pg_verifybackup: TAR format backup verification
Hi,
Currently, pg_verifybackup only works with plain (directory) format backups.
This proposal aims to support tar-format backups too. We will read the tar
files from start to finish and verify each file inside against the
backup_manifest information, similar to how it verifies plain files.
We are introducing new options to pg_verifybackup: -F, --format=p|t and -Z,
--compress=METHOD, which allow users to specify the backup format and
compression type, similar to the options available in pg_basebackup. If these
options are not provided, the backup format and compression type will be
automatically detected. To determine the format, we will search for PG_VERSION
file in the backup directory — if found, it indicates a plain backup;
otherwise, it
is a tar-format backup. For the compression type, we will check the extension
of base.tar.xxx file of tar-format backup. Refer to patch 0008 for the details.
The main challenge is to structure the code neatly. For plain-format backups,
we verify bytes directly from the files. For tar-format backups, we read bytes
from the tar file of the specific file we care about. We need an abstraction
to handle both formats smoothly, without using many if statements or special
cases.
To achieve this goal, we need to reuse existing infrastructure without
duplicating code, and for that, the major work involved here is the code
refactoring. Here is a breakdown of the work:
1. BBSTREAMER Rename and Relocate:
BBSTREAMER, currently used in pg_basebackup for reading and decompressing TAR
files; can also be used for pg_verifybackup. In the future, it could support
other tools like pg_combinebackup for merging TAR backups without extraction,
and pg_waldump for verifying WAL files from the tar backup. For that
accessibility,
BBSTREAMER needs to be relocated to a shared directory.
Moreover, renaming BBSTREAMER to ASTREAMER (short for Archive Streamer) would
better indicate its general application across multiple tools. Moving it to
src/fe_utils directory is appropriate, given its frontend infrastructure use.
2. pg_verifybackup Code Refactoring:
The existing code for plain backup verification will be split into separate
files or functions, so it can also be reused for tar backup verification.
3. Adding TAR Backup Verification:
Finally, patches will be added to implement TAR backup verification, along with
tests and documentation.
Patches 0001-0003 focus on renaming and relocating BBSTREAMER, patches
0004-0007 on splitting the existing verification code, and patches 0008-0010 on
adding TAR backup verification capabilities, tests, and documentation. The last
set could be a single patch but is split to make the review easier.
Please take a look at the attached patches and share your comments,
suggestions, or any ways to enhance them. Your feedback is greatly
appreciated.
Thank you !
--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
Attachments:
v1-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchapplication/x-patch; name=v1-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchDownload
From 3d851acbab2a141b4a23b21c83260a4bace354fb Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Mon, 17 Jun 2024 15:33:15 +0530
Subject: [PATCH v1 02/10] Refactor: Add astreamer_inject.h and move related
declarations to it.
---
src/bin/pg_basebackup/astreamer.h | 7 -------
src/bin/pg_basebackup/astreamer_inject.c | 2 +-
src/bin/pg_basebackup/astreamer_inject.h | 24 ++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 2 +-
4 files changed, 26 insertions(+), 9 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer_inject.h
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
index 6b0047418bb..b4b9e381900 100644
--- a/src/bin/pg_basebackup/astreamer.h
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -24,7 +24,6 @@
#include "common/compression.h"
#include "lib/stringinfo.h"
-#include "pqexpbuffer.h"
struct astreamer;
struct astreamer_ops;
@@ -217,10 +216,4 @@ extern astreamer *astreamer_tar_parser_new(astreamer *next);
extern astreamer *astreamer_tar_terminator_new(astreamer *next);
extern astreamer *astreamer_tar_archiver_new(astreamer *next);
-extern astreamer *astreamer_recovery_injector_new(astreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void astreamer_inject_file(astreamer *streamer, char *pathname,
- char *data, int len);
-
#endif
diff --git a/src/bin/pg_basebackup/astreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
index 7f1decded8d..4ad8381f102 100644
--- a/src/bin/pg_basebackup/astreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -11,7 +11,7 @@
#include "postgres_fe.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "common/file_perm.h"
#include "common/logging.h"
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
new file mode 100644
index 00000000000..8504b3f5e0d
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_inject.h
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer_inject.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_INJECT_H
+#define ASTREAMER_INJECT_H
+
+#include "astreamer.h"
+#include "pqexpbuffer.h"
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4179b064cbc..1e753e40c97 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,7 +26,7 @@
#endif
#include "access/xlog_internal.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "backup/basebackup.h"
#include "common/compression.h"
#include "common/file_perm.h"
--
2.18.0
v1-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchapplication/x-patch; name=v1-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchDownload
From a3101a0bde8f5a17c0a0cbc4bddbbebd86d0c7e9 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 09:39:32 +0530
Subject: [PATCH v1 01/10] Refactor: Rename all bbstreamer references to
astreamer.
BBSTREAMER is specific to pg_basebackup; we need a more generalized
name so it can be placed in a common area, making it accessible for
other modules. Renaming it to ASTREAMER, short for ARCHIVE STREAMER,
makes it more general.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/astreamer.h | 226 +++++++++++++
.../{bbstreamer_file.c => astreamer_file.c} | 148 ++++----
.../{bbstreamer_gzip.c => astreamer_gzip.c} | 154 ++++-----
...bbstreamer_inject.c => astreamer_inject.c} | 152 ++++-----
.../{bbstreamer_lz4.c => astreamer_lz4.c} | 172 +++++-----
.../{bbstreamer_tar.c => astreamer_tar.c} | 316 +++++++++---------
.../{bbstreamer_zstd.c => astreamer_zstd.c} | 160 ++++-----
src/bin/pg_basebackup/bbstreamer.h | 226 -------------
src/bin/pg_basebackup/meson.build | 12 +-
src/bin/pg_basebackup/nls.mk | 12 +-
src/bin/pg_basebackup/pg_basebackup.c | 74 ++--
src/tools/pgindent/typedefs.list | 26 +-
13 files changed, 845 insertions(+), 845 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer.h
rename src/bin/pg_basebackup/{bbstreamer_file.c => astreamer_file.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_gzip.c => astreamer_gzip.c} (62%)
rename src/bin/pg_basebackup/{bbstreamer_inject.c => astreamer_inject.c} (53%)
rename src/bin/pg_basebackup/{bbstreamer_lz4.c => astreamer_lz4.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_tar.c => astreamer_tar.c} (50%)
rename src/bin/pg_basebackup/{bbstreamer_zstd.c => astreamer_zstd.c} (64%)
delete mode 100644 src/bin/pg_basebackup/bbstreamer.h
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 26c53e473f5..a71af2d48a7 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,12 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- bbstreamer_file.o \
- bbstreamer_gzip.o \
- bbstreamer_inject.o \
- bbstreamer_lz4.o \
- bbstreamer_tar.o \
- bbstreamer_zstd.o
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_inject.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
new file mode 100644
index 00000000000..6b0047418bb
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * astreamer objects for further processing. The astreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent astreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_H
+#define ASTREAMER_H
+
+#include "common/compression.h"
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct astreamer;
+struct astreamer_ops;
+typedef struct astreamer astreamer;
+typedef struct astreamer_ops astreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a astreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see astreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one ASTREAMER_MEMBER_HEADER chunk and
+ * exactly one ASTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * ASTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly ASTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last ASTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ ASTREAMER_UNKNOWN,
+ ASTREAMER_MEMBER_HEADER,
+ ASTREAMER_MEMBER_CONTENTS,
+ ASTREAMER_MEMBER_TRAILER,
+ ASTREAMER_ARCHIVE_TRAILER,
+} astreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as ASTREAMER_MEMBER_HEADER,
+ * ASTREAMER_MEMBER_CONTENTS, or ASTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} astreamer_member;
+
+/*
+ * Generally, each type of astreamer will define its own struct, but the
+ * first element should be 'astreamer base'. A astreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the astreamer_ops object which contains the
+ * function pointers appropriate to this type of astreamer.
+ *
+ * bbs_next is a pointer to the successor astreamer, for those types of
+ * astreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of astreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct astreamer
+{
+ const astreamer_ops *bbs_ops;
+ astreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a astreamer. The 'content' callback is
+ * called repeatedly, as described in the astreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * astreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct astreamer_ops
+{
+ void (*content) (astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+ void (*finalize) (astreamer *streamer);
+ void (*free) (astreamer *streamer);
+};
+
+/* Send some content to a astreamer. */
+static inline void
+astreamer_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a astreamer. */
+static inline void
+astreamer_finalize(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a astreamer. */
+static inline void
+astreamer_free(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the astreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+astreamer_buffer_bytes(astreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * astreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+astreamer_buffer_until(astreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ astreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ astreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating astreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern astreamer *astreamer_plain_writer_new(char *pathname, FILE *file);
+extern astreamer *astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern astreamer *astreamer_gzip_decompressor_new(astreamer *next);
+extern astreamer *astreamer_lz4_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_lz4_decompressor_new(astreamer *next);
+extern astreamer *astreamer_zstd_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_zstd_decompressor_new(astreamer *next);
+extern astreamer *astreamer_tar_parser_new(astreamer *next);
+extern astreamer *astreamer_tar_terminator_new(astreamer *next);
+extern astreamer *astreamer_tar_archiver_new(astreamer *next);
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/astreamer_file.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_file.c
rename to src/bin/pg_basebackup/astreamer_file.c
index bab6cd4a6b1..2742385e103 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/astreamer_file.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_file.c
+ * astreamer_file.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_file.c
+ * src/bin/pg_basebackup/astreamer_file.c
*-------------------------------------------------------------------------
*/
@@ -13,60 +13,60 @@
#include <unistd.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
-typedef struct bbstreamer_plain_writer
+typedef struct astreamer_plain_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
FILE *file;
bool should_close_file;
-} bbstreamer_plain_writer;
+} astreamer_plain_writer;
-typedef struct bbstreamer_extractor
+typedef struct astreamer_extractor
{
- bbstreamer base;
+ astreamer base;
char *basepath;
const char *(*link_map) (const char *);
void (*report_output_file) (const char *);
char filename[MAXPGPATH];
FILE *file;
-} bbstreamer_extractor;
+} astreamer_extractor;
-static void bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+static void astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_plain_writer_finalize(astreamer *streamer);
+static void astreamer_plain_writer_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_plain_writer_ops = {
- .content = bbstreamer_plain_writer_content,
- .finalize = bbstreamer_plain_writer_finalize,
- .free = bbstreamer_plain_writer_free
+static const astreamer_ops astreamer_plain_writer_ops = {
+ .content = astreamer_plain_writer_content,
+ .finalize = astreamer_plain_writer_finalize,
+ .free = astreamer_plain_writer_free
};
-static void bbstreamer_extractor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_extractor_finalize(bbstreamer *streamer);
-static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void astreamer_extractor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_extractor_finalize(astreamer *streamer);
+static void astreamer_extractor_free(astreamer *streamer);
static void extract_directory(const char *filename, mode_t mode);
static void extract_link(const char *filename, const char *linktarget);
static FILE *create_file_for_extract(const char *filename, mode_t mode);
-static const bbstreamer_ops bbstreamer_extractor_ops = {
- .content = bbstreamer_extractor_content,
- .finalize = bbstreamer_extractor_finalize,
- .free = bbstreamer_extractor_free
+static const astreamer_ops astreamer_extractor_ops = {
+ .content = astreamer_extractor_content,
+ .finalize = astreamer_extractor_finalize,
+ .free = astreamer_extractor_free
};
/*
- * Create a bbstreamer that just writes data to a file.
+ * Create a astreamer that just writes data to a file.
*
* The caller must specify a pathname and may specify a file. The pathname is
* used for error-reporting purposes either way. If file is NULL, the pathname
@@ -74,14 +74,14 @@ static const bbstreamer_ops bbstreamer_extractor_ops = {
* for writing and closed when done. If file is not NULL, the data is written
* there.
*/
-bbstreamer *
-bbstreamer_plain_writer_new(char *pathname, FILE *file)
+astreamer *
+astreamer_plain_writer_new(char *pathname, FILE *file)
{
- bbstreamer_plain_writer *streamer;
+ astreamer_plain_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_plain_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_plain_writer_ops;
+ streamer = palloc0(sizeof(astreamer_plain_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_plain_writer_ops;
streamer->pathname = pstrdup(pathname);
streamer->file = file;
@@ -101,13 +101,13 @@ bbstreamer_plain_writer_new(char *pathname, FILE *file)
* Write archive content to file.
*/
static void
-bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (len == 0)
return;
@@ -128,11 +128,11 @@ bbstreamer_plain_writer_content(bbstreamer *streamer,
* the file if we opened it, but not if the caller provided it.
*/
static void
-bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+astreamer_plain_writer_finalize(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
pg_fatal("could not close file \"%s\": %m",
@@ -143,14 +143,14 @@ bbstreamer_plain_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_plain_writer_free(bbstreamer *streamer)
+astreamer_plain_writer_free(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
Assert(!mystreamer->should_close_file);
Assert(mystreamer->base.bbs_next == NULL);
@@ -160,13 +160,13 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
}
/*
- * Create a bbstreamer that extracts an archive.
+ * Create a astreamer that extracts an archive.
*
* All pathnames in the archive are interpreted relative to basepath.
*
- * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * Unlike e.g. astreamer_plain_writer_new() we can't do anything useful here
* with untyped chunks; we need typed chunks which follow the rules described
- * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * in astreamer.h. Assuming we have that, we don't need to worry about the
* original archive format; it's enough to just look at the member information
* provided and write to the corresponding file.
*
@@ -179,16 +179,16 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
* new output file. The pathname to that file is passed as an argument. If
* NULL, the call is skipped.
*/
-bbstreamer *
-bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *))
+astreamer *
+astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
{
- bbstreamer_extractor *streamer;
+ astreamer_extractor *streamer;
- streamer = palloc0(sizeof(bbstreamer_extractor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_extractor_ops;
+ streamer = palloc0(sizeof(astreamer_extractor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_extractor_ops;
streamer->basepath = pstrdup(basepath);
streamer->link_map = link_map;
streamer->report_output_file = report_output_file;
@@ -200,19 +200,19 @@ bbstreamer_extractor_new(const char *basepath,
* Extract archive contents to the filesystem.
*/
static void
-bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_extractor_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
int fnamelen;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
+ Assert(context != ASTREAMER_UNKNOWN);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
Assert(mystreamer->file == NULL);
/* Prepend basepath. */
@@ -245,7 +245,7 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
mystreamer->report_output_file(mystreamer->filename);
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
if (mystreamer->file == NULL)
break;
@@ -260,14 +260,14 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
if (mystreamer->file == NULL)
break;
fclose(mystreamer->file);
mystreamer->file = NULL;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
break;
default:
@@ -375,10 +375,10 @@ create_file_for_extract(const char *filename, mode_t mode)
* There's nothing to do here but sanity checking.
*/
static void
-bbstreamer_extractor_finalize(bbstreamer *streamer)
+astreamer_extractor_finalize(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
- = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
+ = (astreamer_extractor *) streamer;
Assert(mystreamer->file == NULL);
}
@@ -387,9 +387,9 @@ bbstreamer_extractor_finalize(bbstreamer *streamer)
* Free memory.
*/
static void
-bbstreamer_extractor_free(bbstreamer *streamer)
+astreamer_extractor_free(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
pfree(mystreamer->basepath);
pfree(mystreamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/astreamer_gzip.c
similarity index 62%
rename from src/bin/pg_basebackup/bbstreamer_gzip.c
rename to src/bin/pg_basebackup/astreamer_gzip.c
index 0417fd9bc2c..6f7c27afbbc 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/astreamer_gzip.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_gzip.c
+ * astreamer_gzip.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_gzip.c
+ * src/bin/pg_basebackup/astreamer_gzip.c
*-------------------------------------------------------------------------
*/
@@ -17,74 +17,74 @@
#include <zlib.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
+typedef struct astreamer_gzip_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
gzFile gzfile;
-} bbstreamer_gzip_writer;
+} astreamer_gzip_writer;
-typedef struct bbstreamer_gzip_decompressor
+typedef struct astreamer_gzip_decompressor
{
- bbstreamer base;
+ astreamer base;
z_stream zstream;
size_t bytes_written;
-} bbstreamer_gzip_decompressor;
+} astreamer_gzip_decompressor;
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static void astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_writer_finalize(astreamer *streamer);
+static void astreamer_gzip_writer_free(astreamer *streamer);
static const char *get_gz_error(gzFile gzf);
-static const bbstreamer_ops bbstreamer_gzip_writer_ops = {
- .content = bbstreamer_gzip_writer_content,
- .finalize = bbstreamer_gzip_writer_finalize,
- .free = bbstreamer_gzip_writer_free
+static const astreamer_ops astreamer_gzip_writer_ops = {
+ .content = astreamer_gzip_writer_content,
+ .finalize = astreamer_gzip_writer_finalize,
+ .free = astreamer_gzip_writer_free
};
-static void bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_decompressor_free(bbstreamer *streamer);
+static void astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_decompressor_finalize(astreamer *streamer);
+static void astreamer_gzip_decompressor_free(astreamer *streamer);
static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
static void gzip_pfree(void *opaque, void *address);
-static const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
- .content = bbstreamer_gzip_decompressor_content,
- .finalize = bbstreamer_gzip_decompressor_finalize,
- .free = bbstreamer_gzip_decompressor_free
+static const astreamer_ops astreamer_gzip_decompressor_ops = {
+ .content = astreamer_gzip_decompressor_content,
+ .finalize = astreamer_gzip_decompressor_finalize,
+ .free = astreamer_gzip_decompressor_free
};
#endif
/*
- * Create a bbstreamer that just compresses data using gzip, and then writes
+ * Create a astreamer that just compresses data using gzip, and then writes
* it to a file.
*
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * As in the case of astreamer_plain_writer_new, pathname is always used
* for error reporting purposes; if file is NULL, it is also the opened and
* closed so that the data may be written there.
*/
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress)
+astreamer *
+astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_writer *streamer;
+ astreamer_gzip_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_gzip_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_writer_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_writer_ops;
streamer->pathname = pstrdup(pathname);
@@ -123,13 +123,13 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file,
* Write archive content to gzip file.
*/
static void
-bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
if (len == 0)
return;
@@ -151,16 +151,16 @@ bbstreamer_gzip_writer_content(bbstreamer *streamer,
*
* It makes no difference whether we opened the file or the caller did it,
* because libz provides no way of avoiding a close on the underlying file
- * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * handle. Notice, however, that astreamer_gzip_writer_new() uses dup() to
* work around this issue, so that the behavior from the caller's viewpoint
- * is the same as for bbstreamer_plain_writer.
+ * is the same as for astreamer_plain_writer.
*/
static void
-bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+astreamer_gzip_writer_finalize(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
errno = 0; /* in case gzclose() doesn't set it */
if (gzclose(mystreamer->gzfile) != 0)
@@ -171,14 +171,14 @@ bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_gzip_writer_free(bbstreamer *streamer)
+astreamer_gzip_writer_free(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
Assert(mystreamer->base.bbs_next == NULL);
Assert(mystreamer->gzfile == NULL);
@@ -208,18 +208,18 @@ get_gz_error(gzFile gzf)
* Create a new base backup streamer that performs decompression of gzip
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_gzip_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_gzip_decompressor_new(astreamer *next)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_decompressor *streamer;
+ astreamer_gzip_decompressor *streamer;
z_stream *zs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_gzip_decompressor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_decompressor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -258,15 +258,15 @@ bbstreamer_gzip_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
z_stream *zs;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
zs = &mystreamer->zstream;
zs->next_in = (const uint8 *) data;
@@ -301,9 +301,9 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
/* If output buffer is full then pass data to next streamer */
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen, context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
mystreamer->bytes_written = 0;
}
}
@@ -313,31 +313,31 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer)
+astreamer_gzip_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_gzip_decompressor_free(bbstreamer *streamer)
+astreamer_gzip_decompressor_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
similarity index 53%
rename from src/bin/pg_basebackup/bbstreamer_inject.c
rename to src/bin/pg_basebackup/astreamer_inject.c
index 194026b56e9..7f1decded8d 100644
--- a/src/bin/pg_basebackup/bbstreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -1,51 +1,51 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_inject.c
+ * astreamer_inject.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_inject.c
+ * src/bin/pg_basebackup/astreamer_inject.c
*-------------------------------------------------------------------------
*/
#include "postgres_fe.h"
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
-typedef struct bbstreamer_recovery_injector
+typedef struct astreamer_recovery_injector
{
- bbstreamer base;
+ astreamer base;
bool skip_file;
bool is_recovery_guc_supported;
bool is_postgresql_auto_conf;
bool found_postgresql_auto_conf;
PQExpBuffer recoveryconfcontents;
- bbstreamer_member member;
-} bbstreamer_recovery_injector;
+ astreamer_member member;
+} astreamer_recovery_injector;
-static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
-static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+static void astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_recovery_injector_finalize(astreamer *streamer);
+static void astreamer_recovery_injector_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
- .content = bbstreamer_recovery_injector_content,
- .finalize = bbstreamer_recovery_injector_finalize,
- .free = bbstreamer_recovery_injector_free
+static const astreamer_ops astreamer_recovery_injector_ops = {
+ .content = astreamer_recovery_injector_content,
+ .finalize = astreamer_recovery_injector_finalize,
+ .free = astreamer_recovery_injector_free
};
/*
- * Create a bbstreamer that can edit recoverydata into an archive stream.
+ * Create a astreamer that can edit recoverydata into an archive stream.
*
- * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
- * per the conventions described in bbstreamer.h; the chunks forwarded to
- * the next bbstreamer will be similarly typed, but the
- * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * The input should be a series of typed chunks (not ASTREAMER_UNKNOWN) as
+ * per the conventions described in astreamer.h; the chunks forwarded to
+ * the next astreamer will be similarly typed, but the
+ * ASTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
* edited the archive stream.
*
* Our goal is to do one of the following three things with the content passed
@@ -61,16 +61,16 @@ static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
* zero-length standby.signal file, dropping any file with that name from
* the archive.
*/
-bbstreamer *
-bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents)
+astreamer *
+astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
{
- bbstreamer_recovery_injector *streamer;
+ astreamer_recovery_injector *streamer;
- streamer = palloc0(sizeof(bbstreamer_recovery_injector));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_recovery_injector_ops;
+ streamer = palloc0(sizeof(astreamer_recovery_injector));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_recovery_injector_ops;
streamer->base.bbs_next = next;
streamer->is_recovery_guc_supported = is_recovery_guc_supported;
streamer->recoveryconfcontents = recoveryconfcontents;
@@ -82,21 +82,21 @@ bbstreamer_recovery_injector_new(bbstreamer *next,
* Handle each chunk of tar content while injecting recovery configuration.
*/
static void
-bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_recovery_injector *mystreamer;
+ astreamer_recovery_injector *mystreamer;
- mystreamer = (bbstreamer_recovery_injector *) streamer;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ mystreamer = (astreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/* Must copy provided data so we have the option to modify it. */
- memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+ memcpy(&mystreamer->member, member, sizeof(astreamer_member));
/*
* On v12+, skip standby.signal and edit postgresql.auto.conf; on
@@ -119,8 +119,8 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
/*
* Zap data and len because the archive header is no
- * longer valid; some subsequent bbstreamer must
- * regenerate it if it's necessary.
+ * longer valid; some subsequent astreamer must regenerate
+ * it if it's necessary.
*/
data = NULL;
len = 0;
@@ -135,26 +135,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
return;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/* Do not forward if the file is to be skipped. */
if (mystreamer->skip_file)
return;
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/* Do not forward it the file is to be skipped. */
if (mystreamer->skip_file)
return;
/* Append provided content to whatever we already sent. */
if (mystreamer->is_postgresql_auto_conf)
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ ASTREAMER_MEMBER_CONTENTS);
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
if (mystreamer->is_recovery_guc_supported)
{
/*
@@ -163,22 +163,22 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
* member now.
*/
if (!mystreamer->found_postgresql_auto_conf)
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "postgresql.auto.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
/* Inject empty standby.signal file. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "standby.signal", "", 0);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
}
else
{
/* Inject recovery.conf file with specified contents. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "recovery.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
}
/* Nothing to do here. */
@@ -189,26 +189,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
pg_fatal("unexpected state while injecting recovery settings");
}
- bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
- data, len, context);
+ astreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
}
/*
- * End-of-stream processing for this bbstreamer.
+ * End-of-stream processing for this astreamer.
*/
static void
-bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+astreamer_recovery_injector_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_recovery_injector_free(bbstreamer *streamer)
+astreamer_recovery_injector_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
@@ -216,10 +216,10 @@ bbstreamer_recovery_injector_free(bbstreamer *streamer)
* Inject a member into the archive with specified contents.
*/
void
-bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
- int len)
+astreamer_inject_file(astreamer *streamer, char *pathname, char *data,
+ int len)
{
- bbstreamer_member member;
+ astreamer_member member;
strlcpy(member.pathname, pathname, MAXPGPATH);
member.size = len;
@@ -238,12 +238,12 @@ bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
/*
* We don't know here how to generate valid member headers and trailers
* for the archiving format in use, so if those are needed, some successor
- * bbstreamer will have to generate them using the data from 'member'.
+ * astreamer will have to generate them using the data from 'member'.
*/
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_HEADER);
- bbstreamer_content(streamer, &member, data, len,
- BBSTREAMER_MEMBER_CONTENTS);
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_HEADER);
+ astreamer_content(streamer, &member, data, len,
+ ASTREAMER_MEMBER_CONTENTS);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/astreamer_lz4.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_lz4.c
rename to src/bin/pg_basebackup/astreamer_lz4.c
index f5c9e68150c..1c40d7d8ad5 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/astreamer_lz4.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_lz4.c
+ * astreamer_lz4.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_lz4.c
+ * src/bin/pg_basebackup/astreamer_lz4.c
*-------------------------------------------------------------------------
*/
@@ -17,15 +17,15 @@
#include <lz4frame.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef USE_LZ4
-typedef struct bbstreamer_lz4_frame
+typedef struct astreamer_lz4_frame
{
- bbstreamer base;
+ astreamer base;
LZ4F_compressionContext_t cctx;
LZ4F_decompressionContext_t dctx;
@@ -33,32 +33,32 @@ typedef struct bbstreamer_lz4_frame
size_t bytes_written;
bool header_written;
-} bbstreamer_lz4_frame;
+} astreamer_lz4_frame;
-static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+static void astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_compressor_finalize(astreamer *streamer);
+static void astreamer_lz4_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
- .content = bbstreamer_lz4_compressor_content,
- .finalize = bbstreamer_lz4_compressor_finalize,
- .free = bbstreamer_lz4_compressor_free
+static const astreamer_ops astreamer_lz4_compressor_ops = {
+ .content = astreamer_lz4_compressor_content,
+ .finalize = astreamer_lz4_compressor_finalize,
+ .free = astreamer_lz4_compressor_free
};
-static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+static void astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_decompressor_finalize(astreamer *streamer);
+static void astreamer_lz4_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
- .content = bbstreamer_lz4_decompressor_content,
- .finalize = bbstreamer_lz4_decompressor_finalize,
- .free = bbstreamer_lz4_decompressor_free
+static const astreamer_ops astreamer_lz4_decompressor_ops = {
+ .content = astreamer_lz4_decompressor_content,
+ .finalize = astreamer_lz4_decompressor_finalize,
+ .free = astreamer_lz4_decompressor_free
};
#endif
@@ -66,19 +66,19 @@ static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* Create a new base backup streamer that performs lz4 compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_lz4_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
LZ4F_preferences_t *prefs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_compressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -113,19 +113,19 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compr
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t out_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
/* Write header before processing the first input chunk. */
@@ -159,10 +159,10 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
out_bound = LZ4F_compressBound(len, &mystreamer->prefs);
if (avail_out < out_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ context);
/* Enlarge buffer if it falls short of out bound. */
if (mystreamer->base.bbs_buffer.maxlen < out_bound)
@@ -196,25 +196,25 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
+astreamer_lz4_compressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_out;
size_t footer_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/* Find out the footer bound and update the output buffer. */
footer_bound = LZ4F_compressBound(0, &mystreamer->prefs);
if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <
footer_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
/* Enlarge buffer if it falls short of footer bound. */
if (mystreamer->base.bbs_buffer.maxlen < footer_bound)
@@ -243,24 +243,24 @@ bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
mystreamer->bytes_written += compressed_size;
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_compressor_free(bbstreamer *streamer)
+astreamer_lz4_compressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeCompressionContext(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -271,18 +271,18 @@ bbstreamer_lz4_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of lz4
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_lz4_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_lz4_decompressor_new(astreamer *next)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -307,18 +307,18 @@ bbstreamer_lz4_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t avail_in,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
avail_in = len;
@@ -366,10 +366,10 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ context);
avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
@@ -387,34 +387,34 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer)
+astreamer_lz4_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_decompressor_free(bbstreamer *streamer)
+astreamer_lz4_decompressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeDecompressionContext(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/astreamer_tar.c
similarity index 50%
rename from src/bin/pg_basebackup/bbstreamer_tar.c
rename to src/bin/pg_basebackup/astreamer_tar.c
index 9137d17ddc1..673690cd18f 100644
--- a/src/bin/pg_basebackup/bbstreamer_tar.c
+++ b/src/bin/pg_basebackup/astreamer_tar.c
@@ -1,13 +1,13 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_tar.c
+ * astreamer_tar.c
*
* This module implements three types of tar processing. A tar parser
- * expects unlabelled chunks of data (e.g. BBSTREAMER_UNKNOWN) and splits
- * it into labelled chunks (any other value of bbstreamer_archive_context).
+ * expects unlabelled chunks of data (e.g. ASTREAMER_UNKNOWN) and splits
+ * it into labelled chunks (any other value of astreamer_archive_context).
* A tar archiver does the reverse: it takes a bunch of labelled chunks
* and produces a tarfile, optionally replacing member headers and trailers
- * so that upstream bbstreamer objects can perform surgery on the tarfile
+ * so that upstream astreamer objects can perform surgery on the tarfile
* contents without knowing the details of the tar format. A tar terminator
* just adds two blocks of NUL bytes to the end of the file, since older
* server versions produce files with this terminator omitted.
@@ -15,7 +15,7 @@
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_tar.c
+ * src/bin/pg_basebackup/astreamer_tar.c
*-------------------------------------------------------------------------
*/
@@ -23,83 +23,83 @@
#include <time.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#include "pgtar.h"
-typedef struct bbstreamer_tar_parser
+typedef struct astreamer_tar_parser
{
- bbstreamer base;
- bbstreamer_archive_context next_context;
- bbstreamer_member member;
+ astreamer base;
+ astreamer_archive_context next_context;
+ astreamer_member member;
size_t file_bytes_sent;
size_t pad_bytes_expected;
-} bbstreamer_tar_parser;
+} astreamer_tar_parser;
-typedef struct bbstreamer_tar_archiver
+typedef struct astreamer_tar_archiver
{
- bbstreamer base;
+ astreamer base;
bool rearchive_member;
-} bbstreamer_tar_archiver;
+} astreamer_tar_archiver;
-static void bbstreamer_tar_parser_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_parser_free(bbstreamer *streamer);
-static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+static void astreamer_tar_parser_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_parser_finalize(astreamer *streamer);
+static void astreamer_tar_parser_free(astreamer *streamer);
+static bool astreamer_tar_header(astreamer_tar_parser *mystreamer);
-static const bbstreamer_ops bbstreamer_tar_parser_ops = {
- .content = bbstreamer_tar_parser_content,
- .finalize = bbstreamer_tar_parser_finalize,
- .free = bbstreamer_tar_parser_free
+static const astreamer_ops astreamer_tar_parser_ops = {
+ .content = astreamer_tar_parser_content,
+ .finalize = astreamer_tar_parser_finalize,
+ .free = astreamer_tar_parser_free
};
-static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+static void astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_archiver_finalize(astreamer *streamer);
+static void astreamer_tar_archiver_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_archiver_ops = {
- .content = bbstreamer_tar_archiver_content,
- .finalize = bbstreamer_tar_archiver_finalize,
- .free = bbstreamer_tar_archiver_free
+static const astreamer_ops astreamer_tar_archiver_ops = {
+ .content = astreamer_tar_archiver_content,
+ .finalize = astreamer_tar_archiver_finalize,
+ .free = astreamer_tar_archiver_free
};
-static void bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_terminator_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_terminator_free(bbstreamer *streamer);
+static void astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_terminator_finalize(astreamer *streamer);
+static void astreamer_tar_terminator_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_terminator_ops = {
- .content = bbstreamer_tar_terminator_content,
- .finalize = bbstreamer_tar_terminator_finalize,
- .free = bbstreamer_tar_terminator_free
+static const astreamer_ops astreamer_tar_terminator_ops = {
+ .content = astreamer_tar_terminator_content,
+ .finalize = astreamer_tar_terminator_finalize,
+ .free = astreamer_tar_terminator_free
};
/*
- * Create a bbstreamer that can parse a stream of content as tar data.
+ * Create a astreamer that can parse a stream of content as tar data.
*
- * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * The input should be a series of ASTREAMER_UNKNOWN chunks; the astreamer
* specified by 'next' will receive a series of typed chunks, as per the
- * conventions described in bbstreamer.h.
+ * conventions described in astreamer.h.
*/
-bbstreamer *
-bbstreamer_tar_parser_new(bbstreamer *next)
+astreamer *
+astreamer_tar_parser_new(astreamer *next)
{
- bbstreamer_tar_parser *streamer;
+ astreamer_tar_parser *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_parser));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_parser_ops;
+ streamer = palloc0(sizeof(astreamer_tar_parser));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_parser_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
- streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ streamer->next_context = ASTREAMER_MEMBER_HEADER;
return &streamer->base;
}
@@ -108,29 +108,29 @@ bbstreamer_tar_parser_new(bbstreamer *next)
* Parse unknown content as tar data.
*/
static void
-bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_parser_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
size_t nbytes;
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
while (len > 0)
{
switch (mystreamer->next_context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/*
* If we're expecting an archive member header, accumulate a
* full block of data before doing anything further.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- TAR_BLOCK_SIZE))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
return;
/*
@@ -139,32 +139,32 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* thought was the next file header is actually the start of
* the archive trailer. Switch modes accordingly.
*/
- if (bbstreamer_tar_header(mystreamer))
+ if (astreamer_tar_header(mystreamer))
{
if (mystreamer->member.size == 0)
{
/* No content; trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Expect contents. */
- mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ mystreamer->next_context = ASTREAMER_MEMBER_CONTENTS;
}
mystreamer->base.bbs_buffer.len = 0;
mystreamer->file_bytes_sent = 0;
}
else
- mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ mystreamer->next_context = ASTREAMER_ARCHIVE_TRAILER;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/*
* Send as much content as we have, but not more than the
@@ -174,10 +174,10 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
nbytes = Min(nbytes, len);
Assert(nbytes > 0);
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, nbytes,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ ASTREAMER_MEMBER_CONTENTS);
mystreamer->file_bytes_sent += nbytes;
data += nbytes;
len -= nbytes;
@@ -193,53 +193,53 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
if (mystreamer->pad_bytes_expected == 0)
{
/* Trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Trailer is not zero-length. */
- mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ mystreamer->next_context = ASTREAMER_MEMBER_TRAILER;
}
mystreamer->base.bbs_buffer.len = 0;
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/*
* If we're expecting an archive member trailer, accumulate
* the expected number of padding bytes before sending
* anything onward.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- mystreamer->pad_bytes_expected))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
return;
/* OK, now we can send it. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, mystreamer->pad_bytes_expected,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next file header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
mystreamer->base.bbs_buffer.len = 0;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
/*
* We've seen an end-of-archive indicator, so anything more is
* buffered and sent as part of the archive trailer. But we
* don't expect more than 2 blocks.
*/
- bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ astreamer_buffer_bytes(streamer, &data, &len, len);
if (len > 2 * TAR_BLOCK_SIZE)
pg_fatal("tar file trailer exceeds 2 blocks");
return;
@@ -255,14 +255,14 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* Parse a file header within a tar stream.
*
* The return value is true if we found a file header and passed it on to the
- * next bbstreamer; it is false if we have reached the archive trailer.
+ * next astreamer; it is false if we have reached the archive trailer.
*/
static bool
-bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+astreamer_tar_header(astreamer_tar_parser *mystreamer)
{
bool has_nonzero_byte = false;
int i;
- bbstreamer_member *member = &mystreamer->member;
+ astreamer_member *member = &mystreamer->member;
char *buffer = mystreamer->base.bbs_buffer.data;
Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
@@ -304,10 +304,10 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
/* Compute number of padding bytes. */
mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
- /* Forward the entire header to the next bbstreamer. */
- bbstreamer_content(mystreamer->base.bbs_next, member,
- buffer, TAR_BLOCK_SIZE,
- BBSTREAMER_MEMBER_HEADER);
+ /* Forward the entire header to the next astreamer. */
+ astreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ ASTREAMER_MEMBER_HEADER);
return true;
}
@@ -316,50 +316,50 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
* End-of-stream processing for a tar parser.
*/
static void
-bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+astreamer_tar_parser_finalize(astreamer *streamer)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
- if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
- (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ if (mystreamer->next_context != ASTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != ASTREAMER_MEMBER_HEADER ||
mystreamer->base.bbs_buffer.len > 0))
pg_fatal("COPY stream ended before last file was finished");
/* Send the archive trailer, even if empty. */
- bbstreamer_content(streamer->bbs_next, NULL,
- streamer->bbs_buffer.data, streamer->bbs_buffer.len,
- BBSTREAMER_ARCHIVE_TRAILER);
+ astreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ ASTREAMER_ARCHIVE_TRAILER);
/* Now finalize successor. */
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar parser.
*/
static void
-bbstreamer_tar_parser_free(bbstreamer *streamer)
+astreamer_tar_parser_free(astreamer *streamer)
{
pfree(streamer->bbs_buffer.data);
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
}
/*
- * Create a bbstreamer that can generate a tar archive.
+ * Create a astreamer that can generate a tar archive.
*
* This is intended to be usable either for generating a brand-new tar archive
* or for modifying one on the fly. The input should be a series of typed
- * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
- * bbstreamer_tar_parser_content.
+ * chunks (i.e. not ASTREAMER_UNKNOWN). See also the comments for
+ * astreamer_tar_parser_content.
*/
-bbstreamer *
-bbstreamer_tar_archiver_new(bbstreamer *next)
+astreamer *
+astreamer_tar_archiver_new(astreamer *next)
{
- bbstreamer_tar_archiver *streamer;
+ astreamer_tar_archiver *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_archiver));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_archiver_ops;
+ streamer = palloc0(sizeof(astreamer_tar_archiver));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_archiver_ops;
streamer->base.bbs_next = next;
return &streamer->base;
@@ -368,36 +368,36 @@ bbstreamer_tar_archiver_new(bbstreamer *next)
/*
* Fix up the stream of input chunks to create a valid tar file.
*
- * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * If a ASTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
* newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
* passed through without change. Any other size is a fatal error (and
* indicates a bug).
*
- * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
- * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * Whenever a new ASTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding ASTREAMER_MEMBER_TRAILER chunk is also constructed from
* scratch. Specifically, we construct a block of zero bytes sufficient to
* pad out to a block boundary, as required by the tar format. Other
- * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ * ASTREAMER_MEMBER_TRAILER chunks are passed through without change.
*
- * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ * Any ASTREAMER_MEMBER_CONTENTS chunks are passed through without change.
*
- * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * The ASTREAMER_ARCHIVE_TRAILER chunk is replaced with two
* blocks of zero bytes. Not all tar programs require this, but apparently
* some do. The server does not supply this trailer. If no archive trailer is
- * present, one will be added by bbstreamer_tar_parser_finalize.
+ * present, one will be added by astreamer_tar_parser_finalize.
*/
static void
-bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ astreamer_tar_archiver *mystreamer = (astreamer_tar_archiver *) streamer;
char buffer[2 * TAR_BLOCK_SIZE];
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(context != ASTREAMER_UNKNOWN);
- if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ if (context == ASTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
{
Assert(len == 0);
@@ -411,7 +411,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Also make a note to replace padding, in case size changed. */
mystreamer->rearchive_member = true;
}
- else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ else if (context == ASTREAMER_MEMBER_TRAILER &&
mystreamer->rearchive_member)
{
int pad_bytes = tarPaddingBytesRequired(member->size);
@@ -424,7 +424,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Don't do this again unless we replace another header. */
mystreamer->rearchive_member = false;
}
- else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ else if (context == ASTREAMER_ARCHIVE_TRAILER)
{
/* Trailer should always be two blocks of zero bytes. */
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
@@ -432,40 +432,40 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
len = 2 * TAR_BLOCK_SIZE;
}
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
* End-of-stream processing for a tar archiver.
*/
static void
-bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+astreamer_tar_archiver_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar archiver.
*/
static void
-bbstreamer_tar_archiver_free(bbstreamer *streamer)
+astreamer_tar_archiver_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
/*
- * Create a bbstreamer that blindly adds two blocks of NUL bytes to the
+ * Create a astreamer that blindly adds two blocks of NUL bytes to the
* end of an incomplete tarfile that the server might send us.
*/
-bbstreamer *
-bbstreamer_tar_terminator_new(bbstreamer *next)
+astreamer *
+astreamer_tar_terminator_new(astreamer *next)
{
- bbstreamer *streamer;
+ astreamer *streamer;
- streamer = palloc0(sizeof(bbstreamer));
- *((const bbstreamer_ops **) &streamer->bbs_ops) =
- &bbstreamer_tar_terminator_ops;
+ streamer = palloc0(sizeof(astreamer));
+ *((const astreamer_ops **) &streamer->bbs_ops) =
+ &astreamer_tar_terminator_ops;
streamer->bbs_next = next;
return streamer;
@@ -475,17 +475,17 @@ bbstreamer_tar_terminator_new(bbstreamer *next)
* Pass all the content through without change.
*/
static void
-bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
/* Just forward it. */
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
@@ -493,22 +493,22 @@ bbstreamer_tar_terminator_content(bbstreamer *streamer,
* to supply.
*/
static void
-bbstreamer_tar_terminator_finalize(bbstreamer *streamer)
+astreamer_tar_terminator_finalize(astreamer *streamer)
{
char buffer[2 * TAR_BLOCK_SIZE];
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
- bbstreamer_content(streamer->bbs_next, NULL, buffer,
- 2 * TAR_BLOCK_SIZE, BBSTREAMER_UNKNOWN);
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_content(streamer->bbs_next, NULL, buffer,
+ 2 * TAR_BLOCK_SIZE, ASTREAMER_UNKNOWN);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar terminator.
*/
static void
-bbstreamer_tar_terminator_free(bbstreamer *streamer)
+astreamer_tar_terminator_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/astreamer_zstd.c
similarity index 64%
rename from src/bin/pg_basebackup/bbstreamer_zstd.c
rename to src/bin/pg_basebackup/astreamer_zstd.c
index 20f11d4450e..58dc679ef99 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/astreamer_zstd.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_zstd.c
+ * astreamer_zstd.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_zstd.c
+ * src/bin/pg_basebackup/astreamer_zstd.c
*-------------------------------------------------------------------------
*/
@@ -17,44 +17,44 @@
#include <zstd.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#ifdef USE_ZSTD
-typedef struct bbstreamer_zstd_frame
+typedef struct astreamer_zstd_frame
{
- bbstreamer base;
+ astreamer base;
ZSTD_CCtx *cctx;
ZSTD_DCtx *dctx;
ZSTD_outBuffer zstd_outBuf;
-} bbstreamer_zstd_frame;
+} astreamer_zstd_frame;
-static void bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_compressor_free(bbstreamer *streamer);
+static void astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_compressor_finalize(astreamer *streamer);
+static void astreamer_zstd_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
- .content = bbstreamer_zstd_compressor_content,
- .finalize = bbstreamer_zstd_compressor_finalize,
- .free = bbstreamer_zstd_compressor_free
+static const astreamer_ops astreamer_zstd_compressor_ops = {
+ .content = astreamer_zstd_compressor_content,
+ .finalize = astreamer_zstd_compressor_finalize,
+ .free = astreamer_zstd_compressor_free
};
-static void bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_decompressor_free(bbstreamer *streamer);
+static void astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_decompressor_finalize(astreamer *streamer);
+static void astreamer_zstd_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
- .content = bbstreamer_zstd_decompressor_content,
- .finalize = bbstreamer_zstd_decompressor_finalize,
- .free = bbstreamer_zstd_decompressor_free
+static const astreamer_ops astreamer_zstd_decompressor_ops = {
+ .content = astreamer_zstd_decompressor_content,
+ .finalize = astreamer_zstd_decompressor_finalize,
+ .free = astreamer_zstd_decompressor_free
};
#endif
@@ -62,19 +62,19 @@ static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* Create a new base backup streamer that performs zstd compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_zstd_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
size_t ret;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_compressor_ops;
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -142,12 +142,12 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *comp
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -162,10 +162,10 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -187,9 +187,9 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
+astreamer_zstd_compressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
size_t yet_to_flush;
do
@@ -204,10 +204,10 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -227,23 +227,23 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
/* Make sure to pass any remaining bytes to the next streamer. */
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_compressor_free(bbstreamer *streamer)
+astreamer_zstd_compressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeCCtx(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -254,17 +254,17 @@ bbstreamer_zstd_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of zstd
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_zstd_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_zstd_decompressor_new(astreamer *next)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -293,12 +293,12 @@ bbstreamer_zstd_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -311,10 +311,10 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->zstd_outBuf.pos >= mystreamer->zstd_outBuf.size)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -335,32 +335,32 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer)
+astreamer_zstd_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_decompressor_free(bbstreamer *streamer)
+astreamer_zstd_decompressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeDCtx(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
deleted file mode 100644
index 3b820f13b51..00000000000
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ /dev/null
@@ -1,226 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * bbstreamer.h
- *
- * Each tar archive returned by the server is passed to one or more
- * bbstreamer objects for further processing. The bbstreamer may do
- * something simple, like write the archive to a file, perhaps after
- * compressing it, but it can also do more complicated things, like
- * annotating the byte stream to indicate which parts of the data
- * correspond to tar headers or trailing padding, vs. which parts are
- * payload data. A subsequent bbstreamer may use this information to
- * make further decisions about how to process the data; for example,
- * it might choose to modify the archive contents.
- *
- * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
- *
- * IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer.h
- *-------------------------------------------------------------------------
- */
-
-#ifndef BBSTREAMER_H
-#define BBSTREAMER_H
-
-#include "common/compression.h"
-#include "lib/stringinfo.h"
-#include "pqexpbuffer.h"
-
-struct bbstreamer;
-struct bbstreamer_ops;
-typedef struct bbstreamer bbstreamer;
-typedef struct bbstreamer_ops bbstreamer_ops;
-
-/*
- * Each chunk of archive data passed to a bbstreamer is classified into one
- * of these categories. When data is first received from the remote server,
- * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
- * be of whatever size the remote server chose to send.
- *
- * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
- * chunks should be labelled as one of the other types listed here. In
- * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
- * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
- * that means a zero-length call. There can be any number of
- * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
- * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
- * last BBSTREAMER_MEMBER_TRAILER chunk.
- *
- * In theory, we could need other classifications here, such as a way of
- * indicating an archive header, but the "tar" format doesn't need anything
- * else, so for the time being there's no point.
- */
-typedef enum
-{
- BBSTREAMER_UNKNOWN,
- BBSTREAMER_MEMBER_HEADER,
- BBSTREAMER_MEMBER_CONTENTS,
- BBSTREAMER_MEMBER_TRAILER,
- BBSTREAMER_ARCHIVE_TRAILER,
-} bbstreamer_archive_context;
-
-/*
- * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
- * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
- * pass a pointer to an instance of this struct. The details are expected
- * to be present in the archive header and used to fill the struct, after
- * which all subsequent calls for the same archive member are expected to
- * pass the same details.
- */
-typedef struct
-{
- char pathname[MAXPGPATH];
- pgoff_t size;
- mode_t mode;
- uid_t uid;
- gid_t gid;
- bool is_directory;
- bool is_link;
- char linktarget[MAXPGPATH];
-} bbstreamer_member;
-
-/*
- * Generally, each type of bbstreamer will define its own struct, but the
- * first element should be 'bbstreamer base'. A bbstreamer that does not
- * require any additional private data could use this structure directly.
- *
- * bbs_ops is a pointer to the bbstreamer_ops object which contains the
- * function pointers appropriate to this type of bbstreamer.
- *
- * bbs_next is a pointer to the successor bbstreamer, for those types of
- * bbstreamer which forward data to a successor. It need not be used and
- * should be set to NULL when not relevant.
- *
- * bbs_buffer is a buffer for accumulating data for temporary storage. Each
- * type of bbstreamer makes its own decisions about whether and how to use
- * this buffer.
- */
-struct bbstreamer
-{
- const bbstreamer_ops *bbs_ops;
- bbstreamer *bbs_next;
- StringInfoData bbs_buffer;
-};
-
-/*
- * There are three callbacks for a bbstreamer. The 'content' callback is
- * called repeatedly, as described in the bbstreamer_archive_context comments.
- * Then, the 'finalize' callback is called once at the end, to give the
- * bbstreamer a chance to perform cleanup such as closing files. Finally,
- * because this code is running in a frontend environment where, as of this
- * writing, there are no memory contexts, the 'free' callback is called to
- * release memory. These callbacks should always be invoked using the static
- * inline functions defined below.
- */
-struct bbstreamer_ops
-{
- void (*content) (bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
- void (*finalize) (bbstreamer *streamer);
- void (*free) (bbstreamer *streamer);
-};
-
-/* Send some content to a bbstreamer. */
-static inline void
-bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->content(streamer, member, data, len, context);
-}
-
-/* Finalize a bbstreamer. */
-static inline void
-bbstreamer_finalize(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->finalize(streamer);
-}
-
-/* Free a bbstreamer. */
-static inline void
-bbstreamer_free(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->free(streamer);
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outside callers. It adds the amount of data specified by
- * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
- * accordingly.
- */
-static inline void
-bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
- int nbytes)
-{
- Assert(nbytes <= *len);
-
- appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
- *len -= nbytes;
- *data += nbytes;
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outsider callers. It attempts to add enough data to the
- * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
- * and '*data' accordingly. It returns true if the target length has been
- * reached and false otherwise.
- */
-static inline bool
-bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
- int target_bytes)
-{
- int buflen = streamer->bbs_buffer.len;
-
- if (buflen >= target_bytes)
- {
- /* Target length already reached; nothing to do. */
- return true;
- }
-
- if (buflen + *len < target_bytes)
- {
- /* Not enough data to reach target length; buffer all of it. */
- bbstreamer_buffer_bytes(streamer, data, len, *len);
- return false;
- }
-
- /* Buffer just enough to reach the target length. */
- bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
- return true;
-}
-
-/*
- * Functions for creating bbstreamer objects of various types. See the header
- * comments for each of these functions for details.
- */
-extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
-extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *));
-
-extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
-
-extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
- char *data, int len);
-
-#endif
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index c00acd5e118..a68dbd7837d 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,12 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'bbstreamer_file.c',
- 'bbstreamer_gzip.c',
- 'bbstreamer_inject.c',
- 'bbstreamer_lz4.c',
- 'bbstreamer_tar.c',
- 'bbstreamer_zstd.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_inject.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/bin/pg_basebackup/nls.mk b/src/bin/pg_basebackup/nls.mk
index 384dbb021e9..950b9797b1e 100644
--- a/src/bin/pg_basebackup/nls.mk
+++ b/src/bin/pg_basebackup/nls.mk
@@ -1,12 +1,12 @@
# src/bin/pg_basebackup/nls.mk
CATALOG_NAME = pg_basebackup
GETTEXT_FILES = $(FRONTEND_COMMON_GETTEXT_FILES) \
- bbstreamer_file.c \
- bbstreamer_gzip.c \
- bbstreamer_inject.c \
- bbstreamer_lz4.c \
- bbstreamer_tar.c \
- bbstreamer_zstd.c \
+ astreamer_file.c \
+ astreamer_gzip.c \
+ astreamer_inject.c \
+ astreamer_lz4.c \
+ astreamer_tar.c \
+ astreamer_zstd.c \
pg_basebackup.c \
pg_createsubscriber.c \
pg_receivewal.c \
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8f3dd04fd22..4179b064cbc 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,8 +26,8 @@
#endif
#include "access/xlog_internal.h"
+#include "astreamer.h"
#include "backup/basebackup.h"
-#include "bbstreamer.h"
#include "common/compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
@@ -57,8 +57,8 @@ typedef struct ArchiveStreamState
{
int tablespacenum;
pg_compress_specification *compress;
- bbstreamer *streamer;
- bbstreamer *manifest_inject_streamer;
+ astreamer *streamer;
+ astreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
char manifest_filename[MAXPGPATH];
FILE *manifest_file;
@@ -67,7 +67,7 @@ typedef struct ArchiveStreamState
typedef struct WriteTarState
{
int tablespacenum;
- bbstreamer *streamer;
+ astreamer *streamer;
} WriteTarState;
typedef struct WriteManifestState
@@ -199,8 +199,8 @@ static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *fo
static void progress_update_filename(const char *filename);
static void progress_report(int tablespacenum, bool force, bool finished);
-static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+static astreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress);
@@ -1053,19 +1053,19 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
* the options selected by the user. We may just write the results directly
* to a file, or we might compress first, or we might extract the tar file
* and write each member separately. This function doesn't do any of that
- * directly, but it works out what kind of bbstreamer we need to create so
+ * directly, but it works out what kind of astreamer we need to create so
* that the right stuff happens when, down the road, we actually receive
* the data.
*/
-static bbstreamer *
+static astreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress)
{
- bbstreamer *streamer = NULL;
- bbstreamer *manifest_inject_streamer = NULL;
+ astreamer *streamer = NULL;
+ astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
is_tar_gz,
@@ -1160,7 +1160,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
directory = psprintf("%s/%s", basedir, spclocation);
else
directory = get_tablespace_mapping(spclocation);
- streamer = bbstreamer_extractor_new(directory,
+ streamer = astreamer_extractor_new(directory,
get_tablespace_mapping,
progress_update_filename);
}
@@ -1188,27 +1188,27 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
if (compress->algorithm == PG_COMPRESSION_NONE)
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
else if (compress->algorithm == PG_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
- streamer = bbstreamer_gzip_writer_new(archive_filename,
+ streamer = astreamer_gzip_writer_new(archive_filename,
archive_file, compress);
}
else if (compress->algorithm == PG_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer, compress);
+ streamer = astreamer_lz4_compressor_new(streamer, compress);
}
else if (compress->algorithm == PG_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer, compress);
+ streamer = astreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1222,7 +1222,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* into it.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_archiver_new(streamer);
+ streamer = astreamer_tar_archiver_new(streamer);
progress_update_filename(archive_filename);
}
@@ -1241,7 +1241,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (spclocation == NULL && writerecoveryconf)
{
Assert(must_parse_archive);
- streamer = bbstreamer_recovery_injector_new(streamer,
+ streamer = astreamer_recovery_injector_new(streamer,
is_recovery_guc_supported,
recoveryconfcontents);
}
@@ -1253,9 +1253,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* we're talking to such a server we'll need to add the terminator here.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_parser_new(streamer);
+ streamer = astreamer_tar_parser_new(streamer);
else if (expect_unterminated_tarfile)
- streamer = bbstreamer_tar_terminator_new(streamer);
+ streamer = astreamer_tar_terminator_new(streamer);
/*
* If the user has requested a server compressed archive along with
@@ -1264,11 +1264,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (format == 'p')
{
if (is_tar_gz)
- streamer = bbstreamer_gzip_decompressor_new(streamer);
+ streamer = astreamer_gzip_decompressor_new(streamer);
else if (is_tar_lz4)
- streamer = bbstreamer_lz4_decompressor_new(streamer);
+ streamer = astreamer_lz4_decompressor_new(streamer);
else if (is_tar_zstd)
- streamer = bbstreamer_zstd_decompressor_new(streamer);
+ streamer = astreamer_zstd_decompressor_new(streamer);
}
/* Return the results. */
@@ -1307,7 +1307,7 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
if (state.manifest_inject_streamer != NULL &&
state.manifest_buffer != NULL)
{
- bbstreamer_inject_file(state.manifest_inject_streamer,
+ astreamer_inject_file(state.manifest_inject_streamer,
"backup_manifest",
state.manifest_buffer->data,
state.manifest_buffer->len);
@@ -1318,8 +1318,8 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
/* If there's still an archive in progress, end processing. */
if (state.streamer != NULL)
{
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
state.streamer = NULL;
}
}
@@ -1383,8 +1383,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
/* End processing of any prior archive. */
if (state->streamer != NULL)
{
- bbstreamer_finalize(state->streamer);
- bbstreamer_free(state->streamer);
+ astreamer_finalize(state->streamer);
+ astreamer_free(state->streamer);
state->streamer = NULL;
}
@@ -1437,8 +1437,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
else if (state->streamer != NULL)
{
/* Archive data. */
- bbstreamer_content(state->streamer, NULL, copybuf + 1,
- r - 1, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, ASTREAMER_UNKNOWN);
}
else
pg_fatal("unexpected payload data");
@@ -1600,7 +1600,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum, pg_compress_specification *compress)
{
WriteTarState state;
- bbstreamer *manifest_inject_streamer;
+ astreamer *manifest_inject_streamer;
bool is_recovery_guc_supported;
bool expect_unterminated_tarfile;
@@ -1636,7 +1636,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
pg_fatal("out of memory");
/* Inject it into the output tarfile. */
- bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ astreamer_inject_file(manifest_inject_streamer, "backup_manifest",
buf.data, buf.len);
/* Free memory. */
@@ -1644,8 +1644,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
}
/* Cleanup. */
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
progress_report(tablespacenum, true, false);
@@ -1663,7 +1663,7 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf, r, ASTREAMER_UNKNOWN);
totaldone += r;
progress_report(state->tablespacenum, false, false);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 635e6d6e215..2d1ec373236 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3308,19 +3308,19 @@ bbsink_shell
bbsink_state
bbsink_throttle
bbsink_zstd
-bbstreamer
-bbstreamer_archive_context
-bbstreamer_extractor
-bbstreamer_gzip_decompressor
-bbstreamer_gzip_writer
-bbstreamer_lz4_frame
-bbstreamer_member
-bbstreamer_ops
-bbstreamer_plain_writer
-bbstreamer_recovery_injector
-bbstreamer_tar_archiver
-bbstreamer_tar_parser
-bbstreamer_zstd_frame
+astreamer
+astreamer_archive_context
+astreamer_extractor
+astreamer_gzip_decompressor
+astreamer_gzip_writer
+astreamer_lz4_frame
+astreamer_member
+astreamer_ops
+astreamer_plain_writer
+astreamer_recovery_injector
+astreamer_tar_archiver
+astreamer_tar_parser
+astreamer_zstd_frame
bgworker_main_type
bh_node_type
binaryheap
--
2.18.0
v1-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchapplication/x-patch; name=v1-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchDownload
From 06b9374d6342565b55ad58250d466a472fde0e8e Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 9 Jul 2024 10:30:31 +0530
Subject: [PATCH v1 03/10] Refactor: move astreamer* files to fe_utils to make
common availability of it.
To make it accessible to other code, we need to move the ASTREAMER
code (previously known as BBSTREAMER) to a common location. The
appropriate place would be src/fe_utils, as it is a frontend
infrastructure intended for shared use.
---
src/bin/pg_basebackup/Makefile | 7 +------
src/bin/pg_basebackup/astreamer_inject.h | 2 +-
src/bin/pg_basebackup/meson.build | 5 -----
src/fe_utils/Makefile | 11 +++++++++--
src/{bin/pg_basebackup => fe_utils}/astreamer_file.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c | 2 +-
src/fe_utils/meson.build | 7 ++++++-
.../pg_basebackup => include/fe_utils}/astreamer.h | 0
11 files changed, 22 insertions(+), 20 deletions(-)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_file.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c (99%)
rename src/{bin/pg_basebackup => include/fe_utils}/astreamer.h (100%)
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a71af2d48a7..f1e73058b23 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,7 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- astreamer_file.o \
- astreamer_gzip.o \
- astreamer_inject.o \
- astreamer_lz4.o \
- astreamer_tar.o \
- astreamer_zstd.o
+ astreamer_inject.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
index 8504b3f5e0d..aeed533862b 100644
--- a/src/bin/pg_basebackup/astreamer_inject.h
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -12,7 +12,7 @@
#ifndef ASTREAMER_INJECT_H
#define ASTREAMER_INJECT_H
-#include "astreamer.h"
+#include "fe_utils/astreamer.h"
#include "pqexpbuffer.h"
extern astreamer *astreamer_recovery_injector_new(astreamer *next,
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index a68dbd7837d..9101fc18438 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,7 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'astreamer_file.c',
- 'astreamer_gzip.c',
'astreamer_inject.c',
- 'astreamer_lz4.c',
- 'astreamer_tar.c',
- 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 946c05258f0..ff002f37d57 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -34,13 +34,20 @@ OBJS = \
simple_list.o \
string_utils.o
+AOBJS = \
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o
+
ifeq ($(PORTNAME), win32)
override CPPFLAGS += -DFD_SETSIZE=1024
endif
all: libpgfeutils.a
-libpgfeutils.a: $(OBJS)
+libpgfeutils.a: $(AOBJS) $(OBJS)
rm -f $@
$(AR) $(AROPT) $@ $^
@@ -59,5 +66,5 @@ uninstall:
rm -f '$(DESTDIR)$(libdir)/libpgfeutils.a'
clean distclean:
- rm -f libpgfeutils.a $(OBJS) lex.backup
+ rm -f libpgfeutils.a $(AOBJS) $(OBJS) lex.backup
rm -f psqlscan.c
diff --git a/src/bin/pg_basebackup/astreamer_file.c b/src/fe_utils/astreamer_file.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_file.c
rename to src/fe_utils/astreamer_file.c
index 2742385e103..13d1192c6e6 100644
--- a/src/bin/pg_basebackup/astreamer_file.c
+++ b/src/fe_utils/astreamer_file.c
@@ -13,10 +13,10 @@
#include <unistd.h>
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
typedef struct astreamer_plain_writer
{
diff --git a/src/bin/pg_basebackup/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_gzip.c
rename to src/fe_utils/astreamer_gzip.c
index 6f7c27afbbc..dd28defac7b 100644
--- a/src/bin/pg_basebackup/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -17,10 +17,10 @@
#include <zlib.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef HAVE_LIBZ
typedef struct astreamer_gzip_writer
diff --git a/src/bin/pg_basebackup/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_lz4.c
rename to src/fe_utils/astreamer_lz4.c
index 1c40d7d8ad5..d8b2a367e47 100644
--- a/src/bin/pg_basebackup/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -17,10 +17,10 @@
#include <lz4frame.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_LZ4
typedef struct astreamer_lz4_frame
diff --git a/src/bin/pg_basebackup/astreamer_tar.c b/src/fe_utils/astreamer_tar.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_tar.c
rename to src/fe_utils/astreamer_tar.c
index 673690cd18f..f5d3562d280 100644
--- a/src/bin/pg_basebackup/astreamer_tar.c
+++ b/src/fe_utils/astreamer_tar.c
@@ -23,8 +23,8 @@
#include <time.h>
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#include "pgtar.h"
typedef struct astreamer_tar_parser
diff --git a/src/bin/pg_basebackup/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_zstd.c
rename to src/fe_utils/astreamer_zstd.c
index 58dc679ef99..45f6cb67363 100644
--- a/src/bin/pg_basebackup/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -17,8 +17,8 @@
#include <zstd.h>
#endif
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_ZSTD
diff --git a/src/fe_utils/meson.build b/src/fe_utils/meson.build
index 14d0482a2cc..0ec28e86af7 100644
--- a/src/fe_utils/meson.build
+++ b/src/fe_utils/meson.build
@@ -13,6 +13,11 @@ fe_utils_sources = files(
'recovery_gen.c',
'simple_list.c',
'string_utils.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
)
psqlscan = custom_target('psqlscan',
@@ -28,6 +33,6 @@ fe_utils = static_library('libpgfeutils',
c_pch: pch_postgres_fe_h,
include_directories: [postgres_inc, libpq_inc],
c_args: host_system == 'windows' ? ['-DFD_SETSIZE=1024'] : [],
- dependencies: frontend_common_code,
+ dependencies: [frontend_common_code, lz4, zlib, zstd],
kwargs: default_lib_args,
)
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/include/fe_utils/astreamer.h
similarity index 100%
rename from src/bin/pg_basebackup/astreamer.h
rename to src/include/fe_utils/astreamer.h
--
2.18.0
v1-0006-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v1-0006-Refactor-split-verify_file_checksum-function.patchDownload
From 99c7408f9b7fe8abebf75eddf8fe4b3609a3671f Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Mon, 17 Jun 2024 14:22:40 +0530
Subject: [PATCH v1 06/10] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to enable incremental checksum computation.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 102 +++++++++++++++-------
src/bin/pg_verifybackup/pg_verifybackup.h | 4 +
2 files changed, 73 insertions(+), 33 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d17b565a604..7b845bece71 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -773,6 +773,72 @@ verify_backup_checksums(verifier_context *context)
progress_report(true);
}
+/*
+ * It computes the checksum incrementally for the received bytes, requiring the
+ * caller to pass a properly initialized checksum_ctx parameter. Once the
+ * complete file content is received, which is tracked using the computed_len
+ * parameter, it verifies against the manifest data. If any error occurs, it
+ * returns false; otherwise, it returns true to indicate either the complete
+ * file content is yet to be received or checksum verification is completed
+ * successfully.
+ */
+bool
+verify_content_checksum(verifier_context *context,
+ pg_checksum_context *checksum_ctx,
+ manifest_file *m, uint8 *buffer,
+ int buffer_len, size_t *computed_len)
+{
+ char *relpath = m->pathname;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+ int checksumlen;
+
+ if (pg_checksum_update(checksum_ctx, buffer, buffer_len) < 0)
+ {
+ report_backup_error(context, "could not update checksum of file \"%s\"",
+ relpath);
+ return false;
+ }
+
+ /* Update the total count of computed checksum bytes. */
+ *computed_len += buffer_len;
+
+ /* Report progress */
+ done_size += buffer_len;
+ progress_report(false);
+
+ /* Yet to receive the full content of the file. */
+ if (*computed_len < m->size)
+ return true;
+
+ /* Get the final checksum. */
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
+ if (checksumlen < 0)
+ {
+ report_backup_error(context,
+ "could not finalize checksum of file \"%s\"",
+ relpath);
+ return false;
+ }
+
+ /* And check it against the manifest. */
+ if (checksumlen != m->checksum_length)
+ {
+ report_backup_error(context,
+ "file \"%s\" has checksum of length %d, but expected %d",
+ relpath, m->checksum_length, checksumlen);
+ return false;
+ }
+ else if (memcmp(checksumbuf, m->checksum_payload, checksumlen) != 0)
+ {
+ report_backup_error(context,
+ "checksum mismatch for file \"%s\"",
+ relpath);
+ return false;
+ }
+
+ return true;
+}
+
/*
* Verify the checksum of a single file.
*/
@@ -785,8 +851,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int fd;
int rc;
size_t bytes_read = 0;
- uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -808,19 +872,14 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/* Read the file chunk by chunk, updating the checksum as we go. */
while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
{
- bytes_read += rc;
- if (pg_checksum_update(&checksum_ctx, buffer, rc) < 0)
+ if (!verify_content_checksum(context, &checksum_ctx, m, buffer, rc,
+ &bytes_read))
{
- report_backup_error(context, "could not update checksum of file \"%s\"",
- relpath);
close(fd);
return;
}
-
- /* Report progress */
- done_size += rc;
- progress_report(false);
}
+
if (rc < 0)
report_backup_error(context, "could not read file \"%s\": %m",
relpath);
@@ -845,32 +904,9 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
* filesystem misbehavior.
*/
if (bytes_read != m->size)
- {
report_backup_error(context,
"file \"%s\" should contain %zu bytes, but read %zu bytes",
relpath, m->size, bytes_read);
- return;
- }
-
- /* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
- if (checksumlen < 0)
- {
- report_backup_error(context,
- "could not finalize checksum of file \"%s\"",
- relpath);
- return;
- }
-
- /* And check it against the manifest. */
- if (checksumlen != m->checksum_length)
- report_backup_error(context,
- "file \"%s\" has checksum of length %d, but expected %d",
- relpath, m->checksum_length, checksumlen);
- else if (memcmp(checksumbuf, m->checksum_payload, checksumlen) != 0)
- report_backup_error(context,
- "checksum mismatch for file \"%s\"",
- relpath);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index c11ff33a100..50a285752aa 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -101,6 +101,10 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, size_t filesize);
+extern bool verify_content_checksum(verifier_context *context,
+ pg_checksum_context *checksum_ctx,
+ manifest_file *m, uint8 *buf,
+ int buf_len, size_t *computed_len);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v1-0008-pg_verifybackup-Add-backup-format-and-compression.patchapplication/x-patch; name=v1-0008-pg_verifybackup-Add-backup-format-and-compression.patchDownload
From 9da0442e9f3f440d9e7a09f2ba2a4df7c10946ac Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v1 08/10] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 143 +++++++++++++++++++++-
1 file changed, 141 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 375f196b300..0d458298f34 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
#include <sys/stat.h>
#include <time.h>
+#include "common/compression.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -56,6 +57,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
+static pg_compress_algorithm find_backup_compression(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
@@ -82,6 +85,9 @@ bool skip_checksums = false;
static uint64 total_size = 0;
static uint64 done_size = 0;
+char format = '\0'; /* p(lain)/t(ar) */
+pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
+
/*
* Main entry point.
*/
@@ -92,11 +98,13 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
{"wal-directory", required_argument, NULL, 'w'},
+ {"compress", required_argument, NULL, 'Z'},
{NULL, 0, NULL, 0}
};
@@ -107,6 +115,7 @@ main(int argc, char **argv)
bool quiet = false;
char *wal_directory = NULL;
char *pg_waldump_path = NULL;
+ bool tar_compression_specified = false;
pg_logging_init(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verifybackup"));
@@ -149,7 +158,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:Z:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -168,6 +177,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -184,6 +202,12 @@ main(int argc, char **argv)
wal_directory = pstrdup(optarg);
canonicalize_path(wal_directory);
break;
+ case 'Z':
+ if (!parse_compress_algorithm(optarg, &compress_algorithm))
+ pg_fatal("unrecognized compression algorithm: \"%s\"",
+ optarg);
+ tar_compression_specified = true;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -215,11 +239,41 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Complain if compression method specified but the format isn't tar. */
+ if (format != 't' && tar_compression_specified)
+ {
+ pg_log_error("only tar mode backups can be compressed");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Determine the backup format if it hasn't been specified. */
+ if (format == '\0')
+ format = find_backup_format(&context);
+
+ /*
+ * Determine the tar backup compression method if it hasn't been
+ * specified.
+ */
+ if (format == 't' && !tar_compression_specified)
+ compress_algorithm = find_backup_compression(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (format != 'p')
+ {
+ pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+ pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -274,8 +328,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!skip_checksums)
+ if (!skip_checksums && format == 'p')
verify_backup_checksums(&context);
/*
@@ -1041,6 +1100,84 @@ progress_report(bool finished)
fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ result = (stat(path, &sb) == 0) ? 'p' : 't';
+ pfree(path);
+
+ return result;
+}
+
+/*
+ * To determine the compression format, we will search for the main data
+ * directory archive and its extension, which starts with base.tar, as
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ */
+static pg_compress_algorithm
+find_backup_compression(verifier_context *context)
+{
+ char *path;
+ struct stat sb;
+ bool found;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * Is this a tar archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_NONE;
+
+ /*
+ * Is this a .tar.gz archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.gz");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_GZIP;
+
+ /*
+ * Is this a .tar.lz4 archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.lz4");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_LZ4;
+
+ /*
+ * Is this a .tar.zst archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.zst");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_ZSTD;
+
+ return PG_COMPRESSION_NONE; /* placate compiler */
+}
+
/*
* Print out usage information and exit.
*/
@@ -1053,11 +1190,13 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -Z, --compress=METHOD compress method (gzip, lz4, zstd, none) \n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
--
2.18.0
v1-0010-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v1-0010-pg_verifybackup-Tests-and-document.patchDownload
From 1ce3f3bbc5002b4f9245f928d738129d24c943c7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 17:04:56 +0530
Subject: [PATCH v1 10/10] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 54 +++++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 18 ++++-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 96 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..c743bd89a92 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. The main data directory's contents
+ will be written to a file named <filename>base.tar</filename>,
+ and each other tablespace will be written to a separate tar file
+ named after that tablespace's OID.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
@@ -227,6 +265,18 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-Z <replaceable class="parameter">method</replaceable></option></term>
+ <term><option>--compress=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ The tar backup compression method can be <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> if no compression.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..d47ce1f04fc 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,13 +17,25 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
+command_fails_like(
+ [ 'pg_verifybackup', '-Zgzip', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Zlz4', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required tar format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Znon_exist', $tempdir ],
+ qr/unrecognized compression algorithm/,
+ 'compression method should be valid');
# create fake manifest file
open(my $fh, '>', "$tempdir/backup_manifest") || die "open: $!";
@@ -31,7 +43,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v1-0007-Refactor-split-verify_control_file.patchapplication/x-patch; name=v1-0007-Refactor-split-verify_control_file.patchDownload
From 302cf37bb76947fa8e850d4096a77575faf4dca3 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:10:08 +0530
Subject: [PATCH v1 07/10] Refactor: split verify_control_file.
Moved verify_control_file doing the control file checks to a separate
function that can be called from other places as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 40 ++++++++++-------------
src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++
2 files changed, 32 insertions(+), 22 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 7b845bece71..375f196b300 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -626,14 +623,20 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
/* Check whether there's an entry in the manifest hash. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the manifest system identifier */
+ if (m != NULL && should_verify_sysid(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_file_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
}
/*
@@ -683,15 +686,11 @@ verify_manifest_entry(verifier_context *context, char *relpath, size_t filesize)
* Sanity check control file and validate system identifier against manifest
* system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_file_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -707,9 +706,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 50a285752aa..64508578290 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
#include "common/controldata_utils.h"
#include "common/hashfn_unstable.h"
+#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
@@ -46,6 +47,16 @@ typedef struct manifest_file
#define should_verify_checksum(m) \
(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1. This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_sysid(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m)->matched) && !((m)->bad) && (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -105,6 +116,9 @@ extern bool verify_content_checksum(verifier_context *context,
pg_checksum_context *checksum_ctx,
manifest_file *m, uint8 *buf,
int buf_len, size_t *computed_len);
+extern void verify_control_file_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v1-0009-pg_verifybackup-Read-tar-files-and-verify-its-con.patchapplication/x-patch; name=v1-0009-pg_verifybackup-Read-tar-files-and-verify-its-con.patchDownload
From f5c5cff262268c3ffb356c31dc68e05bc12f6832 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Mon, 17 Jun 2024 17:11:26 +0530
Subject: [PATCH v1 09/10] pg_verifybackup: Read tar files and verify its
contents
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 250 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 220 +++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 10 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 480 insertions(+), 11 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..9be9a9bc04a
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/astreamer.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+
+ manifest_file *mfile;
+ size_t received_bytes;
+ bool verify_checksums;
+ bool verify_sysid;
+ pg_checksum_context *checksum_ctx;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ return &streamer->base;
+}
+
+/*
+ * It verifies each TAR member entry against the manifest data and performs
+ * checksum verification if enabled. Additionally, it validates the backup's
+ * system identifier against the backup_manifest.
+ */
+static void
+astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ if (!member->is_directory && !member->is_link &&
+ !should_ignore_relpath(mystreamer->context, member->pathname))
+ {
+ manifest_file *m;
+
+ /*
+ * The backup_manifest stores a relative path to the base
+ * directory for files belong tablespace, whereas
+ * <tablespaceoid>.tar doesn't. Prepare the required path,
+ * otherwise, the manfiest entry verification will fail.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and manifest system identifier
+ * verification.
+ *
+ * We could have these checks while receiving contents.
+ * However, since contents are received in multiple iterations,
+ * this would result in these lengthy checks being performed
+ * multiple times. Instead, having a single flag would be more
+ * efficient.
+ */
+ if (m != NULL)
+ {
+ mystreamer->verify_checksums =
+ (!skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_sysid =
+ should_verify_sysid(mystreamer->context->manifest, m);
+ }
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Perform checksum verification as the file content becomes
+ * available, since the TAR format does not have random access to
+ * files like a normal backup directory, where checksum verification
+ * occurs at different points.
+ */
+ if (mystreamer->verify_checksums)
+ {
+ /* If we were first time for this file */
+ if (!mystreamer->checksum_ctx)
+ {
+ mystreamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ if (pg_checksum_init(mystreamer->checksum_ctx,
+ mystreamer->mfile->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, member->pathname);
+ mystreamer->verify_checksums = false;
+ return;
+ }
+ }
+
+ /* Compute and do the checksum validation */
+ mystreamer->verify_checksums =
+ verify_content_checksum(mystreamer->context,
+ mystreamer->checksum_ctx,
+ mystreamer->mfile,
+ (uint8 *) data, len,
+ &mystreamer->received_bytes);
+ }
+
+ /* Do the manifest system identifier verification */
+ if (mystreamer->verify_sysid)
+ {
+ ControlFileData control_file;
+ uint64 manifest_system_identifier;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(mystreamer->context->manifest->version != 1);
+
+ /* Should have whole control file data. */
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ sizeof(ControlFileData)))
+ return;
+
+ pg_log_debug("%s: reading \"%s\"", mystreamer->archive_name,
+ member->pathname);
+
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: could not read control file: read %d of %zu",
+ mystreamer->archive_name, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data,
+ sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ (char *) (&control_file),
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ manifest_system_identifier =
+ mystreamer->context->manifest->system_identifier;
+
+ verify_control_file_data(&control_file, member->pathname,
+ crc_ok, manifest_system_identifier);
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+ mystreamer->checksum_ctx = NULL;
+ mystreamer->mfile = NULL;
+ mystreamer->received_bytes = 0;
+ mystreamer->verify_checksums = false;
+ mystreamer->verify_sysid = false;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 0d458298f34..21b16c281cb 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,12 @@
#include "common/compression.h"
#include "common/parse_manifest.h"
+#include "common/relpath.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
#include "pg_verifybackup.h"
+#include "pgtar.h"
#include "pgtime.h"
/*
@@ -39,6 +42,11 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
+
+static void (*verify_backup_file_cb) (verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -63,6 +71,15 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
+static void verify_plain_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_content(verifier_context *context,
+ char *relpath, char *fullpath,
+ astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -71,6 +88,9 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
static void progress_report(bool finished);
static void usage(void);
@@ -154,6 +174,10 @@ main(int argc, char **argv)
*/
simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -258,6 +282,15 @@ main(int argc, char **argv)
if (format == 't' && !tar_compression_specified)
compress_algorithm = find_backup_compression(&context);
+ /*
+ * Setup the required callback function to verify plain or tar backup
+ * files.
+ */
+ if (format == 'p')
+ verify_backup_file_cb = verify_plain_file_cb;
+ else
+ verify_backup_file_cb = verify_tar_file_cb;
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
@@ -637,7 +670,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
}
/*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
*
* The arguments to this function have the same meaning as the arguments to
* verify_backup_directory.
@@ -646,7 +680,6 @@ static void
verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
{
struct stat sb;
- manifest_file *m;
if (stat(fullpath, &sb) != 0)
{
@@ -679,8 +712,25 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Do the further verifications */
+ verify_backup_file_cb(context, relpath, fullpath, sb.st_size);
+}
+
+/*
+ * Verify one plan file or a symlink.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_plain_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
- m = verify_manifest_entry(context, relpath, sb.st_size);
+ m = verify_manifest_entry(context, relpath, filesize);
/* Validate the manifest system identifier */
if (m != NULL && should_verify_sysid(context->manifest, m))
@@ -698,6 +748,124 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
}
}
+/*
+ * Verify one tar file.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_tar_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ astreamer *streamer;
+ Oid tblspc_oid = InvalidOid;
+ int file_name_len;
+ int file_extn_len = 0; /* placate compiler */
+ char *file_extn = "";
+
+ /* Should be tar backup */
+ Assert(format == 't');
+
+ /* Find the tar file extension. */
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ {
+ file_extn = ".tar";
+ file_extn_len = 4;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_GZIP)
+ {
+ file_extn = ".tar.gz";
+ file_extn_len = 7;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ {
+ file_extn = ".tar.lz4";
+ file_extn_len = 8;
+ }
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ {
+ file_extn = ".tar.zst";
+ file_extn_len = 8;
+ }
+
+ /*
+ * Ensure that we have the correct file type corresponding to the backup
+ * format.
+ */
+ file_name_len = strlen(relpath);
+ if (file_name_len < file_extn_len ||
+ strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0)
+ {
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting tar file",
+ relpath);
+ else
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting \"%s\" compressed tar file",
+ relpath,
+ get_compress_algorithm_name(compress_algorithm));
+ return;
+ }
+
+ /*
+ * For the tablespace, pg_basebackup writes the data out to
+ * <tablespaceoid>.tar. If a file matches that format, then extract the
+ * tablespaceoid, which we need to prepare the paths of the files
+ * belonging to that tablespace relative to the base directory.
+ */
+ if (strspn(relpath, "0123456789") == (file_name_len - file_extn_len))
+ tblspc_oid = strtoi64(relpath, NULL, 10);
+
+ streamer = create_archive_verifier(context, relpath, tblspc_oid);
+ verify_tar_content(context, relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+}
+
+/*
+ * Reads a given tar file in predefined chunks and pass to astreamer. Which
+ * initiates routines for decompression (if necessary) then verification
+ * of each member within the tar archive.
+ */
+static void
+verify_tar_content(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -742,8 +910,8 @@ verify_manifest_entry(verifier_context *context, char *relpath, size_t filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
void
verify_control_file_data(ControlFileData *control_file,
@@ -1124,10 +1292,10 @@ find_backup_format(verifier_context *context)
}
/*
- * To determine the compression format, we will search for the main data
- * directory archive and its extension, which starts with base.tar, as
* pg_basebackup writes the main data directory to an archive file named
- * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ * base.tar, followed by a compression type extension such as .gz, .lz4, or
+ * .zst. To determine the compression format, we need to search for this main
+ * data directory archive file.
*/
static pg_compress_algorithm
find_backup_compression(verifier_context *context)
@@ -1178,6 +1346,42 @@ find_backup_compression(verifier_context *context)
return PG_COMPRESSION_NONE; /* placate compiler */
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algorithm == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print out usage information and exit.
*/
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 64508578290..e27fe4da6a2 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -127,4 +127,14 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();
extern bool should_ignore_relpath(verifier_context *context, const char *relpath);
+/* Forward declarations to avoid fe_utils/astreamer.h include. */
+struct astreamer;
+typedef struct astreamer astreamer;
+
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2d1ec373236..7231a48fbee 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3320,6 +3320,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
--
2.18.0
v1-0005-Refactor-split-verify_backup_file-function.patchapplication/x-patch; name=v1-0005-Refactor-split-verify_backup_file-function.patchDownload
From 292393cf8562bec33ee46dd9d24a694df9301cde Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Mon, 17 Jun 2024 14:17:22 +0530
Subject: [PATCH v1 05/10] Refactor: split verify_backup_file() function.
Separate the manifest entry verification code into a new function.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 36 ++++++++++++++++-------
1 file changed, 25 insertions(+), 11 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index a248db1b28a..d17b565a604 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -624,35 +624,47 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
}
/* Check whether there's an entry in the manifest hash. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, size_t filesize)
+{
+ manifest_file *m;
+
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
{
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
/* Update statistics for progress report, if necessary */
if (show_progress && !skip_checksums && should_verify_checksum(m))
total_size += m->size;
@@ -663,6 +675,8 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
--
2.18.0
v1-0004-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchapplication/x-patch; name=v1-0004-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchDownload
From f8ba9bfef39c908ab26db90167ea9aae97ce6287 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:32:11 +0530
Subject: [PATCH v1 04/10] Refactor: move some part of pg_verifybackup.c to
pg_verifybackup.h
---
src/bin/pg_basebackup/astreamer_inject.h | 1 -
src/bin/pg_verifybackup/pg_verifybackup.c | 99 ++-----------------
src/bin/pg_verifybackup/pg_verifybackup.h | 112 ++++++++++++++++++++++
src/include/fe_utils/astreamer.h | 1 +
4 files changed, 120 insertions(+), 93 deletions(-)
create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
index aeed533862b..21023b6cc47 100644
--- a/src/bin/pg_basebackup/astreamer_inject.h
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -13,7 +13,6 @@
#define ASTREAMER_INJECT_H
#include "fe_utils/astreamer.h"
-#include "pqexpbuffer.h"
extern astreamer *astreamer_recovery_injector_new(astreamer *next,
bool is_recovery_guc_supported,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..a248db1b28a 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "pg_verifybackup.h"
#include "pgtime.h"
/*
@@ -40,83 +39,6 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE (128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
- uint32 status; /* hash status */
- const char *pathname;
- size_t size;
- pg_checksum_type checksum_type;
- int checksum_length;
- uint8 *checksum_payload;
- bool matched;
- bool bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX manifest_files
-#define SH_ELEMENT_TYPE manifest_file
-#define SH_KEY_TYPE const char *
-#define SH_KEY pathname
-#define SH_HASH_KEY(tb, key) hash_string(key)
-#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
-#define SH_SCOPE static inline
-#define SH_RAW_ALLOCATOR pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
- TimeLineID tli;
- XLogRecPtr start_lsn;
- XLogRecPtr end_lsn;
- struct manifest_wal_range *next;
- struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
- int version;
- uint64 system_identifier;
- manifest_files_hash *files;
- manifest_wal_range *first_wal_range;
- manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
- manifest_data *manifest;
- char *backup_directory;
- SimpleStringList ignore_list;
- bool exit_on_error;
- bool saw_any_error;
-} verifier_context;
-
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -150,21 +72,14 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
-static void report_backup_error(verifier_context *context,
- const char *pg_restrict fmt,...)
- pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
- pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
static void progress_report(bool finished);
static void usage(void);
static const char *progname;
/* options */
-static bool show_progress = false;
-static bool skip_checksums = false;
+bool show_progress = false;
+bool skip_checksums = false;
/* Progress indicators */
static uint64 total_size = 0;
@@ -556,7 +471,7 @@ verifybackup_per_file_cb(JsonManifestParseContext *context,
bool found;
/* Make a new entry in the hash table for this file. */
- m = manifest_files_insert(ht, pathname, &found);
+ m = manifest_files_insert(ht, (char *) pathname, &found);
if (found)
report_fatal_error("duplicate path name in backup manifest: \"%s\"",
pathname);
@@ -979,7 +894,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
* Update the context to indicate that we saw an error, and exit if the
* context says we should.
*/
-static void
+void
report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
{
va_list ap;
@@ -996,7 +911,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
/*
* Report a fatal error and exit
*/
-static void
+void
report_fatal_error(const char *pg_restrict fmt,...)
{
va_list ap;
@@ -1015,7 +930,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
* Note that by "prefix" we mean a parent directory; for this purpose,
* "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
*/
-static bool
+bool
should_ignore_relpath(verifier_context *context, const char *relpath)
{
SimpleStringListCell *cell;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..c11ff33a100
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,112 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ * Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+extern bool show_progress;
+extern bool skip_checksums;
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+ uint32 status; /* hash status */
+ char *pathname;
+ size_t size;
+ pg_checksum_type checksum_type;
+ int checksum_length;
+ uint8 *checksum_payload;
+ bool matched;
+ bool bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+ (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX manifest_files
+#define SH_ELEMENT_TYPE manifest_file
+#define SH_KEY_TYPE char *
+#define SH_KEY pathname
+#define SH_HASH_KEY(tb, key) hash_string(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ struct manifest_wal_range *next;
+ struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+ int version;
+ uint64 system_identifier;
+ manifest_files_hash *files;
+ manifest_wal_range *first_wal_range;
+ manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+ manifest_data *manifest;
+ char *backup_directory;
+ SimpleStringList ignore_list;
+ bool exit_on_error;
+ bool saw_any_error;
+} verifier_context;
+
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, size_t filesize);
+
+extern void report_backup_error(verifier_context *context,
+ const char *pg_restrict fmt,...)
+ pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+ pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context, const char *relpath);
+
+#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/include/fe_utils/astreamer.h b/src/include/fe_utils/astreamer.h
index b4b9e381900..9d0a8c4d0c2 100644
--- a/src/include/fe_utils/astreamer.h
+++ b/src/include/fe_utils/astreamer.h
@@ -24,6 +24,7 @@
#include "common/compression.h"
#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
struct astreamer;
struct astreamer_ops;
--
2.18.0
Hi Amul,
thanks for working on this.
+ file_name_len = strlen(relpath);
+ if (file_name_len < file_extn_len || + strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0) + { + if (compress_algorithm == PG_COMPRESSION_NONE) + report_backup_error(context, + "\"%s\" is not a valid file, expecting tar file", + relpath); + else + report_backup_error(context, + "\"%s\" is not a valid file, expecting \"%s\" compressed tar file", + relpath, + get_compress_algorithm_name(compress_algorithm)); + return; + }
I believe pg_verifybackup needs to exit after reporting a failure here
since it could not figure out a streamer to allocate.
Also, v1-0002 removes #include "pqexpbuffer.h" from astreamer.h and adds it
to the new .h file and in v1-0004 it
reverts the change. So this can be avoided altogether.
On Tue, Jul 9, 2024 at 3:24 PM Amul Sul <sulamul@gmail.com> wrote:
Hi,
Currently, pg_verifybackup only works with plain (directory) format
backups.
This proposal aims to support tar-format backups too. We will read the tar
files from start to finish and verify each file inside against the
backup_manifest information, similar to how it verifies plain files.We are introducing new options to pg_verifybackup: -F, --format=p|t and -Z,
--compress=METHOD, which allow users to specify the backup format and
compression type, similar to the options available in pg_basebackup. If
these
options are not provided, the backup format and compression type will be
automatically detected. To determine the format, we will search for
PG_VERSION
file in the backup directory — if found, it indicates a plain backup;
otherwise, it
is a tar-format backup. For the compression type, we will check the
extension
of base.tar.xxx file of tar-format backup. Refer to patch 0008 for the
details.The main challenge is to structure the code neatly. For plain-format
backups,
we verify bytes directly from the files. For tar-format backups, we read
bytes
from the tar file of the specific file we care about. We need an
abstraction
to handle both formats smoothly, without using many if statements or
special
cases.To achieve this goal, we need to reuse existing infrastructure without
duplicating code, and for that, the major work involved here is the code
refactoring. Here is a breakdown of the work:1. BBSTREAMER Rename and Relocate:
BBSTREAMER, currently used in pg_basebackup for reading and decompressing
TAR
files; can also be used for pg_verifybackup. In the future, it could
support
other tools like pg_combinebackup for merging TAR backups without
extraction,
and pg_waldump for verifying WAL files from the tar backup. For that
accessibility,
BBSTREAMER needs to be relocated to a shared directory.Moreover, renaming BBSTREAMER to ASTREAMER (short for Archive Streamer)
would
better indicate its general application across multiple tools. Moving it to
src/fe_utils directory is appropriate, given its frontend infrastructure
use.2. pg_verifybackup Code Refactoring:
The existing code for plain backup verification will be split into separate
files or functions, so it can also be reused for tar backup verification.3. Adding TAR Backup Verification:
Finally, patches will be added to implement TAR backup verification, along
with
tests and documentation.Patches 0001-0003 focus on renaming and relocating BBSTREAMER, patches
0004-0007 on splitting the existing verification code, and patches
0008-0010 on
adding TAR backup verification capabilities, tests, and documentation. The
last
set could be a single patch but is split to make the review easier.Please take a look at the attached patches and share your comments,
suggestions, or any ways to enhance them. Your feedback is greatly
appreciated.Thank you !
--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
--
Thanks & Regards,
Sravan Velagandula
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Jul 22, 2024 at 8:29 AM Sravan Kumar <sravanvcybage@gmail.com> wrote:
Hi Amul,
thanks for working on this.
Thanks, for your review.
+ file_name_len = strlen(relpath); + if (file_name_len < file_extn_len || + strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0) + { + if (compress_algorithm == PG_COMPRESSION_NONE) + report_backup_error(context, + "\"%s\" is not a valid file, expecting tar file", + relpath); + else + report_backup_error(context, + "\"%s\" is not a valid file, expecting \"%s\" compressed tar file", + relpath, + get_compress_algorithm_name(compress_algorithm)); + return; + }I believe pg_verifybackup needs to exit after reporting a failure here since it could not figure out a streamer to allocate.
The intention here is to continue the verification of the remaining tar files
instead of exiting immediately in case of an error. If the user prefers an
immediate exit, they can use the --exit-on-error option of pg_verifybackup.
Also, v1-0002 removes #include "pqexpbuffer.h" from astreamer.h and adds it to the new .h file and in v1-0004 it
reverts the change. So this can be avoided altogether.
Fix in the attached version.
Regards,
Amul
Attachments:
v2-0007-Refactor-split-verify_control_file.patchapplication/octet-stream; name=v2-0007-Refactor-split-verify_control_file.patchDownload
From ae9064a6eac9029390ca5c0228fcbcafc5ebc4af Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:10:08 +0530
Subject: [PATCH v2 07/10] Refactor: split verify_control_file.
Moved verify_control_file doing the control file checks to a separate
function that can be called from other places as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 40 ++++++++++-------------
src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++
2 files changed, 32 insertions(+), 22 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 7b845bece71..375f196b300 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -626,14 +623,20 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
/* Check whether there's an entry in the manifest hash. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the manifest system identifier */
+ if (m != NULL && should_verify_sysid(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_file_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
}
/*
@@ -683,15 +686,11 @@ verify_manifest_entry(verifier_context *context, char *relpath, size_t filesize)
* Sanity check control file and validate system identifier against manifest
* system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_file_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -707,9 +706,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 50a285752aa..64508578290 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
#include "common/controldata_utils.h"
#include "common/hashfn_unstable.h"
+#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
@@ -46,6 +47,16 @@ typedef struct manifest_file
#define should_verify_checksum(m) \
(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1. This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_sysid(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m)->matched) && !((m)->bad) && (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -105,6 +116,9 @@ extern bool verify_content_checksum(verifier_context *context,
pg_checksum_context *checksum_ctx,
manifest_file *m, uint8 *buf,
int buf_len, size_t *computed_len);
+extern void verify_control_file_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v2-0008-pg_verifybackup-Add-backup-format-and-compression.patchapplication/octet-stream; name=v2-0008-pg_verifybackup-Add-backup-format-and-compression.patchDownload
From c3262b852c3a2020535ee4aee6a3296a400ba4b7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v2 08/10] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 143 +++++++++++++++++++++-
1 file changed, 141 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 375f196b300..0d458298f34 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
#include <sys/stat.h>
#include <time.h>
+#include "common/compression.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -56,6 +57,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
+static pg_compress_algorithm find_backup_compression(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
@@ -82,6 +85,9 @@ bool skip_checksums = false;
static uint64 total_size = 0;
static uint64 done_size = 0;
+char format = '\0'; /* p(lain)/t(ar) */
+pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
+
/*
* Main entry point.
*/
@@ -92,11 +98,13 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
{"wal-directory", required_argument, NULL, 'w'},
+ {"compress", required_argument, NULL, 'Z'},
{NULL, 0, NULL, 0}
};
@@ -107,6 +115,7 @@ main(int argc, char **argv)
bool quiet = false;
char *wal_directory = NULL;
char *pg_waldump_path = NULL;
+ bool tar_compression_specified = false;
pg_logging_init(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verifybackup"));
@@ -149,7 +158,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:Z:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -168,6 +177,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -184,6 +202,12 @@ main(int argc, char **argv)
wal_directory = pstrdup(optarg);
canonicalize_path(wal_directory);
break;
+ case 'Z':
+ if (!parse_compress_algorithm(optarg, &compress_algorithm))
+ pg_fatal("unrecognized compression algorithm: \"%s\"",
+ optarg);
+ tar_compression_specified = true;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -215,11 +239,41 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Complain if compression method specified but the format isn't tar. */
+ if (format != 't' && tar_compression_specified)
+ {
+ pg_log_error("only tar mode backups can be compressed");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Determine the backup format if it hasn't been specified. */
+ if (format == '\0')
+ format = find_backup_format(&context);
+
+ /*
+ * Determine the tar backup compression method if it hasn't been
+ * specified.
+ */
+ if (format == 't' && !tar_compression_specified)
+ compress_algorithm = find_backup_compression(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (format != 'p')
+ {
+ pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+ pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -274,8 +328,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!skip_checksums)
+ if (!skip_checksums && format == 'p')
verify_backup_checksums(&context);
/*
@@ -1041,6 +1100,84 @@ progress_report(bool finished)
fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ result = (stat(path, &sb) == 0) ? 'p' : 't';
+ pfree(path);
+
+ return result;
+}
+
+/*
+ * To determine the compression format, we will search for the main data
+ * directory archive and its extension, which starts with base.tar, as
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ */
+static pg_compress_algorithm
+find_backup_compression(verifier_context *context)
+{
+ char *path;
+ struct stat sb;
+ bool found;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * Is this a tar archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_NONE;
+
+ /*
+ * Is this a .tar.gz archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.gz");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_GZIP;
+
+ /*
+ * Is this a .tar.lz4 archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.lz4");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_LZ4;
+
+ /*
+ * Is this a .tar.zst archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.zst");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_ZSTD;
+
+ return PG_COMPRESSION_NONE; /* placate compiler */
+}
+
/*
* Print out usage information and exit.
*/
@@ -1053,11 +1190,13 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -Z, --compress=METHOD compress method (gzip, lz4, zstd, none) \n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
--
2.18.0
v2-0006-Refactor-split-verify_file_checksum-function.patchapplication/octet-stream; name=v2-0006-Refactor-split-verify_file_checksum-function.patchDownload
From ce7198153612faf2afc3b1b45d64f331bbd8bae4 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Mon, 17 Jun 2024 14:22:40 +0530
Subject: [PATCH v2 06/10] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to enable incremental checksum computation.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 102 +++++++++++++++-------
src/bin/pg_verifybackup/pg_verifybackup.h | 4 +
2 files changed, 73 insertions(+), 33 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d17b565a604..7b845bece71 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -773,6 +773,72 @@ verify_backup_checksums(verifier_context *context)
progress_report(true);
}
+/*
+ * It computes the checksum incrementally for the received bytes, requiring the
+ * caller to pass a properly initialized checksum_ctx parameter. Once the
+ * complete file content is received, which is tracked using the computed_len
+ * parameter, it verifies against the manifest data. If any error occurs, it
+ * returns false; otherwise, it returns true to indicate either the complete
+ * file content is yet to be received or checksum verification is completed
+ * successfully.
+ */
+bool
+verify_content_checksum(verifier_context *context,
+ pg_checksum_context *checksum_ctx,
+ manifest_file *m, uint8 *buffer,
+ int buffer_len, size_t *computed_len)
+{
+ char *relpath = m->pathname;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+ int checksumlen;
+
+ if (pg_checksum_update(checksum_ctx, buffer, buffer_len) < 0)
+ {
+ report_backup_error(context, "could not update checksum of file \"%s\"",
+ relpath);
+ return false;
+ }
+
+ /* Update the total count of computed checksum bytes. */
+ *computed_len += buffer_len;
+
+ /* Report progress */
+ done_size += buffer_len;
+ progress_report(false);
+
+ /* Yet to receive the full content of the file. */
+ if (*computed_len < m->size)
+ return true;
+
+ /* Get the final checksum. */
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
+ if (checksumlen < 0)
+ {
+ report_backup_error(context,
+ "could not finalize checksum of file \"%s\"",
+ relpath);
+ return false;
+ }
+
+ /* And check it against the manifest. */
+ if (checksumlen != m->checksum_length)
+ {
+ report_backup_error(context,
+ "file \"%s\" has checksum of length %d, but expected %d",
+ relpath, m->checksum_length, checksumlen);
+ return false;
+ }
+ else if (memcmp(checksumbuf, m->checksum_payload, checksumlen) != 0)
+ {
+ report_backup_error(context,
+ "checksum mismatch for file \"%s\"",
+ relpath);
+ return false;
+ }
+
+ return true;
+}
+
/*
* Verify the checksum of a single file.
*/
@@ -785,8 +851,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int fd;
int rc;
size_t bytes_read = 0;
- uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -808,19 +872,14 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/* Read the file chunk by chunk, updating the checksum as we go. */
while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
{
- bytes_read += rc;
- if (pg_checksum_update(&checksum_ctx, buffer, rc) < 0)
+ if (!verify_content_checksum(context, &checksum_ctx, m, buffer, rc,
+ &bytes_read))
{
- report_backup_error(context, "could not update checksum of file \"%s\"",
- relpath);
close(fd);
return;
}
-
- /* Report progress */
- done_size += rc;
- progress_report(false);
}
+
if (rc < 0)
report_backup_error(context, "could not read file \"%s\": %m",
relpath);
@@ -845,32 +904,9 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
* filesystem misbehavior.
*/
if (bytes_read != m->size)
- {
report_backup_error(context,
"file \"%s\" should contain %zu bytes, but read %zu bytes",
relpath, m->size, bytes_read);
- return;
- }
-
- /* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
- if (checksumlen < 0)
- {
- report_backup_error(context,
- "could not finalize checksum of file \"%s\"",
- relpath);
- return;
- }
-
- /* And check it against the manifest. */
- if (checksumlen != m->checksum_length)
- report_backup_error(context,
- "file \"%s\" has checksum of length %d, but expected %d",
- relpath, m->checksum_length, checksumlen);
- else if (memcmp(checksumbuf, m->checksum_payload, checksumlen) != 0)
- report_backup_error(context,
- "checksum mismatch for file \"%s\"",
- relpath);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index c11ff33a100..50a285752aa 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -101,6 +101,10 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, size_t filesize);
+extern bool verify_content_checksum(verifier_context *context,
+ pg_checksum_context *checksum_ctx,
+ manifest_file *m, uint8 *buf,
+ int buf_len, size_t *computed_len);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v2-0010-pg_verifybackup-Tests-and-document.patchapplication/octet-stream; name=v2-0010-pg_verifybackup-Tests-and-document.patchDownload
From 0fd6b72e710e08e96230e9b09974044eccdd4d44 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 17:04:56 +0530
Subject: [PATCH v2 10/10] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 54 +++++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 18 ++++-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 96 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..c743bd89a92 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. The main data directory's contents
+ will be written to a file named <filename>base.tar</filename>,
+ and each other tablespace will be written to a separate tar file
+ named after that tablespace's OID.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
@@ -227,6 +265,18 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-Z <replaceable class="parameter">method</replaceable></option></term>
+ <term><option>--compress=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ The tar backup compression method can be <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> if no compression.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..d47ce1f04fc 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,13 +17,25 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
+command_fails_like(
+ [ 'pg_verifybackup', '-Zgzip', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Zlz4', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required tar format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Znon_exist', $tempdir ],
+ qr/unrecognized compression algorithm/,
+ 'compression method should be valid');
# create fake manifest file
open(my $fh, '>', "$tempdir/backup_manifest") || die "open: $!";
@@ -31,7 +43,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v2-0009-pg_verifybackup-Read-tar-files-and-verify-its-con.patchapplication/octet-stream; name=v2-0009-pg_verifybackup-Read-tar-files-and-verify-its-con.patchDownload
From 15d61fbcceb84d8b8c780309a58893a88a197411 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Mon, 17 Jun 2024 17:11:26 +0530
Subject: [PATCH v2 09/10] pg_verifybackup: Read tar files and verify its
contents
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 250 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 220 +++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 10 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 480 insertions(+), 11 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..9be9a9bc04a
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/astreamer.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+
+ manifest_file *mfile;
+ size_t received_bytes;
+ bool verify_checksums;
+ bool verify_sysid;
+ pg_checksum_context *checksum_ctx;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ return &streamer->base;
+}
+
+/*
+ * It verifies each TAR member entry against the manifest data and performs
+ * checksum verification if enabled. Additionally, it validates the backup's
+ * system identifier against the backup_manifest.
+ */
+static void
+astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ if (!member->is_directory && !member->is_link &&
+ !should_ignore_relpath(mystreamer->context, member->pathname))
+ {
+ manifest_file *m;
+
+ /*
+ * The backup_manifest stores a relative path to the base
+ * directory for files belong tablespace, whereas
+ * <tablespaceoid>.tar doesn't. Prepare the required path,
+ * otherwise, the manfiest entry verification will fail.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and manifest system identifier
+ * verification.
+ *
+ * We could have these checks while receiving contents.
+ * However, since contents are received in multiple iterations,
+ * this would result in these lengthy checks being performed
+ * multiple times. Instead, having a single flag would be more
+ * efficient.
+ */
+ if (m != NULL)
+ {
+ mystreamer->verify_checksums =
+ (!skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_sysid =
+ should_verify_sysid(mystreamer->context->manifest, m);
+ }
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Perform checksum verification as the file content becomes
+ * available, since the TAR format does not have random access to
+ * files like a normal backup directory, where checksum verification
+ * occurs at different points.
+ */
+ if (mystreamer->verify_checksums)
+ {
+ /* If we were first time for this file */
+ if (!mystreamer->checksum_ctx)
+ {
+ mystreamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ if (pg_checksum_init(mystreamer->checksum_ctx,
+ mystreamer->mfile->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, member->pathname);
+ mystreamer->verify_checksums = false;
+ return;
+ }
+ }
+
+ /* Compute and do the checksum validation */
+ mystreamer->verify_checksums =
+ verify_content_checksum(mystreamer->context,
+ mystreamer->checksum_ctx,
+ mystreamer->mfile,
+ (uint8 *) data, len,
+ &mystreamer->received_bytes);
+ }
+
+ /* Do the manifest system identifier verification */
+ if (mystreamer->verify_sysid)
+ {
+ ControlFileData control_file;
+ uint64 manifest_system_identifier;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(mystreamer->context->manifest->version != 1);
+
+ /* Should have whole control file data. */
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ sizeof(ControlFileData)))
+ return;
+
+ pg_log_debug("%s: reading \"%s\"", mystreamer->archive_name,
+ member->pathname);
+
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: could not read control file: read %d of %zu",
+ mystreamer->archive_name, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data,
+ sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ (char *) (&control_file),
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ manifest_system_identifier =
+ mystreamer->context->manifest->system_identifier;
+
+ verify_control_file_data(&control_file, member->pathname,
+ crc_ok, manifest_system_identifier);
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+ mystreamer->checksum_ctx = NULL;
+ mystreamer->mfile = NULL;
+ mystreamer->received_bytes = 0;
+ mystreamer->verify_checksums = false;
+ mystreamer->verify_sysid = false;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 0d458298f34..21b16c281cb 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,12 @@
#include "common/compression.h"
#include "common/parse_manifest.h"
+#include "common/relpath.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
#include "pg_verifybackup.h"
+#include "pgtar.h"
#include "pgtime.h"
/*
@@ -39,6 +42,11 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
+
+static void (*verify_backup_file_cb) (verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -63,6 +71,15 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
+static void verify_plain_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_content(verifier_context *context,
+ char *relpath, char *fullpath,
+ astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -71,6 +88,9 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
static void progress_report(bool finished);
static void usage(void);
@@ -154,6 +174,10 @@ main(int argc, char **argv)
*/
simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -258,6 +282,15 @@ main(int argc, char **argv)
if (format == 't' && !tar_compression_specified)
compress_algorithm = find_backup_compression(&context);
+ /*
+ * Setup the required callback function to verify plain or tar backup
+ * files.
+ */
+ if (format == 'p')
+ verify_backup_file_cb = verify_plain_file_cb;
+ else
+ verify_backup_file_cb = verify_tar_file_cb;
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
@@ -637,7 +670,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
}
/*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
*
* The arguments to this function have the same meaning as the arguments to
* verify_backup_directory.
@@ -646,7 +680,6 @@ static void
verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
{
struct stat sb;
- manifest_file *m;
if (stat(fullpath, &sb) != 0)
{
@@ -679,8 +712,25 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Do the further verifications */
+ verify_backup_file_cb(context, relpath, fullpath, sb.st_size);
+}
+
+/*
+ * Verify one plan file or a symlink.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_plain_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
- m = verify_manifest_entry(context, relpath, sb.st_size);
+ m = verify_manifest_entry(context, relpath, filesize);
/* Validate the manifest system identifier */
if (m != NULL && should_verify_sysid(context->manifest, m))
@@ -698,6 +748,124 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
}
}
+/*
+ * Verify one tar file.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_tar_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ astreamer *streamer;
+ Oid tblspc_oid = InvalidOid;
+ int file_name_len;
+ int file_extn_len = 0; /* placate compiler */
+ char *file_extn = "";
+
+ /* Should be tar backup */
+ Assert(format == 't');
+
+ /* Find the tar file extension. */
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ {
+ file_extn = ".tar";
+ file_extn_len = 4;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_GZIP)
+ {
+ file_extn = ".tar.gz";
+ file_extn_len = 7;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ {
+ file_extn = ".tar.lz4";
+ file_extn_len = 8;
+ }
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ {
+ file_extn = ".tar.zst";
+ file_extn_len = 8;
+ }
+
+ /*
+ * Ensure that we have the correct file type corresponding to the backup
+ * format.
+ */
+ file_name_len = strlen(relpath);
+ if (file_name_len < file_extn_len ||
+ strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0)
+ {
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting tar file",
+ relpath);
+ else
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting \"%s\" compressed tar file",
+ relpath,
+ get_compress_algorithm_name(compress_algorithm));
+ return;
+ }
+
+ /*
+ * For the tablespace, pg_basebackup writes the data out to
+ * <tablespaceoid>.tar. If a file matches that format, then extract the
+ * tablespaceoid, which we need to prepare the paths of the files
+ * belonging to that tablespace relative to the base directory.
+ */
+ if (strspn(relpath, "0123456789") == (file_name_len - file_extn_len))
+ tblspc_oid = strtoi64(relpath, NULL, 10);
+
+ streamer = create_archive_verifier(context, relpath, tblspc_oid);
+ verify_tar_content(context, relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+}
+
+/*
+ * Reads a given tar file in predefined chunks and pass to astreamer. Which
+ * initiates routines for decompression (if necessary) then verification
+ * of each member within the tar archive.
+ */
+static void
+verify_tar_content(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -742,8 +910,8 @@ verify_manifest_entry(verifier_context *context, char *relpath, size_t filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
void
verify_control_file_data(ControlFileData *control_file,
@@ -1124,10 +1292,10 @@ find_backup_format(verifier_context *context)
}
/*
- * To determine the compression format, we will search for the main data
- * directory archive and its extension, which starts with base.tar, as
* pg_basebackup writes the main data directory to an archive file named
- * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ * base.tar, followed by a compression type extension such as .gz, .lz4, or
+ * .zst. To determine the compression format, we need to search for this main
+ * data directory archive file.
*/
static pg_compress_algorithm
find_backup_compression(verifier_context *context)
@@ -1178,6 +1346,42 @@ find_backup_compression(verifier_context *context)
return PG_COMPRESSION_NONE; /* placate compiler */
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algorithm == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print out usage information and exit.
*/
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 64508578290..e27fe4da6a2 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -127,4 +127,14 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();
extern bool should_ignore_relpath(verifier_context *context, const char *relpath);
+/* Forward declarations to avoid fe_utils/astreamer.h include. */
+struct astreamer;
+typedef struct astreamer astreamer;
+
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b982dffa5fc..ab279c9eb39 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3321,6 +3321,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
--
2.18.0
v2-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchapplication/octet-stream; name=v2-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchDownload
From b34e3ecffa493b8037416805106c931105d9a9a3 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 9 Jul 2024 10:30:31 +0530
Subject: [PATCH v2 03/10] Refactor: move astreamer* files to fe_utils to make
common availability of it.
To make it accessible to other code, we need to move the ASTREAMER
code (previously known as BBSTREAMER) to a common location. The
appropriate place would be src/fe_utils, as it is a frontend
infrastructure intended for shared use.
---
src/bin/pg_basebackup/Makefile | 7 +------
src/bin/pg_basebackup/astreamer_inject.h | 2 +-
src/bin/pg_basebackup/meson.build | 5 -----
src/fe_utils/Makefile | 11 +++++++++--
src/{bin/pg_basebackup => fe_utils}/astreamer_file.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c | 2 +-
src/fe_utils/meson.build | 7 ++++++-
.../pg_basebackup => include/fe_utils}/astreamer.h | 0
11 files changed, 22 insertions(+), 20 deletions(-)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_file.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c (99%)
rename src/{bin/pg_basebackup => include/fe_utils}/astreamer.h (100%)
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a71af2d48a7..f1e73058b23 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,7 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- astreamer_file.o \
- astreamer_gzip.o \
- astreamer_inject.o \
- astreamer_lz4.o \
- astreamer_tar.o \
- astreamer_zstd.o
+ astreamer_inject.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
index 8504b3f5e0d..aeed533862b 100644
--- a/src/bin/pg_basebackup/astreamer_inject.h
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -12,7 +12,7 @@
#ifndef ASTREAMER_INJECT_H
#define ASTREAMER_INJECT_H
-#include "astreamer.h"
+#include "fe_utils/astreamer.h"
#include "pqexpbuffer.h"
extern astreamer *astreamer_recovery_injector_new(astreamer *next,
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index a68dbd7837d..9101fc18438 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,7 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'astreamer_file.c',
- 'astreamer_gzip.c',
'astreamer_inject.c',
- 'astreamer_lz4.c',
- 'astreamer_tar.c',
- 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 946c05258f0..ff002f37d57 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -34,13 +34,20 @@ OBJS = \
simple_list.o \
string_utils.o
+AOBJS = \
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o
+
ifeq ($(PORTNAME), win32)
override CPPFLAGS += -DFD_SETSIZE=1024
endif
all: libpgfeutils.a
-libpgfeutils.a: $(OBJS)
+libpgfeutils.a: $(AOBJS) $(OBJS)
rm -f $@
$(AR) $(AROPT) $@ $^
@@ -59,5 +66,5 @@ uninstall:
rm -f '$(DESTDIR)$(libdir)/libpgfeutils.a'
clean distclean:
- rm -f libpgfeutils.a $(OBJS) lex.backup
+ rm -f libpgfeutils.a $(AOBJS) $(OBJS) lex.backup
rm -f psqlscan.c
diff --git a/src/bin/pg_basebackup/astreamer_file.c b/src/fe_utils/astreamer_file.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_file.c
rename to src/fe_utils/astreamer_file.c
index 2742385e103..13d1192c6e6 100644
--- a/src/bin/pg_basebackup/astreamer_file.c
+++ b/src/fe_utils/astreamer_file.c
@@ -13,10 +13,10 @@
#include <unistd.h>
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
typedef struct astreamer_plain_writer
{
diff --git a/src/bin/pg_basebackup/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_gzip.c
rename to src/fe_utils/astreamer_gzip.c
index 6f7c27afbbc..dd28defac7b 100644
--- a/src/bin/pg_basebackup/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -17,10 +17,10 @@
#include <zlib.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef HAVE_LIBZ
typedef struct astreamer_gzip_writer
diff --git a/src/bin/pg_basebackup/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_lz4.c
rename to src/fe_utils/astreamer_lz4.c
index 1c40d7d8ad5..d8b2a367e47 100644
--- a/src/bin/pg_basebackup/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -17,10 +17,10 @@
#include <lz4frame.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_LZ4
typedef struct astreamer_lz4_frame
diff --git a/src/bin/pg_basebackup/astreamer_tar.c b/src/fe_utils/astreamer_tar.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_tar.c
rename to src/fe_utils/astreamer_tar.c
index 673690cd18f..f5d3562d280 100644
--- a/src/bin/pg_basebackup/astreamer_tar.c
+++ b/src/fe_utils/astreamer_tar.c
@@ -23,8 +23,8 @@
#include <time.h>
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#include "pgtar.h"
typedef struct astreamer_tar_parser
diff --git a/src/bin/pg_basebackup/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_zstd.c
rename to src/fe_utils/astreamer_zstd.c
index 58dc679ef99..45f6cb67363 100644
--- a/src/bin/pg_basebackup/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -17,8 +17,8 @@
#include <zstd.h>
#endif
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_ZSTD
diff --git a/src/fe_utils/meson.build b/src/fe_utils/meson.build
index 14d0482a2cc..0ec28e86af7 100644
--- a/src/fe_utils/meson.build
+++ b/src/fe_utils/meson.build
@@ -13,6 +13,11 @@ fe_utils_sources = files(
'recovery_gen.c',
'simple_list.c',
'string_utils.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
)
psqlscan = custom_target('psqlscan',
@@ -28,6 +33,6 @@ fe_utils = static_library('libpgfeutils',
c_pch: pch_postgres_fe_h,
include_directories: [postgres_inc, libpq_inc],
c_args: host_system == 'windows' ? ['-DFD_SETSIZE=1024'] : [],
- dependencies: frontend_common_code,
+ dependencies: [frontend_common_code, lz4, zlib, zstd],
kwargs: default_lib_args,
)
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/include/fe_utils/astreamer.h
similarity index 100%
rename from src/bin/pg_basebackup/astreamer.h
rename to src/include/fe_utils/astreamer.h
--
2.18.0
v2-0004-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchapplication/octet-stream; name=v2-0004-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchDownload
From 92b5d91ce00087ea30cec35420d0deb124aa1531 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:32:11 +0530
Subject: [PATCH v2 04/10] Refactor: move some part of pg_verifybackup.c to
pg_verifybackup.h
---
src/bin/pg_verifybackup/pg_verifybackup.c | 99 ++-----------------
src/bin/pg_verifybackup/pg_verifybackup.h | 112 ++++++++++++++++++++++
2 files changed, 119 insertions(+), 92 deletions(-)
create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..a248db1b28a 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "pg_verifybackup.h"
#include "pgtime.h"
/*
@@ -40,83 +39,6 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE (128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
- uint32 status; /* hash status */
- const char *pathname;
- size_t size;
- pg_checksum_type checksum_type;
- int checksum_length;
- uint8 *checksum_payload;
- bool matched;
- bool bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX manifest_files
-#define SH_ELEMENT_TYPE manifest_file
-#define SH_KEY_TYPE const char *
-#define SH_KEY pathname
-#define SH_HASH_KEY(tb, key) hash_string(key)
-#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
-#define SH_SCOPE static inline
-#define SH_RAW_ALLOCATOR pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
- TimeLineID tli;
- XLogRecPtr start_lsn;
- XLogRecPtr end_lsn;
- struct manifest_wal_range *next;
- struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
- int version;
- uint64 system_identifier;
- manifest_files_hash *files;
- manifest_wal_range *first_wal_range;
- manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
- manifest_data *manifest;
- char *backup_directory;
- SimpleStringList ignore_list;
- bool exit_on_error;
- bool saw_any_error;
-} verifier_context;
-
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -150,21 +72,14 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
-static void report_backup_error(verifier_context *context,
- const char *pg_restrict fmt,...)
- pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
- pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
static void progress_report(bool finished);
static void usage(void);
static const char *progname;
/* options */
-static bool show_progress = false;
-static bool skip_checksums = false;
+bool show_progress = false;
+bool skip_checksums = false;
/* Progress indicators */
static uint64 total_size = 0;
@@ -556,7 +471,7 @@ verifybackup_per_file_cb(JsonManifestParseContext *context,
bool found;
/* Make a new entry in the hash table for this file. */
- m = manifest_files_insert(ht, pathname, &found);
+ m = manifest_files_insert(ht, (char *) pathname, &found);
if (found)
report_fatal_error("duplicate path name in backup manifest: \"%s\"",
pathname);
@@ -979,7 +894,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
* Update the context to indicate that we saw an error, and exit if the
* context says we should.
*/
-static void
+void
report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
{
va_list ap;
@@ -996,7 +911,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
/*
* Report a fatal error and exit
*/
-static void
+void
report_fatal_error(const char *pg_restrict fmt,...)
{
va_list ap;
@@ -1015,7 +930,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
* Note that by "prefix" we mean a parent directory; for this purpose,
* "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
*/
-static bool
+bool
should_ignore_relpath(verifier_context *context, const char *relpath)
{
SimpleStringListCell *cell;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..c11ff33a100
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,112 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ * Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+extern bool show_progress;
+extern bool skip_checksums;
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+ uint32 status; /* hash status */
+ char *pathname;
+ size_t size;
+ pg_checksum_type checksum_type;
+ int checksum_length;
+ uint8 *checksum_payload;
+ bool matched;
+ bool bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+ (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX manifest_files
+#define SH_ELEMENT_TYPE manifest_file
+#define SH_KEY_TYPE char *
+#define SH_KEY pathname
+#define SH_HASH_KEY(tb, key) hash_string(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ struct manifest_wal_range *next;
+ struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+ int version;
+ uint64 system_identifier;
+ manifest_files_hash *files;
+ manifest_wal_range *first_wal_range;
+ manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+ manifest_data *manifest;
+ char *backup_directory;
+ SimpleStringList ignore_list;
+ bool exit_on_error;
+ bool saw_any_error;
+} verifier_context;
+
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, size_t filesize);
+
+extern void report_backup_error(verifier_context *context,
+ const char *pg_restrict fmt,...)
+ pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+ pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context, const char *relpath);
+
+#endif /* PG_VERIFYBACKUP_H */
--
2.18.0
v2-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchapplication/octet-stream; name=v2-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchDownload
From 011ff073e67e9bcc5c3fc5bf30cff996d3a78e03 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 17 Jul 2024 14:23:27 +0530
Subject: [PATCH v2 02/10] Refactor: Add astreamer_inject.h and move related
declarations to it.
---
src/bin/pg_basebackup/astreamer.h | 6 ------
src/bin/pg_basebackup/astreamer_inject.c | 2 +-
src/bin/pg_basebackup/astreamer_inject.h | 24 ++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 2 +-
4 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer_inject.h
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
index 6b0047418bb..9d0a8c4d0c2 100644
--- a/src/bin/pg_basebackup/astreamer.h
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -217,10 +217,4 @@ extern astreamer *astreamer_tar_parser_new(astreamer *next);
extern astreamer *astreamer_tar_terminator_new(astreamer *next);
extern astreamer *astreamer_tar_archiver_new(astreamer *next);
-extern astreamer *astreamer_recovery_injector_new(astreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void astreamer_inject_file(astreamer *streamer, char *pathname,
- char *data, int len);
-
#endif
diff --git a/src/bin/pg_basebackup/astreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
index 7f1decded8d..4ad8381f102 100644
--- a/src/bin/pg_basebackup/astreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -11,7 +11,7 @@
#include "postgres_fe.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "common/file_perm.h"
#include "common/logging.h"
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
new file mode 100644
index 00000000000..8504b3f5e0d
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_inject.h
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer_inject.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_INJECT_H
+#define ASTREAMER_INJECT_H
+
+#include "astreamer.h"
+#include "pqexpbuffer.h"
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4179b064cbc..1e753e40c97 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,7 +26,7 @@
#endif
#include "access/xlog_internal.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "backup/basebackup.h"
#include "common/compression.h"
#include "common/file_perm.h"
--
2.18.0
v2-0005-Refactor-split-verify_backup_file-function.patchapplication/octet-stream; name=v2-0005-Refactor-split-verify_backup_file-function.patchDownload
From f013bc9f93ac0a31b22fcdaa8c57422a15adc552 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Mon, 17 Jun 2024 14:17:22 +0530
Subject: [PATCH v2 05/10] Refactor: split verify_backup_file() function.
Separate the manifest entry verification code into a new function.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 36 ++++++++++++++++-------
1 file changed, 25 insertions(+), 11 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index a248db1b28a..d17b565a604 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -624,35 +624,47 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
}
/* Check whether there's an entry in the manifest hash. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, size_t filesize)
+{
+ manifest_file *m;
+
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
{
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
/* Update statistics for progress report, if necessary */
if (show_progress && !skip_checksums && should_verify_checksum(m))
total_size += m->size;
@@ -663,6 +675,8 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
--
2.18.0
v2-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchapplication/octet-stream; name=v2-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchDownload
From c83e66ccff090c15979093692f33fcedbc3ccf39 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 09:39:32 +0530
Subject: [PATCH v2 01/10] Refactor: Rename all bbstreamer references to
astreamer.
BBSTREAMER is specific to pg_basebackup; we need a more generalized
name so it can be placed in a common area, making it accessible for
other modules. Renaming it to ASTREAMER, short for ARCHIVE STREAMER,
makes it more general.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/astreamer.h | 226 +++++++++++++
.../{bbstreamer_file.c => astreamer_file.c} | 148 ++++----
.../{bbstreamer_gzip.c => astreamer_gzip.c} | 154 ++++-----
...bbstreamer_inject.c => astreamer_inject.c} | 152 ++++-----
.../{bbstreamer_lz4.c => astreamer_lz4.c} | 172 +++++-----
.../{bbstreamer_tar.c => astreamer_tar.c} | 316 +++++++++---------
.../{bbstreamer_zstd.c => astreamer_zstd.c} | 160 ++++-----
src/bin/pg_basebackup/bbstreamer.h | 226 -------------
src/bin/pg_basebackup/meson.build | 12 +-
src/bin/pg_basebackup/nls.mk | 12 +-
src/bin/pg_basebackup/pg_basebackup.c | 74 ++--
src/tools/pgindent/typedefs.list | 26 +-
13 files changed, 845 insertions(+), 845 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer.h
rename src/bin/pg_basebackup/{bbstreamer_file.c => astreamer_file.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_gzip.c => astreamer_gzip.c} (62%)
rename src/bin/pg_basebackup/{bbstreamer_inject.c => astreamer_inject.c} (53%)
rename src/bin/pg_basebackup/{bbstreamer_lz4.c => astreamer_lz4.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_tar.c => astreamer_tar.c} (50%)
rename src/bin/pg_basebackup/{bbstreamer_zstd.c => astreamer_zstd.c} (64%)
delete mode 100644 src/bin/pg_basebackup/bbstreamer.h
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 26c53e473f5..a71af2d48a7 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,12 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- bbstreamer_file.o \
- bbstreamer_gzip.o \
- bbstreamer_inject.o \
- bbstreamer_lz4.o \
- bbstreamer_tar.o \
- bbstreamer_zstd.o
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_inject.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
new file mode 100644
index 00000000000..6b0047418bb
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * astreamer objects for further processing. The astreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent astreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_H
+#define ASTREAMER_H
+
+#include "common/compression.h"
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct astreamer;
+struct astreamer_ops;
+typedef struct astreamer astreamer;
+typedef struct astreamer_ops astreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a astreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see astreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one ASTREAMER_MEMBER_HEADER chunk and
+ * exactly one ASTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * ASTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly ASTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last ASTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ ASTREAMER_UNKNOWN,
+ ASTREAMER_MEMBER_HEADER,
+ ASTREAMER_MEMBER_CONTENTS,
+ ASTREAMER_MEMBER_TRAILER,
+ ASTREAMER_ARCHIVE_TRAILER,
+} astreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as ASTREAMER_MEMBER_HEADER,
+ * ASTREAMER_MEMBER_CONTENTS, or ASTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} astreamer_member;
+
+/*
+ * Generally, each type of astreamer will define its own struct, but the
+ * first element should be 'astreamer base'. A astreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the astreamer_ops object which contains the
+ * function pointers appropriate to this type of astreamer.
+ *
+ * bbs_next is a pointer to the successor astreamer, for those types of
+ * astreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of astreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct astreamer
+{
+ const astreamer_ops *bbs_ops;
+ astreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a astreamer. The 'content' callback is
+ * called repeatedly, as described in the astreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * astreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct astreamer_ops
+{
+ void (*content) (astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+ void (*finalize) (astreamer *streamer);
+ void (*free) (astreamer *streamer);
+};
+
+/* Send some content to a astreamer. */
+static inline void
+astreamer_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a astreamer. */
+static inline void
+astreamer_finalize(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a astreamer. */
+static inline void
+astreamer_free(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the astreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+astreamer_buffer_bytes(astreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * astreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+astreamer_buffer_until(astreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ astreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ astreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating astreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern astreamer *astreamer_plain_writer_new(char *pathname, FILE *file);
+extern astreamer *astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern astreamer *astreamer_gzip_decompressor_new(astreamer *next);
+extern astreamer *astreamer_lz4_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_lz4_decompressor_new(astreamer *next);
+extern astreamer *astreamer_zstd_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_zstd_decompressor_new(astreamer *next);
+extern astreamer *astreamer_tar_parser_new(astreamer *next);
+extern astreamer *astreamer_tar_terminator_new(astreamer *next);
+extern astreamer *astreamer_tar_archiver_new(astreamer *next);
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/astreamer_file.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_file.c
rename to src/bin/pg_basebackup/astreamer_file.c
index bab6cd4a6b1..2742385e103 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/astreamer_file.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_file.c
+ * astreamer_file.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_file.c
+ * src/bin/pg_basebackup/astreamer_file.c
*-------------------------------------------------------------------------
*/
@@ -13,60 +13,60 @@
#include <unistd.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
-typedef struct bbstreamer_plain_writer
+typedef struct astreamer_plain_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
FILE *file;
bool should_close_file;
-} bbstreamer_plain_writer;
+} astreamer_plain_writer;
-typedef struct bbstreamer_extractor
+typedef struct astreamer_extractor
{
- bbstreamer base;
+ astreamer base;
char *basepath;
const char *(*link_map) (const char *);
void (*report_output_file) (const char *);
char filename[MAXPGPATH];
FILE *file;
-} bbstreamer_extractor;
+} astreamer_extractor;
-static void bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+static void astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_plain_writer_finalize(astreamer *streamer);
+static void astreamer_plain_writer_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_plain_writer_ops = {
- .content = bbstreamer_plain_writer_content,
- .finalize = bbstreamer_plain_writer_finalize,
- .free = bbstreamer_plain_writer_free
+static const astreamer_ops astreamer_plain_writer_ops = {
+ .content = astreamer_plain_writer_content,
+ .finalize = astreamer_plain_writer_finalize,
+ .free = astreamer_plain_writer_free
};
-static void bbstreamer_extractor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_extractor_finalize(bbstreamer *streamer);
-static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void astreamer_extractor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_extractor_finalize(astreamer *streamer);
+static void astreamer_extractor_free(astreamer *streamer);
static void extract_directory(const char *filename, mode_t mode);
static void extract_link(const char *filename, const char *linktarget);
static FILE *create_file_for_extract(const char *filename, mode_t mode);
-static const bbstreamer_ops bbstreamer_extractor_ops = {
- .content = bbstreamer_extractor_content,
- .finalize = bbstreamer_extractor_finalize,
- .free = bbstreamer_extractor_free
+static const astreamer_ops astreamer_extractor_ops = {
+ .content = astreamer_extractor_content,
+ .finalize = astreamer_extractor_finalize,
+ .free = astreamer_extractor_free
};
/*
- * Create a bbstreamer that just writes data to a file.
+ * Create a astreamer that just writes data to a file.
*
* The caller must specify a pathname and may specify a file. The pathname is
* used for error-reporting purposes either way. If file is NULL, the pathname
@@ -74,14 +74,14 @@ static const bbstreamer_ops bbstreamer_extractor_ops = {
* for writing and closed when done. If file is not NULL, the data is written
* there.
*/
-bbstreamer *
-bbstreamer_plain_writer_new(char *pathname, FILE *file)
+astreamer *
+astreamer_plain_writer_new(char *pathname, FILE *file)
{
- bbstreamer_plain_writer *streamer;
+ astreamer_plain_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_plain_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_plain_writer_ops;
+ streamer = palloc0(sizeof(astreamer_plain_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_plain_writer_ops;
streamer->pathname = pstrdup(pathname);
streamer->file = file;
@@ -101,13 +101,13 @@ bbstreamer_plain_writer_new(char *pathname, FILE *file)
* Write archive content to file.
*/
static void
-bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (len == 0)
return;
@@ -128,11 +128,11 @@ bbstreamer_plain_writer_content(bbstreamer *streamer,
* the file if we opened it, but not if the caller provided it.
*/
static void
-bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+astreamer_plain_writer_finalize(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
pg_fatal("could not close file \"%s\": %m",
@@ -143,14 +143,14 @@ bbstreamer_plain_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_plain_writer_free(bbstreamer *streamer)
+astreamer_plain_writer_free(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
Assert(!mystreamer->should_close_file);
Assert(mystreamer->base.bbs_next == NULL);
@@ -160,13 +160,13 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
}
/*
- * Create a bbstreamer that extracts an archive.
+ * Create a astreamer that extracts an archive.
*
* All pathnames in the archive are interpreted relative to basepath.
*
- * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * Unlike e.g. astreamer_plain_writer_new() we can't do anything useful here
* with untyped chunks; we need typed chunks which follow the rules described
- * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * in astreamer.h. Assuming we have that, we don't need to worry about the
* original archive format; it's enough to just look at the member information
* provided and write to the corresponding file.
*
@@ -179,16 +179,16 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
* new output file. The pathname to that file is passed as an argument. If
* NULL, the call is skipped.
*/
-bbstreamer *
-bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *))
+astreamer *
+astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
{
- bbstreamer_extractor *streamer;
+ astreamer_extractor *streamer;
- streamer = palloc0(sizeof(bbstreamer_extractor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_extractor_ops;
+ streamer = palloc0(sizeof(astreamer_extractor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_extractor_ops;
streamer->basepath = pstrdup(basepath);
streamer->link_map = link_map;
streamer->report_output_file = report_output_file;
@@ -200,19 +200,19 @@ bbstreamer_extractor_new(const char *basepath,
* Extract archive contents to the filesystem.
*/
static void
-bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_extractor_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
int fnamelen;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
+ Assert(context != ASTREAMER_UNKNOWN);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
Assert(mystreamer->file == NULL);
/* Prepend basepath. */
@@ -245,7 +245,7 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
mystreamer->report_output_file(mystreamer->filename);
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
if (mystreamer->file == NULL)
break;
@@ -260,14 +260,14 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
if (mystreamer->file == NULL)
break;
fclose(mystreamer->file);
mystreamer->file = NULL;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
break;
default:
@@ -375,10 +375,10 @@ create_file_for_extract(const char *filename, mode_t mode)
* There's nothing to do here but sanity checking.
*/
static void
-bbstreamer_extractor_finalize(bbstreamer *streamer)
+astreamer_extractor_finalize(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
- = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
+ = (astreamer_extractor *) streamer;
Assert(mystreamer->file == NULL);
}
@@ -387,9 +387,9 @@ bbstreamer_extractor_finalize(bbstreamer *streamer)
* Free memory.
*/
static void
-bbstreamer_extractor_free(bbstreamer *streamer)
+astreamer_extractor_free(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
pfree(mystreamer->basepath);
pfree(mystreamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/astreamer_gzip.c
similarity index 62%
rename from src/bin/pg_basebackup/bbstreamer_gzip.c
rename to src/bin/pg_basebackup/astreamer_gzip.c
index 0417fd9bc2c..6f7c27afbbc 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/astreamer_gzip.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_gzip.c
+ * astreamer_gzip.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_gzip.c
+ * src/bin/pg_basebackup/astreamer_gzip.c
*-------------------------------------------------------------------------
*/
@@ -17,74 +17,74 @@
#include <zlib.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
+typedef struct astreamer_gzip_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
gzFile gzfile;
-} bbstreamer_gzip_writer;
+} astreamer_gzip_writer;
-typedef struct bbstreamer_gzip_decompressor
+typedef struct astreamer_gzip_decompressor
{
- bbstreamer base;
+ astreamer base;
z_stream zstream;
size_t bytes_written;
-} bbstreamer_gzip_decompressor;
+} astreamer_gzip_decompressor;
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static void astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_writer_finalize(astreamer *streamer);
+static void astreamer_gzip_writer_free(astreamer *streamer);
static const char *get_gz_error(gzFile gzf);
-static const bbstreamer_ops bbstreamer_gzip_writer_ops = {
- .content = bbstreamer_gzip_writer_content,
- .finalize = bbstreamer_gzip_writer_finalize,
- .free = bbstreamer_gzip_writer_free
+static const astreamer_ops astreamer_gzip_writer_ops = {
+ .content = astreamer_gzip_writer_content,
+ .finalize = astreamer_gzip_writer_finalize,
+ .free = astreamer_gzip_writer_free
};
-static void bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_decompressor_free(bbstreamer *streamer);
+static void astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_decompressor_finalize(astreamer *streamer);
+static void astreamer_gzip_decompressor_free(astreamer *streamer);
static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
static void gzip_pfree(void *opaque, void *address);
-static const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
- .content = bbstreamer_gzip_decompressor_content,
- .finalize = bbstreamer_gzip_decompressor_finalize,
- .free = bbstreamer_gzip_decompressor_free
+static const astreamer_ops astreamer_gzip_decompressor_ops = {
+ .content = astreamer_gzip_decompressor_content,
+ .finalize = astreamer_gzip_decompressor_finalize,
+ .free = astreamer_gzip_decompressor_free
};
#endif
/*
- * Create a bbstreamer that just compresses data using gzip, and then writes
+ * Create a astreamer that just compresses data using gzip, and then writes
* it to a file.
*
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * As in the case of astreamer_plain_writer_new, pathname is always used
* for error reporting purposes; if file is NULL, it is also the opened and
* closed so that the data may be written there.
*/
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress)
+astreamer *
+astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_writer *streamer;
+ astreamer_gzip_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_gzip_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_writer_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_writer_ops;
streamer->pathname = pstrdup(pathname);
@@ -123,13 +123,13 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file,
* Write archive content to gzip file.
*/
static void
-bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
if (len == 0)
return;
@@ -151,16 +151,16 @@ bbstreamer_gzip_writer_content(bbstreamer *streamer,
*
* It makes no difference whether we opened the file or the caller did it,
* because libz provides no way of avoiding a close on the underlying file
- * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * handle. Notice, however, that astreamer_gzip_writer_new() uses dup() to
* work around this issue, so that the behavior from the caller's viewpoint
- * is the same as for bbstreamer_plain_writer.
+ * is the same as for astreamer_plain_writer.
*/
static void
-bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+astreamer_gzip_writer_finalize(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
errno = 0; /* in case gzclose() doesn't set it */
if (gzclose(mystreamer->gzfile) != 0)
@@ -171,14 +171,14 @@ bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_gzip_writer_free(bbstreamer *streamer)
+astreamer_gzip_writer_free(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
Assert(mystreamer->base.bbs_next == NULL);
Assert(mystreamer->gzfile == NULL);
@@ -208,18 +208,18 @@ get_gz_error(gzFile gzf)
* Create a new base backup streamer that performs decompression of gzip
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_gzip_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_gzip_decompressor_new(astreamer *next)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_decompressor *streamer;
+ astreamer_gzip_decompressor *streamer;
z_stream *zs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_gzip_decompressor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_decompressor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -258,15 +258,15 @@ bbstreamer_gzip_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
z_stream *zs;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
zs = &mystreamer->zstream;
zs->next_in = (const uint8 *) data;
@@ -301,9 +301,9 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
/* If output buffer is full then pass data to next streamer */
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen, context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
mystreamer->bytes_written = 0;
}
}
@@ -313,31 +313,31 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer)
+astreamer_gzip_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_gzip_decompressor_free(bbstreamer *streamer)
+astreamer_gzip_decompressor_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
similarity index 53%
rename from src/bin/pg_basebackup/bbstreamer_inject.c
rename to src/bin/pg_basebackup/astreamer_inject.c
index 194026b56e9..7f1decded8d 100644
--- a/src/bin/pg_basebackup/bbstreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -1,51 +1,51 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_inject.c
+ * astreamer_inject.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_inject.c
+ * src/bin/pg_basebackup/astreamer_inject.c
*-------------------------------------------------------------------------
*/
#include "postgres_fe.h"
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
-typedef struct bbstreamer_recovery_injector
+typedef struct astreamer_recovery_injector
{
- bbstreamer base;
+ astreamer base;
bool skip_file;
bool is_recovery_guc_supported;
bool is_postgresql_auto_conf;
bool found_postgresql_auto_conf;
PQExpBuffer recoveryconfcontents;
- bbstreamer_member member;
-} bbstreamer_recovery_injector;
+ astreamer_member member;
+} astreamer_recovery_injector;
-static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
-static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+static void astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_recovery_injector_finalize(astreamer *streamer);
+static void astreamer_recovery_injector_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
- .content = bbstreamer_recovery_injector_content,
- .finalize = bbstreamer_recovery_injector_finalize,
- .free = bbstreamer_recovery_injector_free
+static const astreamer_ops astreamer_recovery_injector_ops = {
+ .content = astreamer_recovery_injector_content,
+ .finalize = astreamer_recovery_injector_finalize,
+ .free = astreamer_recovery_injector_free
};
/*
- * Create a bbstreamer that can edit recoverydata into an archive stream.
+ * Create a astreamer that can edit recoverydata into an archive stream.
*
- * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
- * per the conventions described in bbstreamer.h; the chunks forwarded to
- * the next bbstreamer will be similarly typed, but the
- * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * The input should be a series of typed chunks (not ASTREAMER_UNKNOWN) as
+ * per the conventions described in astreamer.h; the chunks forwarded to
+ * the next astreamer will be similarly typed, but the
+ * ASTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
* edited the archive stream.
*
* Our goal is to do one of the following three things with the content passed
@@ -61,16 +61,16 @@ static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
* zero-length standby.signal file, dropping any file with that name from
* the archive.
*/
-bbstreamer *
-bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents)
+astreamer *
+astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
{
- bbstreamer_recovery_injector *streamer;
+ astreamer_recovery_injector *streamer;
- streamer = palloc0(sizeof(bbstreamer_recovery_injector));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_recovery_injector_ops;
+ streamer = palloc0(sizeof(astreamer_recovery_injector));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_recovery_injector_ops;
streamer->base.bbs_next = next;
streamer->is_recovery_guc_supported = is_recovery_guc_supported;
streamer->recoveryconfcontents = recoveryconfcontents;
@@ -82,21 +82,21 @@ bbstreamer_recovery_injector_new(bbstreamer *next,
* Handle each chunk of tar content while injecting recovery configuration.
*/
static void
-bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_recovery_injector *mystreamer;
+ astreamer_recovery_injector *mystreamer;
- mystreamer = (bbstreamer_recovery_injector *) streamer;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ mystreamer = (astreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/* Must copy provided data so we have the option to modify it. */
- memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+ memcpy(&mystreamer->member, member, sizeof(astreamer_member));
/*
* On v12+, skip standby.signal and edit postgresql.auto.conf; on
@@ -119,8 +119,8 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
/*
* Zap data and len because the archive header is no
- * longer valid; some subsequent bbstreamer must
- * regenerate it if it's necessary.
+ * longer valid; some subsequent astreamer must regenerate
+ * it if it's necessary.
*/
data = NULL;
len = 0;
@@ -135,26 +135,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
return;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/* Do not forward if the file is to be skipped. */
if (mystreamer->skip_file)
return;
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/* Do not forward it the file is to be skipped. */
if (mystreamer->skip_file)
return;
/* Append provided content to whatever we already sent. */
if (mystreamer->is_postgresql_auto_conf)
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ ASTREAMER_MEMBER_CONTENTS);
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
if (mystreamer->is_recovery_guc_supported)
{
/*
@@ -163,22 +163,22 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
* member now.
*/
if (!mystreamer->found_postgresql_auto_conf)
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "postgresql.auto.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
/* Inject empty standby.signal file. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "standby.signal", "", 0);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
}
else
{
/* Inject recovery.conf file with specified contents. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "recovery.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
}
/* Nothing to do here. */
@@ -189,26 +189,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
pg_fatal("unexpected state while injecting recovery settings");
}
- bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
- data, len, context);
+ astreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
}
/*
- * End-of-stream processing for this bbstreamer.
+ * End-of-stream processing for this astreamer.
*/
static void
-bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+astreamer_recovery_injector_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_recovery_injector_free(bbstreamer *streamer)
+astreamer_recovery_injector_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
@@ -216,10 +216,10 @@ bbstreamer_recovery_injector_free(bbstreamer *streamer)
* Inject a member into the archive with specified contents.
*/
void
-bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
- int len)
+astreamer_inject_file(astreamer *streamer, char *pathname, char *data,
+ int len)
{
- bbstreamer_member member;
+ astreamer_member member;
strlcpy(member.pathname, pathname, MAXPGPATH);
member.size = len;
@@ -238,12 +238,12 @@ bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
/*
* We don't know here how to generate valid member headers and trailers
* for the archiving format in use, so if those are needed, some successor
- * bbstreamer will have to generate them using the data from 'member'.
+ * astreamer will have to generate them using the data from 'member'.
*/
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_HEADER);
- bbstreamer_content(streamer, &member, data, len,
- BBSTREAMER_MEMBER_CONTENTS);
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_HEADER);
+ astreamer_content(streamer, &member, data, len,
+ ASTREAMER_MEMBER_CONTENTS);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/astreamer_lz4.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_lz4.c
rename to src/bin/pg_basebackup/astreamer_lz4.c
index f5c9e68150c..1c40d7d8ad5 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/astreamer_lz4.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_lz4.c
+ * astreamer_lz4.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_lz4.c
+ * src/bin/pg_basebackup/astreamer_lz4.c
*-------------------------------------------------------------------------
*/
@@ -17,15 +17,15 @@
#include <lz4frame.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef USE_LZ4
-typedef struct bbstreamer_lz4_frame
+typedef struct astreamer_lz4_frame
{
- bbstreamer base;
+ astreamer base;
LZ4F_compressionContext_t cctx;
LZ4F_decompressionContext_t dctx;
@@ -33,32 +33,32 @@ typedef struct bbstreamer_lz4_frame
size_t bytes_written;
bool header_written;
-} bbstreamer_lz4_frame;
+} astreamer_lz4_frame;
-static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+static void astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_compressor_finalize(astreamer *streamer);
+static void astreamer_lz4_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
- .content = bbstreamer_lz4_compressor_content,
- .finalize = bbstreamer_lz4_compressor_finalize,
- .free = bbstreamer_lz4_compressor_free
+static const astreamer_ops astreamer_lz4_compressor_ops = {
+ .content = astreamer_lz4_compressor_content,
+ .finalize = astreamer_lz4_compressor_finalize,
+ .free = astreamer_lz4_compressor_free
};
-static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+static void astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_decompressor_finalize(astreamer *streamer);
+static void astreamer_lz4_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
- .content = bbstreamer_lz4_decompressor_content,
- .finalize = bbstreamer_lz4_decompressor_finalize,
- .free = bbstreamer_lz4_decompressor_free
+static const astreamer_ops astreamer_lz4_decompressor_ops = {
+ .content = astreamer_lz4_decompressor_content,
+ .finalize = astreamer_lz4_decompressor_finalize,
+ .free = astreamer_lz4_decompressor_free
};
#endif
@@ -66,19 +66,19 @@ static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* Create a new base backup streamer that performs lz4 compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_lz4_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
LZ4F_preferences_t *prefs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_compressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -113,19 +113,19 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compr
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t out_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
/* Write header before processing the first input chunk. */
@@ -159,10 +159,10 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
out_bound = LZ4F_compressBound(len, &mystreamer->prefs);
if (avail_out < out_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ context);
/* Enlarge buffer if it falls short of out bound. */
if (mystreamer->base.bbs_buffer.maxlen < out_bound)
@@ -196,25 +196,25 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
+astreamer_lz4_compressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_out;
size_t footer_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/* Find out the footer bound and update the output buffer. */
footer_bound = LZ4F_compressBound(0, &mystreamer->prefs);
if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <
footer_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
/* Enlarge buffer if it falls short of footer bound. */
if (mystreamer->base.bbs_buffer.maxlen < footer_bound)
@@ -243,24 +243,24 @@ bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
mystreamer->bytes_written += compressed_size;
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_compressor_free(bbstreamer *streamer)
+astreamer_lz4_compressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeCompressionContext(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -271,18 +271,18 @@ bbstreamer_lz4_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of lz4
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_lz4_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_lz4_decompressor_new(astreamer *next)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -307,18 +307,18 @@ bbstreamer_lz4_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t avail_in,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
avail_in = len;
@@ -366,10 +366,10 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ context);
avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
@@ -387,34 +387,34 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer)
+astreamer_lz4_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_decompressor_free(bbstreamer *streamer)
+astreamer_lz4_decompressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeDecompressionContext(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/astreamer_tar.c
similarity index 50%
rename from src/bin/pg_basebackup/bbstreamer_tar.c
rename to src/bin/pg_basebackup/astreamer_tar.c
index 9137d17ddc1..673690cd18f 100644
--- a/src/bin/pg_basebackup/bbstreamer_tar.c
+++ b/src/bin/pg_basebackup/astreamer_tar.c
@@ -1,13 +1,13 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_tar.c
+ * astreamer_tar.c
*
* This module implements three types of tar processing. A tar parser
- * expects unlabelled chunks of data (e.g. BBSTREAMER_UNKNOWN) and splits
- * it into labelled chunks (any other value of bbstreamer_archive_context).
+ * expects unlabelled chunks of data (e.g. ASTREAMER_UNKNOWN) and splits
+ * it into labelled chunks (any other value of astreamer_archive_context).
* A tar archiver does the reverse: it takes a bunch of labelled chunks
* and produces a tarfile, optionally replacing member headers and trailers
- * so that upstream bbstreamer objects can perform surgery on the tarfile
+ * so that upstream astreamer objects can perform surgery on the tarfile
* contents without knowing the details of the tar format. A tar terminator
* just adds two blocks of NUL bytes to the end of the file, since older
* server versions produce files with this terminator omitted.
@@ -15,7 +15,7 @@
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_tar.c
+ * src/bin/pg_basebackup/astreamer_tar.c
*-------------------------------------------------------------------------
*/
@@ -23,83 +23,83 @@
#include <time.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#include "pgtar.h"
-typedef struct bbstreamer_tar_parser
+typedef struct astreamer_tar_parser
{
- bbstreamer base;
- bbstreamer_archive_context next_context;
- bbstreamer_member member;
+ astreamer base;
+ astreamer_archive_context next_context;
+ astreamer_member member;
size_t file_bytes_sent;
size_t pad_bytes_expected;
-} bbstreamer_tar_parser;
+} astreamer_tar_parser;
-typedef struct bbstreamer_tar_archiver
+typedef struct astreamer_tar_archiver
{
- bbstreamer base;
+ astreamer base;
bool rearchive_member;
-} bbstreamer_tar_archiver;
+} astreamer_tar_archiver;
-static void bbstreamer_tar_parser_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_parser_free(bbstreamer *streamer);
-static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+static void astreamer_tar_parser_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_parser_finalize(astreamer *streamer);
+static void astreamer_tar_parser_free(astreamer *streamer);
+static bool astreamer_tar_header(astreamer_tar_parser *mystreamer);
-static const bbstreamer_ops bbstreamer_tar_parser_ops = {
- .content = bbstreamer_tar_parser_content,
- .finalize = bbstreamer_tar_parser_finalize,
- .free = bbstreamer_tar_parser_free
+static const astreamer_ops astreamer_tar_parser_ops = {
+ .content = astreamer_tar_parser_content,
+ .finalize = astreamer_tar_parser_finalize,
+ .free = astreamer_tar_parser_free
};
-static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+static void astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_archiver_finalize(astreamer *streamer);
+static void astreamer_tar_archiver_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_archiver_ops = {
- .content = bbstreamer_tar_archiver_content,
- .finalize = bbstreamer_tar_archiver_finalize,
- .free = bbstreamer_tar_archiver_free
+static const astreamer_ops astreamer_tar_archiver_ops = {
+ .content = astreamer_tar_archiver_content,
+ .finalize = astreamer_tar_archiver_finalize,
+ .free = astreamer_tar_archiver_free
};
-static void bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_terminator_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_terminator_free(bbstreamer *streamer);
+static void astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_terminator_finalize(astreamer *streamer);
+static void astreamer_tar_terminator_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_terminator_ops = {
- .content = bbstreamer_tar_terminator_content,
- .finalize = bbstreamer_tar_terminator_finalize,
- .free = bbstreamer_tar_terminator_free
+static const astreamer_ops astreamer_tar_terminator_ops = {
+ .content = astreamer_tar_terminator_content,
+ .finalize = astreamer_tar_terminator_finalize,
+ .free = astreamer_tar_terminator_free
};
/*
- * Create a bbstreamer that can parse a stream of content as tar data.
+ * Create a astreamer that can parse a stream of content as tar data.
*
- * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * The input should be a series of ASTREAMER_UNKNOWN chunks; the astreamer
* specified by 'next' will receive a series of typed chunks, as per the
- * conventions described in bbstreamer.h.
+ * conventions described in astreamer.h.
*/
-bbstreamer *
-bbstreamer_tar_parser_new(bbstreamer *next)
+astreamer *
+astreamer_tar_parser_new(astreamer *next)
{
- bbstreamer_tar_parser *streamer;
+ astreamer_tar_parser *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_parser));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_parser_ops;
+ streamer = palloc0(sizeof(astreamer_tar_parser));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_parser_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
- streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ streamer->next_context = ASTREAMER_MEMBER_HEADER;
return &streamer->base;
}
@@ -108,29 +108,29 @@ bbstreamer_tar_parser_new(bbstreamer *next)
* Parse unknown content as tar data.
*/
static void
-bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_parser_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
size_t nbytes;
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
while (len > 0)
{
switch (mystreamer->next_context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/*
* If we're expecting an archive member header, accumulate a
* full block of data before doing anything further.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- TAR_BLOCK_SIZE))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
return;
/*
@@ -139,32 +139,32 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* thought was the next file header is actually the start of
* the archive trailer. Switch modes accordingly.
*/
- if (bbstreamer_tar_header(mystreamer))
+ if (astreamer_tar_header(mystreamer))
{
if (mystreamer->member.size == 0)
{
/* No content; trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Expect contents. */
- mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ mystreamer->next_context = ASTREAMER_MEMBER_CONTENTS;
}
mystreamer->base.bbs_buffer.len = 0;
mystreamer->file_bytes_sent = 0;
}
else
- mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ mystreamer->next_context = ASTREAMER_ARCHIVE_TRAILER;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/*
* Send as much content as we have, but not more than the
@@ -174,10 +174,10 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
nbytes = Min(nbytes, len);
Assert(nbytes > 0);
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, nbytes,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ ASTREAMER_MEMBER_CONTENTS);
mystreamer->file_bytes_sent += nbytes;
data += nbytes;
len -= nbytes;
@@ -193,53 +193,53 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
if (mystreamer->pad_bytes_expected == 0)
{
/* Trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Trailer is not zero-length. */
- mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ mystreamer->next_context = ASTREAMER_MEMBER_TRAILER;
}
mystreamer->base.bbs_buffer.len = 0;
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/*
* If we're expecting an archive member trailer, accumulate
* the expected number of padding bytes before sending
* anything onward.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- mystreamer->pad_bytes_expected))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
return;
/* OK, now we can send it. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, mystreamer->pad_bytes_expected,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next file header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
mystreamer->base.bbs_buffer.len = 0;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
/*
* We've seen an end-of-archive indicator, so anything more is
* buffered and sent as part of the archive trailer. But we
* don't expect more than 2 blocks.
*/
- bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ astreamer_buffer_bytes(streamer, &data, &len, len);
if (len > 2 * TAR_BLOCK_SIZE)
pg_fatal("tar file trailer exceeds 2 blocks");
return;
@@ -255,14 +255,14 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* Parse a file header within a tar stream.
*
* The return value is true if we found a file header and passed it on to the
- * next bbstreamer; it is false if we have reached the archive trailer.
+ * next astreamer; it is false if we have reached the archive trailer.
*/
static bool
-bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+astreamer_tar_header(astreamer_tar_parser *mystreamer)
{
bool has_nonzero_byte = false;
int i;
- bbstreamer_member *member = &mystreamer->member;
+ astreamer_member *member = &mystreamer->member;
char *buffer = mystreamer->base.bbs_buffer.data;
Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
@@ -304,10 +304,10 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
/* Compute number of padding bytes. */
mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
- /* Forward the entire header to the next bbstreamer. */
- bbstreamer_content(mystreamer->base.bbs_next, member,
- buffer, TAR_BLOCK_SIZE,
- BBSTREAMER_MEMBER_HEADER);
+ /* Forward the entire header to the next astreamer. */
+ astreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ ASTREAMER_MEMBER_HEADER);
return true;
}
@@ -316,50 +316,50 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
* End-of-stream processing for a tar parser.
*/
static void
-bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+astreamer_tar_parser_finalize(astreamer *streamer)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
- if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
- (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ if (mystreamer->next_context != ASTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != ASTREAMER_MEMBER_HEADER ||
mystreamer->base.bbs_buffer.len > 0))
pg_fatal("COPY stream ended before last file was finished");
/* Send the archive trailer, even if empty. */
- bbstreamer_content(streamer->bbs_next, NULL,
- streamer->bbs_buffer.data, streamer->bbs_buffer.len,
- BBSTREAMER_ARCHIVE_TRAILER);
+ astreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ ASTREAMER_ARCHIVE_TRAILER);
/* Now finalize successor. */
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar parser.
*/
static void
-bbstreamer_tar_parser_free(bbstreamer *streamer)
+astreamer_tar_parser_free(astreamer *streamer)
{
pfree(streamer->bbs_buffer.data);
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
}
/*
- * Create a bbstreamer that can generate a tar archive.
+ * Create a astreamer that can generate a tar archive.
*
* This is intended to be usable either for generating a brand-new tar archive
* or for modifying one on the fly. The input should be a series of typed
- * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
- * bbstreamer_tar_parser_content.
+ * chunks (i.e. not ASTREAMER_UNKNOWN). See also the comments for
+ * astreamer_tar_parser_content.
*/
-bbstreamer *
-bbstreamer_tar_archiver_new(bbstreamer *next)
+astreamer *
+astreamer_tar_archiver_new(astreamer *next)
{
- bbstreamer_tar_archiver *streamer;
+ astreamer_tar_archiver *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_archiver));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_archiver_ops;
+ streamer = palloc0(sizeof(astreamer_tar_archiver));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_archiver_ops;
streamer->base.bbs_next = next;
return &streamer->base;
@@ -368,36 +368,36 @@ bbstreamer_tar_archiver_new(bbstreamer *next)
/*
* Fix up the stream of input chunks to create a valid tar file.
*
- * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * If a ASTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
* newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
* passed through without change. Any other size is a fatal error (and
* indicates a bug).
*
- * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
- * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * Whenever a new ASTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding ASTREAMER_MEMBER_TRAILER chunk is also constructed from
* scratch. Specifically, we construct a block of zero bytes sufficient to
* pad out to a block boundary, as required by the tar format. Other
- * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ * ASTREAMER_MEMBER_TRAILER chunks are passed through without change.
*
- * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ * Any ASTREAMER_MEMBER_CONTENTS chunks are passed through without change.
*
- * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * The ASTREAMER_ARCHIVE_TRAILER chunk is replaced with two
* blocks of zero bytes. Not all tar programs require this, but apparently
* some do. The server does not supply this trailer. If no archive trailer is
- * present, one will be added by bbstreamer_tar_parser_finalize.
+ * present, one will be added by astreamer_tar_parser_finalize.
*/
static void
-bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ astreamer_tar_archiver *mystreamer = (astreamer_tar_archiver *) streamer;
char buffer[2 * TAR_BLOCK_SIZE];
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(context != ASTREAMER_UNKNOWN);
- if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ if (context == ASTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
{
Assert(len == 0);
@@ -411,7 +411,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Also make a note to replace padding, in case size changed. */
mystreamer->rearchive_member = true;
}
- else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ else if (context == ASTREAMER_MEMBER_TRAILER &&
mystreamer->rearchive_member)
{
int pad_bytes = tarPaddingBytesRequired(member->size);
@@ -424,7 +424,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Don't do this again unless we replace another header. */
mystreamer->rearchive_member = false;
}
- else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ else if (context == ASTREAMER_ARCHIVE_TRAILER)
{
/* Trailer should always be two blocks of zero bytes. */
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
@@ -432,40 +432,40 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
len = 2 * TAR_BLOCK_SIZE;
}
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
* End-of-stream processing for a tar archiver.
*/
static void
-bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+astreamer_tar_archiver_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar archiver.
*/
static void
-bbstreamer_tar_archiver_free(bbstreamer *streamer)
+astreamer_tar_archiver_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
/*
- * Create a bbstreamer that blindly adds two blocks of NUL bytes to the
+ * Create a astreamer that blindly adds two blocks of NUL bytes to the
* end of an incomplete tarfile that the server might send us.
*/
-bbstreamer *
-bbstreamer_tar_terminator_new(bbstreamer *next)
+astreamer *
+astreamer_tar_terminator_new(astreamer *next)
{
- bbstreamer *streamer;
+ astreamer *streamer;
- streamer = palloc0(sizeof(bbstreamer));
- *((const bbstreamer_ops **) &streamer->bbs_ops) =
- &bbstreamer_tar_terminator_ops;
+ streamer = palloc0(sizeof(astreamer));
+ *((const astreamer_ops **) &streamer->bbs_ops) =
+ &astreamer_tar_terminator_ops;
streamer->bbs_next = next;
return streamer;
@@ -475,17 +475,17 @@ bbstreamer_tar_terminator_new(bbstreamer *next)
* Pass all the content through without change.
*/
static void
-bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
/* Just forward it. */
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
@@ -493,22 +493,22 @@ bbstreamer_tar_terminator_content(bbstreamer *streamer,
* to supply.
*/
static void
-bbstreamer_tar_terminator_finalize(bbstreamer *streamer)
+astreamer_tar_terminator_finalize(astreamer *streamer)
{
char buffer[2 * TAR_BLOCK_SIZE];
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
- bbstreamer_content(streamer->bbs_next, NULL, buffer,
- 2 * TAR_BLOCK_SIZE, BBSTREAMER_UNKNOWN);
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_content(streamer->bbs_next, NULL, buffer,
+ 2 * TAR_BLOCK_SIZE, ASTREAMER_UNKNOWN);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar terminator.
*/
static void
-bbstreamer_tar_terminator_free(bbstreamer *streamer)
+astreamer_tar_terminator_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/astreamer_zstd.c
similarity index 64%
rename from src/bin/pg_basebackup/bbstreamer_zstd.c
rename to src/bin/pg_basebackup/astreamer_zstd.c
index 20f11d4450e..58dc679ef99 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/astreamer_zstd.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_zstd.c
+ * astreamer_zstd.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_zstd.c
+ * src/bin/pg_basebackup/astreamer_zstd.c
*-------------------------------------------------------------------------
*/
@@ -17,44 +17,44 @@
#include <zstd.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#ifdef USE_ZSTD
-typedef struct bbstreamer_zstd_frame
+typedef struct astreamer_zstd_frame
{
- bbstreamer base;
+ astreamer base;
ZSTD_CCtx *cctx;
ZSTD_DCtx *dctx;
ZSTD_outBuffer zstd_outBuf;
-} bbstreamer_zstd_frame;
+} astreamer_zstd_frame;
-static void bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_compressor_free(bbstreamer *streamer);
+static void astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_compressor_finalize(astreamer *streamer);
+static void astreamer_zstd_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
- .content = bbstreamer_zstd_compressor_content,
- .finalize = bbstreamer_zstd_compressor_finalize,
- .free = bbstreamer_zstd_compressor_free
+static const astreamer_ops astreamer_zstd_compressor_ops = {
+ .content = astreamer_zstd_compressor_content,
+ .finalize = astreamer_zstd_compressor_finalize,
+ .free = astreamer_zstd_compressor_free
};
-static void bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_decompressor_free(bbstreamer *streamer);
+static void astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_decompressor_finalize(astreamer *streamer);
+static void astreamer_zstd_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
- .content = bbstreamer_zstd_decompressor_content,
- .finalize = bbstreamer_zstd_decompressor_finalize,
- .free = bbstreamer_zstd_decompressor_free
+static const astreamer_ops astreamer_zstd_decompressor_ops = {
+ .content = astreamer_zstd_decompressor_content,
+ .finalize = astreamer_zstd_decompressor_finalize,
+ .free = astreamer_zstd_decompressor_free
};
#endif
@@ -62,19 +62,19 @@ static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* Create a new base backup streamer that performs zstd compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_zstd_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
size_t ret;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_compressor_ops;
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -142,12 +142,12 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *comp
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -162,10 +162,10 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -187,9 +187,9 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
+astreamer_zstd_compressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
size_t yet_to_flush;
do
@@ -204,10 +204,10 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -227,23 +227,23 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
/* Make sure to pass any remaining bytes to the next streamer. */
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_compressor_free(bbstreamer *streamer)
+astreamer_zstd_compressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeCCtx(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -254,17 +254,17 @@ bbstreamer_zstd_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of zstd
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_zstd_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_zstd_decompressor_new(astreamer *next)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -293,12 +293,12 @@ bbstreamer_zstd_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -311,10 +311,10 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->zstd_outBuf.pos >= mystreamer->zstd_outBuf.size)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -335,32 +335,32 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer)
+astreamer_zstd_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_decompressor_free(bbstreamer *streamer)
+astreamer_zstd_decompressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeDCtx(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
deleted file mode 100644
index 3b820f13b51..00000000000
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ /dev/null
@@ -1,226 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * bbstreamer.h
- *
- * Each tar archive returned by the server is passed to one or more
- * bbstreamer objects for further processing. The bbstreamer may do
- * something simple, like write the archive to a file, perhaps after
- * compressing it, but it can also do more complicated things, like
- * annotating the byte stream to indicate which parts of the data
- * correspond to tar headers or trailing padding, vs. which parts are
- * payload data. A subsequent bbstreamer may use this information to
- * make further decisions about how to process the data; for example,
- * it might choose to modify the archive contents.
- *
- * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
- *
- * IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer.h
- *-------------------------------------------------------------------------
- */
-
-#ifndef BBSTREAMER_H
-#define BBSTREAMER_H
-
-#include "common/compression.h"
-#include "lib/stringinfo.h"
-#include "pqexpbuffer.h"
-
-struct bbstreamer;
-struct bbstreamer_ops;
-typedef struct bbstreamer bbstreamer;
-typedef struct bbstreamer_ops bbstreamer_ops;
-
-/*
- * Each chunk of archive data passed to a bbstreamer is classified into one
- * of these categories. When data is first received from the remote server,
- * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
- * be of whatever size the remote server chose to send.
- *
- * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
- * chunks should be labelled as one of the other types listed here. In
- * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
- * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
- * that means a zero-length call. There can be any number of
- * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
- * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
- * last BBSTREAMER_MEMBER_TRAILER chunk.
- *
- * In theory, we could need other classifications here, such as a way of
- * indicating an archive header, but the "tar" format doesn't need anything
- * else, so for the time being there's no point.
- */
-typedef enum
-{
- BBSTREAMER_UNKNOWN,
- BBSTREAMER_MEMBER_HEADER,
- BBSTREAMER_MEMBER_CONTENTS,
- BBSTREAMER_MEMBER_TRAILER,
- BBSTREAMER_ARCHIVE_TRAILER,
-} bbstreamer_archive_context;
-
-/*
- * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
- * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
- * pass a pointer to an instance of this struct. The details are expected
- * to be present in the archive header and used to fill the struct, after
- * which all subsequent calls for the same archive member are expected to
- * pass the same details.
- */
-typedef struct
-{
- char pathname[MAXPGPATH];
- pgoff_t size;
- mode_t mode;
- uid_t uid;
- gid_t gid;
- bool is_directory;
- bool is_link;
- char linktarget[MAXPGPATH];
-} bbstreamer_member;
-
-/*
- * Generally, each type of bbstreamer will define its own struct, but the
- * first element should be 'bbstreamer base'. A bbstreamer that does not
- * require any additional private data could use this structure directly.
- *
- * bbs_ops is a pointer to the bbstreamer_ops object which contains the
- * function pointers appropriate to this type of bbstreamer.
- *
- * bbs_next is a pointer to the successor bbstreamer, for those types of
- * bbstreamer which forward data to a successor. It need not be used and
- * should be set to NULL when not relevant.
- *
- * bbs_buffer is a buffer for accumulating data for temporary storage. Each
- * type of bbstreamer makes its own decisions about whether and how to use
- * this buffer.
- */
-struct bbstreamer
-{
- const bbstreamer_ops *bbs_ops;
- bbstreamer *bbs_next;
- StringInfoData bbs_buffer;
-};
-
-/*
- * There are three callbacks for a bbstreamer. The 'content' callback is
- * called repeatedly, as described in the bbstreamer_archive_context comments.
- * Then, the 'finalize' callback is called once at the end, to give the
- * bbstreamer a chance to perform cleanup such as closing files. Finally,
- * because this code is running in a frontend environment where, as of this
- * writing, there are no memory contexts, the 'free' callback is called to
- * release memory. These callbacks should always be invoked using the static
- * inline functions defined below.
- */
-struct bbstreamer_ops
-{
- void (*content) (bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
- void (*finalize) (bbstreamer *streamer);
- void (*free) (bbstreamer *streamer);
-};
-
-/* Send some content to a bbstreamer. */
-static inline void
-bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->content(streamer, member, data, len, context);
-}
-
-/* Finalize a bbstreamer. */
-static inline void
-bbstreamer_finalize(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->finalize(streamer);
-}
-
-/* Free a bbstreamer. */
-static inline void
-bbstreamer_free(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->free(streamer);
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outside callers. It adds the amount of data specified by
- * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
- * accordingly.
- */
-static inline void
-bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
- int nbytes)
-{
- Assert(nbytes <= *len);
-
- appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
- *len -= nbytes;
- *data += nbytes;
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outsider callers. It attempts to add enough data to the
- * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
- * and '*data' accordingly. It returns true if the target length has been
- * reached and false otherwise.
- */
-static inline bool
-bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
- int target_bytes)
-{
- int buflen = streamer->bbs_buffer.len;
-
- if (buflen >= target_bytes)
- {
- /* Target length already reached; nothing to do. */
- return true;
- }
-
- if (buflen + *len < target_bytes)
- {
- /* Not enough data to reach target length; buffer all of it. */
- bbstreamer_buffer_bytes(streamer, data, len, *len);
- return false;
- }
-
- /* Buffer just enough to reach the target length. */
- bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
- return true;
-}
-
-/*
- * Functions for creating bbstreamer objects of various types. See the header
- * comments for each of these functions for details.
- */
-extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
-extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *));
-
-extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
-
-extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
- char *data, int len);
-
-#endif
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index c00acd5e118..a68dbd7837d 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,12 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'bbstreamer_file.c',
- 'bbstreamer_gzip.c',
- 'bbstreamer_inject.c',
- 'bbstreamer_lz4.c',
- 'bbstreamer_tar.c',
- 'bbstreamer_zstd.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_inject.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/bin/pg_basebackup/nls.mk b/src/bin/pg_basebackup/nls.mk
index 384dbb021e9..950b9797b1e 100644
--- a/src/bin/pg_basebackup/nls.mk
+++ b/src/bin/pg_basebackup/nls.mk
@@ -1,12 +1,12 @@
# src/bin/pg_basebackup/nls.mk
CATALOG_NAME = pg_basebackup
GETTEXT_FILES = $(FRONTEND_COMMON_GETTEXT_FILES) \
- bbstreamer_file.c \
- bbstreamer_gzip.c \
- bbstreamer_inject.c \
- bbstreamer_lz4.c \
- bbstreamer_tar.c \
- bbstreamer_zstd.c \
+ astreamer_file.c \
+ astreamer_gzip.c \
+ astreamer_inject.c \
+ astreamer_lz4.c \
+ astreamer_tar.c \
+ astreamer_zstd.c \
pg_basebackup.c \
pg_createsubscriber.c \
pg_receivewal.c \
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8f3dd04fd22..4179b064cbc 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,8 +26,8 @@
#endif
#include "access/xlog_internal.h"
+#include "astreamer.h"
#include "backup/basebackup.h"
-#include "bbstreamer.h"
#include "common/compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
@@ -57,8 +57,8 @@ typedef struct ArchiveStreamState
{
int tablespacenum;
pg_compress_specification *compress;
- bbstreamer *streamer;
- bbstreamer *manifest_inject_streamer;
+ astreamer *streamer;
+ astreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
char manifest_filename[MAXPGPATH];
FILE *manifest_file;
@@ -67,7 +67,7 @@ typedef struct ArchiveStreamState
typedef struct WriteTarState
{
int tablespacenum;
- bbstreamer *streamer;
+ astreamer *streamer;
} WriteTarState;
typedef struct WriteManifestState
@@ -199,8 +199,8 @@ static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *fo
static void progress_update_filename(const char *filename);
static void progress_report(int tablespacenum, bool force, bool finished);
-static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+static astreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress);
@@ -1053,19 +1053,19 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
* the options selected by the user. We may just write the results directly
* to a file, or we might compress first, or we might extract the tar file
* and write each member separately. This function doesn't do any of that
- * directly, but it works out what kind of bbstreamer we need to create so
+ * directly, but it works out what kind of astreamer we need to create so
* that the right stuff happens when, down the road, we actually receive
* the data.
*/
-static bbstreamer *
+static astreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress)
{
- bbstreamer *streamer = NULL;
- bbstreamer *manifest_inject_streamer = NULL;
+ astreamer *streamer = NULL;
+ astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
is_tar_gz,
@@ -1160,7 +1160,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
directory = psprintf("%s/%s", basedir, spclocation);
else
directory = get_tablespace_mapping(spclocation);
- streamer = bbstreamer_extractor_new(directory,
+ streamer = astreamer_extractor_new(directory,
get_tablespace_mapping,
progress_update_filename);
}
@@ -1188,27 +1188,27 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
if (compress->algorithm == PG_COMPRESSION_NONE)
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
else if (compress->algorithm == PG_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
- streamer = bbstreamer_gzip_writer_new(archive_filename,
+ streamer = astreamer_gzip_writer_new(archive_filename,
archive_file, compress);
}
else if (compress->algorithm == PG_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer, compress);
+ streamer = astreamer_lz4_compressor_new(streamer, compress);
}
else if (compress->algorithm == PG_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer, compress);
+ streamer = astreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1222,7 +1222,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* into it.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_archiver_new(streamer);
+ streamer = astreamer_tar_archiver_new(streamer);
progress_update_filename(archive_filename);
}
@@ -1241,7 +1241,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (spclocation == NULL && writerecoveryconf)
{
Assert(must_parse_archive);
- streamer = bbstreamer_recovery_injector_new(streamer,
+ streamer = astreamer_recovery_injector_new(streamer,
is_recovery_guc_supported,
recoveryconfcontents);
}
@@ -1253,9 +1253,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* we're talking to such a server we'll need to add the terminator here.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_parser_new(streamer);
+ streamer = astreamer_tar_parser_new(streamer);
else if (expect_unterminated_tarfile)
- streamer = bbstreamer_tar_terminator_new(streamer);
+ streamer = astreamer_tar_terminator_new(streamer);
/*
* If the user has requested a server compressed archive along with
@@ -1264,11 +1264,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (format == 'p')
{
if (is_tar_gz)
- streamer = bbstreamer_gzip_decompressor_new(streamer);
+ streamer = astreamer_gzip_decompressor_new(streamer);
else if (is_tar_lz4)
- streamer = bbstreamer_lz4_decompressor_new(streamer);
+ streamer = astreamer_lz4_decompressor_new(streamer);
else if (is_tar_zstd)
- streamer = bbstreamer_zstd_decompressor_new(streamer);
+ streamer = astreamer_zstd_decompressor_new(streamer);
}
/* Return the results. */
@@ -1307,7 +1307,7 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
if (state.manifest_inject_streamer != NULL &&
state.manifest_buffer != NULL)
{
- bbstreamer_inject_file(state.manifest_inject_streamer,
+ astreamer_inject_file(state.manifest_inject_streamer,
"backup_manifest",
state.manifest_buffer->data,
state.manifest_buffer->len);
@@ -1318,8 +1318,8 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
/* If there's still an archive in progress, end processing. */
if (state.streamer != NULL)
{
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
state.streamer = NULL;
}
}
@@ -1383,8 +1383,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
/* End processing of any prior archive. */
if (state->streamer != NULL)
{
- bbstreamer_finalize(state->streamer);
- bbstreamer_free(state->streamer);
+ astreamer_finalize(state->streamer);
+ astreamer_free(state->streamer);
state->streamer = NULL;
}
@@ -1437,8 +1437,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
else if (state->streamer != NULL)
{
/* Archive data. */
- bbstreamer_content(state->streamer, NULL, copybuf + 1,
- r - 1, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, ASTREAMER_UNKNOWN);
}
else
pg_fatal("unexpected payload data");
@@ -1600,7 +1600,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum, pg_compress_specification *compress)
{
WriteTarState state;
- bbstreamer *manifest_inject_streamer;
+ astreamer *manifest_inject_streamer;
bool is_recovery_guc_supported;
bool expect_unterminated_tarfile;
@@ -1636,7 +1636,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
pg_fatal("out of memory");
/* Inject it into the output tarfile. */
- bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ astreamer_inject_file(manifest_inject_streamer, "backup_manifest",
buf.data, buf.len);
/* Free memory. */
@@ -1644,8 +1644,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
}
/* Cleanup. */
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
progress_report(tablespacenum, true, false);
@@ -1663,7 +1663,7 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf, r, ASTREAMER_UNKNOWN);
totaldone += r;
progress_report(state->tablespacenum, false, false);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4d7f9217ce..b982dffa5fc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3309,19 +3309,19 @@ bbsink_shell
bbsink_state
bbsink_throttle
bbsink_zstd
-bbstreamer
-bbstreamer_archive_context
-bbstreamer_extractor
-bbstreamer_gzip_decompressor
-bbstreamer_gzip_writer
-bbstreamer_lz4_frame
-bbstreamer_member
-bbstreamer_ops
-bbstreamer_plain_writer
-bbstreamer_recovery_injector
-bbstreamer_tar_archiver
-bbstreamer_tar_parser
-bbstreamer_zstd_frame
+astreamer
+astreamer_archive_context
+astreamer_extractor
+astreamer_gzip_decompressor
+astreamer_gzip_writer
+astreamer_lz4_frame
+astreamer_member
+astreamer_ops
+astreamer_plain_writer
+astreamer_recovery_injector
+astreamer_tar_archiver
+astreamer_tar_parser
+astreamer_zstd_frame
bgworker_main_type
bh_node_type
binaryheap
--
2.18.0
On Mon, Jul 22, 2024 at 7:53 AM Amul Sul <sulamul@gmail.com> wrote:
Fix in the attached version.
First of all, in the interest of full disclosure, I suggested this
project to Amul, so I'm +1 on the concept. I think making more of our
backup-related tools work with tar and compressed tar formats -- and
perhaps eventually data not stored locally -- will make them a lot
more usable. If, for example, you take a full backup and an
incremental backup, each in tar format, store them in the cloud
someplace, and then want to verify and afterwards restore the
incremental backup, you would need to download the tar files from the
cloud, then extract all the tar files, then run pg_verifybackup and
pg_combinebackup over the results. With this patch set, and similar
work for pg_combinebackup, you could skip the step where you need to
extract the tar files, saving significant amounts of time and disk
space. If the tools also had the ability to access data remotely, you
could save even more, but that's a much harder project, so it makes
sense to me to start with this.
Second, I think this patch set is quite well-organized and easy to
read. That's not to say there is nothing in these patches to which
someone might object, but it seems to me that it should at least be
simple for anyone who wants to review to find the things to which they
object in the patch set without having to spend too much time on it,
which is fantastic.
Third, I think the general approach that these patches take to the
problem - namely, renaming bbstreamer to astreamer and moving it
somewhere that permits it to be reused - makes a lot of sense. To be
honest, I don't think I had it in mind that bbstreamer would be a
reusable component when I wrote it, or if I did have it in mind, it
was off in some dusty corner of my mind that doesn't get visited very
often. I was imagining that you would need to build new infrastructure
to deal with reading the tar file, but I think what you've done here
is better. Reusing the bbstreamer stuff gives you tar file parsing,
and decompression if necessary, basically for free, and IMHO the
result looks rather elegant.
However, I'm not very convinced by 0003. The handling between the
meson and make-based build systems doesn't seem consistent. On the
meson side, you just add the objects to the same list that contains
all of the other files (but not in alphabetical order, which should be
fixed). But on the make side, you for some reason invent a separate
AOBJS list instead of just adding the files to OBJS. I don't think it
should be necessary to treat these objects any differently from any
other objects, so they should be able to just go in OBJS: but if it
were necessary, then I feel like the meson side would need something
similar.
Also, I'm not so sure about this change to src/fe_utils/meson.build:
- dependencies: frontend_common_code,
+ dependencies: [frontend_common_code, lz4, zlib, zstd],
frontend_common_code already includes dependencies on zlib and zstd,
so we probably don't need to add those again here. I checked the
result of otool -L src/bin/pg_controldata/pg_controldata from the
meson build directory, and I find that currently it links against libz
and libzstd but not liblz4. However, if I either make this line say
dependencies: [frontend_common_code, lz4] or if I just update
frontend_common_code to include lz4, then it starts linking against
liblz4 as well. I'm not entirely sure if there's any reason to do one
or the other of those things, but I think I'd be inclined to make
frontend_common_code just include lz4 since it already includes zlib
and zstd anyway, and then you don't need this change.
Alternatively, we could take the position that we need to avoid having
random front-end tools that don't do anything with compression at all,
like pg_controldata for example, to link with compression libraries at
all. But then we'd need to rethink a bunch of things that have not
much to do with this patch.
Regarding 0004, I would rather not move show_progress and
skip_checksums to the new header file. I suppose I was a bit lazy in
making these file-level global variables instead of passing them down
using function arguments and/or a context object, but at least right
now they're only global within a single file. Can we think of
inserting a preparatory patch that moves these into verifier_context?
Regarding 0005, the comment /* Check whether there's an entry in the
manifest hash. */ should move inside verify_manifest_entry, where
manifest_files_lookup is called. The call to the new function
verify_manifest_entry() needs its own, proper comment. Also, I think
there's a null-pointer deference hazard here, because
verify_manifest_entry() can return NULL but the "Validate the manifest
system identifier" chunk assumes it isn't. I think you could hit this
- and presumably seg fault - if pg_control is on disk but not in the
manifest. Seems like just adding an m != NULL test is easiest, but
see also below comments about 0006.
Regarding 0006, suppose that the member file within the tar archive is
longer than expected. With the unpatched code, we'll feed all of the
data to the checksum context, but then, after the read-loop
terminates, we'll complain about the file being the wrong length. With
the patched code, we'll complain about the checksum mismatch before
returning from verify_content_checksum(). I think that's an unintended
behavior change, and I think the new behavior is worse than the old
behavior. But also, I think that in the case of a tar file, the
desired behavior is quite different. In that case, we know the length
of the file from the member header, so we can check whether the length
is as expected before we read any of the data bytes. If we discover
that the size is wrong, we can complain about that and need not feed
the checksum bytes to the checksum context at all -- we can just skip
them, which will be faster. That approach doesn't really make sense
for a file, because even if we were to stat() the file before we
started reading it, the length could theoretically change as we are
reading it, if someone is concurrently modifying it, but for a tar
file I think it does.
I would suggest abandoning this refactoring. There's very little logic
in verify_file_checksum() that you can actually reuse. I think you
should just duplicate the code. If you really want, you could arrange
to reuse the error-reporting code that checks for checksumlen !=
m->checksum_length and memcmp(checksumbuf, m->checksum_payload,
checksumlen) != 0, but even that I think is little enough that it's
fine to just duplicate it. The rest is either (1) OS calls like
open(), read(), etc. which won't be applicable to the
read-from-archive case or (2) calls to pg_checksum_WHATEVER, which are
fine to just duplicate, IMHO.
My eyes are starting to glaze over a bit here so expect comments below
this point to be only a partial review of the corresponding patch.
Regarding 0007, I think that the should_verify_sysid terminology is
problematic. I made all the code and identifier names talk only about
the control file, not the specific thing in the control file that we
are going to verify, in case in the future we want to verify
additional things. This breaks that abstraction.
Regarding 0009, I feel like astreamer_verify_content() might want to
grow some subroutines. One idea could be to move the
ASTREAMER_MEMBER_HEADER case and likewise ASTREAMER_MEMBER_CONTENTS
cases into a new function for each; another idea could be to move
smaller chunks of logic, e.g. under the ASTREAMER_MEMBER_CONTENTS
case, the verify_checksums could be one subroutine and the ill-named
verify_sysid stuff could be another. I'm not certain exactly what's
best here, but some of this code is as deeply as six levels nested,
which is not such a terrible thing that nobody should ever do it, but
it is bad enough that we should at least look around for a better way.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jul 30, 2024 at 9:04 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jul 22, 2024 at 7:53 AM Amul Sul <sulamul@gmail.com> wrote:
Fix in the attached version.
First of all, in the interest of full disclosure, I suggested this
project to Amul, so I'm +1 on the concept. I think making more of our
backup-related tools work with tar and compressed tar formats -- and
perhaps eventually data not stored locally -- will make them a lot
more usable. If, for example, you take a full backup and an
incremental backup, each in tar format, store them in the cloud
someplace, and then want to verify and afterwards restore the
incremental backup, you would need to download the tar files from the
cloud, then extract all the tar files, then run pg_verifybackup and
pg_combinebackup over the results. With this patch set, and similar
work for pg_combinebackup, you could skip the step where you need to
extract the tar files, saving significant amounts of time and disk
space. If the tools also had the ability to access data remotely, you
could save even more, but that's a much harder project, so it makes
sense to me to start with this.Second, I think this patch set is quite well-organized and easy to
read. That's not to say there is nothing in these patches to which
someone might object, but it seems to me that it should at least be
simple for anyone who wants to review to find the things to which they
object in the patch set without having to spend too much time on it,
which is fantastic.Third, I think the general approach that these patches take to the
problem - namely, renaming bbstreamer to astreamer and moving it
somewhere that permits it to be reused - makes a lot of sense. To be
honest, I don't think I had it in mind that bbstreamer would be a
reusable component when I wrote it, or if I did have it in mind, it
was off in some dusty corner of my mind that doesn't get visited very
often. I was imagining that you would need to build new infrastructure
to deal with reading the tar file, but I think what you've done here
is better. Reusing the bbstreamer stuff gives you tar file parsing,
and decompression if necessary, basically for free, and IMHO the
result looks rather elegant.
Thank you so much for the summary and the review.
However, I'm not very convinced by 0003. The handling between the
meson and make-based build systems doesn't seem consistent. On the
meson side, you just add the objects to the same list that contains
all of the other files (but not in alphabetical order, which should be
fixed). But on the make side, you for some reason invent a separate
AOBJS list instead of just adding the files to OBJS. I don't think it
should be necessary to treat these objects any differently from any
other objects, so they should be able to just go in OBJS: but if it
were necessary, then I feel like the meson side would need something
similar.
Fixed -- I did that because it was part of a separate group in pg_basebackup.
Also, I'm not so sure about this change to src/fe_utils/meson.build:
- dependencies: frontend_common_code, + dependencies: [frontend_common_code, lz4, zlib, zstd],frontend_common_code already includes dependencies on zlib and zstd,
so we probably don't need to add those again here. I checked the
result of otool -L src/bin/pg_controldata/pg_controldata from the
meson build directory, and I find that currently it links against libz
and libzstd but not liblz4. However, if I either make this line say
dependencies: [frontend_common_code, lz4] or if I just update
frontend_common_code to include lz4, then it starts linking against
liblz4 as well. I'm not entirely sure if there's any reason to do one
or the other of those things, but I think I'd be inclined to make
frontend_common_code just include lz4 since it already includes zlib
and zstd anyway, and then you don't need this change.
Fixed -- frontend_common_code now includes lz4 as well.
Alternatively, we could take the position that we need to avoid having
random front-end tools that don't do anything with compression at all,
like pg_controldata for example, to link with compression libraries at
all. But then we'd need to rethink a bunch of things that have not
much to do with this patch.
Noted. I might give it a try another day, unless someone else beats
me, perhaps in a separate thread.
Regarding 0004, I would rather not move show_progress and
skip_checksums to the new header file. I suppose I was a bit lazy in
making these file-level global variables instead of passing them down
using function arguments and/or a context object, but at least right
now they're only global within a single file. Can we think of
inserting a preparatory patch that moves these into verifier_context?
Done -- added a new patch as 0004, and the subsequent patch numbers
have been incremented accordingly.
Regarding 0005, the comment /* Check whether there's an entry in the
manifest hash. */ should move inside verify_manifest_entry, where
manifest_files_lookup is called. The call to the new function
verify_manifest_entry() needs its own, proper comment. Also, I think
there's a null-pointer deference hazard here, because
verify_manifest_entry() can return NULL but the "Validate the manifest
system identifier" chunk assumes it isn't. I think you could hit this
- and presumably seg fault - if pg_control is on disk but not in the
manifest. Seems like just adding an m != NULL test is easiest, but
see also below comments about 0006.
Fixed -- I did the NULL check in the earlier 0007 patch, but it should
have been done in this patch.
Regarding 0006, suppose that the member file within the tar archive is
longer than expected. With the unpatched code, we'll feed all of the
data to the checksum context, but then, after the read-loop
terminates, we'll complain about the file being the wrong length. With
the patched code, we'll complain about the checksum mismatch before
returning from verify_content_checksum(). I think that's an unintended
behavior change, and I think the new behavior is worse than the old
behavior. But also, I think that in the case of a tar file, the
desired behavior is quite different. In that case, we know the length
of the file from the member header, so we can check whether the length
is as expected before we read any of the data bytes. If we discover
that the size is wrong, we can complain about that and need not feed
the checksum bytes to the checksum context at all -- we can just skip
them, which will be faster. That approach doesn't really make sense
for a file, because even if we were to stat() the file before we
started reading it, the length could theoretically change as we are
reading it, if someone is concurrently modifying it, but for a tar
file I think it does.
In the case of a file size mismatch, we never reach the point where
checksum calculation is performed, because verify_manifest_entry()
encounters an error and sets manifest_file->bad to true, which causes
skip_checksum to be set to false. For that reason, I didn’t include
the size check again in the checksum calculation part. This behavior
is the same for plain backups, but the additional file size check was
added as a precaution (per comment in verify_file_checksum()),
possibly for the same reasons you mentioned.
I agree, changing the order of errors could create confusion.
Previously, a file size mismatch was a clear and appropriate error
that was reported before the checksum failure error.
However, this can be fixed by delaying the checksum calculation until
the expected file content size is received. Specifically, return from
verify_content_checksum(), if (*computed_len != m->size). If the file
size is incorrect, the checksum calculation won't be performed, and
the caller's loop reading file (I mean in verify_file_checksum()) will
exit at some point which later encounters the size mismatch error.
I would suggest abandoning this refactoring. There's very little logic
in verify_file_checksum() that you can actually reuse. I think you
should just duplicate the code. If you really want, you could arrange
to reuse the error-reporting code that checks for checksumlen !=
m->checksum_length and memcmp(checksumbuf, m->checksum_payload,
checksumlen) != 0, but even that I think is little enough that it's
fine to just duplicate it. The rest is either (1) OS calls like
open(), read(), etc. which won't be applicable to the
read-from-archive case or (2) calls to pg_checksum_WHATEVER, which are
fine to just duplicate, IMHO.
I kept the refactoring as it is by fixing verify_content_checksum() as
mentioned in the previous paragraph. Please let me know if this fix
and the explanation makes sense to you. I’m okay with abandoning this
refactor patch if you think.
My eyes are starting to glaze over a bit here so expect comments below
this point to be only a partial review of the corresponding patch.Regarding 0007, I think that the should_verify_sysid terminology is
problematic. I made all the code and identifier names talk only about
the control file, not the specific thing in the control file that we
are going to verify, in case in the future we want to verify
additional things. This breaks that abstraction.
Agreed, changed to should_verify_control_data.
Regarding 0009, I feel like astreamer_verify_content() might want to
grow some subroutines. One idea could be to move the
ASTREAMER_MEMBER_HEADER case and likewise ASTREAMER_MEMBER_CONTENTS
cases into a new function for each; another idea could be to move
smaller chunks of logic, e.g. under the ASTREAMER_MEMBER_CONTENTS
case, the verify_checksums could be one subroutine and the ill-named
verify_sysid stuff could be another. I'm not certain exactly what's
best here, but some of this code is as deeply as six levels nested,
which is not such a terrible thing that nobody should ever do it, but
it is bad enough that we should at least look around for a better way.
Okay, I added the verify_checksums() and verify_controldata()
functions to the astreamer_verify.c file. I also updated related
variables that were clashing with these function names:
verify_checksums has been renamed to verifyChecksums, and verify_sysid
has been renamed to verifyControlData.
Thanks again for the review comments. Please have a look at the
attached version.
Regards,
Amul
Attachments:
v3-0011-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v3-0011-pg_verifybackup-Tests-and-document.patchDownload
From 547b92c51a47fa03b02721d5012c9f7ab3e9bb27 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 17:04:56 +0530
Subject: [PATCH v3 11/11] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 54 +++++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 18 ++++-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 96 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..c743bd89a92 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. The main data directory's contents
+ will be written to a file named <filename>base.tar</filename>,
+ and each other tablespace will be written to a separate tar file
+ named after that tablespace's OID.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
@@ -227,6 +265,18 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-Z <replaceable class="parameter">method</replaceable></option></term>
+ <term><option>--compress=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ The tar backup compression method can be <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> if no compression.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..d47ce1f04fc 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,13 +17,25 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
+command_fails_like(
+ [ 'pg_verifybackup', '-Zgzip', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Zlz4', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required tar format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Znon_exist', $tempdir ],
+ qr/unrecognized compression algorithm/,
+ 'compression method should be valid');
# create fake manifest file
open(my $fh, '>', "$tempdir/backup_manifest") || die "open: $!";
@@ -31,7 +43,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v3-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v3-0007-Refactor-split-verify_file_checksum-function.patchDownload
From de07a9a975221e679a8c3d61a0ab9b7ee808539f Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 15:28:12 +0530
Subject: [PATCH v3 07/11] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to enable incremental checksum computation.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 102 +++++++++++++++-------
src/bin/pg_verifybackup/pg_verifybackup.h | 4 +
2 files changed, 73 insertions(+), 33 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index bb732bf06ca..7e94af387ad 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -771,6 +771,72 @@ verify_backup_checksums(verifier_context *context)
progress_report(context, true);
}
+/*
+ * It computes the checksum incrementally for the received bytes, requiring the
+ * caller to pass a properly initialized checksum_ctx parameter. Once the
+ * complete file content is received, which is tracked using the computed_len
+ * parameter, it verifies against the manifest data. If any error occurs, it
+ * returns false; otherwise, it returns true to indicate either the complete
+ * file content is yet to be received or checksum verification is completed
+ * successfully.
+ */
+bool
+verify_content_checksum(verifier_context *context,
+ pg_checksum_context *checksum_ctx,
+ manifest_file *m, uint8 *buffer,
+ int buffer_len, size_t *computed_len)
+{
+ char *relpath = m->pathname;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+ int checksumlen;
+
+ if (pg_checksum_update(checksum_ctx, buffer, buffer_len) < 0)
+ {
+ report_backup_error(context, "could not update checksum of file \"%s\"",
+ relpath);
+ return false;
+ }
+
+ /* Update the total count of computed checksum bytes. */
+ *computed_len += buffer_len;
+
+ /* Report progress */
+ done_size += buffer_len;
+ progress_report(context, false);
+
+ /* Yet to receive the full content of the file. */
+ if (*computed_len != m->size)
+ return true;
+
+ /* Get the final checksum. */
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
+ if (checksumlen < 0)
+ {
+ report_backup_error(context,
+ "could not finalize checksum of file \"%s\"",
+ relpath);
+ return false;
+ }
+
+ /* And check it against the manifest. */
+ if (checksumlen != m->checksum_length)
+ {
+ report_backup_error(context,
+ "file \"%s\" has checksum of length %d, but expected %d",
+ relpath, m->checksum_length, checksumlen);
+ return false;
+ }
+ else if (memcmp(checksumbuf, m->checksum_payload, checksumlen) != 0)
+ {
+ report_backup_error(context,
+ "checksum mismatch for file \"%s\"",
+ relpath);
+ return false;
+ }
+
+ return true;
+}
+
/*
* Verify the checksum of a single file.
*/
@@ -783,8 +849,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int fd;
int rc;
size_t bytes_read = 0;
- uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -806,19 +870,14 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/* Read the file chunk by chunk, updating the checksum as we go. */
while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
{
- bytes_read += rc;
- if (pg_checksum_update(&checksum_ctx, buffer, rc) < 0)
+ if (!verify_content_checksum(context, &checksum_ctx, m, buffer, rc,
+ &bytes_read))
{
- report_backup_error(context, "could not update checksum of file \"%s\"",
- relpath);
close(fd);
return;
}
-
- /* Report progress */
- done_size += rc;
- progress_report(context, false);
}
+
if (rc < 0)
report_backup_error(context, "could not read file \"%s\": %m",
relpath);
@@ -843,32 +902,9 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
* filesystem misbehavior.
*/
if (bytes_read != m->size)
- {
report_backup_error(context,
"file \"%s\" should contain %zu bytes, but read %zu bytes",
relpath, m->size, bytes_read);
- return;
- }
-
- /* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
- if (checksumlen < 0)
- {
- report_backup_error(context,
- "could not finalize checksum of file \"%s\"",
- relpath);
- return;
- }
-
- /* And check it against the manifest. */
- if (checksumlen != m->checksum_length)
- report_backup_error(context,
- "file \"%s\" has checksum of length %d, but expected %d",
- relpath, m->checksum_length, checksumlen);
- else if (memcmp(checksumbuf, m->checksum_payload, checksumlen) != 0)
- report_backup_error(context,
- "checksum mismatch for file \"%s\"",
- relpath);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index f4a93d3f137..fe56c63f99b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -103,6 +103,10 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, size_t filesize);
+extern bool verify_content_checksum(verifier_context *context,
+ pg_checksum_context *checksum_ctx,
+ manifest_file *m, uint8 *buf,
+ int buf_len, size_t *computed_len);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v3-0009-pg_verifybackup-Add-backup-format-and-compression.patchapplication/x-patch; name=v3-0009-pg_verifybackup-Add-backup-format-and-compression.patchDownload
From feecd16679b80f8a1b2c98cfd4ac8286e370d796 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v3 09/11] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 143 +++++++++++++++++++++-
1 file changed, 141 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index ba2a9b44d2d..2a7d4869ce5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
#include <sys/stat.h>
#include <time.h>
+#include "common/compression.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -56,6 +57,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
+static pg_compress_algorithm find_backup_compression(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
@@ -78,6 +81,9 @@ static const char *progname;
static uint64 total_size = 0;
static uint64 done_size = 0;
+char format = '\0'; /* p(lain)/t(ar) */
+pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
+
/*
* Main entry point.
*/
@@ -88,11 +94,13 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
{"wal-directory", required_argument, NULL, 'w'},
+ {"compress", required_argument, NULL, 'Z'},
{NULL, 0, NULL, 0}
};
@@ -103,6 +111,7 @@ main(int argc, char **argv)
bool quiet = false;
char *wal_directory = NULL;
char *pg_waldump_path = NULL;
+ bool tar_compression_specified = false;
pg_logging_init(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verifybackup"));
@@ -145,7 +154,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:Z:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -164,6 +173,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -180,6 +198,12 @@ main(int argc, char **argv)
wal_directory = pstrdup(optarg);
canonicalize_path(wal_directory);
break;
+ case 'Z':
+ if (!parse_compress_algorithm(optarg, &compress_algorithm))
+ pg_fatal("unrecognized compression algorithm: \"%s\"",
+ optarg);
+ tar_compression_specified = true;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -211,11 +235,41 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Complain if compression method specified but the format isn't tar. */
+ if (format != 't' && tar_compression_specified)
+ {
+ pg_log_error("only tar mode backups can be compressed");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Determine the backup format if it hasn't been specified. */
+ if (format == '\0')
+ format = find_backup_format(&context);
+
+ /*
+ * Determine the tar backup compression method if it hasn't been
+ * specified.
+ */
+ if (format == 't' && !tar_compression_specified)
+ compress_algorithm = find_backup_compression(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (format != 'p')
+ {
+ pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+ pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -270,8 +324,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!context.skip_checksums)
+ if (!context.skip_checksums && format == 'p')
verify_backup_checksums(&context);
/*
@@ -1039,6 +1098,84 @@ progress_report(verifier_context *context, bool finished)
fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ result = (stat(path, &sb) == 0) ? 'p' : 't';
+ pfree(path);
+
+ return result;
+}
+
+/*
+ * To determine the compression format, we will search for the main data
+ * directory archive and its extension, which starts with base.tar, as
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ */
+static pg_compress_algorithm
+find_backup_compression(verifier_context *context)
+{
+ char *path;
+ struct stat sb;
+ bool found;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * Is this a tar archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_NONE;
+
+ /*
+ * Is this a .tar.gz archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.gz");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_GZIP;
+
+ /*
+ * Is this a .tar.lz4 archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.lz4");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_LZ4;
+
+ /*
+ * Is this a .tar.zst archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.zst");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_ZSTD;
+
+ return PG_COMPRESSION_NONE; /* placate compiler */
+}
+
/*
* Print out usage information and exit.
*/
@@ -1051,11 +1188,13 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -Z, --compress=METHOD compress method (gzip, lz4, zstd, none) \n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
--
2.18.0
v3-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v3-0008-Refactor-split-verify_control_file.patchDownload
From 91cd914f9acb6ade6744cae6d45c6d56d3e78bc0 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:08:43 +0530
Subject: [PATCH v3 08/11] Refactor: split verify_control_file.
Moved verify_control_file doing the control file checks to a separate
function that can be called from other places as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 44 +++++++++++------------
src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++
2 files changed, 34 insertions(+), 24 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 7e94af387ad..ba2a9b44d2d 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -622,14 +619,20 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the manifest system identifier */
+ if (m != NULL && should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
}
/*
@@ -678,18 +681,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, size_t filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -705,9 +704,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index fe56c63f99b..a1a34c4b773 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
#include "common/controldata_utils.h"
#include "common/hashfn_unstable.h"
+#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
@@ -46,6 +47,16 @@ typedef struct manifest_file
#define should_verify_checksum(m) \
(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1. This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m)->matched) && !((m)->bad) && (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -107,6 +118,9 @@ extern bool verify_content_checksum(verifier_context *context,
pg_checksum_context *checksum_ctx,
manifest_file *m, uint8 *buf,
int buf_len, size_t *computed_len);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v3-0010-pg_verifybackup-Read-tar-files-and-verify-its-con.patchapplication/x-patch; name=v3-0010-pg_verifybackup-Read-tar-files-and-verify-its-con.patchDownload
From bb7de7434e4fb9b85d805946ca6aa7d2b9cc8e9e Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v3 10/11] pg_verifybackup: Read tar files and verify its
contents
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 277 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 216 +++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 10 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 505 insertions(+), 9 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..05119e4b7cb
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,277 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/astreamer.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archiveName;
+ Oid tblspcOid;
+
+ manifest_file *mfile;
+ size_t receivedBytes;
+ bool verifyChecksums;
+ bool verifyControlData;
+ pg_checksum_context *checksum_ctx;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void verify_checksums(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void verify_controldata(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archiveName = archive_name;
+ streamer->tblspcOid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ return &streamer->base;
+}
+
+/*
+ * It verifies each TAR member entry against the manifest data and performs
+ * checksum verification if enabled. Additionally, it validates the backup's
+ * system identifier against the backup_manifest.
+ */
+static void
+astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ if (!member->is_directory && !member->is_link &&
+ !should_ignore_relpath(mystreamer->context, member->pathname))
+ {
+ manifest_file *m;
+
+ /*
+ * The backup_manifest stores a relative path to the base
+ * directory for files belong tablespace, whereas
+ * <tablespaceoid>.tar doesn't. Prepare the required path,
+ * otherwise, the manfiest entry verification will fail.
+ */
+ if (OidIsValid(mystreamer->tblspcOid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspcOid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and manifest system identifier
+ * verification.
+ *
+ * We could have these checks while receiving contents.
+ * However, since contents are received in multiple
+ * iterations, this would result in these lengthy checks being
+ * performed multiple times. Instead, having a single flag
+ * would be more efficient.
+ */
+ if (m != NULL)
+ {
+ mystreamer->verifyChecksums =
+ (!mystreamer->context->skip_checksums &&
+ should_verify_checksum(m));
+ mystreamer->verifyControlData =
+ should_verify_control_data(mystreamer->context->manifest, m);
+ }
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Perform checksum verification as the file content becomes
+ * available, since the TAR format does not have random access to
+ * files like a normal backup directory, where checksum
+ * verification occurs at different points.
+ */
+ if (mystreamer->verifyChecksums)
+ verify_checksums(streamer, member, data, len);
+
+ /* Verify pg_control file information */
+ if (mystreamer->verifyControlData)
+ verify_controldata(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+ mystreamer->checksum_ctx = NULL;
+ mystreamer->mfile = NULL;
+ mystreamer->receivedBytes = 0;
+ mystreamer->verifyChecksums = false;
+ mystreamer->verifyControlData = false;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Perform checksum verification of the file content, which may be received in
+ * multiple iterations.
+ */
+static void
+verify_checksums(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ /* If we were first time for this file */
+ if (!mystreamer->checksum_ctx)
+ {
+ mystreamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ if (pg_checksum_init(mystreamer->checksum_ctx,
+ mystreamer->mfile->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archiveName, member->pathname);
+ mystreamer->verifyChecksums = false;
+ return;
+ }
+ }
+
+ /* Compute and do the checksum validation */
+ mystreamer->verifyChecksums =
+ verify_content_checksum(mystreamer->context,
+ mystreamer->checksum_ctx,
+ mystreamer->mfile,
+ (uint8 *) data, len,
+ &mystreamer->receivedBytes);
+}
+
+/*
+ * Prepare the control data from the received file contents, which are supposed
+ * to be from the pg_control file, including CRC calculation. Then, call the
+ * routines that perform the final verification of the control file information.
+ */
+static void
+verify_controldata(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(manifest->version != 1);
+
+ /* Should have whole control file data. */
+ if (!astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData)))
+ return;
+
+ pg_log_debug("%s: reading \"%s\"", mystreamer->archiveName,
+ member->pathname);
+
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: could not read control file: read %d of %zu",
+ mystreamer->archiveName, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data,
+ sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ (char *) (&control_file),
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification */
+ verify_control_data(&control_file, member->pathname, crc_ok,
+ manifest->system_identifier);
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 2a7d4869ce5..ffc7842b350 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,12 @@
#include "common/compression.h"
#include "common/parse_manifest.h"
+#include "common/relpath.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
#include "pg_verifybackup.h"
+#include "pgtar.h"
#include "pgtime.h"
/*
@@ -39,6 +42,11 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
+
+static void (*verify_backup_file_cb) (verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -63,6 +71,15 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
+static void verify_plain_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_content(verifier_context *context,
+ char *relpath, char *fullpath,
+ astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -71,6 +88,9 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
static void progress_report(verifier_context *context, bool finished);
static void usage(void);
@@ -150,6 +170,10 @@ main(int argc, char **argv)
*/
simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -254,6 +278,15 @@ main(int argc, char **argv)
if (format == 't' && !tar_compression_specified)
compress_algorithm = find_backup_compression(&context);
+ /*
+ * Setup the required callback function to verify plain or tar backup
+ * files.
+ */
+ if (format == 'p')
+ verify_backup_file_cb = verify_plain_file_cb;
+ else
+ verify_backup_file_cb = verify_tar_file_cb;
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
@@ -633,7 +666,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
}
/*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
*
* The arguments to this function have the same meaning as the arguments to
* verify_backup_directory.
@@ -642,7 +676,6 @@ static void
verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
{
struct stat sb;
- manifest_file *m;
if (stat(fullpath, &sb) != 0)
{
@@ -675,8 +708,25 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Do the further verifications */
+ verify_backup_file_cb(context, relpath, fullpath, sb.st_size);
+}
+
+/*
+ * Verify one plan file or a symlink.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_plain_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ manifest_file *m;
+
/* Check the backup manifest entry for this file. */
- m = verify_manifest_entry(context, relpath, sb.st_size);
+ m = verify_manifest_entry(context, relpath, filesize);
/* Validate the manifest system identifier */
if (m != NULL && should_verify_control_data(context->manifest, m))
@@ -694,6 +744,124 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
}
}
+/*
+ * Verify one tar file.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_tar_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ astreamer *streamer;
+ Oid tblspc_oid = InvalidOid;
+ int file_name_len;
+ int file_extn_len = 0; /* placate compiler */
+ char *file_extn = "";
+
+ /* Should be tar backup */
+ Assert(format == 't');
+
+ /* Find the tar file extension. */
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ {
+ file_extn = ".tar";
+ file_extn_len = 4;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_GZIP)
+ {
+ file_extn = ".tar.gz";
+ file_extn_len = 7;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ {
+ file_extn = ".tar.lz4";
+ file_extn_len = 8;
+ }
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ {
+ file_extn = ".tar.zst";
+ file_extn_len = 8;
+ }
+
+ /*
+ * Ensure that we have the correct file type corresponding to the backup
+ * format.
+ */
+ file_name_len = strlen(relpath);
+ if (file_name_len < file_extn_len ||
+ strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0)
+ {
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting tar file",
+ relpath);
+ else
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting \"%s\" compressed tar file",
+ relpath,
+ get_compress_algorithm_name(compress_algorithm));
+ return;
+ }
+
+ /*
+ * For the tablespace, pg_basebackup writes the data out to
+ * <tablespaceoid>.tar. If a file matches that format, then extract the
+ * tablespaceoid, which we need to prepare the paths of the files
+ * belonging to that tablespace relative to the base directory.
+ */
+ if (strspn(relpath, "0123456789") == (file_name_len - file_extn_len))
+ tblspc_oid = strtoi64(relpath, NULL, 10);
+
+ streamer = create_archive_verifier(context, relpath, tblspc_oid);
+ verify_tar_content(context, relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+}
+
+/*
+ * Reads a given tar file in predefined chunks and pass to astreamer. Which
+ * initiates routines for decompression (if necessary) then verification
+ * of each member within the tar archive.
+ */
+static void
+verify_tar_content(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1122,10 +1290,10 @@ find_backup_format(verifier_context *context)
}
/*
- * To determine the compression format, we will search for the main data
- * directory archive and its extension, which starts with base.tar, as
* pg_basebackup writes the main data directory to an archive file named
- * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ * base.tar, followed by a compression type extension such as .gz, .lz4, or
+ * .zst. To determine the compression format, we need to search for this main
+ * data directory archive file.
*/
static pg_compress_algorithm
find_backup_compression(verifier_context *context)
@@ -1176,6 +1344,42 @@ find_backup_compression(verifier_context *context)
return PG_COMPRESSION_NONE; /* placate compiler */
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algorithm == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print out usage information and exit.
*/
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index a1a34c4b773..fc53e069854 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -129,4 +129,14 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();
extern bool should_ignore_relpath(verifier_context *context, const char *relpath);
+/* Forward declarations to avoid fe_utils/astreamer.h include. */
+struct astreamer;
+typedef struct astreamer astreamer;
+
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f59f7acdb7e..e4f4c64c1a2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3321,6 +3321,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
--
2.18.0
v3-0006-Refactor-split-verify_backup_file-function.patchapplication/x-patch; name=v3-0006-Refactor-split-verify_backup_file-function.patchDownload
From 031e17e9e5bf36e1f6ae5e9b1b3ae15b14f60088 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Mon, 17 Jun 2024 14:17:22 +0530
Subject: [PATCH v3 06/11] Refactor: split verify_backup_file() function.
Separate the manifest entry verification code into a new function.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 37 ++++++++++++++++-------
1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 134d575d738..bb732bf06ca 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -619,6 +619,27 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, size_t filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -626,29 +647,21 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
/* Update statistics for progress report, if necessary */
if (context->show_progress && !context->skip_checksums &&
should_verify_checksum(m))
@@ -660,6 +673,8 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
--
2.18.0
v3-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchapplication/x-patch; name=v3-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchDownload
From cf2e5f89557cb6716760f49be5e78cda9b066914 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:32:11 +0530
Subject: [PATCH v3 05/11] Refactor: move some part of pg_verifybackup.c to
pg_verifybackup.h
---
src/bin/pg_verifybackup/pg_verifybackup.c | 97 +-----------------
src/bin/pg_verifybackup/pg_verifybackup.h | 114 ++++++++++++++++++++++
2 files changed, 119 insertions(+), 92 deletions(-)
create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 74b3a66835a..134d575d738 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "pg_verifybackup.h"
#include "pgtime.h"
/*
@@ -40,85 +39,6 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE (128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
- uint32 status; /* hash status */
- const char *pathname;
- size_t size;
- pg_checksum_type checksum_type;
- int checksum_length;
- uint8 *checksum_payload;
- bool matched;
- bool bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX manifest_files
-#define SH_ELEMENT_TYPE manifest_file
-#define SH_KEY_TYPE const char *
-#define SH_KEY pathname
-#define SH_HASH_KEY(tb, key) hash_string(key)
-#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
-#define SH_SCOPE static inline
-#define SH_RAW_ALLOCATOR pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
- TimeLineID tli;
- XLogRecPtr start_lsn;
- XLogRecPtr end_lsn;
- struct manifest_wal_range *next;
- struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
- int version;
- uint64 system_identifier;
- manifest_files_hash *files;
- manifest_wal_range *first_wal_range;
- manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
- manifest_data *manifest;
- char *backup_directory;
- SimpleStringList ignore_list;
- bool show_progress;
- bool skip_checksums;
- bool exit_on_error;
- bool saw_any_error;
-} verifier_context;
-
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -152,13 +72,6 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
-static void report_backup_error(verifier_context *context,
- const char *pg_restrict fmt,...)
- pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
- pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
static void progress_report(verifier_context *context, bool finished);
static void usage(void);
@@ -554,7 +467,7 @@ verifybackup_per_file_cb(JsonManifestParseContext *context,
bool found;
/* Make a new entry in the hash table for this file. */
- m = manifest_files_insert(ht, pathname, &found);
+ m = manifest_files_insert(ht, (char *) pathname, &found);
if (found)
report_fatal_error("duplicate path name in backup manifest: \"%s\"",
pathname);
@@ -978,7 +891,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
* Update the context to indicate that we saw an error, and exit if the
* context says we should.
*/
-static void
+void
report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
{
va_list ap;
@@ -995,7 +908,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
/*
* Report a fatal error and exit
*/
-static void
+void
report_fatal_error(const char *pg_restrict fmt,...)
{
va_list ap;
@@ -1014,7 +927,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
* Note that by "prefix" we mean a parent directory; for this purpose,
* "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
*/
-static bool
+bool
should_ignore_relpath(verifier_context *context, const char *relpath)
{
SimpleStringListCell *cell;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..f4a93d3f137
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,114 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ * Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+extern bool show_progress;
+extern bool skip_checksums;
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+ uint32 status; /* hash status */
+ char *pathname;
+ size_t size;
+ pg_checksum_type checksum_type;
+ int checksum_length;
+ uint8 *checksum_payload;
+ bool matched;
+ bool bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+ (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX manifest_files
+#define SH_ELEMENT_TYPE manifest_file
+#define SH_KEY_TYPE char *
+#define SH_KEY pathname
+#define SH_HASH_KEY(tb, key) hash_string(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ struct manifest_wal_range *next;
+ struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+ int version;
+ uint64 system_identifier;
+ manifest_files_hash *files;
+ manifest_wal_range *first_wal_range;
+ manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+ manifest_data *manifest;
+ char *backup_directory;
+ SimpleStringList ignore_list;
+ bool show_progress;
+ bool skip_checksums;
+ bool exit_on_error;
+ bool saw_any_error;
+} verifier_context;
+
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, size_t filesize);
+
+extern void report_backup_error(verifier_context *context,
+ const char *pg_restrict fmt,...)
+ pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+ pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context, const char *relpath);
+
+#endif /* PG_VERIFYBACKUP_H */
--
2.18.0
v3-0004-Refactor-move-show_progress-and-skip_checksums-to.patchapplication/x-patch; name=v3-0004-Refactor-move-show_progress-and-skip_checksums-to.patchDownload
From 67985f9eb69510800cd53333ce456a72363499d0 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 11:43:52 +0530
Subject: [PATCH v3 04/11] Refactor: move show_progress and skip_checksums to
verifier_context struct
---
src/bin/pg_verifybackup/pg_verifybackup.c | 29 +++++++++++------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..74b3a66835a 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -113,6 +113,8 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ bool show_progress;
+ bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
} verifier_context;
@@ -157,15 +159,11 @@ static void report_fatal_error(const char *pg_restrict fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();
static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-static void progress_report(bool finished);
+static void progress_report(verifier_context *context, bool finished);
static void usage(void);
static const char *progname;
-/* options */
-static bool show_progress = false;
-static bool skip_checksums = false;
-
/* Progress indicators */
static uint64 total_size = 0;
static uint64 done_size = 0;
@@ -260,13 +258,13 @@ main(int argc, char **argv)
no_parse_wal = true;
break;
case 'P':
- show_progress = true;
+ context.show_progress = true;
break;
case 'q':
quiet = true;
break;
case 's':
- skip_checksums = true;
+ context.skip_checksums = true;
break;
case 'w':
wal_directory = pstrdup(optarg);
@@ -299,7 +297,7 @@ main(int argc, char **argv)
}
/* Complain if the specified arguments conflict */
- if (show_progress && quiet)
+ if (context.show_progress && quiet)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
@@ -363,7 +361,7 @@ main(int argc, char **argv)
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
*/
- if (!skip_checksums)
+ if (!context.skip_checksums)
verify_backup_checksums(&context);
/*
@@ -739,7 +737,8 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
verify_control_file(fullpath, context->manifest->system_identifier);
/* Update statistics for progress report, if necessary */
- if (show_progress && !skip_checksums && should_verify_checksum(m))
+ if (context->show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
total_size += m->size;
/*
@@ -815,7 +814,7 @@ verify_backup_checksums(verifier_context *context)
manifest_file *m;
uint8 *buffer;
- progress_report(false);
+ progress_report(context, false);
buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
@@ -841,7 +840,7 @@ verify_backup_checksums(verifier_context *context)
pfree(buffer);
- progress_report(true);
+ progress_report(context, true);
}
/*
@@ -890,7 +889,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/* Report progress */
done_size += rc;
- progress_report(false);
+ progress_report(context, false);
}
if (rc < 0)
report_backup_error(context, "could not read file \"%s\": %m",
@@ -1045,7 +1044,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
* is moved to the next line.
*/
static void
-progress_report(bool finished)
+progress_report(verifier_context *context, bool finished)
{
static pg_time_t last_progress_report = 0;
pg_time_t now;
@@ -1053,7 +1052,7 @@ progress_report(bool finished)
char totalsize_str[32];
char donesize_str[32];
- if (!show_progress)
+ if (!context->show_progress)
return;
now = time(NULL);
--
2.18.0
v3-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchapplication/x-patch; name=v3-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchDownload
From 6577e6c506cba5d91721ba9f55ea9400fc1a72c0 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 11:20:52 +0530
Subject: [PATCH v3 03/11] Refactor: move astreamer* files to fe_utils to make
common availability of it.
To make it accessible to other code, we need to move the ASTREAMER
code (previously known as BBSTREAMER) to a common location. The
appropriate place would be src/fe_utils, as it is a frontend
infrastructure intended for shared use.
---
meson.build | 2 +-
src/bin/pg_basebackup/Makefile | 7 +------
src/bin/pg_basebackup/astreamer_inject.h | 2 +-
src/bin/pg_basebackup/meson.build | 5 -----
src/fe_utils/Makefile | 5 +++++
src/{bin/pg_basebackup => fe_utils}/astreamer_file.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c | 2 +-
src/fe_utils/meson.build | 5 +++++
src/{bin/pg_basebackup => include/fe_utils}/astreamer.h | 0
12 files changed, 18 insertions(+), 18 deletions(-)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_file.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c (99%)
rename src/{bin/pg_basebackup => include/fe_utils}/astreamer.h (100%)
diff --git a/meson.build b/meson.build
index 7de0371226d..f7a5d2aea9a 100644
--- a/meson.build
+++ b/meson.build
@@ -3027,7 +3027,7 @@ frontend_common_code = declare_dependency(
compile_args: ['-DFRONTEND'],
include_directories: [postgres_inc],
sources: generated_headers,
- dependencies: [os_deps, zlib, zstd],
+ dependencies: [os_deps, zlib, zstd, lz4],
)
backend_common_code = declare_dependency(
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a71af2d48a7..f1e73058b23 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,7 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- astreamer_file.o \
- astreamer_gzip.o \
- astreamer_inject.o \
- astreamer_lz4.o \
- astreamer_tar.o \
- astreamer_zstd.o
+ astreamer_inject.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
index 8504b3f5e0d..aeed533862b 100644
--- a/src/bin/pg_basebackup/astreamer_inject.h
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -12,7 +12,7 @@
#ifndef ASTREAMER_INJECT_H
#define ASTREAMER_INJECT_H
-#include "astreamer.h"
+#include "fe_utils/astreamer.h"
#include "pqexpbuffer.h"
extern astreamer *astreamer_recovery_injector_new(astreamer *next,
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index a68dbd7837d..9101fc18438 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,7 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'astreamer_file.c',
- 'astreamer_gzip.c',
'astreamer_inject.c',
- 'astreamer_lz4.c',
- 'astreamer_tar.c',
- 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 946c05258f0..2694be4b859 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -21,6 +21,11 @@ override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
OBJS = \
archive.o \
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o \
cancel.o \
conditional.o \
connect_utils.o \
diff --git a/src/bin/pg_basebackup/astreamer_file.c b/src/fe_utils/astreamer_file.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_file.c
rename to src/fe_utils/astreamer_file.c
index 2742385e103..13d1192c6e6 100644
--- a/src/bin/pg_basebackup/astreamer_file.c
+++ b/src/fe_utils/astreamer_file.c
@@ -13,10 +13,10 @@
#include <unistd.h>
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
typedef struct astreamer_plain_writer
{
diff --git a/src/bin/pg_basebackup/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_gzip.c
rename to src/fe_utils/astreamer_gzip.c
index 6f7c27afbbc..dd28defac7b 100644
--- a/src/bin/pg_basebackup/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -17,10 +17,10 @@
#include <zlib.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef HAVE_LIBZ
typedef struct astreamer_gzip_writer
diff --git a/src/bin/pg_basebackup/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_lz4.c
rename to src/fe_utils/astreamer_lz4.c
index 1c40d7d8ad5..d8b2a367e47 100644
--- a/src/bin/pg_basebackup/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -17,10 +17,10 @@
#include <lz4frame.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_LZ4
typedef struct astreamer_lz4_frame
diff --git a/src/bin/pg_basebackup/astreamer_tar.c b/src/fe_utils/astreamer_tar.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_tar.c
rename to src/fe_utils/astreamer_tar.c
index 673690cd18f..f5d3562d280 100644
--- a/src/bin/pg_basebackup/astreamer_tar.c
+++ b/src/fe_utils/astreamer_tar.c
@@ -23,8 +23,8 @@
#include <time.h>
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#include "pgtar.h"
typedef struct astreamer_tar_parser
diff --git a/src/bin/pg_basebackup/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_zstd.c
rename to src/fe_utils/astreamer_zstd.c
index 58dc679ef99..45f6cb67363 100644
--- a/src/bin/pg_basebackup/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -17,8 +17,8 @@
#include <zstd.h>
#endif
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_ZSTD
diff --git a/src/fe_utils/meson.build b/src/fe_utils/meson.build
index 14d0482a2cc..043021d826d 100644
--- a/src/fe_utils/meson.build
+++ b/src/fe_utils/meson.build
@@ -2,6 +2,11 @@
fe_utils_sources = files(
'archive.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
'cancel.c',
'conditional.c',
'connect_utils.c',
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/include/fe_utils/astreamer.h
similarity index 100%
rename from src/bin/pg_basebackup/astreamer.h
rename to src/include/fe_utils/astreamer.h
--
2.18.0
v3-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchapplication/x-patch; name=v3-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchDownload
From 90ea115b1a3d9ffcf2d58f9c4a63663245679d1e Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 17 Jul 2024 14:23:27 +0530
Subject: [PATCH v3 02/11] Refactor: Add astreamer_inject.h and move related
declarations to it.
---
src/bin/pg_basebackup/astreamer.h | 6 ------
src/bin/pg_basebackup/astreamer_inject.c | 2 +-
src/bin/pg_basebackup/astreamer_inject.h | 24 ++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 2 +-
4 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer_inject.h
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
index 6b0047418bb..9d0a8c4d0c2 100644
--- a/src/bin/pg_basebackup/astreamer.h
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -217,10 +217,4 @@ extern astreamer *astreamer_tar_parser_new(astreamer *next);
extern astreamer *astreamer_tar_terminator_new(astreamer *next);
extern astreamer *astreamer_tar_archiver_new(astreamer *next);
-extern astreamer *astreamer_recovery_injector_new(astreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void astreamer_inject_file(astreamer *streamer, char *pathname,
- char *data, int len);
-
#endif
diff --git a/src/bin/pg_basebackup/astreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
index 7f1decded8d..4ad8381f102 100644
--- a/src/bin/pg_basebackup/astreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -11,7 +11,7 @@
#include "postgres_fe.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "common/file_perm.h"
#include "common/logging.h"
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
new file mode 100644
index 00000000000..8504b3f5e0d
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_inject.h
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer_inject.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_INJECT_H
+#define ASTREAMER_INJECT_H
+
+#include "astreamer.h"
+#include "pqexpbuffer.h"
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4179b064cbc..1e753e40c97 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,7 +26,7 @@
#endif
#include "access/xlog_internal.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "backup/basebackup.h"
#include "common/compression.h"
#include "common/file_perm.h"
--
2.18.0
v3-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchapplication/x-patch; name=v3-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchDownload
From ea8d3d56a37ded2e24535577e5c97f43a782b04d Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 09:39:32 +0530
Subject: [PATCH v3 01/11] Refactor: Rename all bbstreamer references to
astreamer.
BBSTREAMER is specific to pg_basebackup; we need a more generalized
name so it can be placed in a common area, making it accessible for
other modules. Renaming it to ASTREAMER, short for ARCHIVE STREAMER,
makes it more general.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/astreamer.h | 226 +++++++++++++
.../{bbstreamer_file.c => astreamer_file.c} | 148 ++++----
.../{bbstreamer_gzip.c => astreamer_gzip.c} | 154 ++++-----
...bbstreamer_inject.c => astreamer_inject.c} | 152 ++++-----
.../{bbstreamer_lz4.c => astreamer_lz4.c} | 172 +++++-----
.../{bbstreamer_tar.c => astreamer_tar.c} | 316 +++++++++---------
.../{bbstreamer_zstd.c => astreamer_zstd.c} | 160 ++++-----
src/bin/pg_basebackup/bbstreamer.h | 226 -------------
src/bin/pg_basebackup/meson.build | 12 +-
src/bin/pg_basebackup/nls.mk | 12 +-
src/bin/pg_basebackup/pg_basebackup.c | 74 ++--
src/tools/pgindent/typedefs.list | 26 +-
13 files changed, 845 insertions(+), 845 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer.h
rename src/bin/pg_basebackup/{bbstreamer_file.c => astreamer_file.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_gzip.c => astreamer_gzip.c} (62%)
rename src/bin/pg_basebackup/{bbstreamer_inject.c => astreamer_inject.c} (53%)
rename src/bin/pg_basebackup/{bbstreamer_lz4.c => astreamer_lz4.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_tar.c => astreamer_tar.c} (50%)
rename src/bin/pg_basebackup/{bbstreamer_zstd.c => astreamer_zstd.c} (64%)
delete mode 100644 src/bin/pg_basebackup/bbstreamer.h
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 26c53e473f5..a71af2d48a7 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,12 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- bbstreamer_file.o \
- bbstreamer_gzip.o \
- bbstreamer_inject.o \
- bbstreamer_lz4.o \
- bbstreamer_tar.o \
- bbstreamer_zstd.o
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_inject.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
new file mode 100644
index 00000000000..6b0047418bb
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * astreamer objects for further processing. The astreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent astreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_H
+#define ASTREAMER_H
+
+#include "common/compression.h"
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct astreamer;
+struct astreamer_ops;
+typedef struct astreamer astreamer;
+typedef struct astreamer_ops astreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a astreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see astreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one ASTREAMER_MEMBER_HEADER chunk and
+ * exactly one ASTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * ASTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly ASTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last ASTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ ASTREAMER_UNKNOWN,
+ ASTREAMER_MEMBER_HEADER,
+ ASTREAMER_MEMBER_CONTENTS,
+ ASTREAMER_MEMBER_TRAILER,
+ ASTREAMER_ARCHIVE_TRAILER,
+} astreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as ASTREAMER_MEMBER_HEADER,
+ * ASTREAMER_MEMBER_CONTENTS, or ASTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} astreamer_member;
+
+/*
+ * Generally, each type of astreamer will define its own struct, but the
+ * first element should be 'astreamer base'. A astreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the astreamer_ops object which contains the
+ * function pointers appropriate to this type of astreamer.
+ *
+ * bbs_next is a pointer to the successor astreamer, for those types of
+ * astreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of astreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct astreamer
+{
+ const astreamer_ops *bbs_ops;
+ astreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a astreamer. The 'content' callback is
+ * called repeatedly, as described in the astreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * astreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct astreamer_ops
+{
+ void (*content) (astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+ void (*finalize) (astreamer *streamer);
+ void (*free) (astreamer *streamer);
+};
+
+/* Send some content to a astreamer. */
+static inline void
+astreamer_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a astreamer. */
+static inline void
+astreamer_finalize(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a astreamer. */
+static inline void
+astreamer_free(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the astreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+astreamer_buffer_bytes(astreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * astreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+astreamer_buffer_until(astreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ astreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ astreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating astreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern astreamer *astreamer_plain_writer_new(char *pathname, FILE *file);
+extern astreamer *astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern astreamer *astreamer_gzip_decompressor_new(astreamer *next);
+extern astreamer *astreamer_lz4_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_lz4_decompressor_new(astreamer *next);
+extern astreamer *astreamer_zstd_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_zstd_decompressor_new(astreamer *next);
+extern astreamer *astreamer_tar_parser_new(astreamer *next);
+extern astreamer *astreamer_tar_terminator_new(astreamer *next);
+extern astreamer *astreamer_tar_archiver_new(astreamer *next);
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/astreamer_file.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_file.c
rename to src/bin/pg_basebackup/astreamer_file.c
index bab6cd4a6b1..2742385e103 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/astreamer_file.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_file.c
+ * astreamer_file.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_file.c
+ * src/bin/pg_basebackup/astreamer_file.c
*-------------------------------------------------------------------------
*/
@@ -13,60 +13,60 @@
#include <unistd.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
-typedef struct bbstreamer_plain_writer
+typedef struct astreamer_plain_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
FILE *file;
bool should_close_file;
-} bbstreamer_plain_writer;
+} astreamer_plain_writer;
-typedef struct bbstreamer_extractor
+typedef struct astreamer_extractor
{
- bbstreamer base;
+ astreamer base;
char *basepath;
const char *(*link_map) (const char *);
void (*report_output_file) (const char *);
char filename[MAXPGPATH];
FILE *file;
-} bbstreamer_extractor;
+} astreamer_extractor;
-static void bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+static void astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_plain_writer_finalize(astreamer *streamer);
+static void astreamer_plain_writer_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_plain_writer_ops = {
- .content = bbstreamer_plain_writer_content,
- .finalize = bbstreamer_plain_writer_finalize,
- .free = bbstreamer_plain_writer_free
+static const astreamer_ops astreamer_plain_writer_ops = {
+ .content = astreamer_plain_writer_content,
+ .finalize = astreamer_plain_writer_finalize,
+ .free = astreamer_plain_writer_free
};
-static void bbstreamer_extractor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_extractor_finalize(bbstreamer *streamer);
-static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void astreamer_extractor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_extractor_finalize(astreamer *streamer);
+static void astreamer_extractor_free(astreamer *streamer);
static void extract_directory(const char *filename, mode_t mode);
static void extract_link(const char *filename, const char *linktarget);
static FILE *create_file_for_extract(const char *filename, mode_t mode);
-static const bbstreamer_ops bbstreamer_extractor_ops = {
- .content = bbstreamer_extractor_content,
- .finalize = bbstreamer_extractor_finalize,
- .free = bbstreamer_extractor_free
+static const astreamer_ops astreamer_extractor_ops = {
+ .content = astreamer_extractor_content,
+ .finalize = astreamer_extractor_finalize,
+ .free = astreamer_extractor_free
};
/*
- * Create a bbstreamer that just writes data to a file.
+ * Create a astreamer that just writes data to a file.
*
* The caller must specify a pathname and may specify a file. The pathname is
* used for error-reporting purposes either way. If file is NULL, the pathname
@@ -74,14 +74,14 @@ static const bbstreamer_ops bbstreamer_extractor_ops = {
* for writing and closed when done. If file is not NULL, the data is written
* there.
*/
-bbstreamer *
-bbstreamer_plain_writer_new(char *pathname, FILE *file)
+astreamer *
+astreamer_plain_writer_new(char *pathname, FILE *file)
{
- bbstreamer_plain_writer *streamer;
+ astreamer_plain_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_plain_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_plain_writer_ops;
+ streamer = palloc0(sizeof(astreamer_plain_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_plain_writer_ops;
streamer->pathname = pstrdup(pathname);
streamer->file = file;
@@ -101,13 +101,13 @@ bbstreamer_plain_writer_new(char *pathname, FILE *file)
* Write archive content to file.
*/
static void
-bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (len == 0)
return;
@@ -128,11 +128,11 @@ bbstreamer_plain_writer_content(bbstreamer *streamer,
* the file if we opened it, but not if the caller provided it.
*/
static void
-bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+astreamer_plain_writer_finalize(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
pg_fatal("could not close file \"%s\": %m",
@@ -143,14 +143,14 @@ bbstreamer_plain_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_plain_writer_free(bbstreamer *streamer)
+astreamer_plain_writer_free(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
Assert(!mystreamer->should_close_file);
Assert(mystreamer->base.bbs_next == NULL);
@@ -160,13 +160,13 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
}
/*
- * Create a bbstreamer that extracts an archive.
+ * Create a astreamer that extracts an archive.
*
* All pathnames in the archive are interpreted relative to basepath.
*
- * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * Unlike e.g. astreamer_plain_writer_new() we can't do anything useful here
* with untyped chunks; we need typed chunks which follow the rules described
- * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * in astreamer.h. Assuming we have that, we don't need to worry about the
* original archive format; it's enough to just look at the member information
* provided and write to the corresponding file.
*
@@ -179,16 +179,16 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
* new output file. The pathname to that file is passed as an argument. If
* NULL, the call is skipped.
*/
-bbstreamer *
-bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *))
+astreamer *
+astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
{
- bbstreamer_extractor *streamer;
+ astreamer_extractor *streamer;
- streamer = palloc0(sizeof(bbstreamer_extractor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_extractor_ops;
+ streamer = palloc0(sizeof(astreamer_extractor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_extractor_ops;
streamer->basepath = pstrdup(basepath);
streamer->link_map = link_map;
streamer->report_output_file = report_output_file;
@@ -200,19 +200,19 @@ bbstreamer_extractor_new(const char *basepath,
* Extract archive contents to the filesystem.
*/
static void
-bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_extractor_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
int fnamelen;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
+ Assert(context != ASTREAMER_UNKNOWN);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
Assert(mystreamer->file == NULL);
/* Prepend basepath. */
@@ -245,7 +245,7 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
mystreamer->report_output_file(mystreamer->filename);
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
if (mystreamer->file == NULL)
break;
@@ -260,14 +260,14 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
if (mystreamer->file == NULL)
break;
fclose(mystreamer->file);
mystreamer->file = NULL;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
break;
default:
@@ -375,10 +375,10 @@ create_file_for_extract(const char *filename, mode_t mode)
* There's nothing to do here but sanity checking.
*/
static void
-bbstreamer_extractor_finalize(bbstreamer *streamer)
+astreamer_extractor_finalize(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
- = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
+ = (astreamer_extractor *) streamer;
Assert(mystreamer->file == NULL);
}
@@ -387,9 +387,9 @@ bbstreamer_extractor_finalize(bbstreamer *streamer)
* Free memory.
*/
static void
-bbstreamer_extractor_free(bbstreamer *streamer)
+astreamer_extractor_free(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
pfree(mystreamer->basepath);
pfree(mystreamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/astreamer_gzip.c
similarity index 62%
rename from src/bin/pg_basebackup/bbstreamer_gzip.c
rename to src/bin/pg_basebackup/astreamer_gzip.c
index 0417fd9bc2c..6f7c27afbbc 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/astreamer_gzip.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_gzip.c
+ * astreamer_gzip.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_gzip.c
+ * src/bin/pg_basebackup/astreamer_gzip.c
*-------------------------------------------------------------------------
*/
@@ -17,74 +17,74 @@
#include <zlib.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
+typedef struct astreamer_gzip_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
gzFile gzfile;
-} bbstreamer_gzip_writer;
+} astreamer_gzip_writer;
-typedef struct bbstreamer_gzip_decompressor
+typedef struct astreamer_gzip_decompressor
{
- bbstreamer base;
+ astreamer base;
z_stream zstream;
size_t bytes_written;
-} bbstreamer_gzip_decompressor;
+} astreamer_gzip_decompressor;
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static void astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_writer_finalize(astreamer *streamer);
+static void astreamer_gzip_writer_free(astreamer *streamer);
static const char *get_gz_error(gzFile gzf);
-static const bbstreamer_ops bbstreamer_gzip_writer_ops = {
- .content = bbstreamer_gzip_writer_content,
- .finalize = bbstreamer_gzip_writer_finalize,
- .free = bbstreamer_gzip_writer_free
+static const astreamer_ops astreamer_gzip_writer_ops = {
+ .content = astreamer_gzip_writer_content,
+ .finalize = astreamer_gzip_writer_finalize,
+ .free = astreamer_gzip_writer_free
};
-static void bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_decompressor_free(bbstreamer *streamer);
+static void astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_decompressor_finalize(astreamer *streamer);
+static void astreamer_gzip_decompressor_free(astreamer *streamer);
static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
static void gzip_pfree(void *opaque, void *address);
-static const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
- .content = bbstreamer_gzip_decompressor_content,
- .finalize = bbstreamer_gzip_decompressor_finalize,
- .free = bbstreamer_gzip_decompressor_free
+static const astreamer_ops astreamer_gzip_decompressor_ops = {
+ .content = astreamer_gzip_decompressor_content,
+ .finalize = astreamer_gzip_decompressor_finalize,
+ .free = astreamer_gzip_decompressor_free
};
#endif
/*
- * Create a bbstreamer that just compresses data using gzip, and then writes
+ * Create a astreamer that just compresses data using gzip, and then writes
* it to a file.
*
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * As in the case of astreamer_plain_writer_new, pathname is always used
* for error reporting purposes; if file is NULL, it is also the opened and
* closed so that the data may be written there.
*/
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress)
+astreamer *
+astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_writer *streamer;
+ astreamer_gzip_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_gzip_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_writer_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_writer_ops;
streamer->pathname = pstrdup(pathname);
@@ -123,13 +123,13 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file,
* Write archive content to gzip file.
*/
static void
-bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
if (len == 0)
return;
@@ -151,16 +151,16 @@ bbstreamer_gzip_writer_content(bbstreamer *streamer,
*
* It makes no difference whether we opened the file or the caller did it,
* because libz provides no way of avoiding a close on the underlying file
- * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * handle. Notice, however, that astreamer_gzip_writer_new() uses dup() to
* work around this issue, so that the behavior from the caller's viewpoint
- * is the same as for bbstreamer_plain_writer.
+ * is the same as for astreamer_plain_writer.
*/
static void
-bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+astreamer_gzip_writer_finalize(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
errno = 0; /* in case gzclose() doesn't set it */
if (gzclose(mystreamer->gzfile) != 0)
@@ -171,14 +171,14 @@ bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_gzip_writer_free(bbstreamer *streamer)
+astreamer_gzip_writer_free(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
Assert(mystreamer->base.bbs_next == NULL);
Assert(mystreamer->gzfile == NULL);
@@ -208,18 +208,18 @@ get_gz_error(gzFile gzf)
* Create a new base backup streamer that performs decompression of gzip
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_gzip_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_gzip_decompressor_new(astreamer *next)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_decompressor *streamer;
+ astreamer_gzip_decompressor *streamer;
z_stream *zs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_gzip_decompressor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_decompressor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -258,15 +258,15 @@ bbstreamer_gzip_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
z_stream *zs;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
zs = &mystreamer->zstream;
zs->next_in = (const uint8 *) data;
@@ -301,9 +301,9 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
/* If output buffer is full then pass data to next streamer */
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen, context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
mystreamer->bytes_written = 0;
}
}
@@ -313,31 +313,31 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer)
+astreamer_gzip_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_gzip_decompressor_free(bbstreamer *streamer)
+astreamer_gzip_decompressor_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
similarity index 53%
rename from src/bin/pg_basebackup/bbstreamer_inject.c
rename to src/bin/pg_basebackup/astreamer_inject.c
index 194026b56e9..7f1decded8d 100644
--- a/src/bin/pg_basebackup/bbstreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -1,51 +1,51 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_inject.c
+ * astreamer_inject.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_inject.c
+ * src/bin/pg_basebackup/astreamer_inject.c
*-------------------------------------------------------------------------
*/
#include "postgres_fe.h"
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
-typedef struct bbstreamer_recovery_injector
+typedef struct astreamer_recovery_injector
{
- bbstreamer base;
+ astreamer base;
bool skip_file;
bool is_recovery_guc_supported;
bool is_postgresql_auto_conf;
bool found_postgresql_auto_conf;
PQExpBuffer recoveryconfcontents;
- bbstreamer_member member;
-} bbstreamer_recovery_injector;
+ astreamer_member member;
+} astreamer_recovery_injector;
-static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
-static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+static void astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_recovery_injector_finalize(astreamer *streamer);
+static void astreamer_recovery_injector_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
- .content = bbstreamer_recovery_injector_content,
- .finalize = bbstreamer_recovery_injector_finalize,
- .free = bbstreamer_recovery_injector_free
+static const astreamer_ops astreamer_recovery_injector_ops = {
+ .content = astreamer_recovery_injector_content,
+ .finalize = astreamer_recovery_injector_finalize,
+ .free = astreamer_recovery_injector_free
};
/*
- * Create a bbstreamer that can edit recoverydata into an archive stream.
+ * Create a astreamer that can edit recoverydata into an archive stream.
*
- * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
- * per the conventions described in bbstreamer.h; the chunks forwarded to
- * the next bbstreamer will be similarly typed, but the
- * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * The input should be a series of typed chunks (not ASTREAMER_UNKNOWN) as
+ * per the conventions described in astreamer.h; the chunks forwarded to
+ * the next astreamer will be similarly typed, but the
+ * ASTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
* edited the archive stream.
*
* Our goal is to do one of the following three things with the content passed
@@ -61,16 +61,16 @@ static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
* zero-length standby.signal file, dropping any file with that name from
* the archive.
*/
-bbstreamer *
-bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents)
+astreamer *
+astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
{
- bbstreamer_recovery_injector *streamer;
+ astreamer_recovery_injector *streamer;
- streamer = palloc0(sizeof(bbstreamer_recovery_injector));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_recovery_injector_ops;
+ streamer = palloc0(sizeof(astreamer_recovery_injector));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_recovery_injector_ops;
streamer->base.bbs_next = next;
streamer->is_recovery_guc_supported = is_recovery_guc_supported;
streamer->recoveryconfcontents = recoveryconfcontents;
@@ -82,21 +82,21 @@ bbstreamer_recovery_injector_new(bbstreamer *next,
* Handle each chunk of tar content while injecting recovery configuration.
*/
static void
-bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_recovery_injector *mystreamer;
+ astreamer_recovery_injector *mystreamer;
- mystreamer = (bbstreamer_recovery_injector *) streamer;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ mystreamer = (astreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/* Must copy provided data so we have the option to modify it. */
- memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+ memcpy(&mystreamer->member, member, sizeof(astreamer_member));
/*
* On v12+, skip standby.signal and edit postgresql.auto.conf; on
@@ -119,8 +119,8 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
/*
* Zap data and len because the archive header is no
- * longer valid; some subsequent bbstreamer must
- * regenerate it if it's necessary.
+ * longer valid; some subsequent astreamer must regenerate
+ * it if it's necessary.
*/
data = NULL;
len = 0;
@@ -135,26 +135,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
return;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/* Do not forward if the file is to be skipped. */
if (mystreamer->skip_file)
return;
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/* Do not forward it the file is to be skipped. */
if (mystreamer->skip_file)
return;
/* Append provided content to whatever we already sent. */
if (mystreamer->is_postgresql_auto_conf)
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ ASTREAMER_MEMBER_CONTENTS);
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
if (mystreamer->is_recovery_guc_supported)
{
/*
@@ -163,22 +163,22 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
* member now.
*/
if (!mystreamer->found_postgresql_auto_conf)
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "postgresql.auto.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
/* Inject empty standby.signal file. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "standby.signal", "", 0);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
}
else
{
/* Inject recovery.conf file with specified contents. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "recovery.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
}
/* Nothing to do here. */
@@ -189,26 +189,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
pg_fatal("unexpected state while injecting recovery settings");
}
- bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
- data, len, context);
+ astreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
}
/*
- * End-of-stream processing for this bbstreamer.
+ * End-of-stream processing for this astreamer.
*/
static void
-bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+astreamer_recovery_injector_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_recovery_injector_free(bbstreamer *streamer)
+astreamer_recovery_injector_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
@@ -216,10 +216,10 @@ bbstreamer_recovery_injector_free(bbstreamer *streamer)
* Inject a member into the archive with specified contents.
*/
void
-bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
- int len)
+astreamer_inject_file(astreamer *streamer, char *pathname, char *data,
+ int len)
{
- bbstreamer_member member;
+ astreamer_member member;
strlcpy(member.pathname, pathname, MAXPGPATH);
member.size = len;
@@ -238,12 +238,12 @@ bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
/*
* We don't know here how to generate valid member headers and trailers
* for the archiving format in use, so if those are needed, some successor
- * bbstreamer will have to generate them using the data from 'member'.
+ * astreamer will have to generate them using the data from 'member'.
*/
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_HEADER);
- bbstreamer_content(streamer, &member, data, len,
- BBSTREAMER_MEMBER_CONTENTS);
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_HEADER);
+ astreamer_content(streamer, &member, data, len,
+ ASTREAMER_MEMBER_CONTENTS);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/astreamer_lz4.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_lz4.c
rename to src/bin/pg_basebackup/astreamer_lz4.c
index f5c9e68150c..1c40d7d8ad5 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/astreamer_lz4.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_lz4.c
+ * astreamer_lz4.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_lz4.c
+ * src/bin/pg_basebackup/astreamer_lz4.c
*-------------------------------------------------------------------------
*/
@@ -17,15 +17,15 @@
#include <lz4frame.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef USE_LZ4
-typedef struct bbstreamer_lz4_frame
+typedef struct astreamer_lz4_frame
{
- bbstreamer base;
+ astreamer base;
LZ4F_compressionContext_t cctx;
LZ4F_decompressionContext_t dctx;
@@ -33,32 +33,32 @@ typedef struct bbstreamer_lz4_frame
size_t bytes_written;
bool header_written;
-} bbstreamer_lz4_frame;
+} astreamer_lz4_frame;
-static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+static void astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_compressor_finalize(astreamer *streamer);
+static void astreamer_lz4_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
- .content = bbstreamer_lz4_compressor_content,
- .finalize = bbstreamer_lz4_compressor_finalize,
- .free = bbstreamer_lz4_compressor_free
+static const astreamer_ops astreamer_lz4_compressor_ops = {
+ .content = astreamer_lz4_compressor_content,
+ .finalize = astreamer_lz4_compressor_finalize,
+ .free = astreamer_lz4_compressor_free
};
-static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+static void astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_decompressor_finalize(astreamer *streamer);
+static void astreamer_lz4_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
- .content = bbstreamer_lz4_decompressor_content,
- .finalize = bbstreamer_lz4_decompressor_finalize,
- .free = bbstreamer_lz4_decompressor_free
+static const astreamer_ops astreamer_lz4_decompressor_ops = {
+ .content = astreamer_lz4_decompressor_content,
+ .finalize = astreamer_lz4_decompressor_finalize,
+ .free = astreamer_lz4_decompressor_free
};
#endif
@@ -66,19 +66,19 @@ static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* Create a new base backup streamer that performs lz4 compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_lz4_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
LZ4F_preferences_t *prefs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_compressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -113,19 +113,19 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compr
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t out_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
/* Write header before processing the first input chunk. */
@@ -159,10 +159,10 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
out_bound = LZ4F_compressBound(len, &mystreamer->prefs);
if (avail_out < out_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ context);
/* Enlarge buffer if it falls short of out bound. */
if (mystreamer->base.bbs_buffer.maxlen < out_bound)
@@ -196,25 +196,25 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
+astreamer_lz4_compressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_out;
size_t footer_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/* Find out the footer bound and update the output buffer. */
footer_bound = LZ4F_compressBound(0, &mystreamer->prefs);
if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <
footer_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
/* Enlarge buffer if it falls short of footer bound. */
if (mystreamer->base.bbs_buffer.maxlen < footer_bound)
@@ -243,24 +243,24 @@ bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
mystreamer->bytes_written += compressed_size;
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_compressor_free(bbstreamer *streamer)
+astreamer_lz4_compressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeCompressionContext(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -271,18 +271,18 @@ bbstreamer_lz4_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of lz4
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_lz4_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_lz4_decompressor_new(astreamer *next)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -307,18 +307,18 @@ bbstreamer_lz4_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t avail_in,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
avail_in = len;
@@ -366,10 +366,10 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ context);
avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
@@ -387,34 +387,34 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer)
+astreamer_lz4_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_decompressor_free(bbstreamer *streamer)
+astreamer_lz4_decompressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeDecompressionContext(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/astreamer_tar.c
similarity index 50%
rename from src/bin/pg_basebackup/bbstreamer_tar.c
rename to src/bin/pg_basebackup/astreamer_tar.c
index 9137d17ddc1..673690cd18f 100644
--- a/src/bin/pg_basebackup/bbstreamer_tar.c
+++ b/src/bin/pg_basebackup/astreamer_tar.c
@@ -1,13 +1,13 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_tar.c
+ * astreamer_tar.c
*
* This module implements three types of tar processing. A tar parser
- * expects unlabelled chunks of data (e.g. BBSTREAMER_UNKNOWN) and splits
- * it into labelled chunks (any other value of bbstreamer_archive_context).
+ * expects unlabelled chunks of data (e.g. ASTREAMER_UNKNOWN) and splits
+ * it into labelled chunks (any other value of astreamer_archive_context).
* A tar archiver does the reverse: it takes a bunch of labelled chunks
* and produces a tarfile, optionally replacing member headers and trailers
- * so that upstream bbstreamer objects can perform surgery on the tarfile
+ * so that upstream astreamer objects can perform surgery on the tarfile
* contents without knowing the details of the tar format. A tar terminator
* just adds two blocks of NUL bytes to the end of the file, since older
* server versions produce files with this terminator omitted.
@@ -15,7 +15,7 @@
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_tar.c
+ * src/bin/pg_basebackup/astreamer_tar.c
*-------------------------------------------------------------------------
*/
@@ -23,83 +23,83 @@
#include <time.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#include "pgtar.h"
-typedef struct bbstreamer_tar_parser
+typedef struct astreamer_tar_parser
{
- bbstreamer base;
- bbstreamer_archive_context next_context;
- bbstreamer_member member;
+ astreamer base;
+ astreamer_archive_context next_context;
+ astreamer_member member;
size_t file_bytes_sent;
size_t pad_bytes_expected;
-} bbstreamer_tar_parser;
+} astreamer_tar_parser;
-typedef struct bbstreamer_tar_archiver
+typedef struct astreamer_tar_archiver
{
- bbstreamer base;
+ astreamer base;
bool rearchive_member;
-} bbstreamer_tar_archiver;
+} astreamer_tar_archiver;
-static void bbstreamer_tar_parser_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_parser_free(bbstreamer *streamer);
-static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+static void astreamer_tar_parser_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_parser_finalize(astreamer *streamer);
+static void astreamer_tar_parser_free(astreamer *streamer);
+static bool astreamer_tar_header(astreamer_tar_parser *mystreamer);
-static const bbstreamer_ops bbstreamer_tar_parser_ops = {
- .content = bbstreamer_tar_parser_content,
- .finalize = bbstreamer_tar_parser_finalize,
- .free = bbstreamer_tar_parser_free
+static const astreamer_ops astreamer_tar_parser_ops = {
+ .content = astreamer_tar_parser_content,
+ .finalize = astreamer_tar_parser_finalize,
+ .free = astreamer_tar_parser_free
};
-static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+static void astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_archiver_finalize(astreamer *streamer);
+static void astreamer_tar_archiver_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_archiver_ops = {
- .content = bbstreamer_tar_archiver_content,
- .finalize = bbstreamer_tar_archiver_finalize,
- .free = bbstreamer_tar_archiver_free
+static const astreamer_ops astreamer_tar_archiver_ops = {
+ .content = astreamer_tar_archiver_content,
+ .finalize = astreamer_tar_archiver_finalize,
+ .free = astreamer_tar_archiver_free
};
-static void bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_terminator_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_terminator_free(bbstreamer *streamer);
+static void astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_terminator_finalize(astreamer *streamer);
+static void astreamer_tar_terminator_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_terminator_ops = {
- .content = bbstreamer_tar_terminator_content,
- .finalize = bbstreamer_tar_terminator_finalize,
- .free = bbstreamer_tar_terminator_free
+static const astreamer_ops astreamer_tar_terminator_ops = {
+ .content = astreamer_tar_terminator_content,
+ .finalize = astreamer_tar_terminator_finalize,
+ .free = astreamer_tar_terminator_free
};
/*
- * Create a bbstreamer that can parse a stream of content as tar data.
+ * Create a astreamer that can parse a stream of content as tar data.
*
- * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * The input should be a series of ASTREAMER_UNKNOWN chunks; the astreamer
* specified by 'next' will receive a series of typed chunks, as per the
- * conventions described in bbstreamer.h.
+ * conventions described in astreamer.h.
*/
-bbstreamer *
-bbstreamer_tar_parser_new(bbstreamer *next)
+astreamer *
+astreamer_tar_parser_new(astreamer *next)
{
- bbstreamer_tar_parser *streamer;
+ astreamer_tar_parser *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_parser));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_parser_ops;
+ streamer = palloc0(sizeof(astreamer_tar_parser));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_parser_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
- streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ streamer->next_context = ASTREAMER_MEMBER_HEADER;
return &streamer->base;
}
@@ -108,29 +108,29 @@ bbstreamer_tar_parser_new(bbstreamer *next)
* Parse unknown content as tar data.
*/
static void
-bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_parser_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
size_t nbytes;
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
while (len > 0)
{
switch (mystreamer->next_context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/*
* If we're expecting an archive member header, accumulate a
* full block of data before doing anything further.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- TAR_BLOCK_SIZE))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
return;
/*
@@ -139,32 +139,32 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* thought was the next file header is actually the start of
* the archive trailer. Switch modes accordingly.
*/
- if (bbstreamer_tar_header(mystreamer))
+ if (astreamer_tar_header(mystreamer))
{
if (mystreamer->member.size == 0)
{
/* No content; trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Expect contents. */
- mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ mystreamer->next_context = ASTREAMER_MEMBER_CONTENTS;
}
mystreamer->base.bbs_buffer.len = 0;
mystreamer->file_bytes_sent = 0;
}
else
- mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ mystreamer->next_context = ASTREAMER_ARCHIVE_TRAILER;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/*
* Send as much content as we have, but not more than the
@@ -174,10 +174,10 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
nbytes = Min(nbytes, len);
Assert(nbytes > 0);
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, nbytes,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ ASTREAMER_MEMBER_CONTENTS);
mystreamer->file_bytes_sent += nbytes;
data += nbytes;
len -= nbytes;
@@ -193,53 +193,53 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
if (mystreamer->pad_bytes_expected == 0)
{
/* Trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Trailer is not zero-length. */
- mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ mystreamer->next_context = ASTREAMER_MEMBER_TRAILER;
}
mystreamer->base.bbs_buffer.len = 0;
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/*
* If we're expecting an archive member trailer, accumulate
* the expected number of padding bytes before sending
* anything onward.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- mystreamer->pad_bytes_expected))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
return;
/* OK, now we can send it. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, mystreamer->pad_bytes_expected,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next file header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
mystreamer->base.bbs_buffer.len = 0;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
/*
* We've seen an end-of-archive indicator, so anything more is
* buffered and sent as part of the archive trailer. But we
* don't expect more than 2 blocks.
*/
- bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ astreamer_buffer_bytes(streamer, &data, &len, len);
if (len > 2 * TAR_BLOCK_SIZE)
pg_fatal("tar file trailer exceeds 2 blocks");
return;
@@ -255,14 +255,14 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* Parse a file header within a tar stream.
*
* The return value is true if we found a file header and passed it on to the
- * next bbstreamer; it is false if we have reached the archive trailer.
+ * next astreamer; it is false if we have reached the archive trailer.
*/
static bool
-bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+astreamer_tar_header(astreamer_tar_parser *mystreamer)
{
bool has_nonzero_byte = false;
int i;
- bbstreamer_member *member = &mystreamer->member;
+ astreamer_member *member = &mystreamer->member;
char *buffer = mystreamer->base.bbs_buffer.data;
Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
@@ -304,10 +304,10 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
/* Compute number of padding bytes. */
mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
- /* Forward the entire header to the next bbstreamer. */
- bbstreamer_content(mystreamer->base.bbs_next, member,
- buffer, TAR_BLOCK_SIZE,
- BBSTREAMER_MEMBER_HEADER);
+ /* Forward the entire header to the next astreamer. */
+ astreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ ASTREAMER_MEMBER_HEADER);
return true;
}
@@ -316,50 +316,50 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
* End-of-stream processing for a tar parser.
*/
static void
-bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+astreamer_tar_parser_finalize(astreamer *streamer)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
- if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
- (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ if (mystreamer->next_context != ASTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != ASTREAMER_MEMBER_HEADER ||
mystreamer->base.bbs_buffer.len > 0))
pg_fatal("COPY stream ended before last file was finished");
/* Send the archive trailer, even if empty. */
- bbstreamer_content(streamer->bbs_next, NULL,
- streamer->bbs_buffer.data, streamer->bbs_buffer.len,
- BBSTREAMER_ARCHIVE_TRAILER);
+ astreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ ASTREAMER_ARCHIVE_TRAILER);
/* Now finalize successor. */
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar parser.
*/
static void
-bbstreamer_tar_parser_free(bbstreamer *streamer)
+astreamer_tar_parser_free(astreamer *streamer)
{
pfree(streamer->bbs_buffer.data);
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
}
/*
- * Create a bbstreamer that can generate a tar archive.
+ * Create a astreamer that can generate a tar archive.
*
* This is intended to be usable either for generating a brand-new tar archive
* or for modifying one on the fly. The input should be a series of typed
- * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
- * bbstreamer_tar_parser_content.
+ * chunks (i.e. not ASTREAMER_UNKNOWN). See also the comments for
+ * astreamer_tar_parser_content.
*/
-bbstreamer *
-bbstreamer_tar_archiver_new(bbstreamer *next)
+astreamer *
+astreamer_tar_archiver_new(astreamer *next)
{
- bbstreamer_tar_archiver *streamer;
+ astreamer_tar_archiver *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_archiver));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_archiver_ops;
+ streamer = palloc0(sizeof(astreamer_tar_archiver));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_archiver_ops;
streamer->base.bbs_next = next;
return &streamer->base;
@@ -368,36 +368,36 @@ bbstreamer_tar_archiver_new(bbstreamer *next)
/*
* Fix up the stream of input chunks to create a valid tar file.
*
- * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * If a ASTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
* newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
* passed through without change. Any other size is a fatal error (and
* indicates a bug).
*
- * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
- * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * Whenever a new ASTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding ASTREAMER_MEMBER_TRAILER chunk is also constructed from
* scratch. Specifically, we construct a block of zero bytes sufficient to
* pad out to a block boundary, as required by the tar format. Other
- * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ * ASTREAMER_MEMBER_TRAILER chunks are passed through without change.
*
- * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ * Any ASTREAMER_MEMBER_CONTENTS chunks are passed through without change.
*
- * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * The ASTREAMER_ARCHIVE_TRAILER chunk is replaced with two
* blocks of zero bytes. Not all tar programs require this, but apparently
* some do. The server does not supply this trailer. If no archive trailer is
- * present, one will be added by bbstreamer_tar_parser_finalize.
+ * present, one will be added by astreamer_tar_parser_finalize.
*/
static void
-bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ astreamer_tar_archiver *mystreamer = (astreamer_tar_archiver *) streamer;
char buffer[2 * TAR_BLOCK_SIZE];
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(context != ASTREAMER_UNKNOWN);
- if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ if (context == ASTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
{
Assert(len == 0);
@@ -411,7 +411,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Also make a note to replace padding, in case size changed. */
mystreamer->rearchive_member = true;
}
- else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ else if (context == ASTREAMER_MEMBER_TRAILER &&
mystreamer->rearchive_member)
{
int pad_bytes = tarPaddingBytesRequired(member->size);
@@ -424,7 +424,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Don't do this again unless we replace another header. */
mystreamer->rearchive_member = false;
}
- else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ else if (context == ASTREAMER_ARCHIVE_TRAILER)
{
/* Trailer should always be two blocks of zero bytes. */
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
@@ -432,40 +432,40 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
len = 2 * TAR_BLOCK_SIZE;
}
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
* End-of-stream processing for a tar archiver.
*/
static void
-bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+astreamer_tar_archiver_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar archiver.
*/
static void
-bbstreamer_tar_archiver_free(bbstreamer *streamer)
+astreamer_tar_archiver_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
/*
- * Create a bbstreamer that blindly adds two blocks of NUL bytes to the
+ * Create a astreamer that blindly adds two blocks of NUL bytes to the
* end of an incomplete tarfile that the server might send us.
*/
-bbstreamer *
-bbstreamer_tar_terminator_new(bbstreamer *next)
+astreamer *
+astreamer_tar_terminator_new(astreamer *next)
{
- bbstreamer *streamer;
+ astreamer *streamer;
- streamer = palloc0(sizeof(bbstreamer));
- *((const bbstreamer_ops **) &streamer->bbs_ops) =
- &bbstreamer_tar_terminator_ops;
+ streamer = palloc0(sizeof(astreamer));
+ *((const astreamer_ops **) &streamer->bbs_ops) =
+ &astreamer_tar_terminator_ops;
streamer->bbs_next = next;
return streamer;
@@ -475,17 +475,17 @@ bbstreamer_tar_terminator_new(bbstreamer *next)
* Pass all the content through without change.
*/
static void
-bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
/* Just forward it. */
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
@@ -493,22 +493,22 @@ bbstreamer_tar_terminator_content(bbstreamer *streamer,
* to supply.
*/
static void
-bbstreamer_tar_terminator_finalize(bbstreamer *streamer)
+astreamer_tar_terminator_finalize(astreamer *streamer)
{
char buffer[2 * TAR_BLOCK_SIZE];
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
- bbstreamer_content(streamer->bbs_next, NULL, buffer,
- 2 * TAR_BLOCK_SIZE, BBSTREAMER_UNKNOWN);
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_content(streamer->bbs_next, NULL, buffer,
+ 2 * TAR_BLOCK_SIZE, ASTREAMER_UNKNOWN);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar terminator.
*/
static void
-bbstreamer_tar_terminator_free(bbstreamer *streamer)
+astreamer_tar_terminator_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/astreamer_zstd.c
similarity index 64%
rename from src/bin/pg_basebackup/bbstreamer_zstd.c
rename to src/bin/pg_basebackup/astreamer_zstd.c
index 20f11d4450e..58dc679ef99 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/astreamer_zstd.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_zstd.c
+ * astreamer_zstd.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_zstd.c
+ * src/bin/pg_basebackup/astreamer_zstd.c
*-------------------------------------------------------------------------
*/
@@ -17,44 +17,44 @@
#include <zstd.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#ifdef USE_ZSTD
-typedef struct bbstreamer_zstd_frame
+typedef struct astreamer_zstd_frame
{
- bbstreamer base;
+ astreamer base;
ZSTD_CCtx *cctx;
ZSTD_DCtx *dctx;
ZSTD_outBuffer zstd_outBuf;
-} bbstreamer_zstd_frame;
+} astreamer_zstd_frame;
-static void bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_compressor_free(bbstreamer *streamer);
+static void astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_compressor_finalize(astreamer *streamer);
+static void astreamer_zstd_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
- .content = bbstreamer_zstd_compressor_content,
- .finalize = bbstreamer_zstd_compressor_finalize,
- .free = bbstreamer_zstd_compressor_free
+static const astreamer_ops astreamer_zstd_compressor_ops = {
+ .content = astreamer_zstd_compressor_content,
+ .finalize = astreamer_zstd_compressor_finalize,
+ .free = astreamer_zstd_compressor_free
};
-static void bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_decompressor_free(bbstreamer *streamer);
+static void astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_decompressor_finalize(astreamer *streamer);
+static void astreamer_zstd_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
- .content = bbstreamer_zstd_decompressor_content,
- .finalize = bbstreamer_zstd_decompressor_finalize,
- .free = bbstreamer_zstd_decompressor_free
+static const astreamer_ops astreamer_zstd_decompressor_ops = {
+ .content = astreamer_zstd_decompressor_content,
+ .finalize = astreamer_zstd_decompressor_finalize,
+ .free = astreamer_zstd_decompressor_free
};
#endif
@@ -62,19 +62,19 @@ static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* Create a new base backup streamer that performs zstd compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_zstd_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
size_t ret;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_compressor_ops;
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -142,12 +142,12 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *comp
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -162,10 +162,10 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -187,9 +187,9 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
+astreamer_zstd_compressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
size_t yet_to_flush;
do
@@ -204,10 +204,10 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -227,23 +227,23 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
/* Make sure to pass any remaining bytes to the next streamer. */
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_compressor_free(bbstreamer *streamer)
+astreamer_zstd_compressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeCCtx(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -254,17 +254,17 @@ bbstreamer_zstd_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of zstd
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_zstd_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_zstd_decompressor_new(astreamer *next)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -293,12 +293,12 @@ bbstreamer_zstd_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -311,10 +311,10 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->zstd_outBuf.pos >= mystreamer->zstd_outBuf.size)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -335,32 +335,32 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer)
+astreamer_zstd_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_decompressor_free(bbstreamer *streamer)
+astreamer_zstd_decompressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeDCtx(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
deleted file mode 100644
index 3b820f13b51..00000000000
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ /dev/null
@@ -1,226 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * bbstreamer.h
- *
- * Each tar archive returned by the server is passed to one or more
- * bbstreamer objects for further processing. The bbstreamer may do
- * something simple, like write the archive to a file, perhaps after
- * compressing it, but it can also do more complicated things, like
- * annotating the byte stream to indicate which parts of the data
- * correspond to tar headers or trailing padding, vs. which parts are
- * payload data. A subsequent bbstreamer may use this information to
- * make further decisions about how to process the data; for example,
- * it might choose to modify the archive contents.
- *
- * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
- *
- * IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer.h
- *-------------------------------------------------------------------------
- */
-
-#ifndef BBSTREAMER_H
-#define BBSTREAMER_H
-
-#include "common/compression.h"
-#include "lib/stringinfo.h"
-#include "pqexpbuffer.h"
-
-struct bbstreamer;
-struct bbstreamer_ops;
-typedef struct bbstreamer bbstreamer;
-typedef struct bbstreamer_ops bbstreamer_ops;
-
-/*
- * Each chunk of archive data passed to a bbstreamer is classified into one
- * of these categories. When data is first received from the remote server,
- * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
- * be of whatever size the remote server chose to send.
- *
- * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
- * chunks should be labelled as one of the other types listed here. In
- * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
- * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
- * that means a zero-length call. There can be any number of
- * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
- * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
- * last BBSTREAMER_MEMBER_TRAILER chunk.
- *
- * In theory, we could need other classifications here, such as a way of
- * indicating an archive header, but the "tar" format doesn't need anything
- * else, so for the time being there's no point.
- */
-typedef enum
-{
- BBSTREAMER_UNKNOWN,
- BBSTREAMER_MEMBER_HEADER,
- BBSTREAMER_MEMBER_CONTENTS,
- BBSTREAMER_MEMBER_TRAILER,
- BBSTREAMER_ARCHIVE_TRAILER,
-} bbstreamer_archive_context;
-
-/*
- * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
- * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
- * pass a pointer to an instance of this struct. The details are expected
- * to be present in the archive header and used to fill the struct, after
- * which all subsequent calls for the same archive member are expected to
- * pass the same details.
- */
-typedef struct
-{
- char pathname[MAXPGPATH];
- pgoff_t size;
- mode_t mode;
- uid_t uid;
- gid_t gid;
- bool is_directory;
- bool is_link;
- char linktarget[MAXPGPATH];
-} bbstreamer_member;
-
-/*
- * Generally, each type of bbstreamer will define its own struct, but the
- * first element should be 'bbstreamer base'. A bbstreamer that does not
- * require any additional private data could use this structure directly.
- *
- * bbs_ops is a pointer to the bbstreamer_ops object which contains the
- * function pointers appropriate to this type of bbstreamer.
- *
- * bbs_next is a pointer to the successor bbstreamer, for those types of
- * bbstreamer which forward data to a successor. It need not be used and
- * should be set to NULL when not relevant.
- *
- * bbs_buffer is a buffer for accumulating data for temporary storage. Each
- * type of bbstreamer makes its own decisions about whether and how to use
- * this buffer.
- */
-struct bbstreamer
-{
- const bbstreamer_ops *bbs_ops;
- bbstreamer *bbs_next;
- StringInfoData bbs_buffer;
-};
-
-/*
- * There are three callbacks for a bbstreamer. The 'content' callback is
- * called repeatedly, as described in the bbstreamer_archive_context comments.
- * Then, the 'finalize' callback is called once at the end, to give the
- * bbstreamer a chance to perform cleanup such as closing files. Finally,
- * because this code is running in a frontend environment where, as of this
- * writing, there are no memory contexts, the 'free' callback is called to
- * release memory. These callbacks should always be invoked using the static
- * inline functions defined below.
- */
-struct bbstreamer_ops
-{
- void (*content) (bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
- void (*finalize) (bbstreamer *streamer);
- void (*free) (bbstreamer *streamer);
-};
-
-/* Send some content to a bbstreamer. */
-static inline void
-bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->content(streamer, member, data, len, context);
-}
-
-/* Finalize a bbstreamer. */
-static inline void
-bbstreamer_finalize(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->finalize(streamer);
-}
-
-/* Free a bbstreamer. */
-static inline void
-bbstreamer_free(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->free(streamer);
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outside callers. It adds the amount of data specified by
- * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
- * accordingly.
- */
-static inline void
-bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
- int nbytes)
-{
- Assert(nbytes <= *len);
-
- appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
- *len -= nbytes;
- *data += nbytes;
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outsider callers. It attempts to add enough data to the
- * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
- * and '*data' accordingly. It returns true if the target length has been
- * reached and false otherwise.
- */
-static inline bool
-bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
- int target_bytes)
-{
- int buflen = streamer->bbs_buffer.len;
-
- if (buflen >= target_bytes)
- {
- /* Target length already reached; nothing to do. */
- return true;
- }
-
- if (buflen + *len < target_bytes)
- {
- /* Not enough data to reach target length; buffer all of it. */
- bbstreamer_buffer_bytes(streamer, data, len, *len);
- return false;
- }
-
- /* Buffer just enough to reach the target length. */
- bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
- return true;
-}
-
-/*
- * Functions for creating bbstreamer objects of various types. See the header
- * comments for each of these functions for details.
- */
-extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
-extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *));
-
-extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
-
-extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
- char *data, int len);
-
-#endif
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index c00acd5e118..a68dbd7837d 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,12 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'bbstreamer_file.c',
- 'bbstreamer_gzip.c',
- 'bbstreamer_inject.c',
- 'bbstreamer_lz4.c',
- 'bbstreamer_tar.c',
- 'bbstreamer_zstd.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_inject.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/bin/pg_basebackup/nls.mk b/src/bin/pg_basebackup/nls.mk
index 384dbb021e9..950b9797b1e 100644
--- a/src/bin/pg_basebackup/nls.mk
+++ b/src/bin/pg_basebackup/nls.mk
@@ -1,12 +1,12 @@
# src/bin/pg_basebackup/nls.mk
CATALOG_NAME = pg_basebackup
GETTEXT_FILES = $(FRONTEND_COMMON_GETTEXT_FILES) \
- bbstreamer_file.c \
- bbstreamer_gzip.c \
- bbstreamer_inject.c \
- bbstreamer_lz4.c \
- bbstreamer_tar.c \
- bbstreamer_zstd.c \
+ astreamer_file.c \
+ astreamer_gzip.c \
+ astreamer_inject.c \
+ astreamer_lz4.c \
+ astreamer_tar.c \
+ astreamer_zstd.c \
pg_basebackup.c \
pg_createsubscriber.c \
pg_receivewal.c \
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8f3dd04fd22..4179b064cbc 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,8 +26,8 @@
#endif
#include "access/xlog_internal.h"
+#include "astreamer.h"
#include "backup/basebackup.h"
-#include "bbstreamer.h"
#include "common/compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
@@ -57,8 +57,8 @@ typedef struct ArchiveStreamState
{
int tablespacenum;
pg_compress_specification *compress;
- bbstreamer *streamer;
- bbstreamer *manifest_inject_streamer;
+ astreamer *streamer;
+ astreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
char manifest_filename[MAXPGPATH];
FILE *manifest_file;
@@ -67,7 +67,7 @@ typedef struct ArchiveStreamState
typedef struct WriteTarState
{
int tablespacenum;
- bbstreamer *streamer;
+ astreamer *streamer;
} WriteTarState;
typedef struct WriteManifestState
@@ -199,8 +199,8 @@ static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *fo
static void progress_update_filename(const char *filename);
static void progress_report(int tablespacenum, bool force, bool finished);
-static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+static astreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress);
@@ -1053,19 +1053,19 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
* the options selected by the user. We may just write the results directly
* to a file, or we might compress first, or we might extract the tar file
* and write each member separately. This function doesn't do any of that
- * directly, but it works out what kind of bbstreamer we need to create so
+ * directly, but it works out what kind of astreamer we need to create so
* that the right stuff happens when, down the road, we actually receive
* the data.
*/
-static bbstreamer *
+static astreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress)
{
- bbstreamer *streamer = NULL;
- bbstreamer *manifest_inject_streamer = NULL;
+ astreamer *streamer = NULL;
+ astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
is_tar_gz,
@@ -1160,7 +1160,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
directory = psprintf("%s/%s", basedir, spclocation);
else
directory = get_tablespace_mapping(spclocation);
- streamer = bbstreamer_extractor_new(directory,
+ streamer = astreamer_extractor_new(directory,
get_tablespace_mapping,
progress_update_filename);
}
@@ -1188,27 +1188,27 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
if (compress->algorithm == PG_COMPRESSION_NONE)
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
else if (compress->algorithm == PG_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
- streamer = bbstreamer_gzip_writer_new(archive_filename,
+ streamer = astreamer_gzip_writer_new(archive_filename,
archive_file, compress);
}
else if (compress->algorithm == PG_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer, compress);
+ streamer = astreamer_lz4_compressor_new(streamer, compress);
}
else if (compress->algorithm == PG_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer, compress);
+ streamer = astreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1222,7 +1222,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* into it.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_archiver_new(streamer);
+ streamer = astreamer_tar_archiver_new(streamer);
progress_update_filename(archive_filename);
}
@@ -1241,7 +1241,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (spclocation == NULL && writerecoveryconf)
{
Assert(must_parse_archive);
- streamer = bbstreamer_recovery_injector_new(streamer,
+ streamer = astreamer_recovery_injector_new(streamer,
is_recovery_guc_supported,
recoveryconfcontents);
}
@@ -1253,9 +1253,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* we're talking to such a server we'll need to add the terminator here.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_parser_new(streamer);
+ streamer = astreamer_tar_parser_new(streamer);
else if (expect_unterminated_tarfile)
- streamer = bbstreamer_tar_terminator_new(streamer);
+ streamer = astreamer_tar_terminator_new(streamer);
/*
* If the user has requested a server compressed archive along with
@@ -1264,11 +1264,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (format == 'p')
{
if (is_tar_gz)
- streamer = bbstreamer_gzip_decompressor_new(streamer);
+ streamer = astreamer_gzip_decompressor_new(streamer);
else if (is_tar_lz4)
- streamer = bbstreamer_lz4_decompressor_new(streamer);
+ streamer = astreamer_lz4_decompressor_new(streamer);
else if (is_tar_zstd)
- streamer = bbstreamer_zstd_decompressor_new(streamer);
+ streamer = astreamer_zstd_decompressor_new(streamer);
}
/* Return the results. */
@@ -1307,7 +1307,7 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
if (state.manifest_inject_streamer != NULL &&
state.manifest_buffer != NULL)
{
- bbstreamer_inject_file(state.manifest_inject_streamer,
+ astreamer_inject_file(state.manifest_inject_streamer,
"backup_manifest",
state.manifest_buffer->data,
state.manifest_buffer->len);
@@ -1318,8 +1318,8 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
/* If there's still an archive in progress, end processing. */
if (state.streamer != NULL)
{
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
state.streamer = NULL;
}
}
@@ -1383,8 +1383,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
/* End processing of any prior archive. */
if (state->streamer != NULL)
{
- bbstreamer_finalize(state->streamer);
- bbstreamer_free(state->streamer);
+ astreamer_finalize(state->streamer);
+ astreamer_free(state->streamer);
state->streamer = NULL;
}
@@ -1437,8 +1437,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
else if (state->streamer != NULL)
{
/* Archive data. */
- bbstreamer_content(state->streamer, NULL, copybuf + 1,
- r - 1, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, ASTREAMER_UNKNOWN);
}
else
pg_fatal("unexpected payload data");
@@ -1600,7 +1600,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum, pg_compress_specification *compress)
{
WriteTarState state;
- bbstreamer *manifest_inject_streamer;
+ astreamer *manifest_inject_streamer;
bool is_recovery_guc_supported;
bool expect_unterminated_tarfile;
@@ -1636,7 +1636,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
pg_fatal("out of memory");
/* Inject it into the output tarfile. */
- bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ astreamer_inject_file(manifest_inject_streamer, "backup_manifest",
buf.data, buf.len);
/* Free memory. */
@@ -1644,8 +1644,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
}
/* Cleanup. */
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
progress_report(tablespacenum, true, false);
@@ -1663,7 +1663,7 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf, r, ASTREAMER_UNKNOWN);
totaldone += r;
progress_report(state->tablespacenum, false, false);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3deb6113b80..f59f7acdb7e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3309,19 +3309,19 @@ bbsink_shell
bbsink_state
bbsink_throttle
bbsink_zstd
-bbstreamer
-bbstreamer_archive_context
-bbstreamer_extractor
-bbstreamer_gzip_decompressor
-bbstreamer_gzip_writer
-bbstreamer_lz4_frame
-bbstreamer_member
-bbstreamer_ops
-bbstreamer_plain_writer
-bbstreamer_recovery_injector
-bbstreamer_tar_archiver
-bbstreamer_tar_parser
-bbstreamer_zstd_frame
+astreamer
+astreamer_archive_context
+astreamer_extractor
+astreamer_gzip_decompressor
+astreamer_gzip_writer
+astreamer_lz4_frame
+astreamer_member
+astreamer_ops
+astreamer_plain_writer
+astreamer_recovery_injector
+astreamer_tar_archiver
+astreamer_tar_parser
+astreamer_zstd_frame
bgworker_main_type
bh_node_type
binaryheap
--
2.18.0
On Wed, Jul 31, 2024 at 9:28 AM Amul Sul <sulamul@gmail.com> wrote:
Fixed -- I did that because it was part of a separate group in pg_basebackup.
Well, that's because pg_basebackup builds multiple executables, and
these files needed to be linked with some but not others. It looks
like when Andres added meson support, instead of linking each object
file into the binaries that need it, he had it just build a static
library and link every executable to that. That puts the linker in
charge of sorting out which binaries need which files, instead of
having the makefile do it. In any case, this consideration doesn't
apply when we're putting the object files into a library, so there was
no need to preserve the separate makefile variable. I think this looks
good now.
Fixed -- frontend_common_code now includes lz4 as well.
Cool. 0003 overall looks good to me now, unless Andres wants to object.
Noted. I might give it a try another day, unless someone else beats
me, perhaps in a separate thread.
Probably not too important, since nobody has complained.
Done -- added a new patch as 0004, and the subsequent patch numbers
have been incremented accordingly.
I think I would have made this pass context->show_progress to
progress_report() instead of the whole verifier_context, but that's an
arguable stylistic choice, so I'll defer to you if you prefer it the
way you have it. Other than that, this LGTM.
However, what is now 0005 does something rather evil. The commit
message claims that it's just rearranging code, and that's almost
entirely true, except that it also changes manifest_file's pathname
member to be char * instead of const char *. I do not think that is a
good idea, and I definitely do not think you should do it in a patch
that purports to just be doing code movement, and I even more
definitely think that you should not do it without even mentioning
that you did it, and why you did it.
Fixed -- I did the NULL check in the earlier 0007 patch, but it should
have been done in this patch.
This is now 0006. struct stat's st_size is of type off_t -- or maybe
ssize_t on some platforms? - not type size_t. I suggest making the
filesize argument use int64 as we do in some other places. size_t is,
I believe, defined to be the right width to hold the size of an object
in memory, not the size of a file on disk, so it isn't really relevant
here.
Other than that, my only comment on this patch is that I think I would
find it more natural to write the check in verify_backup_file() in a
different order: I'd put context->manifest->version != 1 && m != NULL
&& m->matched && !m->bad && strcmp() because (a) that way the most
expensive test is last and (b) it feels weird to think about whether
we have the right pathname if we don't even have a valid manifest
entry. But this is minor and just a stylistic preference, so it's also
OK as you have it if you prefer.
I agree, changing the order of errors could create confusion.
Previously, a file size mismatch was a clear and appropriate error
that was reported before the checksum failure error.
In my opinion, this patch (currently 0007) creates a rather confusing
situation that I can't easily reason about. Post-patch,
verify_content_checksum() is a mix of two different things: it ends up
containing all of the logic that needs to be performed on every chunk
of bytes read from the file plus some but not all of the end-of-file
error-checks from verify_file_checksum(). That's really weird. I'm not
very convinced that the test for whether we've reached the end of the
file is 100% correct, but even if it is, the stuff before that point
is stuff that is supposed to happen many times and the stuff after
that is only supposed to happen once, and I don't see any good reason
to smush those two categories of things into a single function. Plus,
changing the order in which those end-of-file checks happen doesn't
seem like the right idea either: the current ordering is good the way
it is. Maybe you want to think of refactoring to create TWO new
functions, one to do the per-hunk work and a second to do the
end-of-file "is the checksum OK?" stuff, or maybe you can just open
code it, but I'm not willing to commit this the way it is.
Regarding 0008, I don't really see a reason why the m != NULL
shouldn't also move inside should_verify_control_data(). Yeah, the
caller added in 0010 might not need the check, but it won't really
cost anything. Also, it seems to me that the logic in 0010 is actually
wrong. If m == NULL, we'll keep the values of verifyChecksums and
verifyControlData from the previous iteration, whereas what we should
do is make them both false. How about removing the if m == NULL guard
here and making both should_verify_checksum() and
should_verify_control_data() test m != NULL internally? Then it all
works out nicely, I think. Or alternatively you need an else clause
that resets both values to false when m == NULL.
Okay, I added the verify_checksums() and verify_controldata()
functions to the astreamer_verify.c file. I also updated related
variables that were clashing with these function names:
verify_checksums has been renamed to verifyChecksums, and verify_sysid
has been renamed to verifyControlData.
Maybe think of doing something with the ASTREAMER_MEMBER_HEADER case also.
Out of time for today, will look again soon. I think the first few of
these are probably pretty much ready for commit already, and with a
little more adjustment they'll probably be ready up through about
0006.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2024-07-31 16:07:03 -0400, Robert Haas wrote:
On Wed, Jul 31, 2024 at 9:28 AM Amul Sul <sulamul@gmail.com> wrote:
Fixed -- I did that because it was part of a separate group in pg_basebackup.
Well, that's because pg_basebackup builds multiple executables, and
these files needed to be linked with some but not others. It looks
like when Andres added meson support, instead of linking each object
file into the binaries that need it, he had it just build a static
library and link every executable to that. That puts the linker in
charge of sorting out which binaries need which files, instead of
having the makefile do it.
Right. Meson supports using the same file with different compilation flags,
depending on the context its used (i.e. as part of an executable or a shared
library). But that also ends up compiling files multiple times when using the
same file in multiple binaries. Which wasn't desirable here -> hence moving it
to a static lib.
Fixed -- frontend_common_code now includes lz4 as well.
Cool. 0003 overall looks good to me now, unless Andres wants to object.
Nope.
Greetings,
Andres Freund
On Thu, Aug 1, 2024 at 1:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 31, 2024 at 9:28 AM Amul Sul <sulamul@gmail.com> wrote:
Fixed -- I did that because it was part of a separate group in pg_basebackup.
Well, that's because pg_basebackup builds multiple executables, and
these files needed to be linked with some but not others. It looks
like when Andres added meson support, instead of linking each object
file into the binaries that need it, he had it just build a static
library and link every executable to that. That puts the linker in
charge of sorting out which binaries need which files, instead of
having the makefile do it. In any case, this consideration doesn't
apply when we're putting the object files into a library, so there was
no need to preserve the separate makefile variable. I think this looks
good now.
Understood.
Fixed -- frontend_common_code now includes lz4 as well.
Cool. 0003 overall looks good to me now, unless Andres wants to object.
Noted. I might give it a try another day, unless someone else beats
me, perhaps in a separate thread.Probably not too important, since nobody has complained.
Done -- added a new patch as 0004, and the subsequent patch numbers
have been incremented accordingly.I think I would have made this pass context->show_progress to
progress_report() instead of the whole verifier_context, but that's an
arguable stylistic choice, so I'll defer to you if you prefer it the
way you have it. Other than that, this LGTM.
Additionally, I moved total_size and done_size to verifier_context
because done_size needs to be accessed in astreamer_verify.c.
With this change, verifier_context is now more suitable.
However, what is now 0005 does something rather evil. The commit
message claims that it's just rearranging code, and that's almost
entirely true, except that it also changes manifest_file's pathname
member to be char * instead of const char *. I do not think that is a
good idea, and I definitely do not think you should do it in a patch
that purports to just be doing code movement, and I even more
definitely think that you should not do it without even mentioning
that you did it, and why you did it.
True, that was a mistake on my part during the rebase. Fixed in the
attached version.
Fixed -- I did the NULL check in the earlier 0007 patch, but it should
have been done in this patch.This is now 0006. struct stat's st_size is of type off_t -- or maybe
ssize_t on some platforms? - not type size_t. I suggest making the
filesize argument use int64 as we do in some other places. size_t is,
I believe, defined to be the right width to hold the size of an object
in memory, not the size of a file on disk, so it isn't really relevant
here.
Ok, used int64.
Other than that, my only comment on this patch is that I think I would
find it more natural to write the check in verify_backup_file() in a
different order: I'd put context->manifest->version != 1 && m != NULL
&& m->matched && !m->bad && strcmp() because (a) that way the most
expensive test is last and (b) it feels weird to think about whether
we have the right pathname if we don't even have a valid manifest
entry. But this is minor and just a stylistic preference, so it's also
OK as you have it if you prefer.
I used to do it that way (a) -- keeping the expensive check for last.
I did the same thing while adding should_verify_control_data() in the
later patch. Somehow, I missed it here, maybe I didn't pay enough
attention to this patch :(
I agree, changing the order of errors could create confusion.
Previously, a file size mismatch was a clear and appropriate error
that was reported before the checksum failure error.In my opinion, this patch (currently 0007) creates a rather confusing
situation that I can't easily reason about. Post-patch,
verify_content_checksum() is a mix of two different things: it ends up
containing all of the logic that needs to be performed on every chunk
of bytes read from the file plus some but not all of the end-of-file
error-checks from verify_file_checksum(). That's really weird. I'm not
very convinced that the test for whether we've reached the end of the
file is 100% correct, but even if it is, the stuff before that point
is stuff that is supposed to happen many times and the stuff after
that is only supposed to happen once, and I don't see any good reason
to smush those two categories of things into a single function. Plus,
changing the order in which those end-of-file checks happen doesn't
seem like the right idea either: the current ordering is good the way
it is. Maybe you want to think of refactoring to create TWO new
functions, one to do the per-hunk work and a second to do the
end-of-file "is the checksum OK?" stuff, or maybe you can just open
code it, but I'm not willing to commit this the way it is.
Understood. At the start of working on the v3 review, I thought of
completely discarding the 0007 patch and copying most of
verify_file_checksum() to a new function in astreamer_verify.c.
However, I later realized we could deduplicate some parts, so I split
verify_file_checksum() and moved the reusable part to a separate
function. Please have a look at v4-0007.
Regarding 0008, I don't really see a reason why the m != NULL
shouldn't also move inside should_verify_control_data(). Yeah, the
caller added in 0010 might not need the check, but it won't really
cost anything. Also, it seems to me that the logic in 0010 is actually
wrong. If m == NULL, we'll keep the values of verifyChecksums and
verifyControlData from the previous iteration, whereas what we should
do is make them both false. How about removing the if m == NULL guard
here and making both should_verify_checksum() and
should_verify_control_data() test m != NULL internally? Then it all
works out nicely, I think. Or alternatively you need an else clause
that resets both values to false when m == NULL.
I had the same thought about checking for NULL inside
should_verify_control_data(), but I wanted to maintain the structure
similar to should_verify_checksum(). Making this change would have
also required altering should_verify_checksum(), I wasn’t sure if I
should make that change before. Now, I did that in the attached
version -- 0008 patch.
Okay, I added the verify_checksums() and verify_controldata()
functions to the astreamer_verify.c file. I also updated related
variables that were clashing with these function names:
verify_checksums has been renamed to verifyChecksums, and verify_sysid
has been renamed to verifyControlData.Maybe think of doing something with the ASTREAMER_MEMBER_HEADER case also.
Done.
Out of time for today, will look again soon. I think the first few of
these are probably pretty much ready for commit already, and with a
little more adjustment they'll probably be ready up through about
0006.
Sure, thank you.
Regards,
Amul
Attachments:
v4-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v4-0008-Refactor-split-verify_control_file.patchDownload
From 335d17b88f26235a71ec4cb3b79cbc7b7b4bf22c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 15:47:26 +0530
Subject: [PATCH v4 08/11] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Note that should_verify_checksum() has been slightly modified to
include a NULL check for its argument, maintaining the same code
structure as should_verify_control_data().
---
src/bin/pg_verifybackup/pg_verifybackup.c | 44 +++++++++++------------
src/bin/pg_verifybackup/pg_verifybackup.h | 18 +++++++++-
2 files changed, 37 insertions(+), 25 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 9479b439bd4..57cb037bb76 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -617,14 +614,20 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the manifest system identifier */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
}
/*
@@ -673,18 +676,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -700,9 +699,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 1bc5f7a6b4a..c88f71ff14b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
#include "common/controldata_utils.h"
#include "common/hashfn_unstable.h"
+#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
@@ -44,7 +45,19 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1. This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
/*
* Define a hash table which we can use to store information about the files
@@ -110,6 +123,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
uint8 *checksumbuf);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v4-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v4-0007-Refactor-split-verify_file_checksum-function.patchDownload
From 816e29e50120b40729377336229714dc0462ec7c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 16:45:55 +0530
Subject: [PATCH v4 07/11] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 18 ++++++++++++++++--
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index b8db83ed2cd..9479b439bd4 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -779,7 +779,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int rc;
size_t bytes_read = 0;
uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -845,8 +844,23 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
return;
}
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, checksumbuf);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, uint8 *checksumbuf)
+{
+ int checksumlen;
+ const char *relpath = m->pathname;
+
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 98c75916255..1bc5f7a6b4a 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -107,6 +107,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ uint8 *checksumbuf);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v4-0009-pg_verifybackup-Add-backup-format-and-compression.patchapplication/x-patch; name=v4-0009-pg_verifybackup-Add-backup-format-and-compression.patchDownload
From 694a28770bfc2dfb90b14442be081bdbc54d4987 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v4 09/11] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 143 +++++++++++++++++++++-
1 file changed, 141 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 57cb037bb76..2ca6b51741f 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
#include <sys/stat.h>
#include <time.h>
+#include "common/compression.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -56,6 +57,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
+static pg_compress_algorithm find_backup_compression(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
@@ -73,6 +76,9 @@ static void usage(void);
static const char *progname;
+char format = '\0'; /* p(lain)/t(ar) */
+pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
+
/*
* Main entry point.
*/
@@ -83,11 +89,13 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
{"wal-directory", required_argument, NULL, 'w'},
+ {"compress", required_argument, NULL, 'Z'},
{NULL, 0, NULL, 0}
};
@@ -98,6 +106,7 @@ main(int argc, char **argv)
bool quiet = false;
char *wal_directory = NULL;
char *pg_waldump_path = NULL;
+ bool tar_compression_specified = false;
pg_logging_init(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verifybackup"));
@@ -140,7 +149,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:Z:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -159,6 +168,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -175,6 +193,12 @@ main(int argc, char **argv)
wal_directory = pstrdup(optarg);
canonicalize_path(wal_directory);
break;
+ case 'Z':
+ if (!parse_compress_algorithm(optarg, &compress_algorithm))
+ pg_fatal("unrecognized compression algorithm: \"%s\"",
+ optarg);
+ tar_compression_specified = true;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -206,11 +230,41 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Complain if compression method specified but the format isn't tar. */
+ if (format != 't' && tar_compression_specified)
+ {
+ pg_log_error("only tar mode backups can be compressed");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Determine the backup format if it hasn't been specified. */
+ if (format == '\0')
+ format = find_backup_format(&context);
+
+ /*
+ * Determine the tar backup compression method if it hasn't been
+ * specified.
+ */
+ if (format == 't' && !tar_compression_specified)
+ compress_algorithm = find_backup_compression(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (format != 'p')
+ {
+ pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+ pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -265,8 +319,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!context.skip_checksums)
+ if (!context.skip_checksums && format == 'p')
verify_backup_checksums(&context);
/*
@@ -1013,6 +1072,84 @@ progress_report(verifier_context *context, bool finished)
fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ result = (stat(path, &sb) == 0) ? 'p' : 't';
+ pfree(path);
+
+ return result;
+}
+
+/*
+ * To determine the compression format, we will search for the main data
+ * directory archive and its extension, which starts with base.tar, as
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ */
+static pg_compress_algorithm
+find_backup_compression(verifier_context *context)
+{
+ char *path;
+ struct stat sb;
+ bool found;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * Is this a tar archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_NONE;
+
+ /*
+ * Is this a .tar.gz archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.gz");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_GZIP;
+
+ /*
+ * Is this a .tar.lz4 archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.lz4");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_LZ4;
+
+ /*
+ * Is this a .tar.zst archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.zst");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_ZSTD;
+
+ return PG_COMPRESSION_NONE; /* placate compiler */
+}
+
/*
* Print out usage information and exit.
*/
@@ -1025,11 +1162,13 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -Z, --compress=METHOD compress method (gzip, lz4, zstd, none) \n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
--
2.18.0
v4-0010-pg_verifybackup-Read-tar-files-and-verify-its-con.patchapplication/x-patch; name=v4-0010-pg_verifybackup-Read-tar-files-and-verify-its-con.patchDownload
From f3f176fe4f0879e9295b07546b9449471d7e4030 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v4 10/11] pg_verifybackup: Read tar files and verify its
contents
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 342 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 216 ++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 9 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 569 insertions(+), 9 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..55d352e1754
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,342 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/astreamer.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archiveName;
+ Oid tblspcOid;
+
+ /*
+ * Hold information for a member file verification that needs to be reset at
+ * the end of each file.
+ */
+ manifest_file *mfile;
+ int64 receivedBytes;
+ bool verifyChecksums;
+ bool verifyControlData;
+ pg_checksum_context *checksum_ctx;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void verify_member_file(astreamer *streamer, astreamer_member *member);
+static void verify_content_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *buffer, int buffer_len);
+static void verify_controldata(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void reset_member_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archiveName = archive_name;
+ streamer->tblspcOid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ return &streamer->base;
+}
+
+/*
+ * It verifies each TAR member entry against the manifest data and performs
+ * checksum verification if enabled. Additionally, it validates the backup's
+ * system identifier against the backup_manifest.
+ */
+static void
+astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ if (!member->is_directory && !member->is_link &&
+ !should_ignore_relpath(mystreamer->context, member->pathname))
+ verify_member_file(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Perform checksum verification as the file content becomes
+ * available, since the TAR format does not have random access to
+ * files like a normal backup directory, where checksum
+ * verification occurs at different points.
+ */
+ if (mystreamer->verifyChecksums)
+ verify_content_checksum(streamer, member, data, len);
+
+ /* Verify pg_control file information */
+ if (mystreamer->verifyControlData)
+ verify_controldata(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ reset_member_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verifies the member file entry against the backup manifest. If the archive
+ * being processed is a tablespace, the required file path is prepared for
+ * subsequent operations.
+ */
+static void
+verify_member_file(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+
+ /*
+ * The backup_manifest stores a relative path to the base
+ * directory for files belong tablespace, whereas
+ * <tablespaceoid>.tar doesn't. Prepare the required path,
+ * otherwise, the manfiest entry verification will fail.
+ */
+ if (OidIsValid(mystreamer->tblspcOid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspcOid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and manifest system identifier
+ * verification.
+ *
+ * We could have these checks while receiving contents.
+ * However, since contents are received in multiple
+ * iterations, this would result in these lengthy checks being
+ * performed multiple times. Instead, having a single flag
+ * would be more efficient.
+ */
+ mystreamer->verifyChecksums =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verifyControlData =
+ should_verify_control_data(mystreamer->context->manifest, m);
+}
+
+/*
+ * Similar to verify_file_checksum() in terms of implementation and code, which
+ * computes the checksum for a single file, this function computes the checksum
+ * incrementally for the received file content. On the first call for a file, it
+ * initializes checksum_ctx, which will be used in subsequent calls. Once the
+ * complete file content is received, tracked using the receivedBytes parameter,
+ * it verifies the checksum against the manifest data.
+ */
+static void
+verify_content_checksum(astreamer *streamer, astreamer_member *member,
+ const char *buffer, int buffer_len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ verifier_context *context = mystreamer->context;
+ manifest_file *m = mystreamer->mfile;
+ const char *relpath = m->pathname;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
+ /*
+ * Mark it false to avoid unexpected re-entrance for the same file content
+ * (e.g. returned in error should not be revisited).
+ */
+ Assert(mystreamer->verifyChecksums);
+ mystreamer->verifyChecksums = false;
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, relpath) == 0);
+
+ /* If we were first time for this file */
+ if (!checksum_ctx)
+ {
+ checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+ mystreamer->checksum_ctx = checksum_ctx;
+
+ if (pg_checksum_init(checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archiveName, relpath);
+ return;
+ }
+ }
+
+ /* Update the total count of computed checksum bytes. */
+ mystreamer->receivedBytes += buffer_len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) buffer, buffer_len) < 0)
+ {
+ report_backup_error(context, "could not update checksum of file \"%s\"",
+ relpath);
+ return;
+ }
+
+ /* Report progress */
+ context->done_size += buffer_len;
+ progress_report(context, false);
+
+ /* Yet to receive the full content of the file. */
+ if (mystreamer->receivedBytes < m->size)
+ {
+ mystreamer->verifyChecksums = true;
+ return;
+ }
+
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, checksum_ctx, checksumbuf);
+}
+
+/*
+ * Prepare the control data from the received file contents, which are supposed
+ * to be from the pg_control file, including CRC calculation. Then, call the
+ * routines that perform the final verification of the control file information.
+ */
+static void
+verify_controldata(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(manifest->version != 1);
+
+ /* Mark it as false to avoid unexpected re-entrance */
+ Assert(mystreamer->verifyControlData);
+ mystreamer->verifyControlData = false;
+
+ /* Should have whole control file data. */
+ if (!astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData)))
+ {
+ mystreamer->verifyControlData = true;
+ return;
+ }
+
+ pg_log_debug("%s: reading \"%s\"", mystreamer->archiveName,
+ member->pathname);
+
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: could not read control file: read %d of %zu",
+ mystreamer->archiveName, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data,
+ sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ (char *) (&control_file),
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(&control_file, member->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+reset_member_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->receivedBytes = 0;
+ mystreamer->verifyChecksums = false;
+ mystreamer->verifyControlData = false;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+ mystreamer->checksum_ctx = NULL;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 2ca6b51741f..39f2e524a3e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,12 @@
#include "common/compression.h"
#include "common/parse_manifest.h"
+#include "common/relpath.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
#include "pg_verifybackup.h"
+#include "pgtar.h"
#include "pgtime.h"
/*
@@ -39,6 +42,11 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
+
+static void (*verify_backup_file_cb) (verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -63,6 +71,15 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
+static void verify_plain_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_content(verifier_context *context,
+ char *relpath, char *fullpath,
+ astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -71,6 +88,9 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
static void usage(void);
@@ -145,6 +165,10 @@ main(int argc, char **argv)
*/
simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -249,6 +273,15 @@ main(int argc, char **argv)
if (format == 't' && !tar_compression_specified)
compress_algorithm = find_backup_compression(&context);
+ /*
+ * Setup the required callback function to verify plain or tar backup
+ * files.
+ */
+ if (format == 'p')
+ verify_backup_file_cb = verify_plain_file_cb;
+ else
+ verify_backup_file_cb = verify_tar_file_cb;
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
@@ -628,7 +661,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
}
/*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
*
* The arguments to this function have the same meaning as the arguments to
* verify_backup_directory.
@@ -637,7 +671,6 @@ static void
verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
{
struct stat sb;
- manifest_file *m;
if (stat(fullpath, &sb) != 0)
{
@@ -670,8 +703,25 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Do the further verifications */
+ verify_backup_file_cb(context, relpath, fullpath, sb.st_size);
+}
+
+/*
+ * Verify one plan file or a symlink.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_plain_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ manifest_file *m;
+
/* Check the backup manifest entry for this file. */
- m = verify_manifest_entry(context, relpath, sb.st_size);
+ m = verify_manifest_entry(context, relpath, filesize);
/* Validate the manifest system identifier */
if (should_verify_control_data(context->manifest, m))
@@ -689,6 +739,124 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
}
}
+/*
+ * Verify one tar file.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_tar_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ astreamer *streamer;
+ Oid tblspc_oid = InvalidOid;
+ int file_name_len;
+ int file_extn_len = 0; /* placate compiler */
+ char *file_extn = "";
+
+ /* Should be tar backup */
+ Assert(format == 't');
+
+ /* Find the tar file extension. */
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ {
+ file_extn = ".tar";
+ file_extn_len = 4;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_GZIP)
+ {
+ file_extn = ".tar.gz";
+ file_extn_len = 7;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ {
+ file_extn = ".tar.lz4";
+ file_extn_len = 8;
+ }
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ {
+ file_extn = ".tar.zst";
+ file_extn_len = 8;
+ }
+
+ /*
+ * Ensure that we have the correct file type corresponding to the backup
+ * format.
+ */
+ file_name_len = strlen(relpath);
+ if (file_name_len < file_extn_len ||
+ strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0)
+ {
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting tar file",
+ relpath);
+ else
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting \"%s\" compressed tar file",
+ relpath,
+ get_compress_algorithm_name(compress_algorithm));
+ return;
+ }
+
+ /*
+ * For the tablespace, pg_basebackup writes the data out to
+ * <tablespaceoid>.tar. If a file matches that format, then extract the
+ * tablespaceoid, which we need to prepare the paths of the files
+ * belonging to that tablespace relative to the base directory.
+ */
+ if (strspn(relpath, "0123456789") == (file_name_len - file_extn_len))
+ tblspc_oid = strtoi64(relpath, NULL, 10);
+
+ streamer = create_archive_verifier(context, relpath, tblspc_oid);
+ verify_tar_content(context, relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+}
+
+/*
+ * Reads a given tar file in predefined chunks and pass to astreamer. Which
+ * initiates routines for decompression (if necessary) then verification
+ * of each member within the tar archive.
+ */
+static void
+verify_tar_content(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1096,10 +1264,10 @@ find_backup_format(verifier_context *context)
}
/*
- * To determine the compression format, we will search for the main data
- * directory archive and its extension, which starts with base.tar, as
* pg_basebackup writes the main data directory to an archive file named
- * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ * base.tar, followed by a compression type extension such as .gz, .lz4, or
+ * .zst. To determine the compression format, we need to search for this main
+ * data directory archive file.
*/
static pg_compress_algorithm
find_backup_compression(verifier_context *context)
@@ -1150,6 +1318,42 @@ find_backup_compression(verifier_context *context)
return PG_COMPRESSION_NONE; /* placate compiler */
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algorithm == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print out usage information and exit.
*/
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index c88f71ff14b..f0a7c8918fb 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -137,4 +137,13 @@ extern bool should_ignore_relpath(verifier_context *context,
extern void progress_report(verifier_context *context, bool finished);
+/* Forward declarations to avoid fe_utils/astreamer.h include. */
+struct astreamer;
+typedef struct astreamer astreamer;
+
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ba9e0200b3f..8c708b02cc2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3323,6 +3323,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
--
2.18.0
v4-0011-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v4-0011-pg_verifybackup-Tests-and-document.patchDownload
From b531a41a2e53617c5ec539951218ac7f0e45233b Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 17:04:56 +0530
Subject: [PATCH v4 11/11] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 54 +++++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 18 ++++-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 96 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..c743bd89a92 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. The main data directory's contents
+ will be written to a file named <filename>base.tar</filename>,
+ and each other tablespace will be written to a separate tar file
+ named after that tablespace's OID.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
@@ -227,6 +265,18 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-Z <replaceable class="parameter">method</replaceable></option></term>
+ <term><option>--compress=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ The tar backup compression method can be <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> if no compression.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..d47ce1f04fc 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,13 +17,25 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
+command_fails_like(
+ [ 'pg_verifybackup', '-Zgzip', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Zlz4', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required tar format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Znon_exist', $tempdir ],
+ qr/unrecognized compression algorithm/,
+ 'compression method should be valid');
# create fake manifest file
open(my $fh, '>', "$tempdir/backup_manifest") || die "open: $!";
@@ -31,7 +43,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v4-0006-Refactor-split-verify_backup_file-function.patchapplication/x-patch; name=v4-0006-Refactor-split-verify_backup_file-function.patchDownload
From b5f845c39be20f6c2c820e3a053980fa1ca45919 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:15:26 +0530
Subject: [PATCH v4 06/11] Refactor: split verify_backup_file() function.
Separate the manifest entry verification code into a new function.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 39 ++++++++++++++++-------
src/bin/pg_verifybackup/pg_verifybackup.h | 3 ++
2 files changed, 30 insertions(+), 12 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 4e42757c346..b8db83ed2cd 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -614,6 +614,27 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -621,29 +642,21 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
/* Update statistics for progress report, if necessary */
if (context->show_progress && !context->skip_checksums &&
should_verify_checksum(m))
@@ -655,6 +668,8 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -817,7 +832,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 90900048547..98c75916255 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -105,6 +105,9 @@ typedef struct verifier_context
uint64 done_size;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
v4-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchapplication/x-patch; name=v4-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchDownload
From d6e791de02b2d5e32059326952ecbb7c440bcf99 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:10:34 +0530
Subject: [PATCH v4 05/11] Refactor: move some part of pg_verifybackup.c to
pg_verifybackup.h
---
src/bin/pg_verifybackup/pg_verifybackup.c | 102 +------------------
src/bin/pg_verifybackup/pg_verifybackup.h | 118 ++++++++++++++++++++++
2 files changed, 123 insertions(+), 97 deletions(-)
create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 71585ffc50e..4e42757c346 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "pg_verifybackup.h"
#include "pgtime.h"
/*
@@ -40,89 +39,6 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE (128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
- uint32 status; /* hash status */
- const char *pathname;
- size_t size;
- pg_checksum_type checksum_type;
- int checksum_length;
- uint8 *checksum_payload;
- bool matched;
- bool bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX manifest_files
-#define SH_ELEMENT_TYPE manifest_file
-#define SH_KEY_TYPE const char *
-#define SH_KEY pathname
-#define SH_HASH_KEY(tb, key) hash_string(key)
-#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
-#define SH_SCOPE static inline
-#define SH_RAW_ALLOCATOR pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
- TimeLineID tli;
- XLogRecPtr start_lsn;
- XLogRecPtr end_lsn;
- struct manifest_wal_range *next;
- struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
- int version;
- uint64 system_identifier;
- manifest_files_hash *files;
- manifest_wal_range *first_wal_range;
- manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
- manifest_data *manifest;
- char *backup_directory;
- SimpleStringList ignore_list;
- bool skip_checksums;
- bool exit_on_error;
- bool saw_any_error;
-
- /* Progress indicators */
- bool show_progress;
- uint64 total_size;
- uint64 done_size;
-} verifier_context;
-
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -156,14 +72,6 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
-static void report_backup_error(verifier_context *context,
- const char *pg_restrict fmt,...)
- pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
- pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
-static void progress_report(verifier_context *context, bool finished);
static void usage(void);
static const char *progname;
@@ -978,7 +886,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
* Update the context to indicate that we saw an error, and exit if the
* context says we should.
*/
-static void
+void
report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
{
va_list ap;
@@ -995,7 +903,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
/*
* Report a fatal error and exit
*/
-static void
+void
report_fatal_error(const char *pg_restrict fmt,...)
{
va_list ap;
@@ -1014,7 +922,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
* Note that by "prefix" we mean a parent directory; for this purpose,
* "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
*/
-static bool
+bool
should_ignore_relpath(verifier_context *context, const char *relpath)
{
SimpleStringListCell *cell;
@@ -1043,7 +951,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
* If finished is set to true, this is the last progress report. The cursor
* is moved to the next line.
*/
-static void
+void
progress_report(verifier_context *context, bool finished)
{
static pg_time_t last_progress_report = 0;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..90900048547
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,118 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ * Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+extern bool show_progress;
+extern bool skip_checksums;
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+ uint32 status; /* hash status */
+ const char *pathname;
+ size_t size;
+ pg_checksum_type checksum_type;
+ int checksum_length;
+ uint8 *checksum_payload;
+ bool matched;
+ bool bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+ (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX manifest_files
+#define SH_ELEMENT_TYPE manifest_file
+#define SH_KEY_TYPE const char *
+#define SH_KEY pathname
+#define SH_HASH_KEY(tb, key) hash_string(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ struct manifest_wal_range *next;
+ struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+ int version;
+ uint64 system_identifier;
+ manifest_files_hash *files;
+ manifest_wal_range *first_wal_range;
+ manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+ manifest_data *manifest;
+ char *backup_directory;
+ SimpleStringList ignore_list;
+ bool skip_checksums;
+ bool exit_on_error;
+ bool saw_any_error;
+
+ /* Progress indicators */
+ bool show_progress;
+ uint64 total_size;
+ uint64 done_size;
+} verifier_context;
+
+extern void report_backup_error(verifier_context *context,
+ const char *pg_restrict fmt,...)
+ pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+ pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context,
+ const char *relpath);
+
+extern void progress_report(verifier_context *context, bool finished);
+
+#endif /* PG_VERIFYBACKUP_H */
--
2.18.0
v4-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchapplication/x-patch; name=v4-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchDownload
From 59cbaf1aa862fc0969b1efdd2b1702f7530814c1 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 11:20:52 +0530
Subject: [PATCH v4 03/11] Refactor: move astreamer* files to fe_utils to make
common availability of it.
To make it accessible to other code, we need to move the ASTREAMER
code (previously known as BBSTREAMER) to a common location. The
appropriate place would be src/fe_utils, as it is a frontend
infrastructure intended for shared use.
---
meson.build | 2 +-
src/bin/pg_basebackup/Makefile | 7 +------
src/bin/pg_basebackup/astreamer_inject.h | 2 +-
src/bin/pg_basebackup/meson.build | 5 -----
src/fe_utils/Makefile | 5 +++++
src/{bin/pg_basebackup => fe_utils}/astreamer_file.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c | 2 +-
src/fe_utils/meson.build | 5 +++++
src/{bin/pg_basebackup => include/fe_utils}/astreamer.h | 0
12 files changed, 18 insertions(+), 18 deletions(-)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_file.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c (99%)
rename src/{bin/pg_basebackup => include/fe_utils}/astreamer.h (100%)
diff --git a/meson.build b/meson.build
index 7de0371226d..f7a5d2aea9a 100644
--- a/meson.build
+++ b/meson.build
@@ -3027,7 +3027,7 @@ frontend_common_code = declare_dependency(
compile_args: ['-DFRONTEND'],
include_directories: [postgres_inc],
sources: generated_headers,
- dependencies: [os_deps, zlib, zstd],
+ dependencies: [os_deps, zlib, zstd, lz4],
)
backend_common_code = declare_dependency(
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a71af2d48a7..f1e73058b23 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,7 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- astreamer_file.o \
- astreamer_gzip.o \
- astreamer_inject.o \
- astreamer_lz4.o \
- astreamer_tar.o \
- astreamer_zstd.o
+ astreamer_inject.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
index 8504b3f5e0d..aeed533862b 100644
--- a/src/bin/pg_basebackup/astreamer_inject.h
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -12,7 +12,7 @@
#ifndef ASTREAMER_INJECT_H
#define ASTREAMER_INJECT_H
-#include "astreamer.h"
+#include "fe_utils/astreamer.h"
#include "pqexpbuffer.h"
extern astreamer *astreamer_recovery_injector_new(astreamer *next,
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index a68dbd7837d..9101fc18438 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,7 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'astreamer_file.c',
- 'astreamer_gzip.c',
'astreamer_inject.c',
- 'astreamer_lz4.c',
- 'astreamer_tar.c',
- 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 946c05258f0..2694be4b859 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -21,6 +21,11 @@ override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
OBJS = \
archive.o \
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o \
cancel.o \
conditional.o \
connect_utils.o \
diff --git a/src/bin/pg_basebackup/astreamer_file.c b/src/fe_utils/astreamer_file.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_file.c
rename to src/fe_utils/astreamer_file.c
index 2742385e103..13d1192c6e6 100644
--- a/src/bin/pg_basebackup/astreamer_file.c
+++ b/src/fe_utils/astreamer_file.c
@@ -13,10 +13,10 @@
#include <unistd.h>
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
typedef struct astreamer_plain_writer
{
diff --git a/src/bin/pg_basebackup/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_gzip.c
rename to src/fe_utils/astreamer_gzip.c
index 6f7c27afbbc..dd28defac7b 100644
--- a/src/bin/pg_basebackup/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -17,10 +17,10 @@
#include <zlib.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef HAVE_LIBZ
typedef struct astreamer_gzip_writer
diff --git a/src/bin/pg_basebackup/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_lz4.c
rename to src/fe_utils/astreamer_lz4.c
index 1c40d7d8ad5..d8b2a367e47 100644
--- a/src/bin/pg_basebackup/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -17,10 +17,10 @@
#include <lz4frame.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_LZ4
typedef struct astreamer_lz4_frame
diff --git a/src/bin/pg_basebackup/astreamer_tar.c b/src/fe_utils/astreamer_tar.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_tar.c
rename to src/fe_utils/astreamer_tar.c
index 673690cd18f..f5d3562d280 100644
--- a/src/bin/pg_basebackup/astreamer_tar.c
+++ b/src/fe_utils/astreamer_tar.c
@@ -23,8 +23,8 @@
#include <time.h>
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#include "pgtar.h"
typedef struct astreamer_tar_parser
diff --git a/src/bin/pg_basebackup/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_zstd.c
rename to src/fe_utils/astreamer_zstd.c
index 58dc679ef99..45f6cb67363 100644
--- a/src/bin/pg_basebackup/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -17,8 +17,8 @@
#include <zstd.h>
#endif
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_ZSTD
diff --git a/src/fe_utils/meson.build b/src/fe_utils/meson.build
index 14d0482a2cc..043021d826d 100644
--- a/src/fe_utils/meson.build
+++ b/src/fe_utils/meson.build
@@ -2,6 +2,11 @@
fe_utils_sources = files(
'archive.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
'cancel.c',
'conditional.c',
'connect_utils.c',
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/include/fe_utils/astreamer.h
similarity index 100%
rename from src/bin/pg_basebackup/astreamer.h
rename to src/include/fe_utils/astreamer.h
--
2.18.0
v4-0004-Refactor-move-few-global-variable-to-verifier_con.patchapplication/x-patch; name=v4-0004-Refactor-move-few-global-variable-to-verifier_con.patchDownload
From 63c2694c9fe17a15bdec481397acf4ee54f00ab1 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 11:43:52 +0530
Subject: [PATCH v4 04/11] Refactor: move few global variable to
verifier_context struct
Global variables are:
1. show_progress
2. skip_checksums
3. total_size
4. done_size
---
src/bin/pg_verifybackup/pg_verifybackup.c | 50 +++++++++++------------
1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..71585ffc50e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -113,8 +113,14 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
+
+ /* Progress indicators */
+ bool show_progress;
+ uint64 total_size;
+ uint64 done_size;
} verifier_context;
static manifest_data *parse_manifest_file(char *manifest_path);
@@ -157,19 +163,11 @@ static void report_fatal_error(const char *pg_restrict fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();
static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-static void progress_report(bool finished);
+static void progress_report(verifier_context *context, bool finished);
static void usage(void);
static const char *progname;
-/* options */
-static bool show_progress = false;
-static bool skip_checksums = false;
-
-/* Progress indicators */
-static uint64 total_size = 0;
-static uint64 done_size = 0;
-
/*
* Main entry point.
*/
@@ -260,13 +258,13 @@ main(int argc, char **argv)
no_parse_wal = true;
break;
case 'P':
- show_progress = true;
+ context.show_progress = true;
break;
case 'q':
quiet = true;
break;
case 's':
- skip_checksums = true;
+ context.skip_checksums = true;
break;
case 'w':
wal_directory = pstrdup(optarg);
@@ -299,7 +297,7 @@ main(int argc, char **argv)
}
/* Complain if the specified arguments conflict */
- if (show_progress && quiet)
+ if (context.show_progress && quiet)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
@@ -363,7 +361,7 @@ main(int argc, char **argv)
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
*/
- if (!skip_checksums)
+ if (!context.skip_checksums)
verify_backup_checksums(&context);
/*
@@ -739,8 +737,9 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
verify_control_file(fullpath, context->manifest->system_identifier);
/* Update statistics for progress report, if necessary */
- if (show_progress && !skip_checksums && should_verify_checksum(m))
- total_size += m->size;
+ if (context->show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ context->total_size += m->size;
/*
* We don't verify checksums at this stage. We first finish verifying that
@@ -815,7 +814,7 @@ verify_backup_checksums(verifier_context *context)
manifest_file *m;
uint8 *buffer;
- progress_report(false);
+ progress_report(context, false);
buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
@@ -841,7 +840,7 @@ verify_backup_checksums(verifier_context *context)
pfree(buffer);
- progress_report(true);
+ progress_report(context, true);
}
/*
@@ -889,8 +888,8 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
}
/* Report progress */
- done_size += rc;
- progress_report(false);
+ context->done_size += rc;
+ progress_report(context, false);
}
if (rc < 0)
report_backup_error(context, "could not read file \"%s\": %m",
@@ -1036,7 +1035,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
}
/*
- * Print a progress report based on the global variables.
+ * Print a progress report based on the variables in verifier_context.
*
* Progress report is written at maximum once per second, unless the finished
* parameter is set to true.
@@ -1045,7 +1044,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
* is moved to the next line.
*/
static void
-progress_report(bool finished)
+progress_report(verifier_context *context, bool finished)
{
static pg_time_t last_progress_report = 0;
pg_time_t now;
@@ -1053,7 +1052,7 @@ progress_report(bool finished)
char totalsize_str[32];
char donesize_str[32];
- if (!show_progress)
+ if (!context->show_progress)
return;
now = time(NULL);
@@ -1061,12 +1060,13 @@ progress_report(bool finished)
return; /* Max once per second */
last_progress_report = now;
- percent_size = total_size ? (int) ((done_size * 100 / total_size)) : 0;
+ percent_size = context->total_size ?
+ (int) ((context->done_size * 100 / context->total_size)) : 0;
snprintf(totalsize_str, sizeof(totalsize_str), UINT64_FORMAT,
- total_size / 1024);
+ context->total_size / 1024);
snprintf(donesize_str, sizeof(donesize_str), UINT64_FORMAT,
- done_size / 1024);
+ context->done_size / 1024);
fprintf(stderr,
_("%*s/%s kB (%d%%) verified"),
--
2.18.0
v4-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchapplication/x-patch; name=v4-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchDownload
From 5b0e83ae2b1f444b7bd9277b0c202758d2eb5934 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 17 Jul 2024 14:23:27 +0530
Subject: [PATCH v4 02/11] Refactor: Add astreamer_inject.h and move related
declarations to it.
---
src/bin/pg_basebackup/astreamer.h | 6 ------
src/bin/pg_basebackup/astreamer_inject.c | 2 +-
src/bin/pg_basebackup/astreamer_inject.h | 24 ++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 2 +-
4 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer_inject.h
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
index 6b0047418bb..9d0a8c4d0c2 100644
--- a/src/bin/pg_basebackup/astreamer.h
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -217,10 +217,4 @@ extern astreamer *astreamer_tar_parser_new(astreamer *next);
extern astreamer *astreamer_tar_terminator_new(astreamer *next);
extern astreamer *astreamer_tar_archiver_new(astreamer *next);
-extern astreamer *astreamer_recovery_injector_new(astreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void astreamer_inject_file(astreamer *streamer, char *pathname,
- char *data, int len);
-
#endif
diff --git a/src/bin/pg_basebackup/astreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
index 7f1decded8d..4ad8381f102 100644
--- a/src/bin/pg_basebackup/astreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -11,7 +11,7 @@
#include "postgres_fe.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "common/file_perm.h"
#include "common/logging.h"
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
new file mode 100644
index 00000000000..8504b3f5e0d
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_inject.h
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer_inject.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_INJECT_H
+#define ASTREAMER_INJECT_H
+
+#include "astreamer.h"
+#include "pqexpbuffer.h"
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4179b064cbc..1e753e40c97 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,7 +26,7 @@
#endif
#include "access/xlog_internal.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "backup/basebackup.h"
#include "common/compression.h"
#include "common/file_perm.h"
--
2.18.0
v4-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchapplication/x-patch; name=v4-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchDownload
From 412266b417727de9eedd58cfb744cca35f0d2c22 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 09:39:32 +0530
Subject: [PATCH v4 01/11] Refactor: Rename all bbstreamer references to
astreamer.
BBSTREAMER is specific to pg_basebackup; we need a more generalized
name so it can be placed in a common area, making it accessible for
other modules. Renaming it to ASTREAMER, short for ARCHIVE STREAMER,
makes it more general.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/astreamer.h | 226 +++++++++++++
.../{bbstreamer_file.c => astreamer_file.c} | 148 ++++----
.../{bbstreamer_gzip.c => astreamer_gzip.c} | 154 ++++-----
...bbstreamer_inject.c => astreamer_inject.c} | 152 ++++-----
.../{bbstreamer_lz4.c => astreamer_lz4.c} | 172 +++++-----
.../{bbstreamer_tar.c => astreamer_tar.c} | 316 +++++++++---------
.../{bbstreamer_zstd.c => astreamer_zstd.c} | 160 ++++-----
src/bin/pg_basebackup/bbstreamer.h | 226 -------------
src/bin/pg_basebackup/meson.build | 12 +-
src/bin/pg_basebackup/nls.mk | 12 +-
src/bin/pg_basebackup/pg_basebackup.c | 74 ++--
src/tools/pgindent/typedefs.list | 26 +-
13 files changed, 845 insertions(+), 845 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer.h
rename src/bin/pg_basebackup/{bbstreamer_file.c => astreamer_file.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_gzip.c => astreamer_gzip.c} (62%)
rename src/bin/pg_basebackup/{bbstreamer_inject.c => astreamer_inject.c} (53%)
rename src/bin/pg_basebackup/{bbstreamer_lz4.c => astreamer_lz4.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_tar.c => astreamer_tar.c} (50%)
rename src/bin/pg_basebackup/{bbstreamer_zstd.c => astreamer_zstd.c} (64%)
delete mode 100644 src/bin/pg_basebackup/bbstreamer.h
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 26c53e473f5..a71af2d48a7 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,12 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- bbstreamer_file.o \
- bbstreamer_gzip.o \
- bbstreamer_inject.o \
- bbstreamer_lz4.o \
- bbstreamer_tar.o \
- bbstreamer_zstd.o
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_inject.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
new file mode 100644
index 00000000000..6b0047418bb
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * astreamer objects for further processing. The astreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent astreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_H
+#define ASTREAMER_H
+
+#include "common/compression.h"
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct astreamer;
+struct astreamer_ops;
+typedef struct astreamer astreamer;
+typedef struct astreamer_ops astreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a astreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see astreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one ASTREAMER_MEMBER_HEADER chunk and
+ * exactly one ASTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * ASTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly ASTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last ASTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ ASTREAMER_UNKNOWN,
+ ASTREAMER_MEMBER_HEADER,
+ ASTREAMER_MEMBER_CONTENTS,
+ ASTREAMER_MEMBER_TRAILER,
+ ASTREAMER_ARCHIVE_TRAILER,
+} astreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as ASTREAMER_MEMBER_HEADER,
+ * ASTREAMER_MEMBER_CONTENTS, or ASTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} astreamer_member;
+
+/*
+ * Generally, each type of astreamer will define its own struct, but the
+ * first element should be 'astreamer base'. A astreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the astreamer_ops object which contains the
+ * function pointers appropriate to this type of astreamer.
+ *
+ * bbs_next is a pointer to the successor astreamer, for those types of
+ * astreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of astreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct astreamer
+{
+ const astreamer_ops *bbs_ops;
+ astreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a astreamer. The 'content' callback is
+ * called repeatedly, as described in the astreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * astreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct astreamer_ops
+{
+ void (*content) (astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+ void (*finalize) (astreamer *streamer);
+ void (*free) (astreamer *streamer);
+};
+
+/* Send some content to a astreamer. */
+static inline void
+astreamer_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a astreamer. */
+static inline void
+astreamer_finalize(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a astreamer. */
+static inline void
+astreamer_free(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the astreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+astreamer_buffer_bytes(astreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * astreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+astreamer_buffer_until(astreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ astreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ astreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating astreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern astreamer *astreamer_plain_writer_new(char *pathname, FILE *file);
+extern astreamer *astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern astreamer *astreamer_gzip_decompressor_new(astreamer *next);
+extern astreamer *astreamer_lz4_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_lz4_decompressor_new(astreamer *next);
+extern astreamer *astreamer_zstd_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_zstd_decompressor_new(astreamer *next);
+extern astreamer *astreamer_tar_parser_new(astreamer *next);
+extern astreamer *astreamer_tar_terminator_new(astreamer *next);
+extern astreamer *astreamer_tar_archiver_new(astreamer *next);
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/astreamer_file.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_file.c
rename to src/bin/pg_basebackup/astreamer_file.c
index bab6cd4a6b1..2742385e103 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/astreamer_file.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_file.c
+ * astreamer_file.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_file.c
+ * src/bin/pg_basebackup/astreamer_file.c
*-------------------------------------------------------------------------
*/
@@ -13,60 +13,60 @@
#include <unistd.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
-typedef struct bbstreamer_plain_writer
+typedef struct astreamer_plain_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
FILE *file;
bool should_close_file;
-} bbstreamer_plain_writer;
+} astreamer_plain_writer;
-typedef struct bbstreamer_extractor
+typedef struct astreamer_extractor
{
- bbstreamer base;
+ astreamer base;
char *basepath;
const char *(*link_map) (const char *);
void (*report_output_file) (const char *);
char filename[MAXPGPATH];
FILE *file;
-} bbstreamer_extractor;
+} astreamer_extractor;
-static void bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+static void astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_plain_writer_finalize(astreamer *streamer);
+static void astreamer_plain_writer_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_plain_writer_ops = {
- .content = bbstreamer_plain_writer_content,
- .finalize = bbstreamer_plain_writer_finalize,
- .free = bbstreamer_plain_writer_free
+static const astreamer_ops astreamer_plain_writer_ops = {
+ .content = astreamer_plain_writer_content,
+ .finalize = astreamer_plain_writer_finalize,
+ .free = astreamer_plain_writer_free
};
-static void bbstreamer_extractor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_extractor_finalize(bbstreamer *streamer);
-static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void astreamer_extractor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_extractor_finalize(astreamer *streamer);
+static void astreamer_extractor_free(astreamer *streamer);
static void extract_directory(const char *filename, mode_t mode);
static void extract_link(const char *filename, const char *linktarget);
static FILE *create_file_for_extract(const char *filename, mode_t mode);
-static const bbstreamer_ops bbstreamer_extractor_ops = {
- .content = bbstreamer_extractor_content,
- .finalize = bbstreamer_extractor_finalize,
- .free = bbstreamer_extractor_free
+static const astreamer_ops astreamer_extractor_ops = {
+ .content = astreamer_extractor_content,
+ .finalize = astreamer_extractor_finalize,
+ .free = astreamer_extractor_free
};
/*
- * Create a bbstreamer that just writes data to a file.
+ * Create a astreamer that just writes data to a file.
*
* The caller must specify a pathname and may specify a file. The pathname is
* used for error-reporting purposes either way. If file is NULL, the pathname
@@ -74,14 +74,14 @@ static const bbstreamer_ops bbstreamer_extractor_ops = {
* for writing and closed when done. If file is not NULL, the data is written
* there.
*/
-bbstreamer *
-bbstreamer_plain_writer_new(char *pathname, FILE *file)
+astreamer *
+astreamer_plain_writer_new(char *pathname, FILE *file)
{
- bbstreamer_plain_writer *streamer;
+ astreamer_plain_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_plain_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_plain_writer_ops;
+ streamer = palloc0(sizeof(astreamer_plain_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_plain_writer_ops;
streamer->pathname = pstrdup(pathname);
streamer->file = file;
@@ -101,13 +101,13 @@ bbstreamer_plain_writer_new(char *pathname, FILE *file)
* Write archive content to file.
*/
static void
-bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (len == 0)
return;
@@ -128,11 +128,11 @@ bbstreamer_plain_writer_content(bbstreamer *streamer,
* the file if we opened it, but not if the caller provided it.
*/
static void
-bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+astreamer_plain_writer_finalize(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
pg_fatal("could not close file \"%s\": %m",
@@ -143,14 +143,14 @@ bbstreamer_plain_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_plain_writer_free(bbstreamer *streamer)
+astreamer_plain_writer_free(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
Assert(!mystreamer->should_close_file);
Assert(mystreamer->base.bbs_next == NULL);
@@ -160,13 +160,13 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
}
/*
- * Create a bbstreamer that extracts an archive.
+ * Create a astreamer that extracts an archive.
*
* All pathnames in the archive are interpreted relative to basepath.
*
- * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * Unlike e.g. astreamer_plain_writer_new() we can't do anything useful here
* with untyped chunks; we need typed chunks which follow the rules described
- * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * in astreamer.h. Assuming we have that, we don't need to worry about the
* original archive format; it's enough to just look at the member information
* provided and write to the corresponding file.
*
@@ -179,16 +179,16 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
* new output file. The pathname to that file is passed as an argument. If
* NULL, the call is skipped.
*/
-bbstreamer *
-bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *))
+astreamer *
+astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
{
- bbstreamer_extractor *streamer;
+ astreamer_extractor *streamer;
- streamer = palloc0(sizeof(bbstreamer_extractor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_extractor_ops;
+ streamer = palloc0(sizeof(astreamer_extractor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_extractor_ops;
streamer->basepath = pstrdup(basepath);
streamer->link_map = link_map;
streamer->report_output_file = report_output_file;
@@ -200,19 +200,19 @@ bbstreamer_extractor_new(const char *basepath,
* Extract archive contents to the filesystem.
*/
static void
-bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_extractor_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
int fnamelen;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
+ Assert(context != ASTREAMER_UNKNOWN);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
Assert(mystreamer->file == NULL);
/* Prepend basepath. */
@@ -245,7 +245,7 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
mystreamer->report_output_file(mystreamer->filename);
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
if (mystreamer->file == NULL)
break;
@@ -260,14 +260,14 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
if (mystreamer->file == NULL)
break;
fclose(mystreamer->file);
mystreamer->file = NULL;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
break;
default:
@@ -375,10 +375,10 @@ create_file_for_extract(const char *filename, mode_t mode)
* There's nothing to do here but sanity checking.
*/
static void
-bbstreamer_extractor_finalize(bbstreamer *streamer)
+astreamer_extractor_finalize(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
- = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
+ = (astreamer_extractor *) streamer;
Assert(mystreamer->file == NULL);
}
@@ -387,9 +387,9 @@ bbstreamer_extractor_finalize(bbstreamer *streamer)
* Free memory.
*/
static void
-bbstreamer_extractor_free(bbstreamer *streamer)
+astreamer_extractor_free(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
pfree(mystreamer->basepath);
pfree(mystreamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/astreamer_gzip.c
similarity index 62%
rename from src/bin/pg_basebackup/bbstreamer_gzip.c
rename to src/bin/pg_basebackup/astreamer_gzip.c
index 0417fd9bc2c..6f7c27afbbc 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/astreamer_gzip.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_gzip.c
+ * astreamer_gzip.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_gzip.c
+ * src/bin/pg_basebackup/astreamer_gzip.c
*-------------------------------------------------------------------------
*/
@@ -17,74 +17,74 @@
#include <zlib.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
+typedef struct astreamer_gzip_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
gzFile gzfile;
-} bbstreamer_gzip_writer;
+} astreamer_gzip_writer;
-typedef struct bbstreamer_gzip_decompressor
+typedef struct astreamer_gzip_decompressor
{
- bbstreamer base;
+ astreamer base;
z_stream zstream;
size_t bytes_written;
-} bbstreamer_gzip_decompressor;
+} astreamer_gzip_decompressor;
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static void astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_writer_finalize(astreamer *streamer);
+static void astreamer_gzip_writer_free(astreamer *streamer);
static const char *get_gz_error(gzFile gzf);
-static const bbstreamer_ops bbstreamer_gzip_writer_ops = {
- .content = bbstreamer_gzip_writer_content,
- .finalize = bbstreamer_gzip_writer_finalize,
- .free = bbstreamer_gzip_writer_free
+static const astreamer_ops astreamer_gzip_writer_ops = {
+ .content = astreamer_gzip_writer_content,
+ .finalize = astreamer_gzip_writer_finalize,
+ .free = astreamer_gzip_writer_free
};
-static void bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_decompressor_free(bbstreamer *streamer);
+static void astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_decompressor_finalize(astreamer *streamer);
+static void astreamer_gzip_decompressor_free(astreamer *streamer);
static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
static void gzip_pfree(void *opaque, void *address);
-static const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
- .content = bbstreamer_gzip_decompressor_content,
- .finalize = bbstreamer_gzip_decompressor_finalize,
- .free = bbstreamer_gzip_decompressor_free
+static const astreamer_ops astreamer_gzip_decompressor_ops = {
+ .content = astreamer_gzip_decompressor_content,
+ .finalize = astreamer_gzip_decompressor_finalize,
+ .free = astreamer_gzip_decompressor_free
};
#endif
/*
- * Create a bbstreamer that just compresses data using gzip, and then writes
+ * Create a astreamer that just compresses data using gzip, and then writes
* it to a file.
*
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * As in the case of astreamer_plain_writer_new, pathname is always used
* for error reporting purposes; if file is NULL, it is also the opened and
* closed so that the data may be written there.
*/
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress)
+astreamer *
+astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_writer *streamer;
+ astreamer_gzip_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_gzip_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_writer_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_writer_ops;
streamer->pathname = pstrdup(pathname);
@@ -123,13 +123,13 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file,
* Write archive content to gzip file.
*/
static void
-bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
if (len == 0)
return;
@@ -151,16 +151,16 @@ bbstreamer_gzip_writer_content(bbstreamer *streamer,
*
* It makes no difference whether we opened the file or the caller did it,
* because libz provides no way of avoiding a close on the underlying file
- * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * handle. Notice, however, that astreamer_gzip_writer_new() uses dup() to
* work around this issue, so that the behavior from the caller's viewpoint
- * is the same as for bbstreamer_plain_writer.
+ * is the same as for astreamer_plain_writer.
*/
static void
-bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+astreamer_gzip_writer_finalize(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
errno = 0; /* in case gzclose() doesn't set it */
if (gzclose(mystreamer->gzfile) != 0)
@@ -171,14 +171,14 @@ bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_gzip_writer_free(bbstreamer *streamer)
+astreamer_gzip_writer_free(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
Assert(mystreamer->base.bbs_next == NULL);
Assert(mystreamer->gzfile == NULL);
@@ -208,18 +208,18 @@ get_gz_error(gzFile gzf)
* Create a new base backup streamer that performs decompression of gzip
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_gzip_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_gzip_decompressor_new(astreamer *next)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_decompressor *streamer;
+ astreamer_gzip_decompressor *streamer;
z_stream *zs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_gzip_decompressor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_decompressor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -258,15 +258,15 @@ bbstreamer_gzip_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
z_stream *zs;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
zs = &mystreamer->zstream;
zs->next_in = (const uint8 *) data;
@@ -301,9 +301,9 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
/* If output buffer is full then pass data to next streamer */
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen, context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
mystreamer->bytes_written = 0;
}
}
@@ -313,31 +313,31 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer)
+astreamer_gzip_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_gzip_decompressor_free(bbstreamer *streamer)
+astreamer_gzip_decompressor_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
similarity index 53%
rename from src/bin/pg_basebackup/bbstreamer_inject.c
rename to src/bin/pg_basebackup/astreamer_inject.c
index 194026b56e9..7f1decded8d 100644
--- a/src/bin/pg_basebackup/bbstreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -1,51 +1,51 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_inject.c
+ * astreamer_inject.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_inject.c
+ * src/bin/pg_basebackup/astreamer_inject.c
*-------------------------------------------------------------------------
*/
#include "postgres_fe.h"
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
-typedef struct bbstreamer_recovery_injector
+typedef struct astreamer_recovery_injector
{
- bbstreamer base;
+ astreamer base;
bool skip_file;
bool is_recovery_guc_supported;
bool is_postgresql_auto_conf;
bool found_postgresql_auto_conf;
PQExpBuffer recoveryconfcontents;
- bbstreamer_member member;
-} bbstreamer_recovery_injector;
+ astreamer_member member;
+} astreamer_recovery_injector;
-static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
-static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+static void astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_recovery_injector_finalize(astreamer *streamer);
+static void astreamer_recovery_injector_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
- .content = bbstreamer_recovery_injector_content,
- .finalize = bbstreamer_recovery_injector_finalize,
- .free = bbstreamer_recovery_injector_free
+static const astreamer_ops astreamer_recovery_injector_ops = {
+ .content = astreamer_recovery_injector_content,
+ .finalize = astreamer_recovery_injector_finalize,
+ .free = astreamer_recovery_injector_free
};
/*
- * Create a bbstreamer that can edit recoverydata into an archive stream.
+ * Create a astreamer that can edit recoverydata into an archive stream.
*
- * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
- * per the conventions described in bbstreamer.h; the chunks forwarded to
- * the next bbstreamer will be similarly typed, but the
- * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * The input should be a series of typed chunks (not ASTREAMER_UNKNOWN) as
+ * per the conventions described in astreamer.h; the chunks forwarded to
+ * the next astreamer will be similarly typed, but the
+ * ASTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
* edited the archive stream.
*
* Our goal is to do one of the following three things with the content passed
@@ -61,16 +61,16 @@ static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
* zero-length standby.signal file, dropping any file with that name from
* the archive.
*/
-bbstreamer *
-bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents)
+astreamer *
+astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
{
- bbstreamer_recovery_injector *streamer;
+ astreamer_recovery_injector *streamer;
- streamer = palloc0(sizeof(bbstreamer_recovery_injector));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_recovery_injector_ops;
+ streamer = palloc0(sizeof(astreamer_recovery_injector));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_recovery_injector_ops;
streamer->base.bbs_next = next;
streamer->is_recovery_guc_supported = is_recovery_guc_supported;
streamer->recoveryconfcontents = recoveryconfcontents;
@@ -82,21 +82,21 @@ bbstreamer_recovery_injector_new(bbstreamer *next,
* Handle each chunk of tar content while injecting recovery configuration.
*/
static void
-bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_recovery_injector *mystreamer;
+ astreamer_recovery_injector *mystreamer;
- mystreamer = (bbstreamer_recovery_injector *) streamer;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ mystreamer = (astreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/* Must copy provided data so we have the option to modify it. */
- memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+ memcpy(&mystreamer->member, member, sizeof(astreamer_member));
/*
* On v12+, skip standby.signal and edit postgresql.auto.conf; on
@@ -119,8 +119,8 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
/*
* Zap data and len because the archive header is no
- * longer valid; some subsequent bbstreamer must
- * regenerate it if it's necessary.
+ * longer valid; some subsequent astreamer must regenerate
+ * it if it's necessary.
*/
data = NULL;
len = 0;
@@ -135,26 +135,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
return;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/* Do not forward if the file is to be skipped. */
if (mystreamer->skip_file)
return;
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/* Do not forward it the file is to be skipped. */
if (mystreamer->skip_file)
return;
/* Append provided content to whatever we already sent. */
if (mystreamer->is_postgresql_auto_conf)
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ ASTREAMER_MEMBER_CONTENTS);
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
if (mystreamer->is_recovery_guc_supported)
{
/*
@@ -163,22 +163,22 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
* member now.
*/
if (!mystreamer->found_postgresql_auto_conf)
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "postgresql.auto.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
/* Inject empty standby.signal file. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "standby.signal", "", 0);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
}
else
{
/* Inject recovery.conf file with specified contents. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "recovery.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
}
/* Nothing to do here. */
@@ -189,26 +189,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
pg_fatal("unexpected state while injecting recovery settings");
}
- bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
- data, len, context);
+ astreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
}
/*
- * End-of-stream processing for this bbstreamer.
+ * End-of-stream processing for this astreamer.
*/
static void
-bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+astreamer_recovery_injector_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_recovery_injector_free(bbstreamer *streamer)
+astreamer_recovery_injector_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
@@ -216,10 +216,10 @@ bbstreamer_recovery_injector_free(bbstreamer *streamer)
* Inject a member into the archive with specified contents.
*/
void
-bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
- int len)
+astreamer_inject_file(astreamer *streamer, char *pathname, char *data,
+ int len)
{
- bbstreamer_member member;
+ astreamer_member member;
strlcpy(member.pathname, pathname, MAXPGPATH);
member.size = len;
@@ -238,12 +238,12 @@ bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
/*
* We don't know here how to generate valid member headers and trailers
* for the archiving format in use, so if those are needed, some successor
- * bbstreamer will have to generate them using the data from 'member'.
+ * astreamer will have to generate them using the data from 'member'.
*/
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_HEADER);
- bbstreamer_content(streamer, &member, data, len,
- BBSTREAMER_MEMBER_CONTENTS);
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_HEADER);
+ astreamer_content(streamer, &member, data, len,
+ ASTREAMER_MEMBER_CONTENTS);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/astreamer_lz4.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_lz4.c
rename to src/bin/pg_basebackup/astreamer_lz4.c
index f5c9e68150c..1c40d7d8ad5 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/astreamer_lz4.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_lz4.c
+ * astreamer_lz4.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_lz4.c
+ * src/bin/pg_basebackup/astreamer_lz4.c
*-------------------------------------------------------------------------
*/
@@ -17,15 +17,15 @@
#include <lz4frame.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef USE_LZ4
-typedef struct bbstreamer_lz4_frame
+typedef struct astreamer_lz4_frame
{
- bbstreamer base;
+ astreamer base;
LZ4F_compressionContext_t cctx;
LZ4F_decompressionContext_t dctx;
@@ -33,32 +33,32 @@ typedef struct bbstreamer_lz4_frame
size_t bytes_written;
bool header_written;
-} bbstreamer_lz4_frame;
+} astreamer_lz4_frame;
-static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+static void astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_compressor_finalize(astreamer *streamer);
+static void astreamer_lz4_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
- .content = bbstreamer_lz4_compressor_content,
- .finalize = bbstreamer_lz4_compressor_finalize,
- .free = bbstreamer_lz4_compressor_free
+static const astreamer_ops astreamer_lz4_compressor_ops = {
+ .content = astreamer_lz4_compressor_content,
+ .finalize = astreamer_lz4_compressor_finalize,
+ .free = astreamer_lz4_compressor_free
};
-static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+static void astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_decompressor_finalize(astreamer *streamer);
+static void astreamer_lz4_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
- .content = bbstreamer_lz4_decompressor_content,
- .finalize = bbstreamer_lz4_decompressor_finalize,
- .free = bbstreamer_lz4_decompressor_free
+static const astreamer_ops astreamer_lz4_decompressor_ops = {
+ .content = astreamer_lz4_decompressor_content,
+ .finalize = astreamer_lz4_decompressor_finalize,
+ .free = astreamer_lz4_decompressor_free
};
#endif
@@ -66,19 +66,19 @@ static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* Create a new base backup streamer that performs lz4 compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_lz4_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
LZ4F_preferences_t *prefs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_compressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -113,19 +113,19 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compr
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t out_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
/* Write header before processing the first input chunk. */
@@ -159,10 +159,10 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
out_bound = LZ4F_compressBound(len, &mystreamer->prefs);
if (avail_out < out_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ context);
/* Enlarge buffer if it falls short of out bound. */
if (mystreamer->base.bbs_buffer.maxlen < out_bound)
@@ -196,25 +196,25 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
+astreamer_lz4_compressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_out;
size_t footer_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/* Find out the footer bound and update the output buffer. */
footer_bound = LZ4F_compressBound(0, &mystreamer->prefs);
if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <
footer_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
/* Enlarge buffer if it falls short of footer bound. */
if (mystreamer->base.bbs_buffer.maxlen < footer_bound)
@@ -243,24 +243,24 @@ bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
mystreamer->bytes_written += compressed_size;
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_compressor_free(bbstreamer *streamer)
+astreamer_lz4_compressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeCompressionContext(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -271,18 +271,18 @@ bbstreamer_lz4_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of lz4
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_lz4_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_lz4_decompressor_new(astreamer *next)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -307,18 +307,18 @@ bbstreamer_lz4_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t avail_in,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
avail_in = len;
@@ -366,10 +366,10 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ context);
avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
@@ -387,34 +387,34 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer)
+astreamer_lz4_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_decompressor_free(bbstreamer *streamer)
+astreamer_lz4_decompressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeDecompressionContext(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/astreamer_tar.c
similarity index 50%
rename from src/bin/pg_basebackup/bbstreamer_tar.c
rename to src/bin/pg_basebackup/astreamer_tar.c
index 9137d17ddc1..673690cd18f 100644
--- a/src/bin/pg_basebackup/bbstreamer_tar.c
+++ b/src/bin/pg_basebackup/astreamer_tar.c
@@ -1,13 +1,13 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_tar.c
+ * astreamer_tar.c
*
* This module implements three types of tar processing. A tar parser
- * expects unlabelled chunks of data (e.g. BBSTREAMER_UNKNOWN) and splits
- * it into labelled chunks (any other value of bbstreamer_archive_context).
+ * expects unlabelled chunks of data (e.g. ASTREAMER_UNKNOWN) and splits
+ * it into labelled chunks (any other value of astreamer_archive_context).
* A tar archiver does the reverse: it takes a bunch of labelled chunks
* and produces a tarfile, optionally replacing member headers and trailers
- * so that upstream bbstreamer objects can perform surgery on the tarfile
+ * so that upstream astreamer objects can perform surgery on the tarfile
* contents without knowing the details of the tar format. A tar terminator
* just adds two blocks of NUL bytes to the end of the file, since older
* server versions produce files with this terminator omitted.
@@ -15,7 +15,7 @@
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_tar.c
+ * src/bin/pg_basebackup/astreamer_tar.c
*-------------------------------------------------------------------------
*/
@@ -23,83 +23,83 @@
#include <time.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#include "pgtar.h"
-typedef struct bbstreamer_tar_parser
+typedef struct astreamer_tar_parser
{
- bbstreamer base;
- bbstreamer_archive_context next_context;
- bbstreamer_member member;
+ astreamer base;
+ astreamer_archive_context next_context;
+ astreamer_member member;
size_t file_bytes_sent;
size_t pad_bytes_expected;
-} bbstreamer_tar_parser;
+} astreamer_tar_parser;
-typedef struct bbstreamer_tar_archiver
+typedef struct astreamer_tar_archiver
{
- bbstreamer base;
+ astreamer base;
bool rearchive_member;
-} bbstreamer_tar_archiver;
+} astreamer_tar_archiver;
-static void bbstreamer_tar_parser_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_parser_free(bbstreamer *streamer);
-static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+static void astreamer_tar_parser_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_parser_finalize(astreamer *streamer);
+static void astreamer_tar_parser_free(astreamer *streamer);
+static bool astreamer_tar_header(astreamer_tar_parser *mystreamer);
-static const bbstreamer_ops bbstreamer_tar_parser_ops = {
- .content = bbstreamer_tar_parser_content,
- .finalize = bbstreamer_tar_parser_finalize,
- .free = bbstreamer_tar_parser_free
+static const astreamer_ops astreamer_tar_parser_ops = {
+ .content = astreamer_tar_parser_content,
+ .finalize = astreamer_tar_parser_finalize,
+ .free = astreamer_tar_parser_free
};
-static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+static void astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_archiver_finalize(astreamer *streamer);
+static void astreamer_tar_archiver_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_archiver_ops = {
- .content = bbstreamer_tar_archiver_content,
- .finalize = bbstreamer_tar_archiver_finalize,
- .free = bbstreamer_tar_archiver_free
+static const astreamer_ops astreamer_tar_archiver_ops = {
+ .content = astreamer_tar_archiver_content,
+ .finalize = astreamer_tar_archiver_finalize,
+ .free = astreamer_tar_archiver_free
};
-static void bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_terminator_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_terminator_free(bbstreamer *streamer);
+static void astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_terminator_finalize(astreamer *streamer);
+static void astreamer_tar_terminator_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_terminator_ops = {
- .content = bbstreamer_tar_terminator_content,
- .finalize = bbstreamer_tar_terminator_finalize,
- .free = bbstreamer_tar_terminator_free
+static const astreamer_ops astreamer_tar_terminator_ops = {
+ .content = astreamer_tar_terminator_content,
+ .finalize = astreamer_tar_terminator_finalize,
+ .free = astreamer_tar_terminator_free
};
/*
- * Create a bbstreamer that can parse a stream of content as tar data.
+ * Create a astreamer that can parse a stream of content as tar data.
*
- * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * The input should be a series of ASTREAMER_UNKNOWN chunks; the astreamer
* specified by 'next' will receive a series of typed chunks, as per the
- * conventions described in bbstreamer.h.
+ * conventions described in astreamer.h.
*/
-bbstreamer *
-bbstreamer_tar_parser_new(bbstreamer *next)
+astreamer *
+astreamer_tar_parser_new(astreamer *next)
{
- bbstreamer_tar_parser *streamer;
+ astreamer_tar_parser *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_parser));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_parser_ops;
+ streamer = palloc0(sizeof(astreamer_tar_parser));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_parser_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
- streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ streamer->next_context = ASTREAMER_MEMBER_HEADER;
return &streamer->base;
}
@@ -108,29 +108,29 @@ bbstreamer_tar_parser_new(bbstreamer *next)
* Parse unknown content as tar data.
*/
static void
-bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_parser_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
size_t nbytes;
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
while (len > 0)
{
switch (mystreamer->next_context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/*
* If we're expecting an archive member header, accumulate a
* full block of data before doing anything further.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- TAR_BLOCK_SIZE))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
return;
/*
@@ -139,32 +139,32 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* thought was the next file header is actually the start of
* the archive trailer. Switch modes accordingly.
*/
- if (bbstreamer_tar_header(mystreamer))
+ if (astreamer_tar_header(mystreamer))
{
if (mystreamer->member.size == 0)
{
/* No content; trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Expect contents. */
- mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ mystreamer->next_context = ASTREAMER_MEMBER_CONTENTS;
}
mystreamer->base.bbs_buffer.len = 0;
mystreamer->file_bytes_sent = 0;
}
else
- mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ mystreamer->next_context = ASTREAMER_ARCHIVE_TRAILER;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/*
* Send as much content as we have, but not more than the
@@ -174,10 +174,10 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
nbytes = Min(nbytes, len);
Assert(nbytes > 0);
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, nbytes,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ ASTREAMER_MEMBER_CONTENTS);
mystreamer->file_bytes_sent += nbytes;
data += nbytes;
len -= nbytes;
@@ -193,53 +193,53 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
if (mystreamer->pad_bytes_expected == 0)
{
/* Trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Trailer is not zero-length. */
- mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ mystreamer->next_context = ASTREAMER_MEMBER_TRAILER;
}
mystreamer->base.bbs_buffer.len = 0;
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/*
* If we're expecting an archive member trailer, accumulate
* the expected number of padding bytes before sending
* anything onward.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- mystreamer->pad_bytes_expected))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
return;
/* OK, now we can send it. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, mystreamer->pad_bytes_expected,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next file header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
mystreamer->base.bbs_buffer.len = 0;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
/*
* We've seen an end-of-archive indicator, so anything more is
* buffered and sent as part of the archive trailer. But we
* don't expect more than 2 blocks.
*/
- bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ astreamer_buffer_bytes(streamer, &data, &len, len);
if (len > 2 * TAR_BLOCK_SIZE)
pg_fatal("tar file trailer exceeds 2 blocks");
return;
@@ -255,14 +255,14 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* Parse a file header within a tar stream.
*
* The return value is true if we found a file header and passed it on to the
- * next bbstreamer; it is false if we have reached the archive trailer.
+ * next astreamer; it is false if we have reached the archive trailer.
*/
static bool
-bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+astreamer_tar_header(astreamer_tar_parser *mystreamer)
{
bool has_nonzero_byte = false;
int i;
- bbstreamer_member *member = &mystreamer->member;
+ astreamer_member *member = &mystreamer->member;
char *buffer = mystreamer->base.bbs_buffer.data;
Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
@@ -304,10 +304,10 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
/* Compute number of padding bytes. */
mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
- /* Forward the entire header to the next bbstreamer. */
- bbstreamer_content(mystreamer->base.bbs_next, member,
- buffer, TAR_BLOCK_SIZE,
- BBSTREAMER_MEMBER_HEADER);
+ /* Forward the entire header to the next astreamer. */
+ astreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ ASTREAMER_MEMBER_HEADER);
return true;
}
@@ -316,50 +316,50 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
* End-of-stream processing for a tar parser.
*/
static void
-bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+astreamer_tar_parser_finalize(astreamer *streamer)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
- if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
- (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ if (mystreamer->next_context != ASTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != ASTREAMER_MEMBER_HEADER ||
mystreamer->base.bbs_buffer.len > 0))
pg_fatal("COPY stream ended before last file was finished");
/* Send the archive trailer, even if empty. */
- bbstreamer_content(streamer->bbs_next, NULL,
- streamer->bbs_buffer.data, streamer->bbs_buffer.len,
- BBSTREAMER_ARCHIVE_TRAILER);
+ astreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ ASTREAMER_ARCHIVE_TRAILER);
/* Now finalize successor. */
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar parser.
*/
static void
-bbstreamer_tar_parser_free(bbstreamer *streamer)
+astreamer_tar_parser_free(astreamer *streamer)
{
pfree(streamer->bbs_buffer.data);
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
}
/*
- * Create a bbstreamer that can generate a tar archive.
+ * Create a astreamer that can generate a tar archive.
*
* This is intended to be usable either for generating a brand-new tar archive
* or for modifying one on the fly. The input should be a series of typed
- * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
- * bbstreamer_tar_parser_content.
+ * chunks (i.e. not ASTREAMER_UNKNOWN). See also the comments for
+ * astreamer_tar_parser_content.
*/
-bbstreamer *
-bbstreamer_tar_archiver_new(bbstreamer *next)
+astreamer *
+astreamer_tar_archiver_new(astreamer *next)
{
- bbstreamer_tar_archiver *streamer;
+ astreamer_tar_archiver *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_archiver));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_archiver_ops;
+ streamer = palloc0(sizeof(astreamer_tar_archiver));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_archiver_ops;
streamer->base.bbs_next = next;
return &streamer->base;
@@ -368,36 +368,36 @@ bbstreamer_tar_archiver_new(bbstreamer *next)
/*
* Fix up the stream of input chunks to create a valid tar file.
*
- * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * If a ASTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
* newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
* passed through without change. Any other size is a fatal error (and
* indicates a bug).
*
- * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
- * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * Whenever a new ASTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding ASTREAMER_MEMBER_TRAILER chunk is also constructed from
* scratch. Specifically, we construct a block of zero bytes sufficient to
* pad out to a block boundary, as required by the tar format. Other
- * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ * ASTREAMER_MEMBER_TRAILER chunks are passed through without change.
*
- * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ * Any ASTREAMER_MEMBER_CONTENTS chunks are passed through without change.
*
- * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * The ASTREAMER_ARCHIVE_TRAILER chunk is replaced with two
* blocks of zero bytes. Not all tar programs require this, but apparently
* some do. The server does not supply this trailer. If no archive trailer is
- * present, one will be added by bbstreamer_tar_parser_finalize.
+ * present, one will be added by astreamer_tar_parser_finalize.
*/
static void
-bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ astreamer_tar_archiver *mystreamer = (astreamer_tar_archiver *) streamer;
char buffer[2 * TAR_BLOCK_SIZE];
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(context != ASTREAMER_UNKNOWN);
- if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ if (context == ASTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
{
Assert(len == 0);
@@ -411,7 +411,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Also make a note to replace padding, in case size changed. */
mystreamer->rearchive_member = true;
}
- else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ else if (context == ASTREAMER_MEMBER_TRAILER &&
mystreamer->rearchive_member)
{
int pad_bytes = tarPaddingBytesRequired(member->size);
@@ -424,7 +424,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Don't do this again unless we replace another header. */
mystreamer->rearchive_member = false;
}
- else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ else if (context == ASTREAMER_ARCHIVE_TRAILER)
{
/* Trailer should always be two blocks of zero bytes. */
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
@@ -432,40 +432,40 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
len = 2 * TAR_BLOCK_SIZE;
}
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
* End-of-stream processing for a tar archiver.
*/
static void
-bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+astreamer_tar_archiver_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar archiver.
*/
static void
-bbstreamer_tar_archiver_free(bbstreamer *streamer)
+astreamer_tar_archiver_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
/*
- * Create a bbstreamer that blindly adds two blocks of NUL bytes to the
+ * Create a astreamer that blindly adds two blocks of NUL bytes to the
* end of an incomplete tarfile that the server might send us.
*/
-bbstreamer *
-bbstreamer_tar_terminator_new(bbstreamer *next)
+astreamer *
+astreamer_tar_terminator_new(astreamer *next)
{
- bbstreamer *streamer;
+ astreamer *streamer;
- streamer = palloc0(sizeof(bbstreamer));
- *((const bbstreamer_ops **) &streamer->bbs_ops) =
- &bbstreamer_tar_terminator_ops;
+ streamer = palloc0(sizeof(astreamer));
+ *((const astreamer_ops **) &streamer->bbs_ops) =
+ &astreamer_tar_terminator_ops;
streamer->bbs_next = next;
return streamer;
@@ -475,17 +475,17 @@ bbstreamer_tar_terminator_new(bbstreamer *next)
* Pass all the content through without change.
*/
static void
-bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
/* Just forward it. */
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
@@ -493,22 +493,22 @@ bbstreamer_tar_terminator_content(bbstreamer *streamer,
* to supply.
*/
static void
-bbstreamer_tar_terminator_finalize(bbstreamer *streamer)
+astreamer_tar_terminator_finalize(astreamer *streamer)
{
char buffer[2 * TAR_BLOCK_SIZE];
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
- bbstreamer_content(streamer->bbs_next, NULL, buffer,
- 2 * TAR_BLOCK_SIZE, BBSTREAMER_UNKNOWN);
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_content(streamer->bbs_next, NULL, buffer,
+ 2 * TAR_BLOCK_SIZE, ASTREAMER_UNKNOWN);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar terminator.
*/
static void
-bbstreamer_tar_terminator_free(bbstreamer *streamer)
+astreamer_tar_terminator_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/astreamer_zstd.c
similarity index 64%
rename from src/bin/pg_basebackup/bbstreamer_zstd.c
rename to src/bin/pg_basebackup/astreamer_zstd.c
index 20f11d4450e..58dc679ef99 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/astreamer_zstd.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_zstd.c
+ * astreamer_zstd.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_zstd.c
+ * src/bin/pg_basebackup/astreamer_zstd.c
*-------------------------------------------------------------------------
*/
@@ -17,44 +17,44 @@
#include <zstd.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#ifdef USE_ZSTD
-typedef struct bbstreamer_zstd_frame
+typedef struct astreamer_zstd_frame
{
- bbstreamer base;
+ astreamer base;
ZSTD_CCtx *cctx;
ZSTD_DCtx *dctx;
ZSTD_outBuffer zstd_outBuf;
-} bbstreamer_zstd_frame;
+} astreamer_zstd_frame;
-static void bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_compressor_free(bbstreamer *streamer);
+static void astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_compressor_finalize(astreamer *streamer);
+static void astreamer_zstd_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
- .content = bbstreamer_zstd_compressor_content,
- .finalize = bbstreamer_zstd_compressor_finalize,
- .free = bbstreamer_zstd_compressor_free
+static const astreamer_ops astreamer_zstd_compressor_ops = {
+ .content = astreamer_zstd_compressor_content,
+ .finalize = astreamer_zstd_compressor_finalize,
+ .free = astreamer_zstd_compressor_free
};
-static void bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_decompressor_free(bbstreamer *streamer);
+static void astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_decompressor_finalize(astreamer *streamer);
+static void astreamer_zstd_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
- .content = bbstreamer_zstd_decompressor_content,
- .finalize = bbstreamer_zstd_decompressor_finalize,
- .free = bbstreamer_zstd_decompressor_free
+static const astreamer_ops astreamer_zstd_decompressor_ops = {
+ .content = astreamer_zstd_decompressor_content,
+ .finalize = astreamer_zstd_decompressor_finalize,
+ .free = astreamer_zstd_decompressor_free
};
#endif
@@ -62,19 +62,19 @@ static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* Create a new base backup streamer that performs zstd compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_zstd_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
size_t ret;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_compressor_ops;
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -142,12 +142,12 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *comp
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -162,10 +162,10 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -187,9 +187,9 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
+astreamer_zstd_compressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
size_t yet_to_flush;
do
@@ -204,10 +204,10 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -227,23 +227,23 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
/* Make sure to pass any remaining bytes to the next streamer. */
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_compressor_free(bbstreamer *streamer)
+astreamer_zstd_compressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeCCtx(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -254,17 +254,17 @@ bbstreamer_zstd_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of zstd
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_zstd_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_zstd_decompressor_new(astreamer *next)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -293,12 +293,12 @@ bbstreamer_zstd_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -311,10 +311,10 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->zstd_outBuf.pos >= mystreamer->zstd_outBuf.size)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -335,32 +335,32 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer)
+astreamer_zstd_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_decompressor_free(bbstreamer *streamer)
+astreamer_zstd_decompressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeDCtx(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
deleted file mode 100644
index 3b820f13b51..00000000000
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ /dev/null
@@ -1,226 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * bbstreamer.h
- *
- * Each tar archive returned by the server is passed to one or more
- * bbstreamer objects for further processing. The bbstreamer may do
- * something simple, like write the archive to a file, perhaps after
- * compressing it, but it can also do more complicated things, like
- * annotating the byte stream to indicate which parts of the data
- * correspond to tar headers or trailing padding, vs. which parts are
- * payload data. A subsequent bbstreamer may use this information to
- * make further decisions about how to process the data; for example,
- * it might choose to modify the archive contents.
- *
- * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
- *
- * IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer.h
- *-------------------------------------------------------------------------
- */
-
-#ifndef BBSTREAMER_H
-#define BBSTREAMER_H
-
-#include "common/compression.h"
-#include "lib/stringinfo.h"
-#include "pqexpbuffer.h"
-
-struct bbstreamer;
-struct bbstreamer_ops;
-typedef struct bbstreamer bbstreamer;
-typedef struct bbstreamer_ops bbstreamer_ops;
-
-/*
- * Each chunk of archive data passed to a bbstreamer is classified into one
- * of these categories. When data is first received from the remote server,
- * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
- * be of whatever size the remote server chose to send.
- *
- * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
- * chunks should be labelled as one of the other types listed here. In
- * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
- * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
- * that means a zero-length call. There can be any number of
- * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
- * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
- * last BBSTREAMER_MEMBER_TRAILER chunk.
- *
- * In theory, we could need other classifications here, such as a way of
- * indicating an archive header, but the "tar" format doesn't need anything
- * else, so for the time being there's no point.
- */
-typedef enum
-{
- BBSTREAMER_UNKNOWN,
- BBSTREAMER_MEMBER_HEADER,
- BBSTREAMER_MEMBER_CONTENTS,
- BBSTREAMER_MEMBER_TRAILER,
- BBSTREAMER_ARCHIVE_TRAILER,
-} bbstreamer_archive_context;
-
-/*
- * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
- * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
- * pass a pointer to an instance of this struct. The details are expected
- * to be present in the archive header and used to fill the struct, after
- * which all subsequent calls for the same archive member are expected to
- * pass the same details.
- */
-typedef struct
-{
- char pathname[MAXPGPATH];
- pgoff_t size;
- mode_t mode;
- uid_t uid;
- gid_t gid;
- bool is_directory;
- bool is_link;
- char linktarget[MAXPGPATH];
-} bbstreamer_member;
-
-/*
- * Generally, each type of bbstreamer will define its own struct, but the
- * first element should be 'bbstreamer base'. A bbstreamer that does not
- * require any additional private data could use this structure directly.
- *
- * bbs_ops is a pointer to the bbstreamer_ops object which contains the
- * function pointers appropriate to this type of bbstreamer.
- *
- * bbs_next is a pointer to the successor bbstreamer, for those types of
- * bbstreamer which forward data to a successor. It need not be used and
- * should be set to NULL when not relevant.
- *
- * bbs_buffer is a buffer for accumulating data for temporary storage. Each
- * type of bbstreamer makes its own decisions about whether and how to use
- * this buffer.
- */
-struct bbstreamer
-{
- const bbstreamer_ops *bbs_ops;
- bbstreamer *bbs_next;
- StringInfoData bbs_buffer;
-};
-
-/*
- * There are three callbacks for a bbstreamer. The 'content' callback is
- * called repeatedly, as described in the bbstreamer_archive_context comments.
- * Then, the 'finalize' callback is called once at the end, to give the
- * bbstreamer a chance to perform cleanup such as closing files. Finally,
- * because this code is running in a frontend environment where, as of this
- * writing, there are no memory contexts, the 'free' callback is called to
- * release memory. These callbacks should always be invoked using the static
- * inline functions defined below.
- */
-struct bbstreamer_ops
-{
- void (*content) (bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
- void (*finalize) (bbstreamer *streamer);
- void (*free) (bbstreamer *streamer);
-};
-
-/* Send some content to a bbstreamer. */
-static inline void
-bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->content(streamer, member, data, len, context);
-}
-
-/* Finalize a bbstreamer. */
-static inline void
-bbstreamer_finalize(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->finalize(streamer);
-}
-
-/* Free a bbstreamer. */
-static inline void
-bbstreamer_free(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->free(streamer);
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outside callers. It adds the amount of data specified by
- * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
- * accordingly.
- */
-static inline void
-bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
- int nbytes)
-{
- Assert(nbytes <= *len);
-
- appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
- *len -= nbytes;
- *data += nbytes;
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outsider callers. It attempts to add enough data to the
- * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
- * and '*data' accordingly. It returns true if the target length has been
- * reached and false otherwise.
- */
-static inline bool
-bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
- int target_bytes)
-{
- int buflen = streamer->bbs_buffer.len;
-
- if (buflen >= target_bytes)
- {
- /* Target length already reached; nothing to do. */
- return true;
- }
-
- if (buflen + *len < target_bytes)
- {
- /* Not enough data to reach target length; buffer all of it. */
- bbstreamer_buffer_bytes(streamer, data, len, *len);
- return false;
- }
-
- /* Buffer just enough to reach the target length. */
- bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
- return true;
-}
-
-/*
- * Functions for creating bbstreamer objects of various types. See the header
- * comments for each of these functions for details.
- */
-extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
-extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *));
-
-extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
-
-extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
- char *data, int len);
-
-#endif
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index c00acd5e118..a68dbd7837d 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,12 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'bbstreamer_file.c',
- 'bbstreamer_gzip.c',
- 'bbstreamer_inject.c',
- 'bbstreamer_lz4.c',
- 'bbstreamer_tar.c',
- 'bbstreamer_zstd.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_inject.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/bin/pg_basebackup/nls.mk b/src/bin/pg_basebackup/nls.mk
index 384dbb021e9..950b9797b1e 100644
--- a/src/bin/pg_basebackup/nls.mk
+++ b/src/bin/pg_basebackup/nls.mk
@@ -1,12 +1,12 @@
# src/bin/pg_basebackup/nls.mk
CATALOG_NAME = pg_basebackup
GETTEXT_FILES = $(FRONTEND_COMMON_GETTEXT_FILES) \
- bbstreamer_file.c \
- bbstreamer_gzip.c \
- bbstreamer_inject.c \
- bbstreamer_lz4.c \
- bbstreamer_tar.c \
- bbstreamer_zstd.c \
+ astreamer_file.c \
+ astreamer_gzip.c \
+ astreamer_inject.c \
+ astreamer_lz4.c \
+ astreamer_tar.c \
+ astreamer_zstd.c \
pg_basebackup.c \
pg_createsubscriber.c \
pg_receivewal.c \
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8f3dd04fd22..4179b064cbc 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,8 +26,8 @@
#endif
#include "access/xlog_internal.h"
+#include "astreamer.h"
#include "backup/basebackup.h"
-#include "bbstreamer.h"
#include "common/compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
@@ -57,8 +57,8 @@ typedef struct ArchiveStreamState
{
int tablespacenum;
pg_compress_specification *compress;
- bbstreamer *streamer;
- bbstreamer *manifest_inject_streamer;
+ astreamer *streamer;
+ astreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
char manifest_filename[MAXPGPATH];
FILE *manifest_file;
@@ -67,7 +67,7 @@ typedef struct ArchiveStreamState
typedef struct WriteTarState
{
int tablespacenum;
- bbstreamer *streamer;
+ astreamer *streamer;
} WriteTarState;
typedef struct WriteManifestState
@@ -199,8 +199,8 @@ static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *fo
static void progress_update_filename(const char *filename);
static void progress_report(int tablespacenum, bool force, bool finished);
-static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+static astreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress);
@@ -1053,19 +1053,19 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
* the options selected by the user. We may just write the results directly
* to a file, or we might compress first, or we might extract the tar file
* and write each member separately. This function doesn't do any of that
- * directly, but it works out what kind of bbstreamer we need to create so
+ * directly, but it works out what kind of astreamer we need to create so
* that the right stuff happens when, down the road, we actually receive
* the data.
*/
-static bbstreamer *
+static astreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress)
{
- bbstreamer *streamer = NULL;
- bbstreamer *manifest_inject_streamer = NULL;
+ astreamer *streamer = NULL;
+ astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
is_tar_gz,
@@ -1160,7 +1160,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
directory = psprintf("%s/%s", basedir, spclocation);
else
directory = get_tablespace_mapping(spclocation);
- streamer = bbstreamer_extractor_new(directory,
+ streamer = astreamer_extractor_new(directory,
get_tablespace_mapping,
progress_update_filename);
}
@@ -1188,27 +1188,27 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
if (compress->algorithm == PG_COMPRESSION_NONE)
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
else if (compress->algorithm == PG_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
- streamer = bbstreamer_gzip_writer_new(archive_filename,
+ streamer = astreamer_gzip_writer_new(archive_filename,
archive_file, compress);
}
else if (compress->algorithm == PG_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer, compress);
+ streamer = astreamer_lz4_compressor_new(streamer, compress);
}
else if (compress->algorithm == PG_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer, compress);
+ streamer = astreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1222,7 +1222,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* into it.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_archiver_new(streamer);
+ streamer = astreamer_tar_archiver_new(streamer);
progress_update_filename(archive_filename);
}
@@ -1241,7 +1241,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (spclocation == NULL && writerecoveryconf)
{
Assert(must_parse_archive);
- streamer = bbstreamer_recovery_injector_new(streamer,
+ streamer = astreamer_recovery_injector_new(streamer,
is_recovery_guc_supported,
recoveryconfcontents);
}
@@ -1253,9 +1253,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* we're talking to such a server we'll need to add the terminator here.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_parser_new(streamer);
+ streamer = astreamer_tar_parser_new(streamer);
else if (expect_unterminated_tarfile)
- streamer = bbstreamer_tar_terminator_new(streamer);
+ streamer = astreamer_tar_terminator_new(streamer);
/*
* If the user has requested a server compressed archive along with
@@ -1264,11 +1264,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (format == 'p')
{
if (is_tar_gz)
- streamer = bbstreamer_gzip_decompressor_new(streamer);
+ streamer = astreamer_gzip_decompressor_new(streamer);
else if (is_tar_lz4)
- streamer = bbstreamer_lz4_decompressor_new(streamer);
+ streamer = astreamer_lz4_decompressor_new(streamer);
else if (is_tar_zstd)
- streamer = bbstreamer_zstd_decompressor_new(streamer);
+ streamer = astreamer_zstd_decompressor_new(streamer);
}
/* Return the results. */
@@ -1307,7 +1307,7 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
if (state.manifest_inject_streamer != NULL &&
state.manifest_buffer != NULL)
{
- bbstreamer_inject_file(state.manifest_inject_streamer,
+ astreamer_inject_file(state.manifest_inject_streamer,
"backup_manifest",
state.manifest_buffer->data,
state.manifest_buffer->len);
@@ -1318,8 +1318,8 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
/* If there's still an archive in progress, end processing. */
if (state.streamer != NULL)
{
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
state.streamer = NULL;
}
}
@@ -1383,8 +1383,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
/* End processing of any prior archive. */
if (state->streamer != NULL)
{
- bbstreamer_finalize(state->streamer);
- bbstreamer_free(state->streamer);
+ astreamer_finalize(state->streamer);
+ astreamer_free(state->streamer);
state->streamer = NULL;
}
@@ -1437,8 +1437,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
else if (state->streamer != NULL)
{
/* Archive data. */
- bbstreamer_content(state->streamer, NULL, copybuf + 1,
- r - 1, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, ASTREAMER_UNKNOWN);
}
else
pg_fatal("unexpected payload data");
@@ -1600,7 +1600,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum, pg_compress_specification *compress)
{
WriteTarState state;
- bbstreamer *manifest_inject_streamer;
+ astreamer *manifest_inject_streamer;
bool is_recovery_guc_supported;
bool expect_unterminated_tarfile;
@@ -1636,7 +1636,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
pg_fatal("out of memory");
/* Inject it into the output tarfile. */
- bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ astreamer_inject_file(manifest_inject_streamer, "backup_manifest",
buf.data, buf.len);
/* Free memory. */
@@ -1644,8 +1644,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
}
/* Cleanup. */
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
progress_report(tablespacenum, true, false);
@@ -1663,7 +1663,7 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf, r, ASTREAMER_UNKNOWN);
totaldone += r;
progress_report(state->tablespacenum, false, false);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8de9978ad8d..ba9e0200b3f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3311,19 +3311,19 @@ bbsink_shell
bbsink_state
bbsink_throttle
bbsink_zstd
-bbstreamer
-bbstreamer_archive_context
-bbstreamer_extractor
-bbstreamer_gzip_decompressor
-bbstreamer_gzip_writer
-bbstreamer_lz4_frame
-bbstreamer_member
-bbstreamer_ops
-bbstreamer_plain_writer
-bbstreamer_recovery_injector
-bbstreamer_tar_archiver
-bbstreamer_tar_parser
-bbstreamer_zstd_frame
+astreamer
+astreamer_archive_context
+astreamer_extractor
+astreamer_gzip_decompressor
+astreamer_gzip_writer
+astreamer_lz4_frame
+astreamer_member
+astreamer_ops
+astreamer_plain_writer
+astreamer_recovery_injector
+astreamer_tar_archiver
+astreamer_tar_parser
+astreamer_zstd_frame
bgworker_main_type
bh_node_type
binaryheap
--
2.18.0
On Thu, Aug 1, 2024 at 6:48 PM Amul Sul <sulamul@gmail.com> wrote:
On Thu, Aug 1, 2024 at 1:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 31, 2024 at 9:28 AM Amul Sul <sulamul@gmail.com> wrote:
Fixed -- I did that because it was part of a separate group in pg_basebackup.
[...]
Out of time for today, will look again soon. I think the first few of
these are probably pretty much ready for commit already, and with a
little more adjustment they'll probably be ready up through about
0006.Sure, thank you.
The v4 version isn't handling the progress report correctly because
the total_size calculation was done in verify_manifest_entry(), and
done_size was updated during the checksum verification. This worked
well for the plain backup but failed for the tar backup, where
checksum verification occurs right after verify_manifest_entry(),
leading to incorrect total_size in the progress report output.
Additionally, the patch missed the final progress_report(true) call
for persistent output, which is called from verify_backup_checksums()
for the plain backup but never for tar backup verification. To address
this, I moved the first and last progress_report() calls to the main
function. Although this is a small change, I placed it in a separate
patch, 0009, in the attached version.
In addition to these changes, the attached version includes
improvements in code comments, function names, and their arrangements
in astreamer_verify.c.
Please consider the attached version for the review.
Regards,
Amul
Attachments:
v5-0010-pg_verifybackup-Add-backup-format-and-compression.patchapplication/x-patch; name=v5-0010-pg_verifybackup-Add-backup-format-and-compression.patchDownload
From 4b3af1ab14a61419e2ef3258b7fadc7ef6230da2 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v5 10/12] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 146 +++++++++++++++++++++-
1 file changed, 144 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 801e13886c2..2738bb30d83 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
#include <sys/stat.h>
#include <time.h>
+#include "common/compression.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -56,6 +57,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
+static pg_compress_algorithm find_backup_compression(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
@@ -74,6 +77,9 @@ static void usage(void);
static const char *progname;
+char format = '\0'; /* p(lain)/t(ar) */
+pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
+
/*
* Main entry point.
*/
@@ -84,11 +90,13 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
{"wal-directory", required_argument, NULL, 'w'},
+ {"compress", required_argument, NULL, 'Z'},
{NULL, 0, NULL, 0}
};
@@ -99,6 +107,7 @@ main(int argc, char **argv)
bool quiet = false;
char *wal_directory = NULL;
char *pg_waldump_path = NULL;
+ bool tar_compression_specified = false;
pg_logging_init(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verifybackup"));
@@ -141,7 +150,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:Z:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -160,6 +169,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -176,6 +194,12 @@ main(int argc, char **argv)
wal_directory = pstrdup(optarg);
canonicalize_path(wal_directory);
break;
+ case 'Z':
+ if (!parse_compress_algorithm(optarg, &compress_algorithm))
+ pg_fatal("unrecognized compression algorithm: \"%s\"",
+ optarg);
+ tar_compression_specified = true;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -207,11 +231,41 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Complain if compression method specified but the format isn't tar. */
+ if (format != 't' && tar_compression_specified)
+ {
+ pg_log_error("only tar mode backups can be compressed");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Determine the backup format if it hasn't been specified. */
+ if (format == '\0')
+ format = find_backup_format(&context);
+
+ /*
+ * Determine the tar backup compression method if it hasn't been
+ * specified.
+ */
+ if (format == 't' && !tar_compression_specified)
+ compress_algorithm = find_backup_compression(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (format != 'p')
+ {
+ pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+ pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -279,7 +333,15 @@ main(int argc, char **argv)
*/
if (!context.skip_checksums)
{
- verify_backup_checksums(&context);
+ /*
+ * We were only checking the plain backup here. For the tar backup,
+ * file checksums verification (if requested) will be done immediately
+ * when the file is accessed, as we don't have random access to the
+ * files like we do with plain backups.
+ */
+ if (format == 'p')
+ verify_backup_checksums(&context);
+
progress_report(&context, true);
}
@@ -972,6 +1034,84 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ result = (stat(path, &sb) == 0) ? 'p' : 't';
+ pfree(path);
+
+ return result;
+}
+
+/*
+ * To determine the compression format, we will search for the main data
+ * directory archive and its extension, which starts with base.tar, as
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ */
+static pg_compress_algorithm
+find_backup_compression(verifier_context *context)
+{
+ char *path;
+ struct stat sb;
+ bool found;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * Is this a tar archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_NONE;
+
+ /*
+ * Is this a .tar.gz archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.gz");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_GZIP;
+
+ /*
+ * Is this a .tar.lz4 archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.lz4");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_LZ4;
+
+ /*
+ * Is this a .tar.zst archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.zst");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_ZSTD;
+
+ return PG_COMPRESSION_NONE; /* placate compiler */
+}
+
/*
* Print a progress report based on the variables in verifier_context.
*
@@ -1054,11 +1194,13 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -Z, --compress=METHOD compress method (gzip, lz4, zstd, none) \n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
--
2.18.0
v5-0011-pg_verifybackup-Read-tar-files-and-verify-its-con.patchapplication/x-patch; name=v5-0011-pg_verifybackup-Read-tar-files-and-verify-its-con.patchDownload
From f34279f9f934fdcad6d3b3dbfb0721bd7959bfdf Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v5 11/12] pg_verifybackup: Read tar files and verify its
contents
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 367 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 216 +++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 9 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 594 insertions(+), 9 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..be40922c042
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,367 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/astreamer.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archiveName;
+ Oid tblspcOid;
+
+ /* Hold information for a member file verification */
+ manifest_file *mfile;
+ int64 receivedBytes;
+ bool verifyChecksums;
+ bool verifyControlData;
+ pg_checksum_context *checksum_ctx;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void verify_member_header(astreamer *streamer, astreamer_member *member);
+static void verify_member_contents(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void verify_content_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *buffer, int buffer_len);
+static void verify_controldata(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void reset_member_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archiveName = archive_name;
+ streamer->tblspcOid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ return &streamer->base;
+}
+
+/*
+ * It verifies each TAR member entry against the manifest data and performs
+ * checksum verification if enabled. Additionally, it validates the backup's
+ * system identifier against the backup_manifest.
+ */
+static void
+astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
+{
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+
+ /*
+ * Perform the initial check and setup for the verification.
+ */
+ verify_member_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Peform the required contants verification.
+ */
+ verify_member_contents(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * Reset the temporary information stored for a verification.
+ */
+ reset_member_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verify the entry if it is a file in the backup manifest. If the archive being
+ * processed is a tablespace, prepare the required file path for subsequent
+ * operations. Finally, check if it needs to perform checksum verification and
+ * control data verification during file content processing.
+ */
+static void
+verify_member_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+
+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;
+
+ /*
+ * The backup_manifest stores a relative path to the base directory for
+ * files belong tablespace, whereas <tablespaceoid>.tar doesn't. Prepare
+ * the required path, otherwise, the manfiest entry verification will
+ * fail.
+ */
+ if (OidIsValid(mystreamer->tblspcOid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspcOid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and manifest system identifier verification.
+ *
+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, having a
+ * single flag would be more efficient.
+ */
+ mystreamer->verifyChecksums =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verifyControlData =
+ should_verify_control_data(mystreamer->context->manifest, m);
+}
+
+/*
+ * Process the member content according to the flags set by the member header
+ * processing routine for checksum and control data verification.
+ */
+static void
+verify_member_contents(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ /* Verify the checksums */
+ if (mystreamer->verifyChecksums)
+ verify_content_checksum(streamer, member, data, len);
+
+ /* Verify pg_control information */
+ if (mystreamer->verifyControlData)
+ verify_controldata(streamer, member, data, len);
+}
+
+/*
+ * Similar to verify_file_checksum() but this function computes the checksum
+ * incrementally for the received file content. Unlike a normal backup
+ * directory, TAR format files do not allow random access, so checksum
+ * verification occurs progressively. Additionally, the function calls the
+ * routine for control data verification if the flags indicate that it is
+ * required.
+ *
+ * On the first visit, the function initializes checksum_ctx, which will be used
+ * for incremental checksum calculation. Once the complete file content is
+ * received (tracked using the receivedBytes), the routine that performs the
+ * final checksum verification is called
+ */
+static void
+verify_content_checksum(astreamer *streamer, astreamer_member *member,
+ const char *buffer, int buffer_len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ verifier_context *context = mystreamer->context;
+ manifest_file *m = mystreamer->mfile;
+ const char *relpath = m->pathname;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
+ /*
+ * Mark it false to avoid unexpected re-entrance for the same file content
+ * (e.g. returned in error should not be revisited).
+ */
+ Assert(mystreamer->verifyChecksums);
+ mystreamer->verifyChecksums = false;
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, relpath) == 0);
+
+ /* If we were first time for this file */
+ if (!checksum_ctx)
+ {
+ checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+ mystreamer->checksum_ctx = checksum_ctx;
+
+ if (pg_checksum_init(checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archiveName, relpath);
+ return;
+ }
+ }
+
+ /* Update the total count of computed checksum bytes. */
+ mystreamer->receivedBytes += buffer_len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) buffer, buffer_len) < 0)
+ {
+ report_backup_error(context, "could not update checksum of file \"%s\"",
+ relpath);
+ return;
+ }
+
+ /* Report progress */
+ context->done_size += buffer_len;
+ progress_report(context, false);
+
+ /* Yet to receive the full content of the file. */
+ if (mystreamer->receivedBytes < m->size)
+ {
+ mystreamer->verifyChecksums = true;
+ return;
+ }
+
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, checksum_ctx, checksumbuf);
+}
+
+/*
+ * Prepare the control data from the received file contents, which are supposed
+ * to be from the pg_control file, including CRC calculation. Then, call the
+ * routines that perform the final verification of the control file information.
+ */
+static void
+verify_controldata(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(manifest->version != 1);
+
+ /* Mark it as false to avoid unexpected re-entrance */
+ Assert(mystreamer->verifyControlData);
+ mystreamer->verifyControlData = false;
+
+ /* Should have whole control file data. */
+ if (!astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData)))
+ {
+ mystreamer->verifyControlData = true;
+ return;
+ }
+
+ pg_log_debug("%s: reading \"%s\"", mystreamer->archiveName,
+ member->pathname);
+
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: could not read control file: read %d of %zu",
+ mystreamer->archiveName, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data,
+ sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ (char *) (&control_file),
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(&control_file, member->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+reset_member_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->receivedBytes = 0;
+ mystreamer->verifyChecksums = false;
+ mystreamer->verifyControlData = false;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+ mystreamer->checksum_ctx = NULL;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 2738bb30d83..d8ab9a3a3dc 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,12 @@
#include "common/compression.h"
#include "common/parse_manifest.h"
+#include "common/relpath.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
#include "pg_verifybackup.h"
+#include "pgtar.h"
#include "pgtime.h"
/*
@@ -39,6 +42,11 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
+
+static void (*verify_backup_file_cb) (verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -63,6 +71,15 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
+static void verify_plain_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_content(verifier_context *context,
+ char *relpath, char *fullpath,
+ astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -71,6 +88,9 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
static void compute_total_size(verifier_context *context);
static void usage(void);
@@ -146,6 +166,10 @@ main(int argc, char **argv)
*/
simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -250,6 +274,15 @@ main(int argc, char **argv)
if (format == 't' && !tar_compression_specified)
compress_algorithm = find_backup_compression(&context);
+ /*
+ * Setup the required callback function to verify plain or tar backup
+ * files.
+ */
+ if (format == 'p')
+ verify_backup_file_cb = verify_plain_file_cb;
+ else
+ verify_backup_file_cb = verify_tar_file_cb;
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
@@ -645,7 +678,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
}
/*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
*
* The arguments to this function have the same meaning as the arguments to
* verify_backup_directory.
@@ -654,7 +688,6 @@ static void
verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
{
struct stat sb;
- manifest_file *m;
if (stat(fullpath, &sb) != 0)
{
@@ -687,8 +720,25 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Do the further verifications */
+ verify_backup_file_cb(context, relpath, fullpath, sb.st_size);
+}
+
+/*
+ * Verify one plan file or a symlink.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_plain_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ manifest_file *m;
+
/* Check the backup manifest entry for this file. */
- m = verify_manifest_entry(context, relpath, sb.st_size);
+ m = verify_manifest_entry(context, relpath, filesize);
/* Validate the manifest system identifier */
if (should_verify_control_data(context->manifest, m))
@@ -706,6 +756,124 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
}
}
+/*
+ * Verify one tar file.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_tar_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ astreamer *streamer;
+ Oid tblspc_oid = InvalidOid;
+ int file_name_len;
+ int file_extn_len = 0; /* placate compiler */
+ char *file_extn = "";
+
+ /* Should be tar backup */
+ Assert(format == 't');
+
+ /* Find the tar file extension. */
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ {
+ file_extn = ".tar";
+ file_extn_len = 4;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_GZIP)
+ {
+ file_extn = ".tar.gz";
+ file_extn_len = 7;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ {
+ file_extn = ".tar.lz4";
+ file_extn_len = 8;
+ }
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ {
+ file_extn = ".tar.zst";
+ file_extn_len = 8;
+ }
+
+ /*
+ * Ensure that we have the correct file type corresponding to the backup
+ * format.
+ */
+ file_name_len = strlen(relpath);
+ if (file_name_len < file_extn_len ||
+ strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0)
+ {
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting tar file",
+ relpath);
+ else
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting \"%s\" compressed tar file",
+ relpath,
+ get_compress_algorithm_name(compress_algorithm));
+ return;
+ }
+
+ /*
+ * For the tablespace, pg_basebackup writes the data out to
+ * <tablespaceoid>.tar. If a file matches that format, then extract the
+ * tablespaceoid, which we need to prepare the paths of the files
+ * belonging to that tablespace relative to the base directory.
+ */
+ if (strspn(relpath, "0123456789") == (file_name_len - file_extn_len))
+ tblspc_oid = strtoi64(relpath, NULL, 10);
+
+ streamer = create_archive_verifier(context, relpath, tblspc_oid);
+ verify_tar_content(context, relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+}
+
+/*
+ * Reads a given tar file in predefined chunks and pass to astreamer. Which
+ * initiates routines for decompression (if necessary) then verification
+ * of each member within the tar archive.
+ */
+static void
+verify_tar_content(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1058,10 +1226,10 @@ find_backup_format(verifier_context *context)
}
/*
- * To determine the compression format, we will search for the main data
- * directory archive and its extension, which starts with base.tar, as
* pg_basebackup writes the main data directory to an archive file named
- * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ * base.tar, followed by a compression type extension such as .gz, .lz4, or
+ * .zst. To determine the compression format, we need to search for this main
+ * data directory archive file.
*/
static pg_compress_algorithm
find_backup_compression(verifier_context *context)
@@ -1112,6 +1280,42 @@ find_backup_compression(verifier_context *context)
return PG_COMPRESSION_NONE; /* placate compiler */
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algorithm == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the variables in verifier_context.
*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index c88f71ff14b..f0a7c8918fb 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -137,4 +137,13 @@ extern bool should_ignore_relpath(verifier_context *context,
extern void progress_report(verifier_context *context, bool finished);
+/* Forward declarations to avoid fe_utils/astreamer.h include. */
+struct astreamer;
+typedef struct astreamer astreamer;
+
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ba9e0200b3f..8c708b02cc2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3323,6 +3323,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
--
2.18.0
v5-0012-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v5-0012-pg_verifybackup-Tests-and-document.patchDownload
From beff1feb10a75336369d3157650117d9597899b7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 17:04:56 +0530
Subject: [PATCH v5 12/12] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 54 +++++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 18 ++++-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 96 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..c743bd89a92 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. The main data directory's contents
+ will be written to a file named <filename>base.tar</filename>,
+ and each other tablespace will be written to a separate tar file
+ named after that tablespace's OID.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
@@ -227,6 +265,18 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-Z <replaceable class="parameter">method</replaceable></option></term>
+ <term><option>--compress=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ The tar backup compression method can be <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> if no compression.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..d47ce1f04fc 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,13 +17,25 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
+command_fails_like(
+ [ 'pg_verifybackup', '-Zgzip', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Zlz4', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required tar format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Znon_exist', $tempdir ],
+ qr/unrecognized compression algorithm/,
+ 'compression method should be valid');
# create fake manifest file
open(my $fh, '>', "$tempdir/backup_manifest") || die "open: $!";
@@ -31,7 +43,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v5-0009-Refactor-move-first-and-last-progress_report-call.patchapplication/x-patch; name=v5-0009-Refactor-move-first-and-last-progress_report-call.patchDownload
From 498ee467c0123d816d14717c20746e55b26d974f Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Fri, 2 Aug 2024 16:37:38 +0530
Subject: [PATCH v5 09/12] Refactor: move first and last progress_report call
to Main.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5f055a23a63..801e13886c2 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -253,7 +253,10 @@ main(int argc, char **argv)
* read, which occurs only when checksum verification is enabled.
*/
if (!context.skip_checksums)
+ {
compute_total_size(&context);
+ progress_report(&context, false);
+ }
/*
* Now scan the files in the backup directory. At this stage, we verify
@@ -275,7 +278,10 @@ main(int argc, char **argv)
* told to skip it.
*/
if (!context.skip_checksums)
+ {
verify_backup_checksums(&context);
+ progress_report(&context, true);
+ }
/*
* Try to parse the required ranges of WAL records, unless we were told
@@ -736,8 +742,6 @@ verify_backup_checksums(verifier_context *context)
manifest_file *m;
uint8 *buffer;
- progress_report(context, false);
-
buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
manifest_files_start_iterate(manifest->files, &it);
@@ -761,8 +765,6 @@ verify_backup_checksums(verifier_context *context)
}
pfree(buffer);
-
- progress_report(context, true);
}
/*
--
2.18.0
v5-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v5-0008-Refactor-split-verify_control_file.patchDownload
From 07e402b82c7f9af43703aa5759da5ee7c8fcc3df Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 15:47:26 +0530
Subject: [PATCH v5 08/12] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Note that should_verify_checksum() has been slightly modified to
include a NULL check for its argument, maintaining the same code
structure as should_verify_control_data().
---
src/bin/pg_verifybackup/pg_verifybackup.c | 44 +++++++++++------------
src/bin/pg_verifybackup/pg_verifybackup.h | 18 +++++++++-
2 files changed, 37 insertions(+), 25 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3eddaa2468e..5f055a23a63 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -625,14 +622,20 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the manifest system identifier */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
}
/*
@@ -676,18 +679,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -703,9 +702,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 1bc5f7a6b4a..c88f71ff14b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
#include "common/controldata_utils.h"
#include "common/hashfn_unstable.h"
+#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
@@ -44,7 +45,19 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1. This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
/*
* Define a hash table which we can use to store information about the files
@@ -110,6 +123,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
uint8 *checksumbuf);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v5-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v5-0007-Refactor-split-verify_file_checksum-function.patchDownload
From dc90660cbc524ff04b88aadb027be22b81c94ff7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 16:45:55 +0530
Subject: [PATCH v5 07/12] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 18 ++++++++++++++++--
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index ab6bda8c9dc..3eddaa2468e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -782,7 +782,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int rc;
size_t bytes_read = 0;
uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -848,8 +847,23 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
return;
}
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, checksumbuf);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, uint8 *checksumbuf)
+{
+ int checksumlen;
+ const char *relpath = m->pathname;
+
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 98c75916255..1bc5f7a6b4a 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -107,6 +107,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ uint8 *checksumbuf);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v5-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchapplication/x-patch; name=v5-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchDownload
From 8befb1d9759e3a23a2a4b951d7a6ac800dc8e72e Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:10:34 +0530
Subject: [PATCH v5 05/12] Refactor: move some part of pg_verifybackup.c to
pg_verifybackup.h
---
src/bin/pg_verifybackup/pg_verifybackup.c | 102 +------------------
src/bin/pg_verifybackup/pg_verifybackup.h | 118 ++++++++++++++++++++++
2 files changed, 123 insertions(+), 97 deletions(-)
create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 71585ffc50e..4e42757c346 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "pg_verifybackup.h"
#include "pgtime.h"
/*
@@ -40,89 +39,6 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE (128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
- uint32 status; /* hash status */
- const char *pathname;
- size_t size;
- pg_checksum_type checksum_type;
- int checksum_length;
- uint8 *checksum_payload;
- bool matched;
- bool bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX manifest_files
-#define SH_ELEMENT_TYPE manifest_file
-#define SH_KEY_TYPE const char *
-#define SH_KEY pathname
-#define SH_HASH_KEY(tb, key) hash_string(key)
-#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
-#define SH_SCOPE static inline
-#define SH_RAW_ALLOCATOR pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
- TimeLineID tli;
- XLogRecPtr start_lsn;
- XLogRecPtr end_lsn;
- struct manifest_wal_range *next;
- struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
- int version;
- uint64 system_identifier;
- manifest_files_hash *files;
- manifest_wal_range *first_wal_range;
- manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
- manifest_data *manifest;
- char *backup_directory;
- SimpleStringList ignore_list;
- bool skip_checksums;
- bool exit_on_error;
- bool saw_any_error;
-
- /* Progress indicators */
- bool show_progress;
- uint64 total_size;
- uint64 done_size;
-} verifier_context;
-
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -156,14 +72,6 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
-static void report_backup_error(verifier_context *context,
- const char *pg_restrict fmt,...)
- pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
- pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
-static void progress_report(verifier_context *context, bool finished);
static void usage(void);
static const char *progname;
@@ -978,7 +886,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
* Update the context to indicate that we saw an error, and exit if the
* context says we should.
*/
-static void
+void
report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
{
va_list ap;
@@ -995,7 +903,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
/*
* Report a fatal error and exit
*/
-static void
+void
report_fatal_error(const char *pg_restrict fmt,...)
{
va_list ap;
@@ -1014,7 +922,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
* Note that by "prefix" we mean a parent directory; for this purpose,
* "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
*/
-static bool
+bool
should_ignore_relpath(verifier_context *context, const char *relpath)
{
SimpleStringListCell *cell;
@@ -1043,7 +951,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
* If finished is set to true, this is the last progress report. The cursor
* is moved to the next line.
*/
-static void
+void
progress_report(verifier_context *context, bool finished)
{
static pg_time_t last_progress_report = 0;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..90900048547
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,118 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ * Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+extern bool show_progress;
+extern bool skip_checksums;
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+ uint32 status; /* hash status */
+ const char *pathname;
+ size_t size;
+ pg_checksum_type checksum_type;
+ int checksum_length;
+ uint8 *checksum_payload;
+ bool matched;
+ bool bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+ (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX manifest_files
+#define SH_ELEMENT_TYPE manifest_file
+#define SH_KEY_TYPE const char *
+#define SH_KEY pathname
+#define SH_HASH_KEY(tb, key) hash_string(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ struct manifest_wal_range *next;
+ struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+ int version;
+ uint64 system_identifier;
+ manifest_files_hash *files;
+ manifest_wal_range *first_wal_range;
+ manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+ manifest_data *manifest;
+ char *backup_directory;
+ SimpleStringList ignore_list;
+ bool skip_checksums;
+ bool exit_on_error;
+ bool saw_any_error;
+
+ /* Progress indicators */
+ bool show_progress;
+ uint64 total_size;
+ uint64 done_size;
+} verifier_context;
+
+extern void report_backup_error(verifier_context *context,
+ const char *pg_restrict fmt,...)
+ pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+ pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context,
+ const char *relpath);
+
+extern void progress_report(verifier_context *context, bool finished);
+
+#endif /* PG_VERIFYBACKUP_H */
--
2.18.0
v5-0004-Refactor-move-few-global-variable-to-verifier_con.patchapplication/x-patch; name=v5-0004-Refactor-move-few-global-variable-to-verifier_con.patchDownload
From 7720bebed0b8456fec741f9083d3fd71278dea42 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 11:43:52 +0530
Subject: [PATCH v5 04/12] Refactor: move few global variable to
verifier_context struct
Global variables are:
1. show_progress
2. skip_checksums
3. total_size
4. done_size
---
src/bin/pg_verifybackup/pg_verifybackup.c | 50 +++++++++++------------
1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..71585ffc50e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -113,8 +113,14 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
+
+ /* Progress indicators */
+ bool show_progress;
+ uint64 total_size;
+ uint64 done_size;
} verifier_context;
static manifest_data *parse_manifest_file(char *manifest_path);
@@ -157,19 +163,11 @@ static void report_fatal_error(const char *pg_restrict fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();
static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-static void progress_report(bool finished);
+static void progress_report(verifier_context *context, bool finished);
static void usage(void);
static const char *progname;
-/* options */
-static bool show_progress = false;
-static bool skip_checksums = false;
-
-/* Progress indicators */
-static uint64 total_size = 0;
-static uint64 done_size = 0;
-
/*
* Main entry point.
*/
@@ -260,13 +258,13 @@ main(int argc, char **argv)
no_parse_wal = true;
break;
case 'P':
- show_progress = true;
+ context.show_progress = true;
break;
case 'q':
quiet = true;
break;
case 's':
- skip_checksums = true;
+ context.skip_checksums = true;
break;
case 'w':
wal_directory = pstrdup(optarg);
@@ -299,7 +297,7 @@ main(int argc, char **argv)
}
/* Complain if the specified arguments conflict */
- if (show_progress && quiet)
+ if (context.show_progress && quiet)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
@@ -363,7 +361,7 @@ main(int argc, char **argv)
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
*/
- if (!skip_checksums)
+ if (!context.skip_checksums)
verify_backup_checksums(&context);
/*
@@ -739,8 +737,9 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
verify_control_file(fullpath, context->manifest->system_identifier);
/* Update statistics for progress report, if necessary */
- if (show_progress && !skip_checksums && should_verify_checksum(m))
- total_size += m->size;
+ if (context->show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ context->total_size += m->size;
/*
* We don't verify checksums at this stage. We first finish verifying that
@@ -815,7 +814,7 @@ verify_backup_checksums(verifier_context *context)
manifest_file *m;
uint8 *buffer;
- progress_report(false);
+ progress_report(context, false);
buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
@@ -841,7 +840,7 @@ verify_backup_checksums(verifier_context *context)
pfree(buffer);
- progress_report(true);
+ progress_report(context, true);
}
/*
@@ -889,8 +888,8 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
}
/* Report progress */
- done_size += rc;
- progress_report(false);
+ context->done_size += rc;
+ progress_report(context, false);
}
if (rc < 0)
report_backup_error(context, "could not read file \"%s\": %m",
@@ -1036,7 +1035,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
}
/*
- * Print a progress report based on the global variables.
+ * Print a progress report based on the variables in verifier_context.
*
* Progress report is written at maximum once per second, unless the finished
* parameter is set to true.
@@ -1045,7 +1044,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
* is moved to the next line.
*/
static void
-progress_report(bool finished)
+progress_report(verifier_context *context, bool finished)
{
static pg_time_t last_progress_report = 0;
pg_time_t now;
@@ -1053,7 +1052,7 @@ progress_report(bool finished)
char totalsize_str[32];
char donesize_str[32];
- if (!show_progress)
+ if (!context->show_progress)
return;
now = time(NULL);
@@ -1061,12 +1060,13 @@ progress_report(bool finished)
return; /* Max once per second */
last_progress_report = now;
- percent_size = total_size ? (int) ((done_size * 100 / total_size)) : 0;
+ percent_size = context->total_size ?
+ (int) ((context->done_size * 100 / context->total_size)) : 0;
snprintf(totalsize_str, sizeof(totalsize_str), UINT64_FORMAT,
- total_size / 1024);
+ context->total_size / 1024);
snprintf(donesize_str, sizeof(donesize_str), UINT64_FORMAT,
- done_size / 1024);
+ context->done_size / 1024);
fprintf(stderr,
_("%*s/%s kB (%d%%) verified"),
--
2.18.0
v5-0006-Refactor-split-verify_backup_file-function.patchapplication/x-patch; name=v5-0006-Refactor-split-verify_backup_file-function.patchDownload
From 2206efa24af8882877ec216e0001dee8538e30c1 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:15:26 +0530
Subject: [PATCH v5 06/12] Refactor: split verify_backup_file() function.
Move the manifest entry verification code into a new function as
verify_manifest_entry(). And the total size computation code into
another new function, compute_total_size(), which is called from the
main.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 76 ++++++++++++++++++-----
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +
2 files changed, 62 insertions(+), 17 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 4e42757c346..ab6bda8c9dc 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -72,6 +72,7 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static void compute_total_size(verifier_context *context);
static void usage(void);
static const char *progname;
@@ -250,6 +251,13 @@ main(int argc, char **argv)
*/
context.manifest = parse_manifest_file(manifest_path);
+ /*
+ * For the progress report, compute the total size of the files to be
+ * read, which occurs only when checksum verification is enabled.
+ */
+ if (!context.skip_checksums)
+ compute_total_size(&context);
+
/*
* Now scan the files in the backup directory. At this stage, we verify
* that every file on disk is present in the manifest and that the sizes
@@ -614,6 +622,27 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -621,40 +650,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
- /* Update statistics for progress report, if necessary */
- if (context->show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- context->total_size += m->size;
-
/*
* We don't verify checksums at this stage. We first finish verifying that
* we have the expected set of files with the expected sizes, and only
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -817,7 +835,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
@@ -988,6 +1006,30 @@ progress_report(verifier_context *context, bool finished)
fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
}
+/*
+ * Compute the total size of backup files for progress reporting.
+ */
+static void
+compute_total_size(verifier_context *context)
+{
+ manifest_data *manifest = context->manifest;
+ manifest_files_iterator it;
+ manifest_file *m;
+ uint64 total_size = 0;
+
+ if (!context->show_progress)
+ return;
+
+ manifest_files_start_iterate(manifest->files, &it);
+ while ((m = manifest_files_iterate(manifest->files, &it)) != NULL)
+ {
+ if (!should_ignore_relpath(context, m->pathname))
+ total_size += m->size;
+ }
+
+ context->total_size = total_size;
+}
+
/*
* Print out usage information and exit.
*/
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 90900048547..98c75916255 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -105,6 +105,9 @@ typedef struct verifier_context
uint64 done_size;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
v5-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchapplication/x-patch; name=v5-0003-Refactor-move-astreamer-files-to-fe_utils-to-make.patchDownload
From ecc3a97ee176b00810a57c21bc84e612ddd94fe0 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 11:20:52 +0530
Subject: [PATCH v5 03/12] Refactor: move astreamer* files to fe_utils to make
common availability of it.
To make it accessible to other code, we need to move the ASTREAMER
code (previously known as BBSTREAMER) to a common location. The
appropriate place would be src/fe_utils, as it is a frontend
infrastructure intended for shared use.
---
meson.build | 2 +-
src/bin/pg_basebackup/Makefile | 7 +------
src/bin/pg_basebackup/astreamer_inject.h | 2 +-
src/bin/pg_basebackup/meson.build | 5 -----
src/fe_utils/Makefile | 5 +++++
src/{bin/pg_basebackup => fe_utils}/astreamer_file.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c | 2 +-
src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c | 2 +-
src/fe_utils/meson.build | 5 +++++
src/{bin/pg_basebackup => include/fe_utils}/astreamer.h | 0
12 files changed, 18 insertions(+), 18 deletions(-)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_file.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_gzip.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_lz4.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_tar.c (99%)
rename src/{bin/pg_basebackup => fe_utils}/astreamer_zstd.c (99%)
rename src/{bin/pg_basebackup => include/fe_utils}/astreamer.h (100%)
diff --git a/meson.build b/meson.build
index 7de0371226d..f7a5d2aea9a 100644
--- a/meson.build
+++ b/meson.build
@@ -3027,7 +3027,7 @@ frontend_common_code = declare_dependency(
compile_args: ['-DFRONTEND'],
include_directories: [postgres_inc],
sources: generated_headers,
- dependencies: [os_deps, zlib, zstd],
+ dependencies: [os_deps, zlib, zstd, lz4],
)
backend_common_code = declare_dependency(
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index a71af2d48a7..f1e73058b23 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,7 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- astreamer_file.o \
- astreamer_gzip.o \
- astreamer_inject.o \
- astreamer_lz4.o \
- astreamer_tar.o \
- astreamer_zstd.o
+ astreamer_inject.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
index 8504b3f5e0d..aeed533862b 100644
--- a/src/bin/pg_basebackup/astreamer_inject.h
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -12,7 +12,7 @@
#ifndef ASTREAMER_INJECT_H
#define ASTREAMER_INJECT_H
-#include "astreamer.h"
+#include "fe_utils/astreamer.h"
#include "pqexpbuffer.h"
extern astreamer *astreamer_recovery_injector_new(astreamer *next,
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index a68dbd7837d..9101fc18438 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,7 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'astreamer_file.c',
- 'astreamer_gzip.c',
'astreamer_inject.c',
- 'astreamer_lz4.c',
- 'astreamer_tar.c',
- 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 946c05258f0..2694be4b859 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -21,6 +21,11 @@ override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
OBJS = \
archive.o \
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o \
cancel.o \
conditional.o \
connect_utils.o \
diff --git a/src/bin/pg_basebackup/astreamer_file.c b/src/fe_utils/astreamer_file.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_file.c
rename to src/fe_utils/astreamer_file.c
index 2742385e103..13d1192c6e6 100644
--- a/src/bin/pg_basebackup/astreamer_file.c
+++ b/src/fe_utils/astreamer_file.c
@@ -13,10 +13,10 @@
#include <unistd.h>
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
typedef struct astreamer_plain_writer
{
diff --git a/src/bin/pg_basebackup/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_gzip.c
rename to src/fe_utils/astreamer_gzip.c
index 6f7c27afbbc..dd28defac7b 100644
--- a/src/bin/pg_basebackup/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -17,10 +17,10 @@
#include <zlib.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef HAVE_LIBZ
typedef struct astreamer_gzip_writer
diff --git a/src/bin/pg_basebackup/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_lz4.c
rename to src/fe_utils/astreamer_lz4.c
index 1c40d7d8ad5..d8b2a367e47 100644
--- a/src/bin/pg_basebackup/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -17,10 +17,10 @@
#include <lz4frame.h>
#endif
-#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_LZ4
typedef struct astreamer_lz4_frame
diff --git a/src/bin/pg_basebackup/astreamer_tar.c b/src/fe_utils/astreamer_tar.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_tar.c
rename to src/fe_utils/astreamer_tar.c
index 673690cd18f..f5d3562d280 100644
--- a/src/bin/pg_basebackup/astreamer_tar.c
+++ b/src/fe_utils/astreamer_tar.c
@@ -23,8 +23,8 @@
#include <time.h>
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#include "pgtar.h"
typedef struct astreamer_tar_parser
diff --git a/src/bin/pg_basebackup/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
similarity index 99%
rename from src/bin/pg_basebackup/astreamer_zstd.c
rename to src/fe_utils/astreamer_zstd.c
index 58dc679ef99..45f6cb67363 100644
--- a/src/bin/pg_basebackup/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -17,8 +17,8 @@
#include <zstd.h>
#endif
-#include "astreamer.h"
#include "common/logging.h"
+#include "fe_utils/astreamer.h"
#ifdef USE_ZSTD
diff --git a/src/fe_utils/meson.build b/src/fe_utils/meson.build
index 14d0482a2cc..043021d826d 100644
--- a/src/fe_utils/meson.build
+++ b/src/fe_utils/meson.build
@@ -2,6 +2,11 @@
fe_utils_sources = files(
'archive.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
'cancel.c',
'conditional.c',
'connect_utils.c',
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/include/fe_utils/astreamer.h
similarity index 100%
rename from src/bin/pg_basebackup/astreamer.h
rename to src/include/fe_utils/astreamer.h
--
2.18.0
v5-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchapplication/x-patch; name=v5-0002-Refactor-Add-astreamer_inject.h-and-move-related-.patchDownload
From 8614be90305258751b760e746a44cbbe7ba0ad6c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 17 Jul 2024 14:23:27 +0530
Subject: [PATCH v5 02/12] Refactor: Add astreamer_inject.h and move related
declarations to it.
---
src/bin/pg_basebackup/astreamer.h | 6 ------
src/bin/pg_basebackup/astreamer_inject.c | 2 +-
src/bin/pg_basebackup/astreamer_inject.h | 24 ++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 2 +-
4 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer_inject.h
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
index 6b0047418bb..9d0a8c4d0c2 100644
--- a/src/bin/pg_basebackup/astreamer.h
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -217,10 +217,4 @@ extern astreamer *astreamer_tar_parser_new(astreamer *next);
extern astreamer *astreamer_tar_terminator_new(astreamer *next);
extern astreamer *astreamer_tar_archiver_new(astreamer *next);
-extern astreamer *astreamer_recovery_injector_new(astreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void astreamer_inject_file(astreamer *streamer, char *pathname,
- char *data, int len);
-
#endif
diff --git a/src/bin/pg_basebackup/astreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
index 7f1decded8d..4ad8381f102 100644
--- a/src/bin/pg_basebackup/astreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -11,7 +11,7 @@
#include "postgres_fe.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "common/file_perm.h"
#include "common/logging.h"
diff --git a/src/bin/pg_basebackup/astreamer_inject.h b/src/bin/pg_basebackup/astreamer_inject.h
new file mode 100644
index 00000000000..8504b3f5e0d
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer_inject.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_inject.h
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer_inject.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_INJECT_H
+#define ASTREAMER_INJECT_H
+
+#include "astreamer.h"
+#include "pqexpbuffer.h"
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4179b064cbc..1e753e40c97 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,7 +26,7 @@
#endif
#include "access/xlog_internal.h"
-#include "astreamer.h"
+#include "astreamer_inject.h"
#include "backup/basebackup.h"
#include "common/compression.h"
#include "common/file_perm.h"
--
2.18.0
v5-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchapplication/x-patch; name=v5-0001-Refactor-Rename-all-bbstreamer-references-to-astr.patchDownload
From 1107087beed27e1e242a058fba30e4fe62b4e620 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 09:39:32 +0530
Subject: [PATCH v5 01/12] Refactor: Rename all bbstreamer references to
astreamer.
BBSTREAMER is specific to pg_basebackup; we need a more generalized
name so it can be placed in a common area, making it accessible for
other modules. Renaming it to ASTREAMER, short for ARCHIVE STREAMER,
makes it more general.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/astreamer.h | 226 +++++++++++++
.../{bbstreamer_file.c => astreamer_file.c} | 148 ++++----
.../{bbstreamer_gzip.c => astreamer_gzip.c} | 154 ++++-----
...bbstreamer_inject.c => astreamer_inject.c} | 152 ++++-----
.../{bbstreamer_lz4.c => astreamer_lz4.c} | 172 +++++-----
.../{bbstreamer_tar.c => astreamer_tar.c} | 316 +++++++++---------
.../{bbstreamer_zstd.c => astreamer_zstd.c} | 160 ++++-----
src/bin/pg_basebackup/bbstreamer.h | 226 -------------
src/bin/pg_basebackup/meson.build | 12 +-
src/bin/pg_basebackup/nls.mk | 12 +-
src/bin/pg_basebackup/pg_basebackup.c | 74 ++--
src/tools/pgindent/typedefs.list | 26 +-
13 files changed, 845 insertions(+), 845 deletions(-)
create mode 100644 src/bin/pg_basebackup/astreamer.h
rename src/bin/pg_basebackup/{bbstreamer_file.c => astreamer_file.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_gzip.c => astreamer_gzip.c} (62%)
rename src/bin/pg_basebackup/{bbstreamer_inject.c => astreamer_inject.c} (53%)
rename src/bin/pg_basebackup/{bbstreamer_lz4.c => astreamer_lz4.c} (69%)
rename src/bin/pg_basebackup/{bbstreamer_tar.c => astreamer_tar.c} (50%)
rename src/bin/pg_basebackup/{bbstreamer_zstd.c => astreamer_zstd.c} (64%)
delete mode 100644 src/bin/pg_basebackup/bbstreamer.h
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 26c53e473f5..a71af2d48a7 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -37,12 +37,12 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
- bbstreamer_file.o \
- bbstreamer_gzip.o \
- bbstreamer_inject.o \
- bbstreamer_lz4.o \
- bbstreamer_tar.o \
- bbstreamer_zstd.o
+ astreamer_file.o \
+ astreamer_gzip.o \
+ astreamer_inject.o \
+ astreamer_lz4.o \
+ astreamer_tar.o \
+ astreamer_zstd.o
all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/astreamer.h b/src/bin/pg_basebackup/astreamer.h
new file mode 100644
index 00000000000..6b0047418bb
--- /dev/null
+++ b/src/bin/pg_basebackup/astreamer.h
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * astreamer objects for further processing. The astreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent astreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/astreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef ASTREAMER_H
+#define ASTREAMER_H
+
+#include "common/compression.h"
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct astreamer;
+struct astreamer_ops;
+typedef struct astreamer astreamer;
+typedef struct astreamer_ops astreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a astreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see astreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one ASTREAMER_MEMBER_HEADER chunk and
+ * exactly one ASTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * ASTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly ASTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last ASTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ ASTREAMER_UNKNOWN,
+ ASTREAMER_MEMBER_HEADER,
+ ASTREAMER_MEMBER_CONTENTS,
+ ASTREAMER_MEMBER_TRAILER,
+ ASTREAMER_ARCHIVE_TRAILER,
+} astreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as ASTREAMER_MEMBER_HEADER,
+ * ASTREAMER_MEMBER_CONTENTS, or ASTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} astreamer_member;
+
+/*
+ * Generally, each type of astreamer will define its own struct, but the
+ * first element should be 'astreamer base'. A astreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the astreamer_ops object which contains the
+ * function pointers appropriate to this type of astreamer.
+ *
+ * bbs_next is a pointer to the successor astreamer, for those types of
+ * astreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of astreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct astreamer
+{
+ const astreamer_ops *bbs_ops;
+ astreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a astreamer. The 'content' callback is
+ * called repeatedly, as described in the astreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * astreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct astreamer_ops
+{
+ void (*content) (astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+ void (*finalize) (astreamer *streamer);
+ void (*free) (astreamer *streamer);
+};
+
+/* Send some content to a astreamer. */
+static inline void
+astreamer_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a astreamer. */
+static inline void
+astreamer_finalize(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a astreamer. */
+static inline void
+astreamer_free(astreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the astreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+astreamer_buffer_bytes(astreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenience method for use when implementing a astreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * astreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+astreamer_buffer_until(astreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ astreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ astreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating astreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern astreamer *astreamer_plain_writer_new(char *pathname, FILE *file);
+extern astreamer *astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern astreamer *astreamer_gzip_decompressor_new(astreamer *next);
+extern astreamer *astreamer_lz4_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_lz4_decompressor_new(astreamer *next);
+extern astreamer *astreamer_zstd_compressor_new(astreamer *next,
+ pg_compress_specification *compress);
+extern astreamer *astreamer_zstd_decompressor_new(astreamer *next);
+extern astreamer *astreamer_tar_parser_new(astreamer *next);
+extern astreamer *astreamer_tar_terminator_new(astreamer *next);
+extern astreamer *astreamer_tar_archiver_new(astreamer *next);
+
+extern astreamer *astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void astreamer_inject_file(astreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/astreamer_file.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_file.c
rename to src/bin/pg_basebackup/astreamer_file.c
index bab6cd4a6b1..2742385e103 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/astreamer_file.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_file.c
+ * astreamer_file.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_file.c
+ * src/bin/pg_basebackup/astreamer_file.c
*-------------------------------------------------------------------------
*/
@@ -13,60 +13,60 @@
#include <unistd.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
-typedef struct bbstreamer_plain_writer
+typedef struct astreamer_plain_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
FILE *file;
bool should_close_file;
-} bbstreamer_plain_writer;
+} astreamer_plain_writer;
-typedef struct bbstreamer_extractor
+typedef struct astreamer_extractor
{
- bbstreamer base;
+ astreamer base;
char *basepath;
const char *(*link_map) (const char *);
void (*report_output_file) (const char *);
char filename[MAXPGPATH];
FILE *file;
-} bbstreamer_extractor;
+} astreamer_extractor;
-static void bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+static void astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_plain_writer_finalize(astreamer *streamer);
+static void astreamer_plain_writer_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_plain_writer_ops = {
- .content = bbstreamer_plain_writer_content,
- .finalize = bbstreamer_plain_writer_finalize,
- .free = bbstreamer_plain_writer_free
+static const astreamer_ops astreamer_plain_writer_ops = {
+ .content = astreamer_plain_writer_content,
+ .finalize = astreamer_plain_writer_finalize,
+ .free = astreamer_plain_writer_free
};
-static void bbstreamer_extractor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_extractor_finalize(bbstreamer *streamer);
-static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void astreamer_extractor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_extractor_finalize(astreamer *streamer);
+static void astreamer_extractor_free(astreamer *streamer);
static void extract_directory(const char *filename, mode_t mode);
static void extract_link(const char *filename, const char *linktarget);
static FILE *create_file_for_extract(const char *filename, mode_t mode);
-static const bbstreamer_ops bbstreamer_extractor_ops = {
- .content = bbstreamer_extractor_content,
- .finalize = bbstreamer_extractor_finalize,
- .free = bbstreamer_extractor_free
+static const astreamer_ops astreamer_extractor_ops = {
+ .content = astreamer_extractor_content,
+ .finalize = astreamer_extractor_finalize,
+ .free = astreamer_extractor_free
};
/*
- * Create a bbstreamer that just writes data to a file.
+ * Create a astreamer that just writes data to a file.
*
* The caller must specify a pathname and may specify a file. The pathname is
* used for error-reporting purposes either way. If file is NULL, the pathname
@@ -74,14 +74,14 @@ static const bbstreamer_ops bbstreamer_extractor_ops = {
* for writing and closed when done. If file is not NULL, the data is written
* there.
*/
-bbstreamer *
-bbstreamer_plain_writer_new(char *pathname, FILE *file)
+astreamer *
+astreamer_plain_writer_new(char *pathname, FILE *file)
{
- bbstreamer_plain_writer *streamer;
+ astreamer_plain_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_plain_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_plain_writer_ops;
+ streamer = palloc0(sizeof(astreamer_plain_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_plain_writer_ops;
streamer->pathname = pstrdup(pathname);
streamer->file = file;
@@ -101,13 +101,13 @@ bbstreamer_plain_writer_new(char *pathname, FILE *file)
* Write archive content to file.
*/
static void
-bbstreamer_plain_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_plain_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (len == 0)
return;
@@ -128,11 +128,11 @@ bbstreamer_plain_writer_content(bbstreamer *streamer,
* the file if we opened it, but not if the caller provided it.
*/
static void
-bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+astreamer_plain_writer_finalize(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
pg_fatal("could not close file \"%s\": %m",
@@ -143,14 +143,14 @@ bbstreamer_plain_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_plain_writer_free(bbstreamer *streamer)
+astreamer_plain_writer_free(astreamer *streamer)
{
- bbstreamer_plain_writer *mystreamer;
+ astreamer_plain_writer *mystreamer;
- mystreamer = (bbstreamer_plain_writer *) streamer;
+ mystreamer = (astreamer_plain_writer *) streamer;
Assert(!mystreamer->should_close_file);
Assert(mystreamer->base.bbs_next == NULL);
@@ -160,13 +160,13 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
}
/*
- * Create a bbstreamer that extracts an archive.
+ * Create a astreamer that extracts an archive.
*
* All pathnames in the archive are interpreted relative to basepath.
*
- * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * Unlike e.g. astreamer_plain_writer_new() we can't do anything useful here
* with untyped chunks; we need typed chunks which follow the rules described
- * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * in astreamer.h. Assuming we have that, we don't need to worry about the
* original archive format; it's enough to just look at the member information
* provided and write to the corresponding file.
*
@@ -179,16 +179,16 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
* new output file. The pathname to that file is passed as an argument. If
* NULL, the call is skipped.
*/
-bbstreamer *
-bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *))
+astreamer *
+astreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
{
- bbstreamer_extractor *streamer;
+ astreamer_extractor *streamer;
- streamer = palloc0(sizeof(bbstreamer_extractor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_extractor_ops;
+ streamer = palloc0(sizeof(astreamer_extractor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_extractor_ops;
streamer->basepath = pstrdup(basepath);
streamer->link_map = link_map;
streamer->report_output_file = report_output_file;
@@ -200,19 +200,19 @@ bbstreamer_extractor_new(const char *basepath,
* Extract archive contents to the filesystem.
*/
static void
-bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_extractor_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
int fnamelen;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
+ Assert(context != ASTREAMER_UNKNOWN);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
Assert(mystreamer->file == NULL);
/* Prepend basepath. */
@@ -245,7 +245,7 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
mystreamer->report_output_file(mystreamer->filename);
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
if (mystreamer->file == NULL)
break;
@@ -260,14 +260,14 @@ bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
if (mystreamer->file == NULL)
break;
fclose(mystreamer->file);
mystreamer->file = NULL;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
break;
default:
@@ -375,10 +375,10 @@ create_file_for_extract(const char *filename, mode_t mode)
* There's nothing to do here but sanity checking.
*/
static void
-bbstreamer_extractor_finalize(bbstreamer *streamer)
+astreamer_extractor_finalize(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
- = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer PG_USED_FOR_ASSERTS_ONLY
+ = (astreamer_extractor *) streamer;
Assert(mystreamer->file == NULL);
}
@@ -387,9 +387,9 @@ bbstreamer_extractor_finalize(bbstreamer *streamer)
* Free memory.
*/
static void
-bbstreamer_extractor_free(bbstreamer *streamer)
+astreamer_extractor_free(astreamer *streamer)
{
- bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ astreamer_extractor *mystreamer = (astreamer_extractor *) streamer;
pfree(mystreamer->basepath);
pfree(mystreamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/astreamer_gzip.c
similarity index 62%
rename from src/bin/pg_basebackup/bbstreamer_gzip.c
rename to src/bin/pg_basebackup/astreamer_gzip.c
index 0417fd9bc2c..6f7c27afbbc 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/astreamer_gzip.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_gzip.c
+ * astreamer_gzip.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_gzip.c
+ * src/bin/pg_basebackup/astreamer_gzip.c
*-------------------------------------------------------------------------
*/
@@ -17,74 +17,74 @@
#include <zlib.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
+typedef struct astreamer_gzip_writer
{
- bbstreamer base;
+ astreamer base;
char *pathname;
gzFile gzfile;
-} bbstreamer_gzip_writer;
+} astreamer_gzip_writer;
-typedef struct bbstreamer_gzip_decompressor
+typedef struct astreamer_gzip_decompressor
{
- bbstreamer base;
+ astreamer base;
z_stream zstream;
size_t bytes_written;
-} bbstreamer_gzip_decompressor;
+} astreamer_gzip_decompressor;
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static void astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_writer_finalize(astreamer *streamer);
+static void astreamer_gzip_writer_free(astreamer *streamer);
static const char *get_gz_error(gzFile gzf);
-static const bbstreamer_ops bbstreamer_gzip_writer_ops = {
- .content = bbstreamer_gzip_writer_content,
- .finalize = bbstreamer_gzip_writer_finalize,
- .free = bbstreamer_gzip_writer_free
+static const astreamer_ops astreamer_gzip_writer_ops = {
+ .content = astreamer_gzip_writer_content,
+ .finalize = astreamer_gzip_writer_finalize,
+ .free = astreamer_gzip_writer_free
};
-static void bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_decompressor_free(bbstreamer *streamer);
+static void astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_gzip_decompressor_finalize(astreamer *streamer);
+static void astreamer_gzip_decompressor_free(astreamer *streamer);
static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
static void gzip_pfree(void *opaque, void *address);
-static const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
- .content = bbstreamer_gzip_decompressor_content,
- .finalize = bbstreamer_gzip_decompressor_finalize,
- .free = bbstreamer_gzip_decompressor_free
+static const astreamer_ops astreamer_gzip_decompressor_ops = {
+ .content = astreamer_gzip_decompressor_content,
+ .finalize = astreamer_gzip_decompressor_finalize,
+ .free = astreamer_gzip_decompressor_free
};
#endif
/*
- * Create a bbstreamer that just compresses data using gzip, and then writes
+ * Create a astreamer that just compresses data using gzip, and then writes
* it to a file.
*
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * As in the case of astreamer_plain_writer_new, pathname is always used
* for error reporting purposes; if file is NULL, it is also the opened and
* closed so that the data may be written there.
*/
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress)
+astreamer *
+astreamer_gzip_writer_new(char *pathname, FILE *file,
+ pg_compress_specification *compress)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_writer *streamer;
+ astreamer_gzip_writer *streamer;
- streamer = palloc0(sizeof(bbstreamer_gzip_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_writer_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_writer));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_writer_ops;
streamer->pathname = pstrdup(pathname);
@@ -123,13 +123,13 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file,
* Write archive content to gzip file.
*/
static void
-bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
+astreamer_gzip_writer_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
if (len == 0)
return;
@@ -151,16 +151,16 @@ bbstreamer_gzip_writer_content(bbstreamer *streamer,
*
* It makes no difference whether we opened the file or the caller did it,
* because libz provides no way of avoiding a close on the underlying file
- * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * handle. Notice, however, that astreamer_gzip_writer_new() uses dup() to
* work around this issue, so that the behavior from the caller's viewpoint
- * is the same as for bbstreamer_plain_writer.
+ * is the same as for astreamer_plain_writer.
*/
static void
-bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+astreamer_gzip_writer_finalize(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
errno = 0; /* in case gzclose() doesn't set it */
if (gzclose(mystreamer->gzfile) != 0)
@@ -171,14 +171,14 @@ bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_gzip_writer_free(bbstreamer *streamer)
+astreamer_gzip_writer_free(astreamer *streamer)
{
- bbstreamer_gzip_writer *mystreamer;
+ astreamer_gzip_writer *mystreamer;
- mystreamer = (bbstreamer_gzip_writer *) streamer;
+ mystreamer = (astreamer_gzip_writer *) streamer;
Assert(mystreamer->base.bbs_next == NULL);
Assert(mystreamer->gzfile == NULL);
@@ -208,18 +208,18 @@ get_gz_error(gzFile gzf)
* Create a new base backup streamer that performs decompression of gzip
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_gzip_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_gzip_decompressor_new(astreamer *next)
{
#ifdef HAVE_LIBZ
- bbstreamer_gzip_decompressor *streamer;
+ astreamer_gzip_decompressor *streamer;
z_stream *zs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_gzip_decompressor));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_gzip_decompressor));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_gzip_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -258,15 +258,15 @@ bbstreamer_gzip_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_gzip_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
z_stream *zs;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
zs = &mystreamer->zstream;
zs->next_in = (const uint8 *) data;
@@ -301,9 +301,9 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
/* If output buffer is full then pass data to next streamer */
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen, context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
mystreamer->bytes_written = 0;
}
}
@@ -313,31 +313,31 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer)
+astreamer_gzip_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_gzip_decompressor *mystreamer;
+ astreamer_gzip_decompressor *mystreamer;
- mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+ mystreamer = (astreamer_gzip_decompressor *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_gzip_decompressor_free(bbstreamer *streamer)
+astreamer_gzip_decompressor_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
similarity index 53%
rename from src/bin/pg_basebackup/bbstreamer_inject.c
rename to src/bin/pg_basebackup/astreamer_inject.c
index 194026b56e9..7f1decded8d 100644
--- a/src/bin/pg_basebackup/bbstreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -1,51 +1,51 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_inject.c
+ * astreamer_inject.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_inject.c
+ * src/bin/pg_basebackup/astreamer_inject.c
*-------------------------------------------------------------------------
*/
#include "postgres_fe.h"
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
-typedef struct bbstreamer_recovery_injector
+typedef struct astreamer_recovery_injector
{
- bbstreamer base;
+ astreamer base;
bool skip_file;
bool is_recovery_guc_supported;
bool is_postgresql_auto_conf;
bool found_postgresql_auto_conf;
PQExpBuffer recoveryconfcontents;
- bbstreamer_member member;
-} bbstreamer_recovery_injector;
+ astreamer_member member;
+} astreamer_recovery_injector;
-static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
-static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+static void astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_recovery_injector_finalize(astreamer *streamer);
+static void astreamer_recovery_injector_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
- .content = bbstreamer_recovery_injector_content,
- .finalize = bbstreamer_recovery_injector_finalize,
- .free = bbstreamer_recovery_injector_free
+static const astreamer_ops astreamer_recovery_injector_ops = {
+ .content = astreamer_recovery_injector_content,
+ .finalize = astreamer_recovery_injector_finalize,
+ .free = astreamer_recovery_injector_free
};
/*
- * Create a bbstreamer that can edit recoverydata into an archive stream.
+ * Create a astreamer that can edit recoverydata into an archive stream.
*
- * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
- * per the conventions described in bbstreamer.h; the chunks forwarded to
- * the next bbstreamer will be similarly typed, but the
- * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * The input should be a series of typed chunks (not ASTREAMER_UNKNOWN) as
+ * per the conventions described in astreamer.h; the chunks forwarded to
+ * the next astreamer will be similarly typed, but the
+ * ASTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
* edited the archive stream.
*
* Our goal is to do one of the following three things with the content passed
@@ -61,16 +61,16 @@ static const bbstreamer_ops bbstreamer_recovery_injector_ops = {
* zero-length standby.signal file, dropping any file with that name from
* the archive.
*/
-bbstreamer *
-bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents)
+astreamer *
+astreamer_recovery_injector_new(astreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
{
- bbstreamer_recovery_injector *streamer;
+ astreamer_recovery_injector *streamer;
- streamer = palloc0(sizeof(bbstreamer_recovery_injector));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_recovery_injector_ops;
+ streamer = palloc0(sizeof(astreamer_recovery_injector));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_recovery_injector_ops;
streamer->base.bbs_next = next;
streamer->is_recovery_guc_supported = is_recovery_guc_supported;
streamer->recoveryconfcontents = recoveryconfcontents;
@@ -82,21 +82,21 @@ bbstreamer_recovery_injector_new(bbstreamer *next,
* Handle each chunk of tar content while injecting recovery configuration.
*/
static void
-bbstreamer_recovery_injector_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_recovery_injector_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_recovery_injector *mystreamer;
+ astreamer_recovery_injector *mystreamer;
- mystreamer = (bbstreamer_recovery_injector *) streamer;
- Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ mystreamer = (astreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == ASTREAMER_ARCHIVE_TRAILER);
switch (context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/* Must copy provided data so we have the option to modify it. */
- memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+ memcpy(&mystreamer->member, member, sizeof(astreamer_member));
/*
* On v12+, skip standby.signal and edit postgresql.auto.conf; on
@@ -119,8 +119,8 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
/*
* Zap data and len because the archive header is no
- * longer valid; some subsequent bbstreamer must
- * regenerate it if it's necessary.
+ * longer valid; some subsequent astreamer must regenerate
+ * it if it's necessary.
*/
data = NULL;
len = 0;
@@ -135,26 +135,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
return;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/* Do not forward if the file is to be skipped. */
if (mystreamer->skip_file)
return;
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/* Do not forward it the file is to be skipped. */
if (mystreamer->skip_file)
return;
/* Append provided content to whatever we already sent. */
if (mystreamer->is_postgresql_auto_conf)
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ ASTREAMER_MEMBER_CONTENTS);
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
if (mystreamer->is_recovery_guc_supported)
{
/*
@@ -163,22 +163,22 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
* member now.
*/
if (!mystreamer->found_postgresql_auto_conf)
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "postgresql.auto.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
/* Inject empty standby.signal file. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "standby.signal", "", 0);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
}
else
{
/* Inject recovery.conf file with specified contents. */
- bbstreamer_inject_file(mystreamer->base.bbs_next,
- "recovery.conf",
- mystreamer->recoveryconfcontents->data,
- mystreamer->recoveryconfcontents->len);
+ astreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
}
/* Nothing to do here. */
@@ -189,26 +189,26 @@ bbstreamer_recovery_injector_content(bbstreamer *streamer,
pg_fatal("unexpected state while injecting recovery settings");
}
- bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
- data, len, context);
+ astreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
}
/*
- * End-of-stream processing for this bbstreamer.
+ * End-of-stream processing for this astreamer.
*/
static void
-bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+astreamer_recovery_injector_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
- * Free memory associated with this bbstreamer.
+ * Free memory associated with this astreamer.
*/
static void
-bbstreamer_recovery_injector_free(bbstreamer *streamer)
+astreamer_recovery_injector_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
@@ -216,10 +216,10 @@ bbstreamer_recovery_injector_free(bbstreamer *streamer)
* Inject a member into the archive with specified contents.
*/
void
-bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
- int len)
+astreamer_inject_file(astreamer *streamer, char *pathname, char *data,
+ int len)
{
- bbstreamer_member member;
+ astreamer_member member;
strlcpy(member.pathname, pathname, MAXPGPATH);
member.size = len;
@@ -238,12 +238,12 @@ bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
/*
* We don't know here how to generate valid member headers and trailers
* for the archiving format in use, so if those are needed, some successor
- * bbstreamer will have to generate them using the data from 'member'.
+ * astreamer will have to generate them using the data from 'member'.
*/
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_HEADER);
- bbstreamer_content(streamer, &member, data, len,
- BBSTREAMER_MEMBER_CONTENTS);
- bbstreamer_content(streamer, &member, NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_HEADER);
+ astreamer_content(streamer, &member, data, len,
+ ASTREAMER_MEMBER_CONTENTS);
+ astreamer_content(streamer, &member, NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/astreamer_lz4.c
similarity index 69%
rename from src/bin/pg_basebackup/bbstreamer_lz4.c
rename to src/bin/pg_basebackup/astreamer_lz4.c
index f5c9e68150c..1c40d7d8ad5 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/astreamer_lz4.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_lz4.c
+ * astreamer_lz4.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_lz4.c
+ * src/bin/pg_basebackup/astreamer_lz4.c
*-------------------------------------------------------------------------
*/
@@ -17,15 +17,15 @@
#include <lz4frame.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/file_perm.h"
#include "common/logging.h"
#include "common/string.h"
#ifdef USE_LZ4
-typedef struct bbstreamer_lz4_frame
+typedef struct astreamer_lz4_frame
{
- bbstreamer base;
+ astreamer base;
LZ4F_compressionContext_t cctx;
LZ4F_decompressionContext_t dctx;
@@ -33,32 +33,32 @@ typedef struct bbstreamer_lz4_frame
size_t bytes_written;
bool header_written;
-} bbstreamer_lz4_frame;
+} astreamer_lz4_frame;
-static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+static void astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_compressor_finalize(astreamer *streamer);
+static void astreamer_lz4_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
- .content = bbstreamer_lz4_compressor_content,
- .finalize = bbstreamer_lz4_compressor_finalize,
- .free = bbstreamer_lz4_compressor_free
+static const astreamer_ops astreamer_lz4_compressor_ops = {
+ .content = astreamer_lz4_compressor_content,
+ .finalize = astreamer_lz4_compressor_finalize,
+ .free = astreamer_lz4_compressor_free
};
-static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+static void astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_lz4_decompressor_finalize(astreamer *streamer);
+static void astreamer_lz4_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
- .content = bbstreamer_lz4_decompressor_content,
- .finalize = bbstreamer_lz4_decompressor_finalize,
- .free = bbstreamer_lz4_decompressor_free
+static const astreamer_ops astreamer_lz4_decompressor_ops = {
+ .content = astreamer_lz4_decompressor_content,
+ .finalize = astreamer_lz4_decompressor_finalize,
+ .free = astreamer_lz4_decompressor_free
};
#endif
@@ -66,19 +66,19 @@ static const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* Create a new base backup streamer that performs lz4 compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_lz4_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
LZ4F_preferences_t *prefs;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_compressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -113,19 +113,19 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, pg_compress_specification *compr
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_lz4_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t out_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
/* Write header before processing the first input chunk. */
@@ -159,10 +159,10 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
out_bound = LZ4F_compressBound(len, &mystreamer->prefs);
if (avail_out < out_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ context);
/* Enlarge buffer if it falls short of out bound. */
if (mystreamer->base.bbs_buffer.maxlen < out_bound)
@@ -196,25 +196,25 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
+astreamer_lz4_compressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_out;
size_t footer_bound,
compressed_size,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/* Find out the footer bound and update the output buffer. */
footer_bound = LZ4F_compressBound(0, &mystreamer->prefs);
if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <
footer_bound)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
/* Enlarge buffer if it falls short of footer bound. */
if (mystreamer->base.bbs_buffer.maxlen < footer_bound)
@@ -243,24 +243,24 @@ bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
mystreamer->bytes_written += compressed_size;
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->bytes_written,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_compressor_free(bbstreamer *streamer)
+astreamer_lz4_compressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeCompressionContext(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -271,18 +271,18 @@ bbstreamer_lz4_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of lz4
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_lz4_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_lz4_decompressor_new(astreamer *next)
{
#ifdef USE_LZ4
- bbstreamer_lz4_frame *streamer;
+ astreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_lz4_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_lz4_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_lz4_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_lz4_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -307,18 +307,18 @@ bbstreamer_lz4_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_lz4_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
uint8 *next_in,
*next_out;
size_t avail_in,
avail_out;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
next_in = (uint8 *) data;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
avail_in = len;
@@ -366,10 +366,10 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ context);
avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
@@ -387,34 +387,34 @@ bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer)
+astreamer_lz4_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
+ mystreamer = (astreamer_lz4_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_lz4_decompressor_free(bbstreamer *streamer)
+astreamer_lz4_decompressor_free(astreamer *streamer)
{
- bbstreamer_lz4_frame *mystreamer;
+ astreamer_lz4_frame *mystreamer;
- mystreamer = (bbstreamer_lz4_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ mystreamer = (astreamer_lz4_frame *) streamer;
+ astreamer_free(streamer->bbs_next);
LZ4F_freeDecompressionContext(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/astreamer_tar.c
similarity index 50%
rename from src/bin/pg_basebackup/bbstreamer_tar.c
rename to src/bin/pg_basebackup/astreamer_tar.c
index 9137d17ddc1..673690cd18f 100644
--- a/src/bin/pg_basebackup/bbstreamer_tar.c
+++ b/src/bin/pg_basebackup/astreamer_tar.c
@@ -1,13 +1,13 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_tar.c
+ * astreamer_tar.c
*
* This module implements three types of tar processing. A tar parser
- * expects unlabelled chunks of data (e.g. BBSTREAMER_UNKNOWN) and splits
- * it into labelled chunks (any other value of bbstreamer_archive_context).
+ * expects unlabelled chunks of data (e.g. ASTREAMER_UNKNOWN) and splits
+ * it into labelled chunks (any other value of astreamer_archive_context).
* A tar archiver does the reverse: it takes a bunch of labelled chunks
* and produces a tarfile, optionally replacing member headers and trailers
- * so that upstream bbstreamer objects can perform surgery on the tarfile
+ * so that upstream astreamer objects can perform surgery on the tarfile
* contents without knowing the details of the tar format. A tar terminator
* just adds two blocks of NUL bytes to the end of the file, since older
* server versions produce files with this terminator omitted.
@@ -15,7 +15,7 @@
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_tar.c
+ * src/bin/pg_basebackup/astreamer_tar.c
*-------------------------------------------------------------------------
*/
@@ -23,83 +23,83 @@
#include <time.h>
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#include "pgtar.h"
-typedef struct bbstreamer_tar_parser
+typedef struct astreamer_tar_parser
{
- bbstreamer base;
- bbstreamer_archive_context next_context;
- bbstreamer_member member;
+ astreamer base;
+ astreamer_archive_context next_context;
+ astreamer_member member;
size_t file_bytes_sent;
size_t pad_bytes_expected;
-} bbstreamer_tar_parser;
+} astreamer_tar_parser;
-typedef struct bbstreamer_tar_archiver
+typedef struct astreamer_tar_archiver
{
- bbstreamer base;
+ astreamer base;
bool rearchive_member;
-} bbstreamer_tar_archiver;
+} astreamer_tar_archiver;
-static void bbstreamer_tar_parser_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_parser_free(bbstreamer *streamer);
-static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+static void astreamer_tar_parser_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_parser_finalize(astreamer *streamer);
+static void astreamer_tar_parser_free(astreamer *streamer);
+static bool astreamer_tar_header(astreamer_tar_parser *mystreamer);
-static const bbstreamer_ops bbstreamer_tar_parser_ops = {
- .content = bbstreamer_tar_parser_content,
- .finalize = bbstreamer_tar_parser_finalize,
- .free = bbstreamer_tar_parser_free
+static const astreamer_ops astreamer_tar_parser_ops = {
+ .content = astreamer_tar_parser_content,
+ .finalize = astreamer_tar_parser_finalize,
+ .free = astreamer_tar_parser_free
};
-static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+static void astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_archiver_finalize(astreamer *streamer);
+static void astreamer_tar_archiver_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_archiver_ops = {
- .content = bbstreamer_tar_archiver_content,
- .finalize = bbstreamer_tar_archiver_finalize,
- .free = bbstreamer_tar_archiver_free
+static const astreamer_ops astreamer_tar_archiver_ops = {
+ .content = astreamer_tar_archiver_content,
+ .finalize = astreamer_tar_archiver_finalize,
+ .free = astreamer_tar_archiver_free
};
-static void bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_tar_terminator_finalize(bbstreamer *streamer);
-static void bbstreamer_tar_terminator_free(bbstreamer *streamer);
+static void astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_tar_terminator_finalize(astreamer *streamer);
+static void astreamer_tar_terminator_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_tar_terminator_ops = {
- .content = bbstreamer_tar_terminator_content,
- .finalize = bbstreamer_tar_terminator_finalize,
- .free = bbstreamer_tar_terminator_free
+static const astreamer_ops astreamer_tar_terminator_ops = {
+ .content = astreamer_tar_terminator_content,
+ .finalize = astreamer_tar_terminator_finalize,
+ .free = astreamer_tar_terminator_free
};
/*
- * Create a bbstreamer that can parse a stream of content as tar data.
+ * Create a astreamer that can parse a stream of content as tar data.
*
- * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * The input should be a series of ASTREAMER_UNKNOWN chunks; the astreamer
* specified by 'next' will receive a series of typed chunks, as per the
- * conventions described in bbstreamer.h.
+ * conventions described in astreamer.h.
*/
-bbstreamer *
-bbstreamer_tar_parser_new(bbstreamer *next)
+astreamer *
+astreamer_tar_parser_new(astreamer *next)
{
- bbstreamer_tar_parser *streamer;
+ astreamer_tar_parser *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_parser));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_parser_ops;
+ streamer = palloc0(sizeof(astreamer_tar_parser));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_parser_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
- streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ streamer->next_context = ASTREAMER_MEMBER_HEADER;
return &streamer->base;
}
@@ -108,29 +108,29 @@ bbstreamer_tar_parser_new(bbstreamer *next)
* Parse unknown content as tar data.
*/
static void
-bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_parser_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
size_t nbytes;
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
while (len > 0)
{
switch (mystreamer->next_context)
{
- case BBSTREAMER_MEMBER_HEADER:
+ case ASTREAMER_MEMBER_HEADER:
/*
* If we're expecting an archive member header, accumulate a
* full block of data before doing anything further.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- TAR_BLOCK_SIZE))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
return;
/*
@@ -139,32 +139,32 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* thought was the next file header is actually the start of
* the archive trailer. Switch modes accordingly.
*/
- if (bbstreamer_tar_header(mystreamer))
+ if (astreamer_tar_header(mystreamer))
{
if (mystreamer->member.size == 0)
{
/* No content; trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Expect contents. */
- mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ mystreamer->next_context = ASTREAMER_MEMBER_CONTENTS;
}
mystreamer->base.bbs_buffer.len = 0;
mystreamer->file_bytes_sent = 0;
}
else
- mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ mystreamer->next_context = ASTREAMER_ARCHIVE_TRAILER;
break;
- case BBSTREAMER_MEMBER_CONTENTS:
+ case ASTREAMER_MEMBER_CONTENTS:
/*
* Send as much content as we have, but not more than the
@@ -174,10 +174,10 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
nbytes = Min(nbytes, len);
Assert(nbytes > 0);
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, nbytes,
- BBSTREAMER_MEMBER_CONTENTS);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ ASTREAMER_MEMBER_CONTENTS);
mystreamer->file_bytes_sent += nbytes;
data += nbytes;
len -= nbytes;
@@ -193,53 +193,53 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
if (mystreamer->pad_bytes_expected == 0)
{
/* Trailer is zero-length. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- NULL, 0,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
}
else
{
/* Trailer is not zero-length. */
- mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ mystreamer->next_context = ASTREAMER_MEMBER_TRAILER;
}
mystreamer->base.bbs_buffer.len = 0;
}
break;
- case BBSTREAMER_MEMBER_TRAILER:
+ case ASTREAMER_MEMBER_TRAILER:
/*
* If we're expecting an archive member trailer, accumulate
* the expected number of padding bytes before sending
* anything onward.
*/
- if (!bbstreamer_buffer_until(streamer, &data, &len,
- mystreamer->pad_bytes_expected))
+ if (!astreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
return;
/* OK, now we can send it. */
- bbstreamer_content(mystreamer->base.bbs_next,
- &mystreamer->member,
- data, mystreamer->pad_bytes_expected,
- BBSTREAMER_MEMBER_TRAILER);
+ astreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ ASTREAMER_MEMBER_TRAILER);
/* Expect next file header. */
- mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->next_context = ASTREAMER_MEMBER_HEADER;
mystreamer->base.bbs_buffer.len = 0;
break;
- case BBSTREAMER_ARCHIVE_TRAILER:
+ case ASTREAMER_ARCHIVE_TRAILER:
/*
* We've seen an end-of-archive indicator, so anything more is
* buffered and sent as part of the archive trailer. But we
* don't expect more than 2 blocks.
*/
- bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ astreamer_buffer_bytes(streamer, &data, &len, len);
if (len > 2 * TAR_BLOCK_SIZE)
pg_fatal("tar file trailer exceeds 2 blocks");
return;
@@ -255,14 +255,14 @@ bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
* Parse a file header within a tar stream.
*
* The return value is true if we found a file header and passed it on to the
- * next bbstreamer; it is false if we have reached the archive trailer.
+ * next astreamer; it is false if we have reached the archive trailer.
*/
static bool
-bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+astreamer_tar_header(astreamer_tar_parser *mystreamer)
{
bool has_nonzero_byte = false;
int i;
- bbstreamer_member *member = &mystreamer->member;
+ astreamer_member *member = &mystreamer->member;
char *buffer = mystreamer->base.bbs_buffer.data;
Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
@@ -304,10 +304,10 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
/* Compute number of padding bytes. */
mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
- /* Forward the entire header to the next bbstreamer. */
- bbstreamer_content(mystreamer->base.bbs_next, member,
- buffer, TAR_BLOCK_SIZE,
- BBSTREAMER_MEMBER_HEADER);
+ /* Forward the entire header to the next astreamer. */
+ astreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ ASTREAMER_MEMBER_HEADER);
return true;
}
@@ -316,50 +316,50 @@ bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
* End-of-stream processing for a tar parser.
*/
static void
-bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+astreamer_tar_parser_finalize(astreamer *streamer)
{
- bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ astreamer_tar_parser *mystreamer = (astreamer_tar_parser *) streamer;
- if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
- (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ if (mystreamer->next_context != ASTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != ASTREAMER_MEMBER_HEADER ||
mystreamer->base.bbs_buffer.len > 0))
pg_fatal("COPY stream ended before last file was finished");
/* Send the archive trailer, even if empty. */
- bbstreamer_content(streamer->bbs_next, NULL,
- streamer->bbs_buffer.data, streamer->bbs_buffer.len,
- BBSTREAMER_ARCHIVE_TRAILER);
+ astreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ ASTREAMER_ARCHIVE_TRAILER);
/* Now finalize successor. */
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar parser.
*/
static void
-bbstreamer_tar_parser_free(bbstreamer *streamer)
+astreamer_tar_parser_free(astreamer *streamer)
{
pfree(streamer->bbs_buffer.data);
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
}
/*
- * Create a bbstreamer that can generate a tar archive.
+ * Create a astreamer that can generate a tar archive.
*
* This is intended to be usable either for generating a brand-new tar archive
* or for modifying one on the fly. The input should be a series of typed
- * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
- * bbstreamer_tar_parser_content.
+ * chunks (i.e. not ASTREAMER_UNKNOWN). See also the comments for
+ * astreamer_tar_parser_content.
*/
-bbstreamer *
-bbstreamer_tar_archiver_new(bbstreamer *next)
+astreamer *
+astreamer_tar_archiver_new(astreamer *next)
{
- bbstreamer_tar_archiver *streamer;
+ astreamer_tar_archiver *streamer;
- streamer = palloc0(sizeof(bbstreamer_tar_archiver));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_tar_archiver_ops;
+ streamer = palloc0(sizeof(astreamer_tar_archiver));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_tar_archiver_ops;
streamer->base.bbs_next = next;
return &streamer->base;
@@ -368,36 +368,36 @@ bbstreamer_tar_archiver_new(bbstreamer *next)
/*
* Fix up the stream of input chunks to create a valid tar file.
*
- * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * If a ASTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
* newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
* passed through without change. Any other size is a fatal error (and
* indicates a bug).
*
- * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
- * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * Whenever a new ASTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding ASTREAMER_MEMBER_TRAILER chunk is also constructed from
* scratch. Specifically, we construct a block of zero bytes sufficient to
* pad out to a block boundary, as required by the tar format. Other
- * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ * ASTREAMER_MEMBER_TRAILER chunks are passed through without change.
*
- * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ * Any ASTREAMER_MEMBER_CONTENTS chunks are passed through without change.
*
- * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * The ASTREAMER_ARCHIVE_TRAILER chunk is replaced with two
* blocks of zero bytes. Not all tar programs require this, but apparently
* some do. The server does not supply this trailer. If no archive trailer is
- * present, one will be added by bbstreamer_tar_parser_finalize.
+ * present, one will be added by astreamer_tar_parser_finalize.
*/
static void
-bbstreamer_tar_archiver_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_archiver_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ astreamer_tar_archiver *mystreamer = (astreamer_tar_archiver *) streamer;
char buffer[2 * TAR_BLOCK_SIZE];
- Assert(context != BBSTREAMER_UNKNOWN);
+ Assert(context != ASTREAMER_UNKNOWN);
- if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ if (context == ASTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
{
Assert(len == 0);
@@ -411,7 +411,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Also make a note to replace padding, in case size changed. */
mystreamer->rearchive_member = true;
}
- else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ else if (context == ASTREAMER_MEMBER_TRAILER &&
mystreamer->rearchive_member)
{
int pad_bytes = tarPaddingBytesRequired(member->size);
@@ -424,7 +424,7 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
/* Don't do this again unless we replace another header. */
mystreamer->rearchive_member = false;
}
- else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ else if (context == ASTREAMER_ARCHIVE_TRAILER)
{
/* Trailer should always be two blocks of zero bytes. */
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
@@ -432,40 +432,40 @@ bbstreamer_tar_archiver_content(bbstreamer *streamer,
len = 2 * TAR_BLOCK_SIZE;
}
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
* End-of-stream processing for a tar archiver.
*/
static void
-bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+astreamer_tar_archiver_finalize(astreamer *streamer)
{
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar archiver.
*/
static void
-bbstreamer_tar_archiver_free(bbstreamer *streamer)
+astreamer_tar_archiver_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
/*
- * Create a bbstreamer that blindly adds two blocks of NUL bytes to the
+ * Create a astreamer that blindly adds two blocks of NUL bytes to the
* end of an incomplete tarfile that the server might send us.
*/
-bbstreamer *
-bbstreamer_tar_terminator_new(bbstreamer *next)
+astreamer *
+astreamer_tar_terminator_new(astreamer *next)
{
- bbstreamer *streamer;
+ astreamer *streamer;
- streamer = palloc0(sizeof(bbstreamer));
- *((const bbstreamer_ops **) &streamer->bbs_ops) =
- &bbstreamer_tar_terminator_ops;
+ streamer = palloc0(sizeof(astreamer));
+ *((const astreamer_ops **) &streamer->bbs_ops) =
+ &astreamer_tar_terminator_ops;
streamer->bbs_next = next;
return streamer;
@@ -475,17 +475,17 @@ bbstreamer_tar_terminator_new(bbstreamer *next)
* Pass all the content through without change.
*/
static void
-bbstreamer_tar_terminator_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_tar_terminator_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
/* Expect unparsed input. */
Assert(member == NULL);
- Assert(context == BBSTREAMER_UNKNOWN);
+ Assert(context == ASTREAMER_UNKNOWN);
/* Just forward it. */
- bbstreamer_content(streamer->bbs_next, member, data, len, context);
+ astreamer_content(streamer->bbs_next, member, data, len, context);
}
/*
@@ -493,22 +493,22 @@ bbstreamer_tar_terminator_content(bbstreamer *streamer,
* to supply.
*/
static void
-bbstreamer_tar_terminator_finalize(bbstreamer *streamer)
+astreamer_tar_terminator_finalize(astreamer *streamer)
{
char buffer[2 * TAR_BLOCK_SIZE];
memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
- bbstreamer_content(streamer->bbs_next, NULL, buffer,
- 2 * TAR_BLOCK_SIZE, BBSTREAMER_UNKNOWN);
- bbstreamer_finalize(streamer->bbs_next);
+ astreamer_content(streamer->bbs_next, NULL, buffer,
+ 2 * TAR_BLOCK_SIZE, ASTREAMER_UNKNOWN);
+ astreamer_finalize(streamer->bbs_next);
}
/*
* Free memory associated with a tar terminator.
*/
static void
-bbstreamer_tar_terminator_free(bbstreamer *streamer)
+astreamer_tar_terminator_free(astreamer *streamer)
{
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
pfree(streamer);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/astreamer_zstd.c
similarity index 64%
rename from src/bin/pg_basebackup/bbstreamer_zstd.c
rename to src/bin/pg_basebackup/astreamer_zstd.c
index 20f11d4450e..58dc679ef99 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/astreamer_zstd.c
@@ -1,11 +1,11 @@
/*-------------------------------------------------------------------------
*
- * bbstreamer_zstd.c
+ * astreamer_zstd.c
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer_zstd.c
+ * src/bin/pg_basebackup/astreamer_zstd.c
*-------------------------------------------------------------------------
*/
@@ -17,44 +17,44 @@
#include <zstd.h>
#endif
-#include "bbstreamer.h"
+#include "astreamer.h"
#include "common/logging.h"
#ifdef USE_ZSTD
-typedef struct bbstreamer_zstd_frame
+typedef struct astreamer_zstd_frame
{
- bbstreamer base;
+ astreamer base;
ZSTD_CCtx *cctx;
ZSTD_DCtx *dctx;
ZSTD_outBuffer zstd_outBuf;
-} bbstreamer_zstd_frame;
+} astreamer_zstd_frame;
-static void bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_compressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_compressor_free(bbstreamer *streamer);
+static void astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_compressor_finalize(astreamer *streamer);
+static void astreamer_zstd_compressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
- .content = bbstreamer_zstd_compressor_content,
- .finalize = bbstreamer_zstd_compressor_finalize,
- .free = bbstreamer_zstd_compressor_free
+static const astreamer_ops astreamer_zstd_compressor_ops = {
+ .content = astreamer_zstd_compressor_content,
+ .finalize = astreamer_zstd_compressor_finalize,
+ .free = astreamer_zstd_compressor_free
};
-static void bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer);
-static void bbstreamer_zstd_decompressor_free(bbstreamer *streamer);
+static void astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_zstd_decompressor_finalize(astreamer *streamer);
+static void astreamer_zstd_decompressor_free(astreamer *streamer);
-static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
- .content = bbstreamer_zstd_decompressor_content,
- .finalize = bbstreamer_zstd_decompressor_finalize,
- .free = bbstreamer_zstd_decompressor_free
+static const astreamer_ops astreamer_zstd_decompressor_ops = {
+ .content = astreamer_zstd_decompressor_content,
+ .finalize = astreamer_zstd_decompressor_finalize,
+ .free = astreamer_zstd_decompressor_free
};
#endif
@@ -62,19 +62,19 @@ static const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* Create a new base backup streamer that performs zstd compression of tar
* blocks.
*/
-bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *compress)
+astreamer *
+astreamer_zstd_compressor_new(astreamer *next, pg_compress_specification *compress)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
size_t ret;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_compressor_ops;
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_compressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -142,12 +142,12 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, pg_compress_specification *comp
* of output buffer to next streamer and empty the buffer.
*/
static void
-bbstreamer_zstd_compressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_compressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -162,10 +162,10 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -187,9 +187,9 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
+astreamer_zstd_compressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
size_t yet_to_flush;
do
@@ -204,10 +204,10 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
max_needed)
{
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -227,23 +227,23 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
/* Make sure to pass any remaining bytes to the next streamer. */
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_compressor_free(bbstreamer *streamer)
+astreamer_zstd_compressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeCCtx(mystreamer->cctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
@@ -254,17 +254,17 @@ bbstreamer_zstd_compressor_free(bbstreamer *streamer)
* Create a new base backup streamer that performs decompression of zstd
* compressed blocks.
*/
-bbstreamer *
-bbstreamer_zstd_decompressor_new(bbstreamer *next)
+astreamer *
+astreamer_zstd_decompressor_new(astreamer *next)
{
#ifdef USE_ZSTD
- bbstreamer_zstd_frame *streamer;
+ astreamer_zstd_frame *streamer;
Assert(next != NULL);
- streamer = palloc0(sizeof(bbstreamer_zstd_frame));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_zstd_decompressor_ops;
+ streamer = palloc0(sizeof(astreamer_zstd_frame));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_zstd_decompressor_ops;
streamer->base.bbs_next = next;
initStringInfo(&streamer->base.bbs_buffer);
@@ -293,12 +293,12 @@ bbstreamer_zstd_decompressor_new(bbstreamer *next)
* to the next streamer.
*/
static void
-bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
+astreamer_zstd_decompressor_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
ZSTD_inBuffer inBuf = {data, len, 0};
while (inBuf.pos < inBuf.size)
@@ -311,10 +311,10 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
*/
if (mystreamer->zstd_outBuf.pos >= mystreamer->zstd_outBuf.size)
{
- bbstreamer_content(mystreamer->base.bbs_next, member,
- mystreamer->zstd_outBuf.dst,
- mystreamer->zstd_outBuf.pos,
- context);
+ astreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
/* Reset the ZSTD output buffer. */
mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
@@ -335,32 +335,32 @@ bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
* End-of-stream processing.
*/
static void
-bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer)
+astreamer_zstd_decompressor_finalize(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
/*
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
if (mystreamer->zstd_outBuf.pos > 0)
- bbstreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- BBSTREAMER_UNKNOWN);
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ ASTREAMER_UNKNOWN);
- bbstreamer_finalize(mystreamer->base.bbs_next);
+ astreamer_finalize(mystreamer->base.bbs_next);
}
/*
* Free memory.
*/
static void
-bbstreamer_zstd_decompressor_free(bbstreamer *streamer)
+astreamer_zstd_decompressor_free(astreamer *streamer)
{
- bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ astreamer_zstd_frame *mystreamer = (astreamer_zstd_frame *) streamer;
- bbstreamer_free(streamer->bbs_next);
+ astreamer_free(streamer->bbs_next);
ZSTD_freeDCtx(mystreamer->dctx);
pfree(streamer->bbs_buffer.data);
pfree(streamer);
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
deleted file mode 100644
index 3b820f13b51..00000000000
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ /dev/null
@@ -1,226 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * bbstreamer.h
- *
- * Each tar archive returned by the server is passed to one or more
- * bbstreamer objects for further processing. The bbstreamer may do
- * something simple, like write the archive to a file, perhaps after
- * compressing it, but it can also do more complicated things, like
- * annotating the byte stream to indicate which parts of the data
- * correspond to tar headers or trailing padding, vs. which parts are
- * payload data. A subsequent bbstreamer may use this information to
- * make further decisions about how to process the data; for example,
- * it might choose to modify the archive contents.
- *
- * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
- *
- * IDENTIFICATION
- * src/bin/pg_basebackup/bbstreamer.h
- *-------------------------------------------------------------------------
- */
-
-#ifndef BBSTREAMER_H
-#define BBSTREAMER_H
-
-#include "common/compression.h"
-#include "lib/stringinfo.h"
-#include "pqexpbuffer.h"
-
-struct bbstreamer;
-struct bbstreamer_ops;
-typedef struct bbstreamer bbstreamer;
-typedef struct bbstreamer_ops bbstreamer_ops;
-
-/*
- * Each chunk of archive data passed to a bbstreamer is classified into one
- * of these categories. When data is first received from the remote server,
- * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
- * be of whatever size the remote server chose to send.
- *
- * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
- * chunks should be labelled as one of the other types listed here. In
- * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
- * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
- * that means a zero-length call. There can be any number of
- * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
- * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
- * last BBSTREAMER_MEMBER_TRAILER chunk.
- *
- * In theory, we could need other classifications here, such as a way of
- * indicating an archive header, but the "tar" format doesn't need anything
- * else, so for the time being there's no point.
- */
-typedef enum
-{
- BBSTREAMER_UNKNOWN,
- BBSTREAMER_MEMBER_HEADER,
- BBSTREAMER_MEMBER_CONTENTS,
- BBSTREAMER_MEMBER_TRAILER,
- BBSTREAMER_ARCHIVE_TRAILER,
-} bbstreamer_archive_context;
-
-/*
- * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
- * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
- * pass a pointer to an instance of this struct. The details are expected
- * to be present in the archive header and used to fill the struct, after
- * which all subsequent calls for the same archive member are expected to
- * pass the same details.
- */
-typedef struct
-{
- char pathname[MAXPGPATH];
- pgoff_t size;
- mode_t mode;
- uid_t uid;
- gid_t gid;
- bool is_directory;
- bool is_link;
- char linktarget[MAXPGPATH];
-} bbstreamer_member;
-
-/*
- * Generally, each type of bbstreamer will define its own struct, but the
- * first element should be 'bbstreamer base'. A bbstreamer that does not
- * require any additional private data could use this structure directly.
- *
- * bbs_ops is a pointer to the bbstreamer_ops object which contains the
- * function pointers appropriate to this type of bbstreamer.
- *
- * bbs_next is a pointer to the successor bbstreamer, for those types of
- * bbstreamer which forward data to a successor. It need not be used and
- * should be set to NULL when not relevant.
- *
- * bbs_buffer is a buffer for accumulating data for temporary storage. Each
- * type of bbstreamer makes its own decisions about whether and how to use
- * this buffer.
- */
-struct bbstreamer
-{
- const bbstreamer_ops *bbs_ops;
- bbstreamer *bbs_next;
- StringInfoData bbs_buffer;
-};
-
-/*
- * There are three callbacks for a bbstreamer. The 'content' callback is
- * called repeatedly, as described in the bbstreamer_archive_context comments.
- * Then, the 'finalize' callback is called once at the end, to give the
- * bbstreamer a chance to perform cleanup such as closing files. Finally,
- * because this code is running in a frontend environment where, as of this
- * writing, there are no memory contexts, the 'free' callback is called to
- * release memory. These callbacks should always be invoked using the static
- * inline functions defined below.
- */
-struct bbstreamer_ops
-{
- void (*content) (bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
- void (*finalize) (bbstreamer *streamer);
- void (*free) (bbstreamer *streamer);
-};
-
-/* Send some content to a bbstreamer. */
-static inline void
-bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->content(streamer, member, data, len, context);
-}
-
-/* Finalize a bbstreamer. */
-static inline void
-bbstreamer_finalize(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->finalize(streamer);
-}
-
-/* Free a bbstreamer. */
-static inline void
-bbstreamer_free(bbstreamer *streamer)
-{
- Assert(streamer != NULL);
- streamer->bbs_ops->free(streamer);
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outside callers. It adds the amount of data specified by
- * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
- * accordingly.
- */
-static inline void
-bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
- int nbytes)
-{
- Assert(nbytes <= *len);
-
- appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
- *len -= nbytes;
- *data += nbytes;
-}
-
-/*
- * This is a convenience method for use when implementing a bbstreamer; it is
- * not for use by outsider callers. It attempts to add enough data to the
- * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
- * and '*data' accordingly. It returns true if the target length has been
- * reached and false otherwise.
- */
-static inline bool
-bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
- int target_bytes)
-{
- int buflen = streamer->bbs_buffer.len;
-
- if (buflen >= target_bytes)
- {
- /* Target length already reached; nothing to do. */
- return true;
- }
-
- if (buflen + *len < target_bytes)
- {
- /* Not enough data to reach target length; buffer all of it. */
- bbstreamer_buffer_bytes(streamer, data, len, *len);
- return false;
- }
-
- /* Buffer just enough to reach the target length. */
- bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
- return true;
-}
-
-/*
- * Functions for creating bbstreamer objects of various types. See the header
- * comments for each of these functions for details.
- */
-extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
-extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
- const char *(*link_map) (const char *),
- void (*report_output_file) (const char *));
-
-extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- pg_compress_specification *compress);
-extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
-extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
-
-extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
- bool is_recovery_guc_supported,
- PQExpBuffer recoveryconfcontents);
-extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
- char *data, int len);
-
-#endif
diff --git a/src/bin/pg_basebackup/meson.build b/src/bin/pg_basebackup/meson.build
index c00acd5e118..a68dbd7837d 100644
--- a/src/bin/pg_basebackup/meson.build
+++ b/src/bin/pg_basebackup/meson.build
@@ -1,12 +1,12 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
common_sources = files(
- 'bbstreamer_file.c',
- 'bbstreamer_gzip.c',
- 'bbstreamer_inject.c',
- 'bbstreamer_lz4.c',
- 'bbstreamer_tar.c',
- 'bbstreamer_zstd.c',
+ 'astreamer_file.c',
+ 'astreamer_gzip.c',
+ 'astreamer_inject.c',
+ 'astreamer_lz4.c',
+ 'astreamer_tar.c',
+ 'astreamer_zstd.c',
'receivelog.c',
'streamutil.c',
'walmethods.c',
diff --git a/src/bin/pg_basebackup/nls.mk b/src/bin/pg_basebackup/nls.mk
index 384dbb021e9..950b9797b1e 100644
--- a/src/bin/pg_basebackup/nls.mk
+++ b/src/bin/pg_basebackup/nls.mk
@@ -1,12 +1,12 @@
# src/bin/pg_basebackup/nls.mk
CATALOG_NAME = pg_basebackup
GETTEXT_FILES = $(FRONTEND_COMMON_GETTEXT_FILES) \
- bbstreamer_file.c \
- bbstreamer_gzip.c \
- bbstreamer_inject.c \
- bbstreamer_lz4.c \
- bbstreamer_tar.c \
- bbstreamer_zstd.c \
+ astreamer_file.c \
+ astreamer_gzip.c \
+ astreamer_inject.c \
+ astreamer_lz4.c \
+ astreamer_tar.c \
+ astreamer_zstd.c \
pg_basebackup.c \
pg_createsubscriber.c \
pg_receivewal.c \
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8f3dd04fd22..4179b064cbc 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -26,8 +26,8 @@
#endif
#include "access/xlog_internal.h"
+#include "astreamer.h"
#include "backup/basebackup.h"
-#include "bbstreamer.h"
#include "common/compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
@@ -57,8 +57,8 @@ typedef struct ArchiveStreamState
{
int tablespacenum;
pg_compress_specification *compress;
- bbstreamer *streamer;
- bbstreamer *manifest_inject_streamer;
+ astreamer *streamer;
+ astreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
char manifest_filename[MAXPGPATH];
FILE *manifest_file;
@@ -67,7 +67,7 @@ typedef struct ArchiveStreamState
typedef struct WriteTarState
{
int tablespacenum;
- bbstreamer *streamer;
+ astreamer *streamer;
} WriteTarState;
typedef struct WriteManifestState
@@ -199,8 +199,8 @@ static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *fo
static void progress_update_filename(const char *filename);
static void progress_report(int tablespacenum, bool force, bool finished);
-static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+static astreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress);
@@ -1053,19 +1053,19 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
* the options selected by the user. We may just write the results directly
* to a file, or we might compress first, or we might extract the tar file
* and write each member separately. This function doesn't do any of that
- * directly, but it works out what kind of bbstreamer we need to create so
+ * directly, but it works out what kind of astreamer we need to create so
* that the right stuff happens when, down the road, we actually receive
* the data.
*/
-static bbstreamer *
+static astreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
- bbstreamer **manifest_inject_streamer_p,
+ astreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile,
pg_compress_specification *compress)
{
- bbstreamer *streamer = NULL;
- bbstreamer *manifest_inject_streamer = NULL;
+ astreamer *streamer = NULL;
+ astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
is_tar_gz,
@@ -1160,7 +1160,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
directory = psprintf("%s/%s", basedir, spclocation);
else
directory = get_tablespace_mapping(spclocation);
- streamer = bbstreamer_extractor_new(directory,
+ streamer = astreamer_extractor_new(directory,
get_tablespace_mapping,
progress_update_filename);
}
@@ -1188,27 +1188,27 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
if (compress->algorithm == PG_COMPRESSION_NONE)
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
else if (compress->algorithm == PG_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
- streamer = bbstreamer_gzip_writer_new(archive_filename,
+ streamer = astreamer_gzip_writer_new(archive_filename,
archive_file, compress);
}
else if (compress->algorithm == PG_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer, compress);
+ streamer = astreamer_lz4_compressor_new(streamer, compress);
}
else if (compress->algorithm == PG_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
- streamer = bbstreamer_plain_writer_new(archive_filename,
+ streamer = astreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer, compress);
+ streamer = astreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1222,7 +1222,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* into it.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_archiver_new(streamer);
+ streamer = astreamer_tar_archiver_new(streamer);
progress_update_filename(archive_filename);
}
@@ -1241,7 +1241,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (spclocation == NULL && writerecoveryconf)
{
Assert(must_parse_archive);
- streamer = bbstreamer_recovery_injector_new(streamer,
+ streamer = astreamer_recovery_injector_new(streamer,
is_recovery_guc_supported,
recoveryconfcontents);
}
@@ -1253,9 +1253,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* we're talking to such a server we'll need to add the terminator here.
*/
if (must_parse_archive)
- streamer = bbstreamer_tar_parser_new(streamer);
+ streamer = astreamer_tar_parser_new(streamer);
else if (expect_unterminated_tarfile)
- streamer = bbstreamer_tar_terminator_new(streamer);
+ streamer = astreamer_tar_terminator_new(streamer);
/*
* If the user has requested a server compressed archive along with
@@ -1264,11 +1264,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (format == 'p')
{
if (is_tar_gz)
- streamer = bbstreamer_gzip_decompressor_new(streamer);
+ streamer = astreamer_gzip_decompressor_new(streamer);
else if (is_tar_lz4)
- streamer = bbstreamer_lz4_decompressor_new(streamer);
+ streamer = astreamer_lz4_decompressor_new(streamer);
else if (is_tar_zstd)
- streamer = bbstreamer_zstd_decompressor_new(streamer);
+ streamer = astreamer_zstd_decompressor_new(streamer);
}
/* Return the results. */
@@ -1307,7 +1307,7 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
if (state.manifest_inject_streamer != NULL &&
state.manifest_buffer != NULL)
{
- bbstreamer_inject_file(state.manifest_inject_streamer,
+ astreamer_inject_file(state.manifest_inject_streamer,
"backup_manifest",
state.manifest_buffer->data,
state.manifest_buffer->len);
@@ -1318,8 +1318,8 @@ ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
/* If there's still an archive in progress, end processing. */
if (state.streamer != NULL)
{
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
state.streamer = NULL;
}
}
@@ -1383,8 +1383,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
/* End processing of any prior archive. */
if (state->streamer != NULL)
{
- bbstreamer_finalize(state->streamer);
- bbstreamer_free(state->streamer);
+ astreamer_finalize(state->streamer);
+ astreamer_free(state->streamer);
state->streamer = NULL;
}
@@ -1437,8 +1437,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
else if (state->streamer != NULL)
{
/* Archive data. */
- bbstreamer_content(state->streamer, NULL, copybuf + 1,
- r - 1, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, ASTREAMER_UNKNOWN);
}
else
pg_fatal("unexpected payload data");
@@ -1600,7 +1600,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum, pg_compress_specification *compress)
{
WriteTarState state;
- bbstreamer *manifest_inject_streamer;
+ astreamer *manifest_inject_streamer;
bool is_recovery_guc_supported;
bool expect_unterminated_tarfile;
@@ -1636,7 +1636,7 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
pg_fatal("out of memory");
/* Inject it into the output tarfile. */
- bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ astreamer_inject_file(manifest_inject_streamer, "backup_manifest",
buf.data, buf.len);
/* Free memory. */
@@ -1644,8 +1644,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
}
/* Cleanup. */
- bbstreamer_finalize(state.streamer);
- bbstreamer_free(state.streamer);
+ astreamer_finalize(state.streamer);
+ astreamer_free(state.streamer);
progress_report(tablespacenum, true, false);
@@ -1663,7 +1663,7 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
+ astreamer_content(state->streamer, NULL, copybuf, r, ASTREAMER_UNKNOWN);
totaldone += r;
progress_report(state->tablespacenum, false, false);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8de9978ad8d..ba9e0200b3f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3311,19 +3311,19 @@ bbsink_shell
bbsink_state
bbsink_throttle
bbsink_zstd
-bbstreamer
-bbstreamer_archive_context
-bbstreamer_extractor
-bbstreamer_gzip_decompressor
-bbstreamer_gzip_writer
-bbstreamer_lz4_frame
-bbstreamer_member
-bbstreamer_ops
-bbstreamer_plain_writer
-bbstreamer_recovery_injector
-bbstreamer_tar_archiver
-bbstreamer_tar_parser
-bbstreamer_zstd_frame
+astreamer
+astreamer_archive_context
+astreamer_extractor
+astreamer_gzip_decompressor
+astreamer_gzip_writer
+astreamer_lz4_frame
+astreamer_member
+astreamer_ops
+astreamer_plain_writer
+astreamer_recovery_injector
+astreamer_tar_archiver
+astreamer_tar_parser
+astreamer_zstd_frame
bgworker_main_type
bh_node_type
binaryheap
--
2.18.0
On Fri, Aug 2, 2024 at 7:43 AM Amul Sul <sulamul@gmail.com> wrote:
Please consider the attached version for the review.
Thanks. I committed 0001-0003. The only thing that I changed was that
in 0001, you forgot to pgindent, which actually mattered quite a bit,
because astreamer is one character shorter than bbstreamer.
Before we proceed with the rest of this patch series, I think we
should fix up the comments for some of the astreamer files. Proposed
patch for that attached; please review.
I also noticed that cfbot was unhappy about this patch set:
[10:37:55.075] pg_verifybackup.c:100:7: error: no previous extern
declaration for non-static variable 'format'
[-Werror,-Wmissing-variable-declarations]
[10:37:55.075] char format = '\0'; /* p(lain)/t(ar) */
[10:37:55.075] ^
[10:37:55.075] pg_verifybackup.c:100:1: note: declare 'static' if the
variable is not intended to be used outside of this translation unit
[10:37:55.075] char format = '\0'; /* p(lain)/t(ar) */
[10:37:55.075] ^
[10:37:55.075] pg_verifybackup.c:101:23: error: no previous extern
declaration for non-static variable 'compress_algorithm'
[-Werror,-Wmissing-variable-declarations]
[10:37:55.075] pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
[10:37:55.075] ^
[10:37:55.075] pg_verifybackup.c:101:1: note: declare 'static' if the
variable is not intended to be used outside of this translation unit
[10:37:55.075] pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
[10:37:55.075] ^
[10:37:55.075] 2 errors generated.
Please fix and, after posting future versions of the patch set, try to
remember to check http://cfbot.cputube.org/amul-sul.html
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v6-0001-Improve-file-header-comments-for-astramer-code.patchapplication/octet-stream; name=v6-0001-Improve-file-header-comments-for-astramer-code.patchDownload
From 2ae018686a1e1b9b2ada8735e4e35f214b8a75bd Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 5 Aug 2024 12:45:58 -0400
Subject: [PATCH v6] Improve file header comments for astramer code.
Make it clear that "astreamer" stands for "archive streamer".
Generalize comments that still believe this code can only be used
by pg_basebackup. Add some comments explaining the asymmetry
between the gzip, lz4, and zstd astreamers, in the hopes of making
life easier for anyone who hacks on this code in the future.
---
src/fe_utils/astreamer_file.c | 4 ++++
src/fe_utils/astreamer_gzip.c | 15 +++++++++++++++
src/fe_utils/astreamer_lz4.c | 4 ++++
src/fe_utils/astreamer_zstd.c | 4 ++++
src/include/fe_utils/astreamer.h | 21 +++++++++++++++------
5 files changed, 42 insertions(+), 6 deletions(-)
diff --git a/src/fe_utils/astreamer_file.c b/src/fe_utils/astreamer_file.c
index 13d1192c6e..e75d166ebf 100644
--- a/src/fe_utils/astreamer_file.c
+++ b/src/fe_utils/astreamer_file.c
@@ -2,6 +2,10 @@
*
* astreamer_file.c
*
+ * Archive streamers that write to files. astreamer_plan_writer writes
+ * the whole archive to a single file, and astreamer_extractor writes
+ * each archive member to a separate file in a given directory.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/fe_utils/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
index dd28defac7..1c773a2384 100644
--- a/src/fe_utils/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -2,6 +2,21 @@
*
* astreamer_gzip.c
*
+ * Archive streamers that deal with data compressed using gzip.
+ * astreamer_gzip_writer applies gzip compression to the input data
+ * and writes the result to a file. astreamer_gzip_decompressor assumes
+ * that the input stream is compressed using gzip and decompresses it.
+ *
+ * Note that the code in this file is asymmetric with what we do for
+ * other compression types: for lz4 and zstd, there is a compressor and
+ * a decompressor, rather than a writer and a decompressor. The approach
+ * taken here is less flexible, because a writer can only write to a file,
+ * while a compressor can write to a subsequent astreamer which is free
+ * to do whatever it likes. The reason it's like this is because this
+ * code was adapated from old, less-modular pg_basebackup that used the
+ * same APIs that astreamer_gzip_writer uses, and it didn't seem
+ * necessary to change anything at the time.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index d8b2a367e4..2bf14084e7 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -2,6 +2,10 @@
*
* astreamer_lz4.c
*
+ * Archive streamers that deal with data compressed using lz4.
+ * astreamer_lz4_compressor applies lz4 compression to the input stream,
+ * and astreamer_lz4_decompressor does the reverse.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/fe_utils/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
index 45f6cb6736..4b2d42b231 100644
--- a/src/fe_utils/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -2,6 +2,10 @@
*
* astreamer_zstd.c
*
+ * Archive streamers that deal with data compressed using zstd.
+ * astreamer_zstd_compressor applies lz4 compression to the input stream,
+ * and astreamer_zstd_decompressor does the reverse.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/include/fe_utils/astreamer.h b/src/include/fe_utils/astreamer.h
index 2c014dbddb..570cfba304 100644
--- a/src/include/fe_utils/astreamer.h
+++ b/src/include/fe_utils/astreamer.h
@@ -2,9 +2,18 @@
*
* astreamer.h
*
- * Each tar archive returned by the server is passed to one or more
- * astreamer objects for further processing. The astreamer may do
- * something simple, like write the archive to a file, perhaps after
+ * The "archive streamer" interface is intended to allow frontend code
+ * to stream from possibly-compressed archive files from any source and
+ * perform arbitrary actions based on the contents of those archives.
+ * Archive streamers are intended to be composable, and most tasks will
+ * require two or more archive streamers to complete. For instance,
+ * if the input is an uncompressed tar stream, a tar parser astreamer
+ * could be used to interpret it, and then an extractor astreamer could
+ * be used to write each archive member out to a file.
+ *
+ * In general, each archive streamer is relatively free to take whatever
+ * action it desires in the stream of chunks provided by the caller. It
+ * may do something simple, like write the archive to a file, perhaps after
* compressing it, but it can also do more complicated things, like
* annotating the byte stream to indicate which parts of the data
* correspond to tar headers or trailing padding, vs. which parts are
@@ -33,9 +42,9 @@ typedef struct astreamer_ops astreamer_ops;
/*
* Each chunk of archive data passed to a astreamer is classified into one
- * of these categories. When data is first received from the remote server,
- * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks will
- * be of whatever size the remote server chose to send.
+ * of these categories. When data is initially passed to an archive streamer,
+ * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks can
+ * be of whatever size the caller finds convenient.
*
* If the archive is parsed (e.g. see astreamer_tar_parser_new()), then all
* chunks should be labelled as one of the other types listed here. In
--
2.39.3 (Apple Git-145)
On Mon, Aug 5, 2024 at 10:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Aug 2, 2024 at 7:43 AM Amul Sul <sulamul@gmail.com> wrote:
Please consider the attached version for the review.
Thanks. I committed 0001-0003. The only thing that I changed was that
in 0001, you forgot to pgindent, which actually mattered quite a bit,
because astreamer is one character shorter than bbstreamer.
Understood. Thanks for tidying up and committing the patches.
Before we proceed with the rest of this patch series, I think we
should fix up the comments for some of the astreamer files. Proposed
patch for that attached; please review.
Looks good to me, except for the following typo that I have fixed in
the attached version:
s/astreamer_plan_writer/astreamer_plain_writer/
I also noticed that cfbot was unhappy about this patch set:
[10:37:55.075] pg_verifybackup.c:100:7: error: no previous extern
declaration for non-static variable 'format'
[-Werror,-Wmissing-variable-declarations]
[10:37:55.075] char format = '\0'; /* p(lain)/t(ar) */
[10:37:55.075] ^
[10:37:55.075] pg_verifybackup.c:100:1: note: declare 'static' if the
variable is not intended to be used outside of this translation unit
[10:37:55.075] char format = '\0'; /* p(lain)/t(ar) */
[10:37:55.075] ^
[10:37:55.075] pg_verifybackup.c:101:23: error: no previous extern
declaration for non-static variable 'compress_algorithm'
[-Werror,-Wmissing-variable-declarations]
[10:37:55.075] pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
[10:37:55.075] ^
[10:37:55.075] pg_verifybackup.c:101:1: note: declare 'static' if the
variable is not intended to be used outside of this translation unit
[10:37:55.075] pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
[10:37:55.075] ^
[10:37:55.075] 2 errors generated.
Fixed in the attached version.
Please fix and, after posting future versions of the patch set, try to
remember to check http://cfbot.cputube.org/amul-sul.html
Sure. I used to rely on that earlier, but after Cirrus CI in the
GitHub repo, I assumed the workflow would be the same as cfbot and
started overlooking it. However, cfbot reported a warning that didn't
appear in my GitHub run. From now on, I'll make sure to check cfbot as
well.
Regards,
Amul
Attachments:
v7-0010-pg_verifybackup-Add-backup-format-and-compression.patchapplication/x-patch; name=v7-0010-pg_verifybackup-Add-backup-format-and-compression.patchDownload
From 975b994d8ca76f3c16b7eb9119e45ec25dba410b Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v7 10/12] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 146 +++++++++++++++++++++-
1 file changed, 144 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 801e13886c2..f20b6e2895c 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
#include <sys/stat.h>
#include <time.h>
+#include "common/compression.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -56,6 +57,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
+static pg_compress_algorithm find_backup_compression(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
@@ -74,6 +77,9 @@ static void usage(void);
static const char *progname;
+static char format = '\0'; /* p(lain)/t(ar) */
+static pg_compress_algorithm compress_algorithm = PG_COMPRESSION_NONE;
+
/*
* Main entry point.
*/
@@ -84,11 +90,13 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
{"wal-directory", required_argument, NULL, 'w'},
+ {"compress", required_argument, NULL, 'Z'},
{NULL, 0, NULL, 0}
};
@@ -99,6 +107,7 @@ main(int argc, char **argv)
bool quiet = false;
char *wal_directory = NULL;
char *pg_waldump_path = NULL;
+ bool tar_compression_specified = false;
pg_logging_init(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verifybackup"));
@@ -141,7 +150,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:Z:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -160,6 +169,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -176,6 +194,12 @@ main(int argc, char **argv)
wal_directory = pstrdup(optarg);
canonicalize_path(wal_directory);
break;
+ case 'Z':
+ if (!parse_compress_algorithm(optarg, &compress_algorithm))
+ pg_fatal("unrecognized compression algorithm: \"%s\"",
+ optarg);
+ tar_compression_specified = true;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -207,11 +231,41 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Complain if compression method specified but the format isn't tar. */
+ if (format != 't' && tar_compression_specified)
+ {
+ pg_log_error("only tar mode backups can be compressed");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Determine the backup format if it hasn't been specified. */
+ if (format == '\0')
+ format = find_backup_format(&context);
+
+ /*
+ * Determine the tar backup compression method if it hasn't been
+ * specified.
+ */
+ if (format == 't' && !tar_compression_specified)
+ compress_algorithm = find_backup_compression(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (format != 'p')
+ {
+ pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+ pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -279,7 +333,15 @@ main(int argc, char **argv)
*/
if (!context.skip_checksums)
{
- verify_backup_checksums(&context);
+ /*
+ * We were only checking the plain backup here. For the tar backup,
+ * file checksums verification (if requested) will be done immediately
+ * when the file is accessed, as we don't have random access to the
+ * files like we do with plain backups.
+ */
+ if (format == 'p')
+ verify_backup_checksums(&context);
+
progress_report(&context, true);
}
@@ -972,6 +1034,84 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ result = (stat(path, &sb) == 0) ? 'p' : 't';
+ pfree(path);
+
+ return result;
+}
+
+/*
+ * To determine the compression format, we will search for the main data
+ * directory archive and its extension, which starts with base.tar, as
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ */
+static pg_compress_algorithm
+find_backup_compression(verifier_context *context)
+{
+ char *path;
+ struct stat sb;
+ bool found;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * Is this a tar archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_NONE;
+
+ /*
+ * Is this a .tar.gz archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.gz");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_GZIP;
+
+ /*
+ * Is this a .tar.lz4 archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.lz4");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_LZ4;
+
+ /*
+ * Is this a .tar.zst archive?
+ */
+ path = psprintf("%s/%s", context->backup_directory, "base.tar.zst");
+ found = (stat(path, &sb) == 0);
+ pfree(path);
+ if (found)
+ return PG_COMPRESSION_ZSTD;
+
+ return PG_COMPRESSION_NONE; /* placate compiler */
+}
+
/*
* Print a progress report based on the variables in verifier_context.
*
@@ -1054,11 +1194,13 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -Z, --compress=METHOD compress method (gzip, lz4, zstd, none) \n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
--
2.18.0
v7-0012-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v7-0012-pg_verifybackup-Tests-and-document.patchDownload
From c8ca272edb9cc433f7eb680876f5947b98fe010f Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 17:04:56 +0530
Subject: [PATCH v7 12/12] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 54 +++++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 18 ++++-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 96 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..c743bd89a92 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. The main data directory's contents
+ will be written to a file named <filename>base.tar</filename>,
+ and each other tablespace will be written to a separate tar file
+ named after that tablespace's OID.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
@@ -227,6 +265,18 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-Z <replaceable class="parameter">method</replaceable></option></term>
+ <term><option>--compress=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ The tar backup compression method can be <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> if no compression.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..d47ce1f04fc 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,13 +17,25 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
+command_fails_like(
+ [ 'pg_verifybackup', '-Zgzip', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Zlz4', $tempdir ],
+ qr/only tar mode backups can be compressed/,
+ 'compression method required tar format option');
+command_fails_like(
+ [ 'pg_verifybackup', '-Fp', '-Znon_exist', $tempdir ],
+ qr/unrecognized compression algorithm/,
+ 'compression method should be valid');
# create fake manifest file
open(my $fh, '>', "$tempdir/backup_manifest") || die "open: $!";
@@ -31,7 +43,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v7-0011-pg_verifybackup-Read-tar-files-and-verify-its-con.patchapplication/x-patch; name=v7-0011-pg_verifybackup-Read-tar-files-and-verify-its-con.patchDownload
From c9415e97006e5765151d7fcd3990bcb0c4a05966 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v7 11/12] pg_verifybackup: Read tar files and verify its
contents
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 367 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 216 +++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 9 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 594 insertions(+), 9 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..be40922c042
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,367 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/astreamer.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archiveName;
+ Oid tblspcOid;
+
+ /* Hold information for a member file verification */
+ manifest_file *mfile;
+ int64 receivedBytes;
+ bool verifyChecksums;
+ bool verifyControlData;
+ pg_checksum_context *checksum_ctx;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void verify_member_header(astreamer *streamer, astreamer_member *member);
+static void verify_member_contents(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void verify_content_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *buffer, int buffer_len);
+static void verify_controldata(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void reset_member_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archiveName = archive_name;
+ streamer->tblspcOid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ return &streamer->base;
+}
+
+/*
+ * It verifies each TAR member entry against the manifest data and performs
+ * checksum verification if enabled. Additionally, it validates the backup's
+ * system identifier against the backup_manifest.
+ */
+static void
+astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member, const char *data,
+ int len, astreamer_archive_context context)
+{
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+
+ /*
+ * Perform the initial check and setup for the verification.
+ */
+ verify_member_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Peform the required contants verification.
+ */
+ verify_member_contents(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * Reset the temporary information stored for a verification.
+ */
+ reset_member_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verify the entry if it is a file in the backup manifest. If the archive being
+ * processed is a tablespace, prepare the required file path for subsequent
+ * operations. Finally, check if it needs to perform checksum verification and
+ * control data verification during file content processing.
+ */
+static void
+verify_member_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+
+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;
+
+ /*
+ * The backup_manifest stores a relative path to the base directory for
+ * files belong tablespace, whereas <tablespaceoid>.tar doesn't. Prepare
+ * the required path, otherwise, the manfiest entry verification will
+ * fail.
+ */
+ if (OidIsValid(mystreamer->tblspcOid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspcOid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and manifest system identifier verification.
+ *
+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, having a
+ * single flag would be more efficient.
+ */
+ mystreamer->verifyChecksums =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verifyControlData =
+ should_verify_control_data(mystreamer->context->manifest, m);
+}
+
+/*
+ * Process the member content according to the flags set by the member header
+ * processing routine for checksum and control data verification.
+ */
+static void
+verify_member_contents(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ /* Verify the checksums */
+ if (mystreamer->verifyChecksums)
+ verify_content_checksum(streamer, member, data, len);
+
+ /* Verify pg_control information */
+ if (mystreamer->verifyControlData)
+ verify_controldata(streamer, member, data, len);
+}
+
+/*
+ * Similar to verify_file_checksum() but this function computes the checksum
+ * incrementally for the received file content. Unlike a normal backup
+ * directory, TAR format files do not allow random access, so checksum
+ * verification occurs progressively. Additionally, the function calls the
+ * routine for control data verification if the flags indicate that it is
+ * required.
+ *
+ * On the first visit, the function initializes checksum_ctx, which will be used
+ * for incremental checksum calculation. Once the complete file content is
+ * received (tracked using the receivedBytes), the routine that performs the
+ * final checksum verification is called
+ */
+static void
+verify_content_checksum(astreamer *streamer, astreamer_member *member,
+ const char *buffer, int buffer_len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ verifier_context *context = mystreamer->context;
+ manifest_file *m = mystreamer->mfile;
+ const char *relpath = m->pathname;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
+ /*
+ * Mark it false to avoid unexpected re-entrance for the same file content
+ * (e.g. returned in error should not be revisited).
+ */
+ Assert(mystreamer->verifyChecksums);
+ mystreamer->verifyChecksums = false;
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, relpath) == 0);
+
+ /* If we were first time for this file */
+ if (!checksum_ctx)
+ {
+ checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+ mystreamer->checksum_ctx = checksum_ctx;
+
+ if (pg_checksum_init(checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archiveName, relpath);
+ return;
+ }
+ }
+
+ /* Update the total count of computed checksum bytes. */
+ mystreamer->receivedBytes += buffer_len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) buffer, buffer_len) < 0)
+ {
+ report_backup_error(context, "could not update checksum of file \"%s\"",
+ relpath);
+ return;
+ }
+
+ /* Report progress */
+ context->done_size += buffer_len;
+ progress_report(context, false);
+
+ /* Yet to receive the full content of the file. */
+ if (mystreamer->receivedBytes < m->size)
+ {
+ mystreamer->verifyChecksums = true;
+ return;
+ }
+
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, checksum_ctx, checksumbuf);
+}
+
+/*
+ * Prepare the control data from the received file contents, which are supposed
+ * to be from the pg_control file, including CRC calculation. Then, call the
+ * routines that perform the final verification of the control file information.
+ */
+static void
+verify_controldata(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(manifest->version != 1);
+
+ /* Mark it as false to avoid unexpected re-entrance */
+ Assert(mystreamer->verifyControlData);
+ mystreamer->verifyControlData = false;
+
+ /* Should have whole control file data. */
+ if (!astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData)))
+ {
+ mystreamer->verifyControlData = true;
+ return;
+ }
+
+ pg_log_debug("%s: reading \"%s\"", mystreamer->archiveName,
+ member->pathname);
+
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: could not read control file: read %d of %zu",
+ mystreamer->archiveName, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data,
+ sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ (char *) (&control_file),
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(&control_file, member->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+reset_member_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->receivedBytes = 0;
+ mystreamer->verifyChecksums = false;
+ mystreamer->verifyControlData = false;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+ mystreamer->checksum_ctx = NULL;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index f20b6e2895c..ce2ba7437be 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,12 @@
#include "common/compression.h"
#include "common/parse_manifest.h"
+#include "common/relpath.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
#include "pg_verifybackup.h"
+#include "pgtar.h"
#include "pgtime.h"
/*
@@ -39,6 +42,11 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
+
+static void (*verify_backup_file_cb) (verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -63,6 +71,15 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
+static void verify_plain_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_file_cb(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_content(verifier_context *context,
+ char *relpath, char *fullpath,
+ astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -71,6 +88,9 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
static void compute_total_size(verifier_context *context);
static void usage(void);
@@ -146,6 +166,10 @@ main(int argc, char **argv)
*/
simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -250,6 +274,15 @@ main(int argc, char **argv)
if (format == 't' && !tar_compression_specified)
compress_algorithm = find_backup_compression(&context);
+ /*
+ * Setup the required callback function to verify plain or tar backup
+ * files.
+ */
+ if (format == 'p')
+ verify_backup_file_cb = verify_plain_file_cb;
+ else
+ verify_backup_file_cb = verify_tar_file_cb;
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
@@ -645,7 +678,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
}
/*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
*
* The arguments to this function have the same meaning as the arguments to
* verify_backup_directory.
@@ -654,7 +688,6 @@ static void
verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
{
struct stat sb;
- manifest_file *m;
if (stat(fullpath, &sb) != 0)
{
@@ -687,8 +720,25 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Do the further verifications */
+ verify_backup_file_cb(context, relpath, fullpath, sb.st_size);
+}
+
+/*
+ * Verify one plan file or a symlink.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_plain_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ manifest_file *m;
+
/* Check the backup manifest entry for this file. */
- m = verify_manifest_entry(context, relpath, sb.st_size);
+ m = verify_manifest_entry(context, relpath, filesize);
/* Validate the manifest system identifier */
if (should_verify_control_data(context->manifest, m))
@@ -706,6 +756,124 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
}
}
+/*
+ * Verify one tar file.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_tar_file_cb(verifier_context *context, char *relpath,
+ char *fullpath, size_t filesize)
+{
+ astreamer *streamer;
+ Oid tblspc_oid = InvalidOid;
+ int file_name_len;
+ int file_extn_len = 0; /* placate compiler */
+ char *file_extn = "";
+
+ /* Should be tar backup */
+ Assert(format == 't');
+
+ /* Find the tar file extension. */
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ {
+ file_extn = ".tar";
+ file_extn_len = 4;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_GZIP)
+ {
+ file_extn = ".tar.gz";
+ file_extn_len = 7;
+
+ }
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ {
+ file_extn = ".tar.lz4";
+ file_extn_len = 8;
+ }
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ {
+ file_extn = ".tar.zst";
+ file_extn_len = 8;
+ }
+
+ /*
+ * Ensure that we have the correct file type corresponding to the backup
+ * format.
+ */
+ file_name_len = strlen(relpath);
+ if (file_name_len < file_extn_len ||
+ strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0)
+ {
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting tar file",
+ relpath);
+ else
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting \"%s\" compressed tar file",
+ relpath,
+ get_compress_algorithm_name(compress_algorithm));
+ return;
+ }
+
+ /*
+ * For the tablespace, pg_basebackup writes the data out to
+ * <tablespaceoid>.tar. If a file matches that format, then extract the
+ * tablespaceoid, which we need to prepare the paths of the files
+ * belonging to that tablespace relative to the base directory.
+ */
+ if (strspn(relpath, "0123456789") == (file_name_len - file_extn_len))
+ tblspc_oid = strtoi64(relpath, NULL, 10);
+
+ streamer = create_archive_verifier(context, relpath, tblspc_oid);
+ verify_tar_content(context, relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+}
+
+/*
+ * Reads a given tar file in predefined chunks and pass to astreamer. Which
+ * initiates routines for decompression (if necessary) then verification
+ * of each member within the tar archive.
+ */
+static void
+verify_tar_content(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1058,10 +1226,10 @@ find_backup_format(verifier_context *context)
}
/*
- * To determine the compression format, we will search for the main data
- * directory archive and its extension, which starts with base.tar, as
* pg_basebackup writes the main data directory to an archive file named
- * base.tar followed by a compression type extension like .gz, .lz4, or .zst.
+ * base.tar, followed by a compression type extension such as .gz, .lz4, or
+ * .zst. To determine the compression format, we need to search for this main
+ * data directory archive file.
*/
static pg_compress_algorithm
find_backup_compression(verifier_context *context)
@@ -1112,6 +1280,42 @@ find_backup_compression(verifier_context *context)
return PG_COMPRESSION_NONE; /* placate compiler */
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algorithm == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algorithm == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the variables in verifier_context.
*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index c88f71ff14b..f0a7c8918fb 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -137,4 +137,13 @@ extern bool should_ignore_relpath(verifier_context *context,
extern void progress_report(verifier_context *context, bool finished);
+/* Forward declarations to avoid fe_utils/astreamer.h include. */
+struct astreamer;
+typedef struct astreamer astreamer;
+
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 547d14b3e7c..d86b28b260e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3329,6 +3329,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
--
2.18.0
v7-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v7-0008-Refactor-split-verify_control_file.patchDownload
From 44a78699dacfa90f36a37c8868ad11a50d53cb12 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 15:47:26 +0530
Subject: [PATCH v7 08/12] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Note that should_verify_checksum() has been slightly modified to
include a NULL check for its argument, maintaining the same code
structure as should_verify_control_data().
---
src/bin/pg_verifybackup/pg_verifybackup.c | 44 +++++++++++------------
src/bin/pg_verifybackup/pg_verifybackup.h | 18 +++++++++-
2 files changed, 37 insertions(+), 25 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3eddaa2468e..5f055a23a63 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -625,14 +622,20 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the manifest system identifier */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
}
/*
@@ -676,18 +679,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -703,9 +702,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 1bc5f7a6b4a..c88f71ff14b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
#include "common/controldata_utils.h"
#include "common/hashfn_unstable.h"
+#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
@@ -44,7 +45,19 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1. This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
/*
* Define a hash table which we can use to store information about the files
@@ -110,6 +123,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
uint8 *checksumbuf);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v7-0009-Refactor-move-first-and-last-progress_report-call.patchapplication/x-patch; name=v7-0009-Refactor-move-first-and-last-progress_report-call.patchDownload
From ee279a6a540a640857599951807438cda02f30d6 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Fri, 2 Aug 2024 16:37:38 +0530
Subject: [PATCH v7 09/12] Refactor: move first and last progress_report call
to Main.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5f055a23a63..801e13886c2 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -253,7 +253,10 @@ main(int argc, char **argv)
* read, which occurs only when checksum verification is enabled.
*/
if (!context.skip_checksums)
+ {
compute_total_size(&context);
+ progress_report(&context, false);
+ }
/*
* Now scan the files in the backup directory. At this stage, we verify
@@ -275,7 +278,10 @@ main(int argc, char **argv)
* told to skip it.
*/
if (!context.skip_checksums)
+ {
verify_backup_checksums(&context);
+ progress_report(&context, true);
+ }
/*
* Try to parse the required ranges of WAL records, unless we were told
@@ -736,8 +742,6 @@ verify_backup_checksums(verifier_context *context)
manifest_file *m;
uint8 *buffer;
- progress_report(context, false);
-
buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
manifest_files_start_iterate(manifest->files, &it);
@@ -761,8 +765,6 @@ verify_backup_checksums(verifier_context *context)
}
pfree(buffer);
-
- progress_report(context, true);
}
/*
--
2.18.0
v7-0006-Refactor-split-verify_backup_file-function.patchapplication/x-patch; name=v7-0006-Refactor-split-verify_backup_file-function.patchDownload
From 37148e750aa68092eaa3e13c0c46cb4978c3f67a Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:15:26 +0530
Subject: [PATCH v7 06/12] Refactor: split verify_backup_file() function.
Move the manifest entry verification code into a new function as
verify_manifest_entry(). And the total size computation code into
another new function, compute_total_size(), which is called from the
main.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 76 ++++++++++++++++++-----
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +
2 files changed, 62 insertions(+), 17 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 4e42757c346..ab6bda8c9dc 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -72,6 +72,7 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static void compute_total_size(verifier_context *context);
static void usage(void);
static const char *progname;
@@ -250,6 +251,13 @@ main(int argc, char **argv)
*/
context.manifest = parse_manifest_file(manifest_path);
+ /*
+ * For the progress report, compute the total size of the files to be
+ * read, which occurs only when checksum verification is enabled.
+ */
+ if (!context.skip_checksums)
+ compute_total_size(&context);
+
/*
* Now scan the files in the backup directory. At this stage, we verify
* that every file on disk is present in the manifest and that the sizes
@@ -614,6 +622,27 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -621,40 +650,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
- /* Update statistics for progress report, if necessary */
- if (context->show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- context->total_size += m->size;
-
/*
* We don't verify checksums at this stage. We first finish verifying that
* we have the expected set of files with the expected sizes, and only
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -817,7 +835,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
@@ -988,6 +1006,30 @@ progress_report(verifier_context *context, bool finished)
fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
}
+/*
+ * Compute the total size of backup files for progress reporting.
+ */
+static void
+compute_total_size(verifier_context *context)
+{
+ manifest_data *manifest = context->manifest;
+ manifest_files_iterator it;
+ manifest_file *m;
+ uint64 total_size = 0;
+
+ if (!context->show_progress)
+ return;
+
+ manifest_files_start_iterate(manifest->files, &it);
+ while ((m = manifest_files_iterate(manifest->files, &it)) != NULL)
+ {
+ if (!should_ignore_relpath(context, m->pathname))
+ total_size += m->size;
+ }
+
+ context->total_size = total_size;
+}
+
/*
* Print out usage information and exit.
*/
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 90900048547..98c75916255 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -105,6 +105,9 @@ typedef struct verifier_context
uint64 done_size;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
v7-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v7-0007-Refactor-split-verify_file_checksum-function.patchDownload
From 6ec5031183fe3c42f0dbc69ba5374ea302116ef7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 16:45:55 +0530
Subject: [PATCH v7 07/12] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 18 ++++++++++++++++--
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index ab6bda8c9dc..3eddaa2468e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -782,7 +782,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int rc;
size_t bytes_read = 0;
uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -848,8 +847,23 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
return;
}
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, checksumbuf);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, uint8 *checksumbuf)
+{
+ int checksumlen;
+ const char *relpath = m->pathname;
+
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 98c75916255..1bc5f7a6b4a 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -107,6 +107,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ uint8 *checksumbuf);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v7-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchapplication/x-patch; name=v7-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchDownload
From bb127a0c8f25d15ee9c10d111be7611ce58dcb39 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:10:34 +0530
Subject: [PATCH v7 05/12] Refactor: move some part of pg_verifybackup.c to
pg_verifybackup.h
---
src/bin/pg_verifybackup/pg_verifybackup.c | 102 +------------------
src/bin/pg_verifybackup/pg_verifybackup.h | 118 ++++++++++++++++++++++
2 files changed, 123 insertions(+), 97 deletions(-)
create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 71585ffc50e..4e42757c346 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "pg_verifybackup.h"
#include "pgtime.h"
/*
@@ -40,89 +39,6 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE (128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
- uint32 status; /* hash status */
- const char *pathname;
- size_t size;
- pg_checksum_type checksum_type;
- int checksum_length;
- uint8 *checksum_payload;
- bool matched;
- bool bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX manifest_files
-#define SH_ELEMENT_TYPE manifest_file
-#define SH_KEY_TYPE const char *
-#define SH_KEY pathname
-#define SH_HASH_KEY(tb, key) hash_string(key)
-#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
-#define SH_SCOPE static inline
-#define SH_RAW_ALLOCATOR pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
- TimeLineID tli;
- XLogRecPtr start_lsn;
- XLogRecPtr end_lsn;
- struct manifest_wal_range *next;
- struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
- int version;
- uint64 system_identifier;
- manifest_files_hash *files;
- manifest_wal_range *first_wal_range;
- manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
- manifest_data *manifest;
- char *backup_directory;
- SimpleStringList ignore_list;
- bool skip_checksums;
- bool exit_on_error;
- bool saw_any_error;
-
- /* Progress indicators */
- bool show_progress;
- uint64 total_size;
- uint64 done_size;
-} verifier_context;
-
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -156,14 +72,6 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
-static void report_backup_error(verifier_context *context,
- const char *pg_restrict fmt,...)
- pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
- pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
-static void progress_report(verifier_context *context, bool finished);
static void usage(void);
static const char *progname;
@@ -978,7 +886,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
* Update the context to indicate that we saw an error, and exit if the
* context says we should.
*/
-static void
+void
report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
{
va_list ap;
@@ -995,7 +903,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
/*
* Report a fatal error and exit
*/
-static void
+void
report_fatal_error(const char *pg_restrict fmt,...)
{
va_list ap;
@@ -1014,7 +922,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
* Note that by "prefix" we mean a parent directory; for this purpose,
* "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
*/
-static bool
+bool
should_ignore_relpath(verifier_context *context, const char *relpath)
{
SimpleStringListCell *cell;
@@ -1043,7 +951,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
* If finished is set to true, this is the last progress report. The cursor
* is moved to the next line.
*/
-static void
+void
progress_report(verifier_context *context, bool finished)
{
static pg_time_t last_progress_report = 0;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..90900048547
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,118 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ * Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+extern bool show_progress;
+extern bool skip_checksums;
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+ uint32 status; /* hash status */
+ const char *pathname;
+ size_t size;
+ pg_checksum_type checksum_type;
+ int checksum_length;
+ uint8 *checksum_payload;
+ bool matched;
+ bool bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+ (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX manifest_files
+#define SH_ELEMENT_TYPE manifest_file
+#define SH_KEY_TYPE const char *
+#define SH_KEY pathname
+#define SH_HASH_KEY(tb, key) hash_string(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ struct manifest_wal_range *next;
+ struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+ int version;
+ uint64 system_identifier;
+ manifest_files_hash *files;
+ manifest_wal_range *first_wal_range;
+ manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+ manifest_data *manifest;
+ char *backup_directory;
+ SimpleStringList ignore_list;
+ bool skip_checksums;
+ bool exit_on_error;
+ bool saw_any_error;
+
+ /* Progress indicators */
+ bool show_progress;
+ uint64 total_size;
+ uint64 done_size;
+} verifier_context;
+
+extern void report_backup_error(verifier_context *context,
+ const char *pg_restrict fmt,...)
+ pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+ pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context,
+ const char *relpath);
+
+extern void progress_report(verifier_context *context, bool finished);
+
+#endif /* PG_VERIFYBACKUP_H */
--
2.18.0
v7-0001-Improve-file-header-comments-for-astramer-code.patchapplication/x-patch; name=v7-0001-Improve-file-header-comments-for-astramer-code.patchDownload
From 0ef14ab4be362f6ab48c6ebd501d3036ba4d21d9 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 6 Aug 2024 10:23:45 +0530
Subject: [PATCH v7] Improve file header comments for astramer code.
Make it clear that "astreamer" stands for "archive streamer".
Generalize comments that still believe this code can only be used
by pg_basebackup. Add some comments explaining the asymmetry
between the gzip, lz4, and zstd astreamers, in the hopes of making
life easier for anyone who hacks on this code in the future.
---
src/fe_utils/astreamer_file.c | 4 ++++
src/fe_utils/astreamer_gzip.c | 15 +++++++++++++++
src/fe_utils/astreamer_lz4.c | 4 ++++
src/fe_utils/astreamer_zstd.c | 4 ++++
src/include/fe_utils/astreamer.h | 21 +++++++++++++++------
5 files changed, 42 insertions(+), 6 deletions(-)
diff --git a/src/fe_utils/astreamer_file.c b/src/fe_utils/astreamer_file.c
index 13d1192c6e6..c9a030853bc 100644
--- a/src/fe_utils/astreamer_file.c
+++ b/src/fe_utils/astreamer_file.c
@@ -2,6 +2,10 @@
*
* astreamer_file.c
*
+ * Archive streamers that write to files. astreamer_plain_writer writes
+ * the whole archive to a single file, and astreamer_extractor writes
+ * each archive member to a separate file in a given directory.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/fe_utils/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
index dd28defac7b..1c773a23848 100644
--- a/src/fe_utils/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -2,6 +2,21 @@
*
* astreamer_gzip.c
*
+ * Archive streamers that deal with data compressed using gzip.
+ * astreamer_gzip_writer applies gzip compression to the input data
+ * and writes the result to a file. astreamer_gzip_decompressor assumes
+ * that the input stream is compressed using gzip and decompresses it.
+ *
+ * Note that the code in this file is asymmetric with what we do for
+ * other compression types: for lz4 and zstd, there is a compressor and
+ * a decompressor, rather than a writer and a decompressor. The approach
+ * taken here is less flexible, because a writer can only write to a file,
+ * while a compressor can write to a subsequent astreamer which is free
+ * to do whatever it likes. The reason it's like this is because this
+ * code was adapated from old, less-modular pg_basebackup that used the
+ * same APIs that astreamer_gzip_writer uses, and it didn't seem
+ * necessary to change anything at the time.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index d8b2a367e47..2bf14084e7f 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -2,6 +2,10 @@
*
* astreamer_lz4.c
*
+ * Archive streamers that deal with data compressed using lz4.
+ * astreamer_lz4_compressor applies lz4 compression to the input stream,
+ * and astreamer_lz4_decompressor does the reverse.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/fe_utils/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
index 45f6cb67363..4b2d42b2311 100644
--- a/src/fe_utils/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -2,6 +2,10 @@
*
* astreamer_zstd.c
*
+ * Archive streamers that deal with data compressed using zstd.
+ * astreamer_zstd_compressor applies lz4 compression to the input stream,
+ * and astreamer_zstd_decompressor does the reverse.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/include/fe_utils/astreamer.h b/src/include/fe_utils/astreamer.h
index 2c014dbddbe..570cfba3040 100644
--- a/src/include/fe_utils/astreamer.h
+++ b/src/include/fe_utils/astreamer.h
@@ -2,9 +2,18 @@
*
* astreamer.h
*
- * Each tar archive returned by the server is passed to one or more
- * astreamer objects for further processing. The astreamer may do
- * something simple, like write the archive to a file, perhaps after
+ * The "archive streamer" interface is intended to allow frontend code
+ * to stream from possibly-compressed archive files from any source and
+ * perform arbitrary actions based on the contents of those archives.
+ * Archive streamers are intended to be composable, and most tasks will
+ * require two or more archive streamers to complete. For instance,
+ * if the input is an uncompressed tar stream, a tar parser astreamer
+ * could be used to interpret it, and then an extractor astreamer could
+ * be used to write each archive member out to a file.
+ *
+ * In general, each archive streamer is relatively free to take whatever
+ * action it desires in the stream of chunks provided by the caller. It
+ * may do something simple, like write the archive to a file, perhaps after
* compressing it, but it can also do more complicated things, like
* annotating the byte stream to indicate which parts of the data
* correspond to tar headers or trailing padding, vs. which parts are
@@ -33,9 +42,9 @@ typedef struct astreamer_ops astreamer_ops;
/*
* Each chunk of archive data passed to a astreamer is classified into one
- * of these categories. When data is first received from the remote server,
- * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks will
- * be of whatever size the remote server chose to send.
+ * of these categories. When data is initially passed to an archive streamer,
+ * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks can
+ * be of whatever size the caller finds convenient.
*
* If the archive is parsed (e.g. see astreamer_tar_parser_new()), then all
* chunks should be labelled as one of the other types listed here. In
--
2.18.0
v7-0004-Refactor-move-few-global-variable-to-verifier_con.patchapplication/x-patch; name=v7-0004-Refactor-move-few-global-variable-to-verifier_con.patchDownload
From 43fb489aecb872cc6f9a59ebbdca9c5a1110ea72 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 11:43:52 +0530
Subject: [PATCH v7 04/12] Refactor: move few global variable to
verifier_context struct
Global variables are:
1. show_progress
2. skip_checksums
3. total_size
4. done_size
---
src/bin/pg_verifybackup/pg_verifybackup.c | 50 +++++++++++------------
1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..71585ffc50e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -113,8 +113,14 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
+
+ /* Progress indicators */
+ bool show_progress;
+ uint64 total_size;
+ uint64 done_size;
} verifier_context;
static manifest_data *parse_manifest_file(char *manifest_path);
@@ -157,19 +163,11 @@ static void report_fatal_error(const char *pg_restrict fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();
static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-static void progress_report(bool finished);
+static void progress_report(verifier_context *context, bool finished);
static void usage(void);
static const char *progname;
-/* options */
-static bool show_progress = false;
-static bool skip_checksums = false;
-
-/* Progress indicators */
-static uint64 total_size = 0;
-static uint64 done_size = 0;
-
/*
* Main entry point.
*/
@@ -260,13 +258,13 @@ main(int argc, char **argv)
no_parse_wal = true;
break;
case 'P':
- show_progress = true;
+ context.show_progress = true;
break;
case 'q':
quiet = true;
break;
case 's':
- skip_checksums = true;
+ context.skip_checksums = true;
break;
case 'w':
wal_directory = pstrdup(optarg);
@@ -299,7 +297,7 @@ main(int argc, char **argv)
}
/* Complain if the specified arguments conflict */
- if (show_progress && quiet)
+ if (context.show_progress && quiet)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
@@ -363,7 +361,7 @@ main(int argc, char **argv)
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
*/
- if (!skip_checksums)
+ if (!context.skip_checksums)
verify_backup_checksums(&context);
/*
@@ -739,8 +737,9 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
verify_control_file(fullpath, context->manifest->system_identifier);
/* Update statistics for progress report, if necessary */
- if (show_progress && !skip_checksums && should_verify_checksum(m))
- total_size += m->size;
+ if (context->show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ context->total_size += m->size;
/*
* We don't verify checksums at this stage. We first finish verifying that
@@ -815,7 +814,7 @@ verify_backup_checksums(verifier_context *context)
manifest_file *m;
uint8 *buffer;
- progress_report(false);
+ progress_report(context, false);
buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
@@ -841,7 +840,7 @@ verify_backup_checksums(verifier_context *context)
pfree(buffer);
- progress_report(true);
+ progress_report(context, true);
}
/*
@@ -889,8 +888,8 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
}
/* Report progress */
- done_size += rc;
- progress_report(false);
+ context->done_size += rc;
+ progress_report(context, false);
}
if (rc < 0)
report_backup_error(context, "could not read file \"%s\": %m",
@@ -1036,7 +1035,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
}
/*
- * Print a progress report based on the global variables.
+ * Print a progress report based on the variables in verifier_context.
*
* Progress report is written at maximum once per second, unless the finished
* parameter is set to true.
@@ -1045,7 +1044,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
* is moved to the next line.
*/
static void
-progress_report(bool finished)
+progress_report(verifier_context *context, bool finished)
{
static pg_time_t last_progress_report = 0;
pg_time_t now;
@@ -1053,7 +1052,7 @@ progress_report(bool finished)
char totalsize_str[32];
char donesize_str[32];
- if (!show_progress)
+ if (!context->show_progress)
return;
now = time(NULL);
@@ -1061,12 +1060,13 @@ progress_report(bool finished)
return; /* Max once per second */
last_progress_report = now;
- percent_size = total_size ? (int) ((done_size * 100 / total_size)) : 0;
+ percent_size = context->total_size ?
+ (int) ((context->done_size * 100 / context->total_size)) : 0;
snprintf(totalsize_str, sizeof(totalsize_str), UINT64_FORMAT,
- total_size / 1024);
+ context->total_size / 1024);
snprintf(donesize_str, sizeof(donesize_str), UINT64_FORMAT,
- done_size / 1024);
+ context->done_size / 1024);
fprintf(stderr,
_("%*s/%s kB (%d%%) verified"),
--
2.18.0
On Thu, Aug 1, 2024 at 9:19 AM Amul Sul <sulamul@gmail.com> wrote:
I think I would have made this pass context->show_progress to
progress_report() instead of the whole verifier_context, but that's an
arguable stylistic choice, so I'll defer to you if you prefer it the
way you have it. Other than that, this LGTM.Additionally, I moved total_size and done_size to verifier_context
because done_size needs to be accessed in astreamer_verify.c.
With this change, verifier_context is now more suitable.
But it seems like 0006 now changes the logic for computing total_size.
Prepatch, the condition is:
- if (context->show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- context->total_size += m->size;
where should_verify_checksum(m) checks (((m)->matched) && !((m)->bad)
&& (((m)->checksum_type) != CHECKSUM_TYPE_NONE)). But post-patch the
condition is:
+ if (!context.skip_checksums)
...
+ if (!should_ignore_relpath(context, m->pathname))
+ total_size += m->size;
The old logic was reached from verify_backup_directory() which does
check should_ignore_relpath(), so the new condition hasn't added
anything. But it seems to have lost the show_progress condition, and
the m->checksum_type != CHECKSUM_TYPE_NONE condition. I think this
means that we'll sum the sizes even when not displaying progress, and
that if some of the files in the manifest had no checksums, our
progress reporting would compute wrong percentages after the patch.
Understood. At the start of working on the v3 review, I thought of
completely discarding the 0007 patch and copying most of
verify_file_checksum() to a new function in astreamer_verify.c.
However, I later realized we could deduplicate some parts, so I split
verify_file_checksum() and moved the reusable part to a separate
function. Please have a look at v4-0007.
Yeah, that seems OK.
The fact that these patches don't have commit messages is making life
more difficult for me than it needs to be. In particular, I'm looking
at 0009 and there's no hint about why you want to do this. In fact
that's the case for all of these refactoring patches. Instead of
saying something like "tar format verification will want to verify the
control file, but will not be able to read the file directly from
disk, so separate the logic that reads the control file from the logic
that verifies it" you just say what code you moved. Then I have to
guess why you moved it, or flip back and forth between the refactoring
patch and 0011 to try to figure it out. It would be nice if each of
these refactoring patches contained a clear indication about the
purpose of the refactoring in the commit message.
I had the same thought about checking for NULL inside
should_verify_control_data(), but I wanted to maintain the structure
similar to should_verify_checksum(). Making this change would have
also required altering should_verify_checksum(), I wasn’t sure if I
should make that change before. Now, I did that in the attached
version -- 0008 patch.
I believe there is no reason for this change to be part of 0008 at
all, and that this should be part of whatever later patch needs it.
Maybe think of doing something with the ASTREAMER_MEMBER_HEADER case also.
Done.
OK, the formatting of 0011 looks much better now.
It seems to me that 0011 is arranging to palloc the checksum context
for every file and then pfree it at the end. It seems like it would be
considerably more efficient if astreamer_verify contained a
pg_checksum_context instead of a pointer to a pg_checksum_context. If
you need a flag to indicate whether we've reinitialized the checksum
for the current file, it's better to add that than to have all of
these unnecessary allocate/free cycles.
Existing astreamer code uses struct member names_like_this. For the
new one, you mostly used namesLikeThis except when you used
names_like_this or namesLkThs.
It seems to me that instead of adding a global variable
verify_backup_file_cb, it would be better to move the 'format'
variable into verifier_context. Then you can do something like if
(context->format == 'p') verify_plain_backup_file() else
verify_tar_backup_file().
It's pretty common for .tar.gz to be abbreviated to .tgz. I think we
should support that.
Let's suppose that I have a backup which, for some reason, does not
use the same compression for all files (base.tar, 16384.tgz,
16385.tar.gz, 16366.tar.lz4). With this patch, that will fail. Now,
that's not really a problem, because having a backup with mixed
compression algorithms like that is strange and you probably wouldn't
try to do it. But on the other hand, it looks to me like making the
code support that would be more elegant than what you have now.
Because, right now, you have code to detect what type of backup you've
got by looking at base.WHATEVER_EXTENSION ... but then you have to
also have code that complains if some later file doesn't have the same
extension. But you could just detect the type of every file
individually.
In fact, I wonder if we even need -Z. What value is that actually
providing? Why not just always auto-detect?
find_backup_format() ignores the possibility of stat() throwing an
error. That's not good.
Suppose that the backup directory contains main.tar, 16385.tar, and
snuffleupagus.tar. It looks to me like what will happen here is that
we'll verify main.tar with tblspc_oid = InvalidOid, 16385.tar with
tblspc_oid = 16385, and snuffleupagus.tar with tblspc_oid =
InvalidOid. That doesn't sound right. I think we should either
completely ignore snuffleupagus.tar just as it were completely
imaginary, or perhaps there's an argument for emitting a warning
saying that we weren't expecting a snuffleupagus to exist.
In general, I think all unexpected files in a tar-format backup
directory should get the same treatment, regardless of whether the
problem is with the extension or the file itself. We should either
silently ignore everything that isn't expected to be present, or we
should emit a complaint saying that the file isn't expected to be
present. Right now, you say that it's "not a valid file" if the
extension isn't what you expect (which doesn't seem like a good error
message, because the file may be perfectly valid for what it is, it's
just not a file we're expecting to see) and say nothing if the
extension is right but the part of the filename preceding the
extension is unexpected.
A related issue is that it's a little unclear what --ignore is
supposed to do for tar-format backups. Does that ignore files in the
backup directory, or files instead of the tar files inside of the
backup directory? If we decide that --ignore ignores files in the
backup directory, then we should complain about any unexpected files
that are present there unless they've been ignored. If we decide that
--ignore ignores files inside of the tar files, then I suggest we just
silently skip any files in the backup directory that don't seem to
have file names in the correct format. I think I prefer the latter
approach, but I'm not 100% sure what's best.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Aug 6, 2024 at 10:39 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 1, 2024 at 9:19 AM Amul Sul <sulamul@gmail.com> wrote:
I think I would have made this pass context->show_progress to
progress_report() instead of the whole verifier_context, but that's an
arguable stylistic choice, so I'll defer to you if you prefer it the
way you have it. Other than that, this LGTM.Additionally, I moved total_size and done_size to verifier_context
because done_size needs to be accessed in astreamer_verify.c.
With this change, verifier_context is now more suitable.But it seems like 0006 now changes the logic for computing total_size.
Prepatch, the condition is:- if (context->show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- context->total_size += m->size;where should_verify_checksum(m) checks (((m)->matched) && !((m)->bad)
&& (((m)->checksum_type) != CHECKSUM_TYPE_NONE)). But post-patch the
condition is:+ if (!context.skip_checksums) ... + if (!should_ignore_relpath(context, m->pathname)) + total_size += m->size;The old logic was reached from verify_backup_directory() which does
check should_ignore_relpath(), so the new condition hasn't added
anything. But it seems to have lost the show_progress condition, and
the m->checksum_type != CHECKSUM_TYPE_NONE condition. I think this
means that we'll sum the sizes even when not displaying progress, and
that if some of the files in the manifest had no checksums, our
progress reporting would compute wrong percentages after the patch.
That is not true. The compute_total_size() function doesn't do
anything when not displaying progress, the first if condition, which
returns the same way as progress_report(). I omitted
should_verify_checksum() since we don't have match and bad flag
information at the start, and we won't have that for TAR files at all.
However, I missed the checksum_type check, which is necessary, and
have added it now.
With the patch, I am concerned that we won't be able to give an
accurate progress report as before. We add all the file sizes in the
backup manifest to the total_size without checking if they exist on
disk. Therefore, sometimes the reported progress completion might not
show 100% when we encounter files where m->bad or m->match == false at
a later stage. However, I think this should be acceptable since there
will be an error for the respective missing or bad file, and it can be
understood that verification is complete even if the progress isn't
100% in that case. Thoughts/Comments?
Understood. At the start of working on the v3 review, I thought of
completely discarding the 0007 patch and copying most of
verify_file_checksum() to a new function in astreamer_verify.c.
However, I later realized we could deduplicate some parts, so I split
verify_file_checksum() and moved the reusable part to a separate
function. Please have a look at v4-0007.Yeah, that seems OK.
The fact that these patches don't have commit messages is making life
more difficult for me than it needs to be. In particular, I'm looking
at 0009 and there's no hint about why you want to do this. In fact
that's the case for all of these refactoring patches. Instead of
saying something like "tar format verification will want to verify the
control file, but will not be able to read the file directly from
disk, so separate the logic that reads the control file from the logic
that verifies it" you just say what code you moved. Then I have to
guess why you moved it, or flip back and forth between the refactoring
patch and 0011 to try to figure it out. It would be nice if each of
these refactoring patches contained a clear indication about the
purpose of the refactoring in the commit message.
Sorry, I was a bit lazy there, assuming you'd handle the review :).
I can understand the frustration -- added some description.
I had the same thought about checking for NULL inside
should_verify_control_data(), but I wanted to maintain the structure
similar to should_verify_checksum(). Making this change would have
also required altering should_verify_checksum(), I wasn’t sure if I
should make that change before. Now, I did that in the attached
version -- 0008 patch.I believe there is no reason for this change to be part of 0008 at
all, and that this should be part of whatever later patch needs it.
Ok
Maybe think of doing something with the ASTREAMER_MEMBER_HEADER case also.
Done.
OK, the formatting of 0011 looks much better now.
It seems to me that 0011 is arranging to palloc the checksum context
for every file and then pfree it at the end. It seems like it would be
considerably more efficient if astreamer_verify contained a
pg_checksum_context instead of a pointer to a pg_checksum_context. If
you need a flag to indicate whether we've reinitialized the checksum
for the current file, it's better to add that than to have all of
these unnecessary allocate/free cycles.
I tried in the attached version, and it’s a good improvement. We don’t
need any flags; we can allocate that during astreamer creation. Later,
in the ASTREAMER_MEMBER_HEADER case while reading, we can
(re)initialize the context for each file as needed.
Existing astreamer code uses struct member names_like_this. For the
new one, you mostly used namesLikeThis except when you used
names_like_this or namesLkThs.
Yeah, in my patch, I ended up using the same name for both the
variable and the function. To avoid that, I made this change. This
could be a minor inconvenience for someone using ctags/cscope to find
the definition of the function or variable, as they might be directed
to the wrong place. However, I think it’s still okay since there are
ways to find the correct definition. I reverted those changes in the
attached version.
It seems to me that instead of adding a global variable
verify_backup_file_cb, it would be better to move the 'format'
variable into verifier_context. Then you can do something like if
(context->format == 'p') verify_plain_backup_file() else
verify_tar_backup_file().
Done.
It's pretty common for .tar.gz to be abbreviated to .tgz. I think we
should support that.
Done.
Let's suppose that I have a backup which, for some reason, does not
use the same compression for all files (base.tar, 16384.tgz,
16385.tar.gz, 16366.tar.lz4). With this patch, that will fail. Now,
that's not really a problem, because having a backup with mixed
compression algorithms like that is strange and you probably wouldn't
try to do it. But on the other hand, it looks to me like making the
code support that would be more elegant than what you have now.
Because, right now, you have code to detect what type of backup you've
got by looking at base.WHATEVER_EXTENSION ... but then you have to
also have code that complains if some later file doesn't have the same
extension. But you could just detect the type of every file
individually.In fact, I wonder if we even need -Z. What value is that actually
providing? Why not just always auto-detect?
+1, removed -Z option.
find_backup_format() ignores the possibility of stat() throwing an
error. That's not good.
I wasn't sure about that before -- I tried it in the attached version.
See if it looks good to you.
Suppose that the backup directory contains main.tar, 16385.tar, and
snuffleupagus.tar. It looks to me like what will happen here is that
we'll verify main.tar with tblspc_oid = InvalidOid, 16385.tar with
tblspc_oid = 16385, and snuffleupagus.tar with tblspc_oid =
InvalidOid. That doesn't sound right. I think we should either
completely ignore snuffleupagus.tar just as it were completely
imaginary, or perhaps there's an argument for emitting a warning
saying that we weren't expecting a snuffleupagus to exist.In general, I think all unexpected files in a tar-format backup
directory should get the same treatment, regardless of whether the
problem is with the extension or the file itself. We should either
silently ignore everything that isn't expected to be present, or we
should emit a complaint saying that the file isn't expected to be
present. Right now, you say that it's "not a valid file" if the
extension isn't what you expect (which doesn't seem like a good error
message, because the file may be perfectly valid for what it is, it's
just not a file we're expecting to see) and say nothing if the
extension is right but the part of the filename preceding the
extension is unexpected.
I added an error for files other than base.tar and
<tablespacesoid>.tar. I think the error message could be improved.
A related issue is that it's a little unclear what --ignore is
supposed to do for tar-format backups. Does that ignore files in the
backup directory, or files instead of the tar files inside of the
backup directory? If we decide that --ignore ignores files in the
backup directory, then we should complain about any unexpected files
that are present there unless they've been ignored. If we decide that
--ignore ignores files inside of the tar files, then I suggest we just
silently skip any files in the backup directory that don't seem to
have file names in the correct format. I think I prefer the latter
approach, but I'm not 100% sure what's best.
I am interested in having that feature to be as useful as possible --
I mean, allowing the option to ignore files from the backup directory
and from the archive file as well. I don't see any major drawbacks,
apart from spending extra CPU cycles to browse the ignore list.
Regards,
Amul
Attachments:
v8-0010-pg_verifybackup-Add-backup-format-and-compression.patchapplication/x-patch; name=v8-0010-pg_verifybackup-Add-backup-format-and-compression.patchDownload
From 6b6192100ff02e04e50d9400a0d2b14216f6615e Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v8 10/12] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 77 ++++++++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 1 +
2 files changed, 76 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index cb4094c8138..8b92b26f4d7 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
#include <sys/stat.h>
#include <time.h>
+#include "common/compression.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -56,6 +57,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
@@ -84,6 +86,7 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
@@ -105,6 +108,7 @@ main(int argc, char **argv)
progname = get_progname(argv[0]);
memset(&context, 0, sizeof(context));
+ context.format = '\0';
if (argc > 1)
{
@@ -141,7 +145,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -160,6 +164,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ context.format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ context.format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -207,11 +220,26 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Determine the backup format if it hasn't been specified. */
+ if (context.format == '\0')
+ context.format = find_backup_format(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (context.format != 'p')
+ {
+ pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+ pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -279,7 +307,15 @@ main(int argc, char **argv)
*/
if (!context.skip_checksums)
{
- verify_backup_checksums(&context);
+ /*
+ * We were only checking the plain backup here. For the tar backup,
+ * file checksums verification (if requested) will be done immediately
+ * when the file is accessed, as we don't have random access to the
+ * files like we do with plain backups.
+ */
+ if (context.format == 'p')
+ verify_backup_checksums(&context);
+
progress_report(&context, true);
}
@@ -972,6 +1008,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(context->format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ if (stat(path, &sb) == 0)
+ result = 'p';
+ else
+ {
+ if (errno != ENOENT)
+ {
+ pg_log_error("cannot determine backup format");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Otherwise, it is assumed to be a TAR backup. */
+ result = 't';
+ }
+ pfree(path);
+
+ return result;
+}
+
/*
* Print a progress report based on the variables in verifier_context.
*
@@ -1060,6 +1132,7 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 8a4046b0e33..fcef972d9ad 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -107,6 +107,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ char format; /* backup format: p(lain)/t(ar) */
bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
--
2.18.0
v8-0009-Refactor-move-first-and-last-progress_report-call.patchapplication/x-patch; name=v8-0009-Refactor-move-first-and-last-progress_report-call.patchDownload
From 83357e1fcccb433a5e4c343232f66431998e9e59 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Fri, 2 Aug 2024 16:37:38 +0530
Subject: [PATCH v8 09/12] Refactor: move first and last progress_report call
to Main.
The progress_report() is currently called at the start and end of
verify_backup_checksums(), which is used only for plain backups. Since
we also need to report progress for TAR backups, the progress_report()
has been moved from verify_backup_checksums() to a common
location.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d518f995298..cb4094c8138 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -253,7 +253,10 @@ main(int argc, char **argv)
* read, which occurs only when checksum verification is enabled.
*/
if (!context.skip_checksums)
+ {
compute_total_size(&context);
+ progress_report(&context, false);
+ }
/*
* Now scan the files in the backup directory. At this stage, we verify
@@ -275,7 +278,10 @@ main(int argc, char **argv)
* told to skip it.
*/
if (!context.skip_checksums)
+ {
verify_backup_checksums(&context);
+ progress_report(&context, true);
+ }
/*
* Try to parse the required ranges of WAL records, unless we were told
@@ -736,8 +742,6 @@ verify_backup_checksums(verifier_context *context)
manifest_file *m;
uint8 *buffer;
- progress_report(context, false);
-
buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
manifest_files_start_iterate(manifest->files, &it);
@@ -761,8 +765,6 @@ verify_backup_checksums(verifier_context *context)
}
pfree(buffer);
-
- progress_report(context, true);
}
/*
--
2.18.0
v8-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v8-0008-Refactor-split-verify_control_file.patchDownload
From a8018ab64411eca3663761ca35913f30176b9102 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 15:47:26 +0530
Subject: [PATCH v8 08/12] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file containt instead, that needs to be verified in
the same way. For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 44 +++++++++++------------
src/bin/pg_verifybackup/pg_verifybackup.h | 15 ++++++++
2 files changed, 35 insertions(+), 24 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 44b2cd49e0c..d518f995298 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -625,14 +622,20 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the manifest system identifier */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
}
/*
@@ -676,18 +679,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -703,9 +702,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 1bc5f7a6b4a..8a4046b0e33 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
#include "common/controldata_utils.h"
#include "common/hashfn_unstable.h"
+#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
@@ -46,6 +47,17 @@ typedef struct manifest_file
#define should_verify_checksum(m) \
(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1. This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -110,6 +122,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
uint8 *checksumbuf);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v8-0011-pg_verifybackup-Read-tar-files-and-verify-its-con.patchapplication/x-patch; name=v8-0011-pg_verifybackup-Read-tar-files-and-verify-its-con.patchDownload
From 4108699d558f2d935aff10944361b4dc93b5fcc3 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v8 11/12] pg_verifybackup: Read tar files and verify its
contents
This patch implements TAR format backup verification.
For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_file, depending on its
compression type which will be auto-deteced.
When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 377 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 208 +++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 12 +-
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 601 insertions(+), 7 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..91d324fddce
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,377 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/astreamer.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+ pg_checksum_context *checksum_ctx;
+
+ /* Hold information for a member file verification */
+ manifest_file *mfile;
+ int64 received_bytes;
+ bool verify_checksum;
+ bool verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void verify_member_header(astreamer *streamer, astreamer_member *member);
+static void verify_member_contents(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void verify_content_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *buffer, int buffer_len);
+static void verify_controldata(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void reset_member_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ if (!context->skip_checksums)
+ streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ return &streamer->base;
+}
+
+/*
+ * It verifies each TAR member entry against the manifest data and performs
+ * checksum verification if enabled. Additionally, it validates the backup's
+ * system identifier against the backup_manifest.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+
+ /*
+ * Perform the initial check and setup for the verification.
+ */
+ verify_member_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Peform the required contants verification.
+ */
+ verify_member_contents(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * Reset the temporary information stored for a verification.
+ */
+ reset_member_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verify the entry if it is a file in the backup manifest. If the archive being
+ * processed is a tablespace, prepare the required file path for subsequent
+ * operations. Finally, check if it needs to perform checksum verification and
+ * control data verification during file content processing.
+ */
+static void
+verify_member_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+
+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;
+
+ /*
+ * The backup_manifest stores a relative path to the base directory for
+ * files belong tablespace, whereas <tablespaceoid>.tar doesn't. Prepare
+ * the required path, otherwise, the manfiest entry verification will
+ * fail.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and control data verification.
+ *
+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, having a
+ * single flag would be more efficient.
+ */
+ mystreamer->verify_checksum =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_control_data =
+ should_verify_control_data(mystreamer->context->manifest, m);
+
+ /* Initialize the context required for checksum verification. */
+ if (mystreamer->verify_checksum &&
+ pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, m->pathname);
+
+ /*
+ * Checksum verification cannot be performed without proper context
+ * initialization.
+ */
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Process the member content according to the flags set by the member header
+ * processing routine for checksum and control data verification.
+ */
+static void
+verify_member_contents(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ /* Verify the checksums */
+ if (mystreamer->verify_checksum)
+ verify_content_checksum(streamer, member, data, len);
+
+ /* Verify pg_control information */
+ if (mystreamer->verify_control_data)
+ verify_controldata(streamer, member, data, len);
+}
+
+/*
+ * Similar to verify_file_checksum() but this function computes the checksum
+ * incrementally for the received file content. Unlike a normal backup
+ * directory, TAR format files do not allow random access, so checksum
+ * verification occurs progressively. Additionally, the function calls the
+ * routine for control data verification if the flags indicate that it is
+ * required.
+ *
+ * Caller should pass correctly initialised checksum_ctx, which will be used
+ * for incremental checksum calculation. Once the complete file content is
+ * received (tracked using the received_bytes), the routine that performs the
+ * final checksum verification is called
+ */
+static void
+verify_content_checksum(astreamer *streamer, astreamer_member *member,
+ const char *buffer, int buffer_len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ verifier_context *context = mystreamer->context;
+ manifest_file *m = mystreamer->mfile;
+ const char *relpath = m->pathname;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
+ /*
+ * Mark it false to avoid unexpected re-entrance for the same file content
+ * (e.g. returned in error should not be revisited).
+ */
+ Assert(mystreamer->verify_checksum);
+ mystreamer->verify_checksum = false;
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, relpath) == 0);
+
+ /*
+ * The checksum context should match the type noted in the backup
+ * manifest.
+ */
+ Assert(checksum_ctx->type == m->checksum_type);
+
+ /* Update the total count of computed checksum bytes. */
+ mystreamer->received_bytes += buffer_len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) buffer, buffer_len) < 0)
+ {
+ report_backup_error(context, "could not update checksum of file \"%s\"",
+ relpath);
+ return;
+ }
+
+ /* Report progress */
+ context->done_size += buffer_len;
+ progress_report(context, false);
+
+ /* Yet to receive the full content of the file. */
+ if (mystreamer->received_bytes < m->size)
+ {
+ mystreamer->verify_checksum = true;
+ return;
+ }
+
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, checksum_ctx, checksumbuf);
+}
+
+/*
+ * Prepare the control data from the received file contents, which are supposed
+ * to be from the pg_control file, including CRC calculation. Then, call the
+ * routines that perform the final verification of the control file information.
+ */
+static void
+verify_controldata(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(manifest->version != 1);
+
+ /* Mark it as false to avoid unexpected re-entrance */
+ Assert(mystreamer->verify_control_data);
+ mystreamer->verify_control_data = false;
+
+ /* Should have whole control file data. */
+ if (!astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData)))
+ {
+ mystreamer->verify_control_data = true;
+ return;
+ }
+
+ pg_log_debug("%s: reading \"%s\"", mystreamer->archive_name,
+ member->pathname);
+
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: could not read control file: read %d of %zu",
+ mystreamer->archive_name, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data,
+ sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ (char *) (&control_file),
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(&control_file, member->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+reset_member_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->received_bytes = 0;
+ mystreamer->verify_checksum = false;
+ mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 8b92b26f4d7..2a058938986 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,12 @@
#include "common/compression.h"
#include "common/parse_manifest.h"
+#include "common/relpath.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
#include "pg_verifybackup.h"
+#include "pgtar.h"
#include "pgtime.h"
/*
@@ -62,6 +65,15 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
+static void verify_plain_file(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_file(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_content(verifier_context *context,
+ char *relpath, char *fullpath,
+ astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -70,6 +82,10 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid,
+ pg_compress_algorithm compress_algo);
static void compute_total_size(verifier_context *context);
static void usage(void);
@@ -141,6 +157,10 @@ main(int argc, char **argv)
*/
simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -619,7 +639,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
}
/*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
*
* The arguments to this function have the same meaning as the arguments to
* verify_backup_directory.
@@ -628,7 +649,6 @@ static void
verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
{
struct stat sb;
- manifest_file *m;
if (stat(fullpath, &sb) != 0)
{
@@ -661,8 +681,28 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Do the further verifications */
+ if (context->format == 'p')
+ verify_plain_file(context, relpath, fullpath, sb.st_size);
+ else
+ verify_tar_file(context, relpath, fullpath, sb.st_size);
+}
+
+/*
+ * Verify one plan file or a symlink.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_plain_file(verifier_context *context, char *relpath, char *fullpath,
+ size_t filesize)
+{
+ manifest_file *m;
+
/* Check the backup manifest entry for this file. */
- m = verify_manifest_entry(context, relpath, sb.st_size);
+ m = verify_manifest_entry(context, relpath, filesize);
/* Validate the manifest system identifier */
if (should_verify_control_data(context->manifest, m))
@@ -680,6 +720,132 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
}
}
+/*
+ * Verify one tar file.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_tar_file(verifier_context *context, char *relpath, char *fullpath,
+ size_t filesize)
+{
+ astreamer *streamer;
+ Oid tblspc_oid = InvalidOid;
+ int file_name_len;
+ int file_extn_len;
+ pg_compress_algorithm compress_algorithm;
+
+ /* Should be tar backup */
+ Assert(context->format == 't');
+
+ /* Find the compression type of the tar file */
+ if (strstr(relpath, ".tgz") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ file_extn_len = 4; /* length of ".tgz" */
+ }
+ else if (strstr(relpath, ".tar.gz") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ file_extn_len = 7; /* length of ".tar.gz" */
+ }
+ else if (strstr(relpath, ".tar.lz4") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_LZ4;
+ file_extn_len = 8; /* length of ".tar.lz4" */
+ }
+ else if (strstr(relpath, ".tar.zst") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_ZSTD;
+ file_extn_len = 8; /* length of ".tar.zst" */
+ }
+ else if (strstr(relpath, ".tar") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_NONE;
+ file_extn_len = 4; /* length of ".tar" */
+ }
+ else
+ {
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ return;
+ }
+
+ /*
+ * We expect TAR files to back up the main directory and tablespace.
+ *
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar and the tablespace directory to <tablespaceoid>.tar, followed
+ * by a compression type extension such as .gz, .lz4, or .zst.
+ */
+ file_name_len = strlen(relpath);
+ if (strspn(relpath, "0123456789") == (file_name_len - file_extn_len))
+ {
+ /*
+ * Since the file matches the <tablespaceoid>.tar format, extract the
+ * tablespaceoid, which is needed to prepare the paths of the files
+ * belonging to that tablespace relative to the base directory.
+ */
+ tblspc_oid = strtoi64(relpath, NULL, 10);
+ }
+ /* Otherwise, it should be a base.tar file; if not, raise an error. */
+ else if (strncmp("base", relpath, file_name_len - file_extn_len) != 0)
+ {
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ return;
+ }
+
+ streamer = create_archive_verifier(context, relpath, tblspc_oid,
+ compress_algorithm);
+ verify_tar_content(context, relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+}
+
+/*
+ * Reads a given tar file in predefined chunks and pass to astreamer. Which
+ * initiates routines for decompression (if necessary) then verification
+ * of each member within the tar archive.
+ */
+static void
+verify_tar_content(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1044,6 +1210,42 @@ find_backup_format(verifier_context *context)
return result;
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(context->format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algo == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the variables in verifier_context.
*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index fcef972d9ad..82f845b449f 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -45,7 +45,8 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
/*
* Validate the manifest system identifier against the control file; this
@@ -137,4 +138,13 @@ extern bool should_ignore_relpath(verifier_context *context,
extern void progress_report(verifier_context *context, bool finished);
+/* Forward declarations to avoid fe_utils/astreamer.h include. */
+struct astreamer;
+typedef struct astreamer astreamer;
+
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 547d14b3e7c..d86b28b260e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3329,6 +3329,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
--
2.18.0
v8-0012-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v8-0012-pg_verifybackup-Tests-and-document.patchDownload
From 583c90336870a528d4df5ed83296ff2c995ae385 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 7 Aug 2024 18:15:29 +0530
Subject: [PATCH v8 12/12] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 42 ++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 6 +-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 72 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..60f771c7663 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. The main data directory's contents
+ will be written to a file named <filename>base.tar</filename>,
+ and each other tablespace will be written to a separate tar file
+ named after that tablespace's OID.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..ca5b0402b7d 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,11 +17,11 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
@@ -31,7 +31,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v8-0006-Refactor-split-verify_backup_file-function.patchapplication/x-patch; name=v8-0006-Refactor-split-verify_backup_file-function.patchDownload
From 4be84391a1e05b70ca6fcb51e050a1a08de2b94d Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:15:26 +0530
Subject: [PATCH v8 06/12] Refactor: split verify_backup_file() function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Move the manifest entry verification code into a new function called
verify_manifest_entry(). Also, move the total size computation code
into another new function called compute_total_size(), which will be
called from the main function.
The current computation is designed for plain backups, which operate
in two rounds. In the first round, only the files are checked against
the manifest backup, and their sizes are added to the total size for
progress reporting. In the second round, the actual files are read for
the checksums verification, and the read size progress is noted and
reported.
However, for tar backups, we do not operate in two rounds because TAR
format files do not allow random access like plain backups. Therefore,
we verify the file entries against the backup manifest and right
after, perform the checksum verification in a single pass. That’s why
we need to compute the total size before starting the TAR backup
verification pass.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 82 ++++++++++++++++++-----
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +
2 files changed, 68 insertions(+), 17 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 4e42757c346..8eadaac72e3 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -72,6 +72,7 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static void compute_total_size(verifier_context *context);
static void usage(void);
static const char *progname;
@@ -250,6 +251,13 @@ main(int argc, char **argv)
*/
context.manifest = parse_manifest_file(manifest_path);
+ /*
+ * For the progress report, compute the total size of the files to be
+ * read, which occurs only when checksum verification is enabled.
+ */
+ if (!context.skip_checksums)
+ compute_total_size(&context);
+
/*
* Now scan the files in the backup directory. At this stage, we verify
* that every file on disk is present in the manifest and that the sizes
@@ -614,6 +622,27 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -621,40 +650,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
- /* Update statistics for progress report, if necessary */
- if (context->show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- context->total_size += m->size;
-
/*
* We don't verify checksums at this stage. We first finish verifying that
* we have the expected set of files with the expected sizes, and only
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -817,7 +835,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
@@ -988,6 +1006,36 @@ progress_report(verifier_context *context, bool finished)
fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
}
+/*
+ * Compute the total size of backup files for progress reporting.
+ */
+static void
+compute_total_size(verifier_context *context)
+{
+ manifest_data *manifest = context->manifest;
+ manifest_files_iterator it;
+ manifest_file *m;
+ uint64 total_size = 0;
+
+ if (!context->show_progress)
+ return;
+
+ manifest_files_start_iterate(manifest->files, &it);
+ while ((m = manifest_files_iterate(manifest->files, &it)) != NULL)
+ {
+ /*
+ * We are not going to read files that are ignored or whose checksums
+ * are not calculated, so their sizes should be excluded from the
+ * total.
+ */
+ if (!should_ignore_relpath(context, m->pathname) &&
+ m->checksum_type != CHECKSUM_TYPE_NONE)
+ total_size += m->size;
+ }
+
+ context->total_size = total_size;
+}
+
/*
* Print out usage information and exit.
*/
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 90900048547..98c75916255 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -105,6 +105,9 @@ typedef struct verifier_context
uint64 done_size;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
v8-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v8-0007-Refactor-split-verify_file_checksum-function.patchDownload
From 62ecfedd2666b23d171539fed738281702f824fd Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 16:45:55 +0530
Subject: [PATCH v8 07/12] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 18 ++++++++++++++++--
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 8eadaac72e3..44b2cd49e0c 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -782,7 +782,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int rc;
size_t bytes_read = 0;
uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -848,8 +847,23 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
return;
}
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, checksumbuf);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, uint8 *checksumbuf)
+{
+ int checksumlen;
+ const char *relpath = m->pathname;
+
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 98c75916255..1bc5f7a6b4a 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -107,6 +107,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ uint8 *checksumbuf);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v8-0004-Refactor-move-few-global-variable-to-verifier_con.patchapplication/x-patch; name=v8-0004-Refactor-move-few-global-variable-to-verifier_con.patchDownload
From c630430b0ec31674091be6952811934ebc4cff3d Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 11:43:52 +0530
Subject: [PATCH v8 04/12] Refactor: move few global variable to
verifier_context struct
Global variables are:
1. show_progress
2. skip_checksums
3. total_size
4. done_size
---
src/bin/pg_verifybackup/pg_verifybackup.c | 50 +++++++++++------------
1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..71585ffc50e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -113,8 +113,14 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
+
+ /* Progress indicators */
+ bool show_progress;
+ uint64 total_size;
+ uint64 done_size;
} verifier_context;
static manifest_data *parse_manifest_file(char *manifest_path);
@@ -157,19 +163,11 @@ static void report_fatal_error(const char *pg_restrict fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();
static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-static void progress_report(bool finished);
+static void progress_report(verifier_context *context, bool finished);
static void usage(void);
static const char *progname;
-/* options */
-static bool show_progress = false;
-static bool skip_checksums = false;
-
-/* Progress indicators */
-static uint64 total_size = 0;
-static uint64 done_size = 0;
-
/*
* Main entry point.
*/
@@ -260,13 +258,13 @@ main(int argc, char **argv)
no_parse_wal = true;
break;
case 'P':
- show_progress = true;
+ context.show_progress = true;
break;
case 'q':
quiet = true;
break;
case 's':
- skip_checksums = true;
+ context.skip_checksums = true;
break;
case 'w':
wal_directory = pstrdup(optarg);
@@ -299,7 +297,7 @@ main(int argc, char **argv)
}
/* Complain if the specified arguments conflict */
- if (show_progress && quiet)
+ if (context.show_progress && quiet)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
@@ -363,7 +361,7 @@ main(int argc, char **argv)
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
*/
- if (!skip_checksums)
+ if (!context.skip_checksums)
verify_backup_checksums(&context);
/*
@@ -739,8 +737,9 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
verify_control_file(fullpath, context->manifest->system_identifier);
/* Update statistics for progress report, if necessary */
- if (show_progress && !skip_checksums && should_verify_checksum(m))
- total_size += m->size;
+ if (context->show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ context->total_size += m->size;
/*
* We don't verify checksums at this stage. We first finish verifying that
@@ -815,7 +814,7 @@ verify_backup_checksums(verifier_context *context)
manifest_file *m;
uint8 *buffer;
- progress_report(false);
+ progress_report(context, false);
buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
@@ -841,7 +840,7 @@ verify_backup_checksums(verifier_context *context)
pfree(buffer);
- progress_report(true);
+ progress_report(context, true);
}
/*
@@ -889,8 +888,8 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
}
/* Report progress */
- done_size += rc;
- progress_report(false);
+ context->done_size += rc;
+ progress_report(context, false);
}
if (rc < 0)
report_backup_error(context, "could not read file \"%s\": %m",
@@ -1036,7 +1035,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
}
/*
- * Print a progress report based on the global variables.
+ * Print a progress report based on the variables in verifier_context.
*
* Progress report is written at maximum once per second, unless the finished
* parameter is set to true.
@@ -1045,7 +1044,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
* is moved to the next line.
*/
static void
-progress_report(bool finished)
+progress_report(verifier_context *context, bool finished)
{
static pg_time_t last_progress_report = 0;
pg_time_t now;
@@ -1053,7 +1052,7 @@ progress_report(bool finished)
char totalsize_str[32];
char donesize_str[32];
- if (!show_progress)
+ if (!context->show_progress)
return;
now = time(NULL);
@@ -1061,12 +1060,13 @@ progress_report(bool finished)
return; /* Max once per second */
last_progress_report = now;
- percent_size = total_size ? (int) ((done_size * 100 / total_size)) : 0;
+ percent_size = context->total_size ?
+ (int) ((context->done_size * 100 / context->total_size)) : 0;
snprintf(totalsize_str, sizeof(totalsize_str), UINT64_FORMAT,
- total_size / 1024);
+ context->total_size / 1024);
snprintf(donesize_str, sizeof(donesize_str), UINT64_FORMAT,
- done_size / 1024);
+ context->done_size / 1024);
fprintf(stderr,
_("%*s/%s kB (%d%%) verified"),
--
2.18.0
v8-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchapplication/x-patch; name=v8-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchDownload
From aadd222cb50197b79e152daaa6f728ced368ae38 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:10:34 +0530
Subject: [PATCH v8 05/12] Refactor: move some part of pg_verifybackup.c to
pg_verifybackup.h
---
src/bin/pg_verifybackup/pg_verifybackup.c | 102 +------------------
src/bin/pg_verifybackup/pg_verifybackup.h | 118 ++++++++++++++++++++++
2 files changed, 123 insertions(+), 97 deletions(-)
create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 71585ffc50e..4e42757c346 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "pg_verifybackup.h"
#include "pgtime.h"
/*
@@ -40,89 +39,6 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE (128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
- uint32 status; /* hash status */
- const char *pathname;
- size_t size;
- pg_checksum_type checksum_type;
- int checksum_length;
- uint8 *checksum_payload;
- bool matched;
- bool bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX manifest_files
-#define SH_ELEMENT_TYPE manifest_file
-#define SH_KEY_TYPE const char *
-#define SH_KEY pathname
-#define SH_HASH_KEY(tb, key) hash_string(key)
-#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
-#define SH_SCOPE static inline
-#define SH_RAW_ALLOCATOR pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
- TimeLineID tli;
- XLogRecPtr start_lsn;
- XLogRecPtr end_lsn;
- struct manifest_wal_range *next;
- struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
- int version;
- uint64 system_identifier;
- manifest_files_hash *files;
- manifest_wal_range *first_wal_range;
- manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
- manifest_data *manifest;
- char *backup_directory;
- SimpleStringList ignore_list;
- bool skip_checksums;
- bool exit_on_error;
- bool saw_any_error;
-
- /* Progress indicators */
- bool show_progress;
- uint64 total_size;
- uint64 done_size;
-} verifier_context;
-
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -156,14 +72,6 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
-static void report_backup_error(verifier_context *context,
- const char *pg_restrict fmt,...)
- pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
- pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
-static void progress_report(verifier_context *context, bool finished);
static void usage(void);
static const char *progname;
@@ -978,7 +886,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
* Update the context to indicate that we saw an error, and exit if the
* context says we should.
*/
-static void
+void
report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
{
va_list ap;
@@ -995,7 +903,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
/*
* Report a fatal error and exit
*/
-static void
+void
report_fatal_error(const char *pg_restrict fmt,...)
{
va_list ap;
@@ -1014,7 +922,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
* Note that by "prefix" we mean a parent directory; for this purpose,
* "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
*/
-static bool
+bool
should_ignore_relpath(verifier_context *context, const char *relpath)
{
SimpleStringListCell *cell;
@@ -1043,7 +951,7 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
* If finished is set to true, this is the last progress report. The cursor
* is moved to the next line.
*/
-static void
+void
progress_report(verifier_context *context, bool finished)
{
static pg_time_t last_progress_report = 0;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..90900048547
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,118 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ * Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+extern bool show_progress;
+extern bool skip_checksums;
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+ uint32 status; /* hash status */
+ const char *pathname;
+ size_t size;
+ pg_checksum_type checksum_type;
+ int checksum_length;
+ uint8 *checksum_payload;
+ bool matched;
+ bool bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+ (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX manifest_files
+#define SH_ELEMENT_TYPE manifest_file
+#define SH_KEY_TYPE const char *
+#define SH_KEY pathname
+#define SH_HASH_KEY(tb, key) hash_string(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ struct manifest_wal_range *next;
+ struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+ int version;
+ uint64 system_identifier;
+ manifest_files_hash *files;
+ manifest_wal_range *first_wal_range;
+ manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+ manifest_data *manifest;
+ char *backup_directory;
+ SimpleStringList ignore_list;
+ bool skip_checksums;
+ bool exit_on_error;
+ bool saw_any_error;
+
+ /* Progress indicators */
+ bool show_progress;
+ uint64 total_size;
+ uint64 done_size;
+} verifier_context;
+
+extern void report_backup_error(verifier_context *context,
+ const char *pg_restrict fmt,...)
+ pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+ pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context,
+ const char *relpath);
+
+extern void progress_report(verifier_context *context, bool finished);
+
+#endif /* PG_VERIFYBACKUP_H */
--
2.18.0
v8-0001-Improve-file-header-comments-for-astramer-code.patchapplication/x-patch; name=v8-0001-Improve-file-header-comments-for-astramer-code.patchDownload
From 87de8fb8600c7e373a44e0b3dddd25044c524352 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 6 Aug 2024 10:23:45 +0530
Subject: [PATCH v8] Improve file header comments for astramer code.
Make it clear that "astreamer" stands for "archive streamer".
Generalize comments that still believe this code can only be used
by pg_basebackup. Add some comments explaining the asymmetry
between the gzip, lz4, and zstd astreamers, in the hopes of making
life easier for anyone who hacks on this code in the future.
---
src/fe_utils/astreamer_file.c | 4 ++++
src/fe_utils/astreamer_gzip.c | 15 +++++++++++++++
src/fe_utils/astreamer_lz4.c | 4 ++++
src/fe_utils/astreamer_zstd.c | 4 ++++
src/include/fe_utils/astreamer.h | 21 +++++++++++++++------
5 files changed, 42 insertions(+), 6 deletions(-)
diff --git a/src/fe_utils/astreamer_file.c b/src/fe_utils/astreamer_file.c
index 13d1192c6e6..c9a030853bc 100644
--- a/src/fe_utils/astreamer_file.c
+++ b/src/fe_utils/astreamer_file.c
@@ -2,6 +2,10 @@
*
* astreamer_file.c
*
+ * Archive streamers that write to files. astreamer_plain_writer writes
+ * the whole archive to a single file, and astreamer_extractor writes
+ * each archive member to a separate file in a given directory.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/fe_utils/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
index dd28defac7b..1c773a23848 100644
--- a/src/fe_utils/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -2,6 +2,21 @@
*
* astreamer_gzip.c
*
+ * Archive streamers that deal with data compressed using gzip.
+ * astreamer_gzip_writer applies gzip compression to the input data
+ * and writes the result to a file. astreamer_gzip_decompressor assumes
+ * that the input stream is compressed using gzip and decompresses it.
+ *
+ * Note that the code in this file is asymmetric with what we do for
+ * other compression types: for lz4 and zstd, there is a compressor and
+ * a decompressor, rather than a writer and a decompressor. The approach
+ * taken here is less flexible, because a writer can only write to a file,
+ * while a compressor can write to a subsequent astreamer which is free
+ * to do whatever it likes. The reason it's like this is because this
+ * code was adapated from old, less-modular pg_basebackup that used the
+ * same APIs that astreamer_gzip_writer uses, and it didn't seem
+ * necessary to change anything at the time.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index d8b2a367e47..2bf14084e7f 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -2,6 +2,10 @@
*
* astreamer_lz4.c
*
+ * Archive streamers that deal with data compressed using lz4.
+ * astreamer_lz4_compressor applies lz4 compression to the input stream,
+ * and astreamer_lz4_decompressor does the reverse.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/fe_utils/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
index 45f6cb67363..4b2d42b2311 100644
--- a/src/fe_utils/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -2,6 +2,10 @@
*
* astreamer_zstd.c
*
+ * Archive streamers that deal with data compressed using zstd.
+ * astreamer_zstd_compressor applies lz4 compression to the input stream,
+ * and astreamer_zstd_decompressor does the reverse.
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/include/fe_utils/astreamer.h b/src/include/fe_utils/astreamer.h
index 2c014dbddbe..570cfba3040 100644
--- a/src/include/fe_utils/astreamer.h
+++ b/src/include/fe_utils/astreamer.h
@@ -2,9 +2,18 @@
*
* astreamer.h
*
- * Each tar archive returned by the server is passed to one or more
- * astreamer objects for further processing. The astreamer may do
- * something simple, like write the archive to a file, perhaps after
+ * The "archive streamer" interface is intended to allow frontend code
+ * to stream from possibly-compressed archive files from any source and
+ * perform arbitrary actions based on the contents of those archives.
+ * Archive streamers are intended to be composable, and most tasks will
+ * require two or more archive streamers to complete. For instance,
+ * if the input is an uncompressed tar stream, a tar parser astreamer
+ * could be used to interpret it, and then an extractor astreamer could
+ * be used to write each archive member out to a file.
+ *
+ * In general, each archive streamer is relatively free to take whatever
+ * action it desires in the stream of chunks provided by the caller. It
+ * may do something simple, like write the archive to a file, perhaps after
* compressing it, but it can also do more complicated things, like
* annotating the byte stream to indicate which parts of the data
* correspond to tar headers or trailing padding, vs. which parts are
@@ -33,9 +42,9 @@ typedef struct astreamer_ops astreamer_ops;
/*
* Each chunk of archive data passed to a astreamer is classified into one
- * of these categories. When data is first received from the remote server,
- * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks will
- * be of whatever size the remote server chose to send.
+ * of these categories. When data is initially passed to an archive streamer,
+ * each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks can
+ * be of whatever size the caller finds convenient.
*
* If the archive is parsed (e.g. see astreamer_tar_parser_new()), then all
* chunks should be labelled as one of the other types listed here. In
--
2.18.0
[ I committed 0001, then noticed I had a type in the subject line of
the commit message. Argh. ]
On Wed, Aug 7, 2024 at 9:41 AM Amul Sul <sulamul@gmail.com> wrote:
With the patch, I am concerned that we won't be able to give an
accurate progress report as before. We add all the file sizes in the
backup manifest to the total_size without checking if they exist on
disk. Therefore, sometimes the reported progress completion might not
show 100% when we encounter files where m->bad or m->match == false at
a later stage. However, I think this should be acceptable since there
will be an error for the respective missing or bad file, and it can be
understood that verification is complete even if the progress isn't
100% in that case. Thoughts/Comments?
When somebody says that something is a refactoring commit, my
assumption is that there should be no behavior change. If the behavior
is changing, it's not purely a refactoring, and it shouldn't be
labelled as a refactoring (or at least there should be a prominent
disclaimer identifying whatever behavior has changed, if a small
change was deemed acceptable and unavoidable).
I am very reluctant to accept a functional regression of the type that
you describe here (or the type that I postulated might occur, even if
I was wrong and it doesn't). The point here is that we're trying to
reuse the code, and I support that goal, because code reuse is good.
But it's not such a good thing that we should do it if it has negative
consequences. We should either figure out some other way of
refactoring it that doesn't have those negative side-effects, or we
should leave the existing code alone and have separate code for the
new stuff we want to do.
I do realize that the type of side effect you describe here is quite
minor. I could live with it if it were unavoidable. But I really don't
see why we can't avoid it.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 7, 2024 at 9:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
[ I committed 0001, then noticed I had a type in the subject line of
the commit message. Argh. ]On Wed, Aug 7, 2024 at 9:41 AM Amul Sul <sulamul@gmail.com> wrote:
With the patch, I am concerned that we won't be able to give an
accurate progress report as before. We add all the file sizes in the
backup manifest to the total_size without checking if they exist on
disk. Therefore, sometimes the reported progress completion might not
show 100% when we encounter files where m->bad or m->match == false at
a later stage. However, I think this should be acceptable since there
will be an error for the respective missing or bad file, and it can be
understood that verification is complete even if the progress isn't
100% in that case. Thoughts/Comments?When somebody says that something is a refactoring commit, my
assumption is that there should be no behavior change. If the behavior
is changing, it's not purely a refactoring, and it shouldn't be
labelled as a refactoring (or at least there should be a prominent
disclaimer identifying whatever behavior has changed, if a small
change was deemed acceptable and unavoidable).
I agree; I'll be more careful next time.
I am very reluctant to accept a functional regression of the type that
you describe here (or the type that I postulated might occur, even if
I was wrong and it doesn't). The point here is that we're trying to
reuse the code, and I support that goal, because code reuse is good.
But it's not such a good thing that we should do it if it has negative
consequences. We should either figure out some other way of
refactoring it that doesn't have those negative side-effects, or we
should leave the existing code alone and have separate code for the
new stuff we want to do.I do realize that the type of side effect you describe here is quite
minor. I could live with it if it were unavoidable. But I really don't
see why we can't avoid it.
The main issue I have is computing the total_size of valid files that
will be checksummed and that exist in both the manifests and the
backup, in the case of a tar backup. This cannot be done in the same
way as with a plain backup.
Another consideration is that, in the case of a tar backup, since
we're reading all tar files from the backup rather than individual
files to be checksummed, should we consider total_size as the sum of
all valid tar files in the backup, regardless of checksum
verification? If so, we would need to perform an initial pass to
calculate the total_size in the directory, similar to what
verify_backup_directory() does., but doing so I am a bit uncomfortable
and unsure if we should do that pass.
An alternative idea is to provide progress reports per file instead of
for the entire backup directory. We could report the size of each file
and keep track of done_size as we read each tar header and content.
For example, the report could be:
109032/109032 kB (100%) base.tar file verified
123444/123444 kB (100%) 16245.tar file verified
23478/23478 kB (100%) 16246.tar file verified
.....
<total_done_size>/<total_size> (NNN%) verified.
I think this type of reporting can be implemented with minimal
changes, abd for plain backups, we can keep the reporting as it is.
Thoughts?
Regards,
Amul
On Wed, Aug 7, 2024 at 1:05 PM Amul Sul <sulamul@gmail.com> wrote:
The main issue I have is computing the total_size of valid files that
will be checksummed and that exist in both the manifests and the
backup, in the case of a tar backup. This cannot be done in the same
way as with a plain backup.
I think you should compute and sum the sizes of the tar files
themselves. Suppose you readdir(), make a list of files that look
relevant, and stat() each one. total_size is the sum of the file
sizes. Then you work your way through the list of files and read each
one. done_size is the total size of all files you've read completely
plus the number of bytes you've read from the current file so far.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 7, 2024 at 11:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 7, 2024 at 1:05 PM Amul Sul <sulamul@gmail.com> wrote:
The main issue I have is computing the total_size of valid files that
will be checksummed and that exist in both the manifests and the
backup, in the case of a tar backup. This cannot be done in the same
way as with a plain backup.I think you should compute and sum the sizes of the tar files
themselves. Suppose you readdir(), make a list of files that look
relevant, and stat() each one. total_size is the sum of the file
sizes. Then you work your way through the list of files and read each
one. done_size is the total size of all files you've read completely
plus the number of bytes you've read from the current file so far.
I tried this in the attached version and made a few additional changes
based on Sravan's off-list comments regarding function names and
descriptions.
Now, verification happens in two passes. The first pass simply
verifies the file names, determines their compression types, and
returns a list of valid tar files whose contents need to be verified
in the second pass. The second pass is called at the end of
verify_backup_directory() after all files in that directory have been
scanned. I named the functions for pass 1 and pass 2 as
verify_tar_file_name() and verify_tar_file_contents(), respectively.
The rest of the code flow is similar as in the previous version.
In the attached patch set, I abandoned the changes, touching the
progress reporting code of plain backups by dropping the previous 0009
patch. The new 0009 patch adds missing APIs to simple_list.c to
destroy SimplePtrList. The rest of the patch numbers remain unchanged.
Regards,
Amul
Attachments:
v9-0011-pg_verifybackup-Read-tar-files-and-verify-its-con.patchapplication/x-patch; name=v9-0011-pg_verifybackup-Read-tar-files-and-verify-its-con.patchDownload
From 98ecaf7d965d44e4c5d1e558b1406230da31a79c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v9 11/12] pg_verifybackup: Read tar files and verify its
contents
This patch implements TAR format backup verification.
For progress reporting support, we perform this verification in two
passes: the first pass calculates total_size, and the second pass
updates done_size as verification progresses.
For the verification, in the first pass, we call verify_tar_file(),
which performs basic verification by expecting only base.tar or
<tablespaceoid>.tar files and raises an error for any other files. It
also determines the compression type of the archive file. All this
information is stored in a newly added tarFile struct, which is
appended to a list that will be used in the second pass (by
verify_tar_content()) for the final verification. In the second pass,
the tar archives are read, decompressed, and the required verification
is carried out.
For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_content(), depending on its
compression type which is detected in the first pass.
When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 354 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 313 +++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 6 +
src/tools/pgindent/typedefs.list | 2 +
6 files changed, 675 insertions(+), 10 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..0983dffde8e
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,354 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+ pg_checksum_context *checksum_ctx;
+
+ /* Hold information for a member file verification */
+ manifest_file *mfile;
+ int64 received_bytes;
+ bool verify_checksum;
+ bool verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void verify_member_header(astreamer *streamer, astreamer_member *member);
+static void verify_member_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *buffer, int buffer_len);
+static void verify_member_control_data(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void reset_member_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ if (!context->skip_checksums)
+ streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ return &streamer->base;
+}
+
+/*
+ * The main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+
+ /*
+ * Perform the initial check and setup verification steps.
+ */
+ verify_member_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Process the member content according to the flags set by the
+ * member header processing routine for checksum and control data
+ * verification.
+ */
+ if (mystreamer->verify_checksum)
+ verify_member_checksum(streamer, member, data, len);
+
+ if (mystreamer->verify_control_data)
+ verify_member_control_data(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * Reset the temporary information stored for the verification.
+ */
+ reset_member_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verifies the tar member entry if it corresponds to a file in the backup
+ * manifest. If the archive being processed is a tablespace, prepares the
+ * required file path for subsequent operations. Finally, determines if
+ * checksum verification and control data verification need to be performed
+ * during file content processing
+ */
+static void
+verify_member_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+
+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;
+
+ /*
+ * The backup_manifest stores a relative path to the base directory for
+ * files belong tablespace, whereas <tablespaceoid>.tar doesn't. Prepare
+ * the required path, otherwise, the manfiest entry verification will
+ * fail.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and control data verification.
+ *
+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, having a
+ * single flag would be more efficient.
+ */
+ mystreamer->verify_checksum =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_control_data =
+ should_verify_control_data(mystreamer->context->manifest, m);
+
+ /* Initialize the context required for checksum verification. */
+ if (mystreamer->verify_checksum &&
+ pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, m->pathname);
+
+ /*
+ * Checksum verification cannot be performed without proper context
+ * initialization.
+ */
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Computes the checksum incrementally for the received file content, and
+ * finally calls the routine for checksum verification, similar to
+ * verify_file_checksum().
+ *
+ * The caller should pass a correctly initialized checksum_ctx, which will be
+ * used for incremental checksum computation. Once the complete file content is
+ * received (tracked using received_bytes), the final checksum verification
+ * happens.
+ */
+static void
+verify_member_checksum(astreamer *streamer, astreamer_member *member,
+ const char *buffer, int buffer_len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ verifier_context *context = mystreamer->context;
+ manifest_file *m = mystreamer->mfile;
+ const char *relpath = m->pathname;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
+ /*
+ * Mark it false to avoid unexpected re-entrance for the same file content
+ * (e.g. returned in error should not be revisited).
+ */
+ Assert(mystreamer->verify_checksum);
+ mystreamer->verify_checksum = false;
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, relpath) == 0);
+
+ /*
+ * The checksum context should match the type noted in the backup
+ * manifest.
+ */
+ Assert(checksum_ctx->type == m->checksum_type);
+
+ /* Update the total count of computed checksum bytes. */
+ mystreamer->received_bytes += buffer_len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) buffer, buffer_len) < 0)
+ {
+ report_backup_error(context, "could not update checksum of file \"%s\"",
+ relpath);
+ return;
+ }
+
+ /* Yet to receive the full content of the file. */
+ if (mystreamer->received_bytes < m->size)
+ {
+ mystreamer->verify_checksum = true;
+ return;
+ }
+
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, checksum_ctx, checksumbuf);
+}
+
+/*
+ * Prepare the control data from the received tar member contents, which are
+ * supposed to be from the pg_control file, including CRC calculation. Then,
+ * call the routines that perform the final verification of the control file
+ * information.
+ */
+static void
+verify_member_control_data(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(manifest->version != 1);
+
+ /* Mark it as false to avoid unexpected re-entrance */
+ Assert(mystreamer->verify_control_data);
+ mystreamer->verify_control_data = false;
+
+ /* Should have whole control file data. */
+ if (!astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData)))
+ {
+ mystreamer->verify_control_data = true;
+ return;
+ }
+
+ pg_log_debug("%s: reading \"%s\"", mystreamer->archive_name,
+ member->pathname);
+
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: could not read control file: read %d of %zu",
+ mystreamer->archive_name, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data,
+ sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc,
+ (char *) (&control_file),
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(&control_file, member->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+reset_member_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->received_bytes = 0;
+ mystreamer->verify_checksum = false;
+ mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 88196cca4e0..42b776eda18 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,11 @@
#include "common/compression.h"
#include "common/parse_manifest.h"
+#include "common/relpath.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
#include "pg_verifybackup.h"
+#include "pgtar.h"
#include "pgtime.h"
/*
@@ -39,6 +41,16 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
+/*
+ * Tar archive information needed for content verification.
+ */
+typedef struct tarFile
+{
+ char *relpath;
+ Oid tblspc_oid;
+ pg_compress_algorithm compress_algorithm;
+} tarFile;
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -61,7 +73,18 @@ static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+ char *relpath, char *fullpath,
+ SimplePtrList *tarFiles);
+static void verify_plain_file(verifier_context *context,
+ char *relpath, char *fullpath,
+ size_t filesize);
+static void verify_tar_file_name(verifier_context *context, char *relpath,
+ char *fullpath, int64 filesize,
+ SimplePtrList *tarFiles);
+static void verify_tar_file_contents(verifier_context *context,
+ SimplePtrList *tarFiles);
+static void verify_tar_contents(verifier_context *context, char *relpath,
+ char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -70,6 +93,10 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid,
+ pg_compress_algorithm compress_algo);
static void progress_report(bool finished);
static void usage(void);
@@ -148,6 +175,10 @@ main(int argc, char **argv)
*/
simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -556,6 +587,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
{
DIR *dir;
struct dirent *dirent;
+ SimplePtrList tarFiles = {NULL, NULL};
dir = opendir(fullpath);
if (dir == NULL)
@@ -595,12 +627,17 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_backup_file(context, newrelpath, newfullpath);
+ verify_backup_file(context, newrelpath, newfullpath, &tarFiles);
pfree(newfullpath);
pfree(newrelpath);
}
+ /* Perform the final verification of the tar contents, if any. */
+ Assert(tarFiles.head == NULL || context->format == 't');
+ if (tarFiles.head != NULL)
+ verify_tar_file_contents(context, &tarFiles);
+
if (closedir(dir))
{
report_backup_error(context,
@@ -610,16 +647,18 @@ verify_backup_directory(verifier_context *context, char *relpath,
}
/*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
*
* The arguments to this function have the same meaning as the arguments to
- * verify_backup_directory.
+ * verify_backup_directory. The additional argument outputs the list of tar
+ * archive information if any.
*/
static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_backup_file(verifier_context *context, char *relpath, char *fullpath,
+ SimplePtrList *tarFiles)
{
struct stat sb;
- manifest_file *m;
if (stat(fullpath, &sb) != 0)
{
@@ -652,8 +691,36 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Do the further verifications */
+ if (context->format == 'p')
+ verify_plain_file(context, relpath, fullpath, sb.st_size);
+ else
+ {
+ /*
+ * This is preparatory work for the tar format backup verification,
+ * where we verify only the archive file name and its compression
+ * type. The final verification will be carried out after listing all
+ * the archives from the backup directory.
+ */
+ verify_tar_file_name(context, relpath, fullpath, sb.st_size, tarFiles);
+ }
+}
+
+/*
+ * Verify one plain file or a symlink.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_plain_file(verifier_context *context, char *relpath, char *fullpath,
+ size_t filesize)
+{
+ manifest_file *m;
+
/* Check the backup manifest entry for this file. */
- m = verify_manifest_entry(context, relpath, sb.st_size);
+ m = verify_manifest_entry(context, relpath, filesize);
/* Validate the manifest system identifier */
if (should_verify_control_data(context->manifest, m))
@@ -676,6 +743,202 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
total_size += m->size;
}
+/*
+ * Verify one tar archive file.
+ *
+ * This does not perform a complete verification; it only performs basic
+ * validation of the tar format backup file, detects the compression type, and
+ * appends that information to the tarFiles list. An error will be reported if
+ * the tar archive name or compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_file. The additional argument is the file size for progress
+ * report.
+ */
+static void
+verify_tar_file_name(verifier_context *context, char *relpath, char *fullpath,
+ int64 filesize, SimplePtrList *tarFiles)
+{
+ Oid tblspc_oid = InvalidOid;
+ int file_name_len;
+ int file_extn_len;
+ pg_compress_algorithm compress_algorithm;
+ tarFile *tar_file;
+
+ /* Should be tar format backup */
+ Assert(context->format == 't');
+
+ /* Find the compression type of the tar file */
+ if (strstr(relpath, ".tgz") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ file_extn_len = 4; /* length of ".tgz" */
+ }
+ else if (strstr(relpath, ".tar.gz") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ file_extn_len = 7; /* length of ".tar.gz" */
+ }
+ else if (strstr(relpath, ".tar.lz4") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_LZ4;
+ file_extn_len = 8; /* length of ".tar.lz4" */
+ }
+ else if (strstr(relpath, ".tar.zst") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_ZSTD;
+ file_extn_len = 8; /* length of ".tar.zst" */
+ }
+ else if (strstr(relpath, ".tar") != NULL)
+ {
+ compress_algorithm = PG_COMPRESSION_NONE;
+ file_extn_len = 4; /* length of ".tar" */
+ }
+ else
+ {
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ return;
+ }
+
+ /*
+ * We expect tar archive files of backing up the main directory and
+ * tablespace.
+ *
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar and the tablespace directory to <tablespaceoid>.tar, followed
+ * by a compression type extension such as .gz, .lz4, or .zst.
+ */
+ file_name_len = strlen(relpath);
+ if (strspn(relpath, "0123456789") == (file_name_len - file_extn_len))
+ {
+ /*
+ * Since the file matches the <tablespaceoid>.tar format, extract the
+ * tablespaceoid, which is needed to prepare the paths of the files
+ * belonging to that tablespace relative to the base directory.
+ */
+ tblspc_oid = strtoi64(relpath, NULL, 10);
+ }
+ /* Otherwise, it should be a base.tar file; if not, raise an error. */
+ else if (strncmp("base", relpath, file_name_len - file_extn_len) != 0)
+ {
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ return;
+ }
+
+ /*
+ * Append the information to the list for complete verification at a later
+ * stage.
+ */
+ tar_file = pg_malloc(sizeof(tarFile));
+ tar_file->relpath = pstrdup(relpath);
+ tar_file->tblspc_oid = tblspc_oid;
+ tar_file->compress_algorithm = compress_algorithm;
+
+ simple_ptr_list_append(tarFiles, tar_file);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress)
+ total_size += filesize;
+}
+
+/*
+ * This is the final part of tar file verification, which prepares the archive
+ * streamer stack according to the tar file compression format for each tar
+ * archive and invokes them for reading, decompressing, and ultimately
+ * verifying the contents.
+ *
+ * The arguments to this function should be a list of valid tar archives to
+ * verify, and the allocation will be freed once the verification is complete.
+ */
+static void
+verify_tar_file_contents(verifier_context *context, SimplePtrList *tarFiles)
+{
+ SimplePtrListCell *cell;
+
+ progress_report(false);
+
+ for (cell = tarFiles->head; cell != NULL; cell = cell->next)
+ {
+ tarFile *tar_file = (tarFile *) cell->ptr;
+ astreamer *streamer;
+ char *fullpath;
+
+ /* Prepare archive streamer stack */
+ streamer = create_archive_verifier(context,
+ tar_file->relpath,
+ tar_file->tblspc_oid,
+ tar_file->compress_algorithm);
+
+ /* Compute the full pathname to the target file. */
+ fullpath = psprintf("%s/%s", context->backup_directory,
+ tar_file->relpath);
+
+ /* Invoke the streamer for reading, decompressing, and verifying. */
+ verify_tar_contents(context, tar_file->relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ pfree(tar_file->relpath);
+ pfree(tar_file);
+ pfree(fullpath);
+
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+ }
+ simple_ptr_list_destroy(tarFiles);
+
+ progress_report(true);
+}
+
+/*
+ * Performs the actual work for tar content verification. It reads a given tar
+ * file in predefined chunks and passes it to the streamer, which initiates
+ * routines for decompression (if necessary) and then verifies each member
+ * within the tar archive.
+ */
+static void
+verify_tar_contents(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ {
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ /* Report progress */
+ done_size += rc;
+ progress_report(false);
+ }
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1044,6 +1307,42 @@ find_backup_format(verifier_context *context)
return result;
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(context->format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algo == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the global variables.
*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 856e8947c1d..db847a59657 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
/*
@@ -128,4 +129,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
extern bool should_ignore_relpath(verifier_context *context,
const char *relpath);
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 547d14b3e7c..47b5f0edcc7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3329,6 +3329,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
@@ -3950,6 +3951,7 @@ substitute_phv_relids_context
subxids_array_status
symbol
tablespaceinfo
+tarFile
td_entry
teSection
temp_tablespaces_extra
--
2.18.0
v9-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v9-0008-Refactor-split-verify_control_file.patchDownload
From 3ba9b317cc313be4c3927b59d1d967458f5bcb66 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 15:47:26 +0530
Subject: [PATCH v9 08/12] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file containt instead, that needs to be verified in
the same way. For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 44 +++++++++++------------
src/bin/pg_verifybackup/pg_verifybackup.h | 15 ++++++++
2 files changed, 35 insertions(+), 24 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e4f499fcd37..5adf24e8f90 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -625,14 +622,20 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the manifest system identifier */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
/* Update statistics for progress report, if necessary */
if (show_progress && !context->skip_checksums &&
@@ -681,18 +684,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -708,9 +707,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 12812cf5584..42d01c26466 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
#include "common/controldata_utils.h"
#include "common/hashfn_unstable.h"
+#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
@@ -44,6 +45,17 @@ typedef struct manifest_file
(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
(((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1. This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -103,6 +115,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
uint8 *checksumbuf);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v9-0012-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v9-0012-pg_verifybackup-Tests-and-document.patchDownload
From 6004858acf1740710f529d526ef70c96dee0bf9a Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 7 Aug 2024 18:15:29 +0530
Subject: [PATCH v9 12/12] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 42 ++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 6 +-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 72 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..60f771c7663 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. The main data directory's contents
+ will be written to a file named <filename>base.tar</filename>,
+ and each other tablespace will be written to a separate tar file
+ named after that tablespace's OID.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..ca5b0402b7d 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,11 +17,11 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
@@ -31,7 +31,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v9-0010-pg_verifybackup-Add-backup-format-and-compression.patchapplication/x-patch; name=v9-0010-pg_verifybackup-Add-backup-format-and-compression.patchDownload
From 0420df794f89344b4b96df0a767e1ea186bcacce Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v9 10/12] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 74 ++++++++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 1 +
2 files changed, 73 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5adf24e8f90..88196cca4e0 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
#include <sys/stat.h>
#include <time.h>
+#include "common/compression.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -56,6 +57,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_backup_file(verifier_context *context,
@@ -91,6 +93,7 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
@@ -112,6 +115,7 @@ main(int argc, char **argv)
progname = get_progname(argv[0]);
memset(&context, 0, sizeof(context));
+ context.format = '\0';
if (argc > 1)
{
@@ -148,7 +152,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -167,6 +171,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ context.format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ context.format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -214,11 +227,26 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Determine the backup format if it hasn't been specified. */
+ if (context.format == '\0')
+ context.format = find_backup_format(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (context.format != 'p')
+ {
+ pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+ pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -273,8 +301,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!context.skip_checksums)
+ if (!context.skip_checksums && context.format == 'p')
verify_backup_checksums(&context);
/*
@@ -975,6 +1008,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(context->format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ if (stat(path, &sb) == 0)
+ result = 'p';
+ else
+ {
+ if (errno != ENOENT)
+ {
+ pg_log_error("cannot determine backup format");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Otherwise, it is assumed to be a TAR backup. */
+ result = 't';
+ }
+ pfree(path);
+
+ return result;
+}
+
/*
* Print a progress report based on the global variables.
*
@@ -1032,6 +1101,7 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 42d01c26466..856e8947c1d 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -105,6 +105,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ char format; /* backup format: p(lain)/t(ar) */
bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
--
2.18.0
v9-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_d.patchapplication/x-patch; name=v9-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_d.patchDownload
From 18887cb590f950cdbc7f4e5bc11812f4e85ccfdd Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 16:01:33 +0530
Subject: [PATCH v9 09/12] Add simple_ptr_list_destroy() and
simple_ptr_list_destroy_deep() API.
We didn't have any helper function to destroy SimplePtrList, likely
because it wasn't needed before, but it's required in a later patch in
this set. I've added two functions for this purpose, inspired by
list_free() and list_free_deep().
---
src/fe_utils/simple_list.c | 39 ++++++++++++++++++++++++++++++
src/include/fe_utils/simple_list.h | 2 ++
2 files changed, 41 insertions(+)
diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..9d218911c31 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,42 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
list->head = cell;
list->tail = cell;
}
+
+/*
+ * Destroy a pointer list and optionally the pointed-to element
+ */
+static void
+simple_ptr_list_destroy_private(SimplePtrList *list, bool deep)
+{
+ SimplePtrListCell *cell;
+
+ cell = list->head;
+ while (cell != NULL)
+ {
+ SimplePtrListCell *next;
+
+ next = cell->next;
+ if (deep)
+ pg_free(cell->ptr);
+ pg_free(cell);
+ cell = next;
+ }
+}
+
+/*
+ * Destroy a pointer list and the pointed-to element
+ */
+void
+simple_ptr_list_destroy_deep(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, true);
+}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, false);
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..5b7cbec8a62 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,7 @@ extern void simple_string_list_destroy(SimpleStringList *list);
extern const char *simple_string_list_not_touched(SimpleStringList *list);
extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy_deep(SimplePtrList *list);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
#endif /* SIMPLE_LIST_H */
--
2.18.0
v9-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchapplication/x-patch; name=v9-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-p.patchDownload
From 6ef19e787db0174040cd0d020085d2a78c2c78f7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:10:34 +0530
Subject: [PATCH v9 05/12] Refactor: move some part of pg_verifybackup.c to
pg_verifybackup.h
---
src/bin/pg_verifybackup/pg_verifybackup.c | 94 +------------------
src/bin/pg_verifybackup/pg_verifybackup.h | 108 ++++++++++++++++++++++
2 files changed, 112 insertions(+), 90 deletions(-)
create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c6d01d52335..384a4dd3500 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "pg_verifybackup.h"
#include "pgtime.h"
/*
@@ -40,84 +39,6 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE (128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
- uint32 status; /* hash status */
- const char *pathname;
- size_t size;
- pg_checksum_type checksum_type;
- int checksum_length;
- uint8 *checksum_payload;
- bool matched;
- bool bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX manifest_files
-#define SH_ELEMENT_TYPE manifest_file
-#define SH_KEY_TYPE const char *
-#define SH_KEY pathname
-#define SH_HASH_KEY(tb, key) hash_string(key)
-#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
-#define SH_SCOPE static inline
-#define SH_RAW_ALLOCATOR pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
- TimeLineID tli;
- XLogRecPtr start_lsn;
- XLogRecPtr end_lsn;
- struct manifest_wal_range *next;
- struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
- int version;
- uint64 system_identifier;
- manifest_files_hash *files;
- manifest_wal_range *first_wal_range;
- manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
- manifest_data *manifest;
- char *backup_directory;
- SimpleStringList ignore_list;
- bool skip_checksums;
- bool exit_on_error;
- bool saw_any_error;
-} verifier_context;
-
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -151,13 +72,6 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
-static void report_backup_error(verifier_context *context,
- const char *pg_restrict fmt,...)
- pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
- pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
static void progress_report(bool finished);
static void usage(void);
@@ -980,7 +894,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
* Update the context to indicate that we saw an error, and exit if the
* context says we should.
*/
-static void
+void
report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
{
va_list ap;
@@ -997,7 +911,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
/*
* Report a fatal error and exit
*/
-static void
+void
report_fatal_error(const char *pg_restrict fmt,...)
{
va_list ap;
@@ -1016,7 +930,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
* Note that by "prefix" we mean a parent directory; for this purpose,
* "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
*/
-static bool
+bool
should_ignore_relpath(verifier_context *context, const char *relpath)
{
SimpleStringListCell *cell;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..bd9c95c477a
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,108 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ * Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+ uint32 status; /* hash status */
+ const char *pathname;
+ size_t size;
+ pg_checksum_type checksum_type;
+ int checksum_length;
+ uint8 *checksum_payload;
+ bool matched;
+ bool bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+ (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX manifest_files
+#define SH_ELEMENT_TYPE manifest_file
+#define SH_KEY_TYPE const char *
+#define SH_KEY pathname
+#define SH_HASH_KEY(tb, key) hash_string(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ struct manifest_wal_range *next;
+ struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+ int version;
+ uint64 system_identifier;
+ manifest_files_hash *files;
+ manifest_wal_range *first_wal_range;
+ manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+ manifest_data *manifest;
+ char *backup_directory;
+ SimpleStringList ignore_list;
+ bool skip_checksums;
+ bool exit_on_error;
+ bool saw_any_error;
+} verifier_context;
+
+extern void report_backup_error(verifier_context *context,
+ const char *pg_restrict fmt,...)
+ pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+ pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context,
+ const char *relpath);
+
+#endif /* PG_VERIFYBACKUP_H */
--
2.18.0
v9-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v9-0007-Refactor-split-verify_file_checksum-function.patchDownload
From 424537524164197abacfb9ce81001ee5da04d9c5 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 16:45:55 +0530
Subject: [PATCH v9 07/12] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 18 ++++++++++++++++--
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e4288f453ef..e4f499fcd37 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -787,7 +787,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int rc;
size_t bytes_read = 0;
uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -853,8 +852,23 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
return;
}
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, checksumbuf);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, uint8 *checksumbuf)
+{
+ int checksumlen;
+ const char *relpath = m->pathname;
+
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 2e71f14669b..12812cf5584 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -100,6 +100,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ uint8 *checksumbuf);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v9-0004-Refactor-move-skip_checksums-global-variable-to-v.patchapplication/x-patch; name=v9-0004-Refactor-move-skip_checksums-global-variable-to-v.patchDownload
From 47c999edada3ce31009351ed2393b04d7ea9f67e Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 15:10:43 +0530
Subject: [PATCH v9 04/12] Refactor: move skip_checksums global variable to
verifier_context struct
To enable access to this flag in another file.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..c6d01d52335 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -113,6 +113,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
} verifier_context;
@@ -164,7 +165,6 @@ static const char *progname;
/* options */
static bool show_progress = false;
-static bool skip_checksums = false;
/* Progress indicators */
static uint64 total_size = 0;
@@ -266,7 +266,7 @@ main(int argc, char **argv)
quiet = true;
break;
case 's':
- skip_checksums = true;
+ context.skip_checksums = true;
break;
case 'w':
wal_directory = pstrdup(optarg);
@@ -363,7 +363,7 @@ main(int argc, char **argv)
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
*/
- if (!skip_checksums)
+ if (!context.skip_checksums)
verify_backup_checksums(&context);
/*
@@ -739,7 +739,8 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
verify_control_file(fullpath, context->manifest->system_identifier);
/* Update statistics for progress report, if necessary */
- if (show_progress && !skip_checksums && should_verify_checksum(m))
+ if (show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
total_size += m->size;
/*
--
2.18.0
v9-0006-Refactor-split-verify_backup_file-function.patchapplication/x-patch; name=v9-0006-Refactor-split-verify_backup_file-function.patchDownload
From 5a323df24f2d0b76fb06ac58c1c70de46056e4ee Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 14:45:04 +0530
Subject: [PATCH v9 06/12] Refactor: split verify_backup_file() function.
Move the manifest entry verification code into a new function called
verify_manifest_entry() so that it can be reused for tar backup
verification. If verify_manifest_entry() doesn't find an entry, it
reports an error as before and returns NULL to the caller. This is why
a NULL check is added to should_verify_checksum().
---
src/bin/pg_verifybackup/pg_verifybackup.c | 49 +++++++++++++++--------
src/bin/pg_verifybackup/pg_verifybackup.h | 6 ++-
2 files changed, 37 insertions(+), 18 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 384a4dd3500..e4288f453ef 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -622,6 +622,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ total_size += m->size;
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -629,40 +655,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
- /* Update statistics for progress report, if necessary */
- if (show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- total_size += m->size;
-
/*
* We don't verify checksums at this stage. We first finish verifying that
* we have the expected set of files with the expected sizes, and only
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -825,7 +840,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index bd9c95c477a..2e71f14669b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -41,7 +41,8 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
/*
* Define a hash table which we can use to store information about the files
@@ -97,6 +98,9 @@ typedef struct verifier_context
bool saw_any_error;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
On Mon, Aug 12, 2024 at 5:13 AM Amul Sul <sulamul@gmail.com> wrote:
I tried this in the attached version and made a few additional changes
based on Sravan's off-list comments regarding function names and
descriptions.Now, verification happens in two passes. The first pass simply
verifies the file names, determines their compression types, and
returns a list of valid tar files whose contents need to be verified
in the second pass. The second pass is called at the end of
verify_backup_directory() after all files in that directory have been
scanned. I named the functions for pass 1 and pass 2 as
verify_tar_file_name() and verify_tar_file_contents(), respectively.
The rest of the code flow is similar as in the previous version.In the attached patch set, I abandoned the changes, touching the
progress reporting code of plain backups by dropping the previous 0009
patch. The new 0009 patch adds missing APIs to simple_list.c to
destroy SimplePtrList. The rest of the patch numbers remain unchanged.
I think you've entangled the code paths here for plain-format backup
and tar-format backup in a way that is not very nice. I suggest
refactoring things so that verify_backup_directory() is only called
for plain-format backups, and you have some completely separate
function (perhaps verify_tar_backup) that is called for tar-format
backups. I don't think verify_backup_file() should be shared between
tar-format and plain-format backups either. Let that be just for
plain-format backups, and have separate logic for tar-format backups.
Right now you've got "if" statements in various places trying to get
all the cases correct, but I think you've missed some (and there's
also the issue of updating all the comments).
For instance, verify_backup_file() recurses into subdirectories, but
that behavior is inappropriate for a tar format backup, where
subdirectories should instead be treated like stray files: complain
that they exist. pg_verify_backup() does this:
/* If it's a directory, just recurse. */
if (S_ISDIR(sb.st_mode))
{
verify_backup_directory(context, relpath, fullpath);
return;
}
/* If it's not a directory, it should be a plain file. */
if (!S_ISREG(sb.st_mode))
{
report_backup_error(context,
"\"%s\" is not a file or directory",
relpath);
return;
}
For a plain format backup, this whole thing should be just:
/* In a tar format backup, we expect only plain files. */
if (!S_ISREG(sb.st_mode))
{
report_backup_error(context,
"\"%s\" is not a plain file",
relpath);
return;
}
Also, immediately above, you do
simple_string_list_append(&context->ignore_list, relpath), but that is
pointless in the tar-file case, and arguably wrong, if -i is going to
ignore both pathnames in the base directory and also pathnames inside
the tar files, because we could add something to the ignore list here
-- accomplishing nothing useful -- and then that ignore-list entry
could cause us to disregard a stray file with the same name present
inside one of the tar files -- which is silly. Note that the actual
point of this logic is to make sure that if we can't stat() a certain
directory, we don't go off and issue a million complaints about all
the files in that directory being missing. But this doesn't accomplish
that goal for a tar-format backup. For a tar-format backup, you'd want
to figure out which files in the manifest we don't expect to see based
on this file being inaccessible, and then arrange to suppress future
complaints about all of those files. But you can't implement that
here, because you haven't parsed the file name yet. That happens
later, in verify_tar_file_name().
You could add a whole bunch more if statements here and try to work
around these issues, but it seems pretty obviously a dead end to me.
Almost the entire function is going to end up being inside of an
if-statement. Essentially the only thing in verify_backup_file() that
should actually be the same in the plain and tar-format cases is that
you should call stat() either way and check whether it throws an
error. But that is not enough justification for trying to share the
whole function.
I find the logic in verify_tar_file_name() to be somewhat tortured as
well. The strstr() calls will match those strings anywhere in the
filename, not just at the end. But also, even if you fixed that, why
work backward from the end of the filename toward the beginning? It
would seem like it would more sense to (1) first check if the string
starts with "base" and set suffix equal to pathname+4, (2) if not,
strtol(pathname, &suffix, 10) and complain if we didn't eat at least
one character or got 0 or something too big to be an OID, (3) check
whether suffix is .tar, .tar.gz, etc.
In verify_member_checksum(), you set mystreamer->verify_checksum =
false. That would be correct if there could only ever be one call to
verify_member_checksum() per member file, but there is no such rule.
There can be (and, I think, normally will be) more than one
ASTREAMER_MEMBER_CONTENTS chunk. I'm a little confused as to how this
code passes any kind of testing.
Also in verify_member_checksum(), the mystreamer->received_bytes <
m->size seems strange. I don't think this is the right way to do
something when you reach the end of an archive member. The right way
to do that is to do it when the ASTREAMER_MEMBER_TRAILER chunk shows
up.
In verify_member_control_data(), you use astreamer_buffer_untIl(). But
that's the same buffer that is being used by verify_member_checksum(),
so I don't really understand how you expect this to work. If this code
path were ever taken, verify_member_checksum() would see the same data
more than once.
The call to pg_log_debug() in this function also seems quite random.
In a plain-format backup, we'd actually be doing something different
for pg_controldata vs. other files, namely reading it during the
initial directory scan. But here we're reading the file in exactly the
same sense as we're reading every other file, neither more nor less,
so why mention this file and not all of the others? And why at this
exact spot in the code?
I suspect that the report_fatal_error("%s: could not read control
file: read %d of %zu", ...) call is unreachable. I agree that you need
to handle the case where the control file inside the tar file is not
the expected length, and in fact I think you should probably write a
TAP test for that exact scenario to make sure it works. I bet this
doesn't. Even if it did, the error message makes no sense in context.
In the plain-format backup, this error would come from code reading
the actual bytes off the disk -- i.e. the complaint about not being
able to read the control file would come from the read() system call.
Here it doesn't.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Aug 13, 2024 at 10:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 12, 2024 at 5:13 AM Amul Sul <sulamul@gmail.com> wrote:
I tried this in the attached version and made a few additional changes
based on Sravan's off-list comments regarding function names and
descriptions.Now, verification happens in two passes. The first pass simply
verifies the file names, determines their compression types, and
returns a list of valid tar files whose contents need to be verified
in the second pass. The second pass is called at the end of
verify_backup_directory() after all files in that directory have been
scanned. I named the functions for pass 1 and pass 2 as
verify_tar_file_name() and verify_tar_file_contents(), respectively.
The rest of the code flow is similar as in the previous version.In the attached patch set, I abandoned the changes, touching the
progress reporting code of plain backups by dropping the previous 0009
patch. The new 0009 patch adds missing APIs to simple_list.c to
destroy SimplePtrList. The rest of the patch numbers remain unchanged.I think you've entangled the code paths here for plain-format backup
and tar-format backup in a way that is not very nice. I suggest
refactoring things so that verify_backup_directory() is only called
for plain-format backups, and you have some completely separate
function (perhaps verify_tar_backup) that is called for tar-format
backups. I don't think verify_backup_file() should be shared between
tar-format and plain-format backups either. Let that be just for
plain-format backups, and have separate logic for tar-format backups.
Right now you've got "if" statements in various places trying to get
all the cases correct, but I think you've missed some (and there's
also the issue of updating all the comments).For instance, verify_backup_file() recurses into subdirectories, but
that behavior is inappropriate for a tar format backup, where
subdirectories should instead be treated like stray files: complain
that they exist. pg_verify_backup() does this:/* If it's a directory, just recurse. */
if (S_ISDIR(sb.st_mode))
{
verify_backup_directory(context, relpath, fullpath);
return;
}/* If it's not a directory, it should be a plain file. */
if (!S_ISREG(sb.st_mode))
{
report_backup_error(context,
"\"%s\" is not a file or directory",
relpath);
return;
}For a plain format backup, this whole thing should be just:
/* In a tar format backup, we expect only plain files. */
if (!S_ISREG(sb.st_mode))
{
report_backup_error(context,
"\"%s\" is not a plain file",
relpath);
return;
}Also, immediately above, you do
simple_string_list_append(&context->ignore_list, relpath), but that is
pointless in the tar-file case, and arguably wrong, if -i is going to
ignore both pathnames in the base directory and also pathnames inside
the tar files, because we could add something to the ignore list here
-- accomplishing nothing useful -- and then that ignore-list entry
could cause us to disregard a stray file with the same name present
inside one of the tar files -- which is silly. Note that the actual
point of this logic is to make sure that if we can't stat() a certain
directory, we don't go off and issue a million complaints about all
the files in that directory being missing. But this doesn't accomplish
that goal for a tar-format backup. For a tar-format backup, you'd want
to figure out which files in the manifest we don't expect to see based
on this file being inaccessible, and then arrange to suppress future
complaints about all of those files. But you can't implement that
here, because you haven't parsed the file name yet. That happens
later, in verify_tar_file_name().You could add a whole bunch more if statements here and try to work
around these issues, but it seems pretty obviously a dead end to me.
Almost the entire function is going to end up being inside of an
if-statement. Essentially the only thing in verify_backup_file() that
should actually be the same in the plain and tar-format cases is that
you should call stat() either way and check whether it throws an
error. But that is not enough justification for trying to share the
whole function.
I agree with keeping verify_backup_file() separate, but I'm hesitant
about doing the same for verify_backup_directory(). Otherwise, we
might end up with nearly duplicate functions that are very similar.
Since the changes in verify_backup_directory() are minimal, I've left
it as is, let me know your thoughts on that. I’ve kept
verify_backup_file() separated for plain backup files and added a new
function, verify_tar_backup_file(), in patch 0011. To maintain
consistency, I also renamed verify_backup_file() to
verify_plain_backup_file() in patch 0006.
I find the logic in verify_tar_file_name() to be somewhat tortured as
well. The strstr() calls will match those strings anywhere in the
filename, not just at the end. But also, even if you fixed that, why
work backward from the end of the filename toward the beginning? It
would seem like it would more sense to (1) first check if the string
starts with "base" and set suffix equal to pathname+4, (2) if not,
strtol(pathname, &suffix, 10) and complain if we didn't eat at least
one character or got 0 or something too big to be an OID, (3) check
whether suffix is .tar, .tar.gz, etc.
Ok, did it this way.
In verify_member_checksum(), you set mystreamer->verify_checksum =
false. That would be correct if there could only ever be one call to
verify_member_checksum() per member file, but there is no such rule.
There can be (and, I think, normally will be) more than one
ASTREAMER_MEMBER_CONTENTS chunk. I'm a little confused as to how this
code passes any kind of testing.
I did that to avoid adding the line for every error case where we
return. The flag is re-enabled when the file contents are yet to be
received in mystreamer->received_bytes < m->size.
Also in verify_member_checksum(), the mystreamer->received_bytes <
m->size seems strange. I don't think this is the right way to do
something when you reach the end of an archive member. The right way
to do that is to do it when the ASTREAMER_MEMBER_TRAILER chunk shows
up.
Ok, I've split this into two parts: the first part handles incremental
computation at ASTREAMER_MEMBER_CONTENTS, and the second part performs
the final verification at the ASTREAMER_MEMBER_TRAILER stage.
In verify_member_control_data(), you use astreamer_buffer_untIl(). But
that's the same buffer that is being used by verify_member_checksum(),
so I don't really understand how you expect this to work. If this code
path were ever taken, verify_member_checksum() would see the same data
more than once.
No, for checksum calculation, we directly compute the checksum on the
received content (which is the caller's buffer) without copying it.
However, for control file verification, we need the entire file, so we
first copy it into a local buffer within myastremer. This local buffer
is used solely for storing control file data.
I've made some adjustments to align this code style with checksum
verification: copying will occur during the ASTREAMER_MEMBER_CONTENTS
stage, and final verification will be performed in the
ASTREAMER_MEMBER_TRAILER stage.
The call to pg_log_debug() in this function also seems quite random.
In a plain-format backup, we'd actually be doing something different
for pg_controldata vs. other files, namely reading it during the
initial directory scan. But here we're reading the file in exactly the
same sense as we're reading every other file, neither more nor less,
so why mention this file and not all of the others? And why at this
exact spot in the code?
Agreed, it was added without much thinking.
I suspect that the report_fatal_error("%s: could not read control
file: read %d of %zu", ...) call is unreachable. I agree that you need
to handle the case where the control file inside the tar file is not
the expected length, and in fact I think you should probably write a
TAP test for that exact scenario to make sure it works. I bet this
doesn't. Even if it did, the error message makes no sense in context.
In the plain-format backup, this error would come from code reading
the actual bytes off the disk -- i.e. the complaint about not being
able to read the control file would come from the read() system call.
Here it doesn't.
Agreed. Replace that with a check for an expected file size.
In addition to the mentioned changes, I have renamed functions in
astreamer_verify.c to ensure a consistent naming format.
Regards,
Amul.
Attachments:
v10-0012-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v10-0012-pg_verifybackup-Tests-and-document.patchDownload
From acabc34488287ab3346551db23b4c99fac0850a9 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 7 Aug 2024 18:15:29 +0530
Subject: [PATCH v10 12/12] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 42 ++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 6 +-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 72 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..60f771c7663 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. The main data directory's contents
+ will be written to a file named <filename>base.tar</filename>,
+ and each other tablespace will be written to a separate tar file
+ named after that tablespace's OID.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..ca5b0402b7d 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,11 +17,11 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
@@ -31,7 +31,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v10-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchapplication/x-patch; name=v10-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchDownload
From 52e11acc54815e9e5db92171b3fba9522f6da683 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v10 11/12] pg_verifybackup: Read tar files and verify its
contents
This patch implements TAR format backup verification.
For progress reporting support, we perform this verification in two
passes: the first pass calculates total_size, and the second pass
updates done_size as verification progresses.
For the verification, in the first pass, we call verify_tar_backup_file(),
which performs basic verification by expecting only base.tar or
<tablespaceoid>.tar files and raises an error for any other files. It
also determines the compression type of the archive file. All this
information is stored in a newly added tarFile struct, which is
appended to a list that will be used in the second pass (by
verify_tar_content()) for the final verification. In the second pass,
the tar archives are read, decompressed, and the required verification
is carried out.
For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_content(), depending on its
compression type which is detected in the first pass.
When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.
Unlike in a plain backup, checksum verification here occurs in two
steps. First, as the contents are received, the checksum is computed
incrementally (see member_compute_checksum). Then, at the end of
processing the member file, the final verification is performed (see
member_verify_checksum).
Similarly, during the content receiving stage, if the file is
pg_control, the data will be copied into a local buffer (see
member_copy_control_data). The verification will then be carried out
at the end of the member file processing (see member_verify_control_data)
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 355 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 283 +++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 6 +
src/tools/pgindent/typedefs.list | 2 +
6 files changed, 649 insertions(+), 7 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..28a3f976877
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,355 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+ pg_checksum_context *checksum_ctx;
+
+ /* Hold information for a member file verification */
+ manifest_file *mfile;
+ int64 received_bytes;
+ bool verify_checksum;
+ bool verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void member_verify_header(astreamer *streamer, astreamer_member *member);
+static void member_compute_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_checksum(astreamer *streamer);
+static void member_copy_control_data(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_control_data(astreamer *streamer);
+static void member_reset_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ if (!context->skip_checksums)
+ streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ return &streamer->base;
+}
+
+/*
+ * The main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+
+ /*
+ * Perform the initial check and setup verification steps.
+ */
+ member_verify_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Process the member content according to the flags set by the
+ * member header processing routine for checksum computation and
+ * copy control data to local buffer.
+ */
+ if (mystreamer->verify_checksum)
+ member_compute_checksum(streamer, member, data, len);
+
+ if (mystreamer->verify_control_data)
+ member_copy_control_data(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /* Do the final checksum verification. */
+ if (mystreamer->verify_checksum)
+ member_verify_checksum(streamer);
+
+ /* Do the control data verification */
+ if (mystreamer->verify_control_data)
+ member_verify_control_data(streamer);
+
+ /*
+ * Reset the temporary information stored for the verification.
+ */
+ member_reset_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verifies the tar member entry if it corresponds to a file in the backup
+ * manifest. If the archive being processed is a tablespace, prepares the
+ * required file path for subsequent operations. Finally, determines if
+ * checksum verification and control data verification need to be performed
+ * during file content processing
+ */
+static void
+member_verify_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+
+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;
+
+ /*
+ * The backup_manifest stores a relative path to the base directory for
+ * files belong tablespace, whereas <tablespaceoid>.tar doesn't. Prepare
+ * the required path, otherwise, the manfiest entry verification will
+ * fail.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and control data verification.
+ *
+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, having a
+ * single flag would be more efficient.
+ */
+ mystreamer->verify_checksum =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_control_data =
+ should_verify_control_data(mystreamer->context->manifest, m);
+
+ /* Initialize the context required for checksum verification. */
+ if (mystreamer->verify_checksum &&
+ pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, m->pathname);
+
+ /*
+ * Checksum verification cannot be performed without proper context
+ * initialization.
+ */
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Computes the checksum incrementally for the received file content.
+ *
+ * The caller should pass a correctly initialized checksum_ctx, which will be
+ * used for incremental checksum computation.
+ */
+static void
+member_compute_checksum(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ manifest_file *m = mystreamer->mfile;
+
+ Assert(mystreamer->verify_checksum);
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, m->pathname) == 0);
+
+ /*
+ * The checksum context should match the type noted in the backup
+ * manifest.
+ */
+ Assert(checksum_ctx->type == m->checksum_type);
+
+ /* Update the total count of computed checksum bytes. */
+ mystreamer->received_bytes += len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "could not update checksum of file \"%s\"",
+ m->pathname);
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Perform the final computation and checksum verification after the entire
+ * file content has been processed.
+ */
+static void
+member_verify_checksum(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(mystreamer->verify_checksum);
+
+ verify_checksum(mystreamer->context, mystreamer->mfile,
+ mystreamer->checksum_ctx, mystreamer->received_bytes);
+}
+
+/*
+ * Stores the pg_control file contents into a local buffer; we need the entire
+ * control file data for verification.
+ */
+static void
+member_copy_control_data(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(((astreamer_verify *) streamer)->verify_control_data);
+
+ /* Copy enough control file data needed for verification. */
+ astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData));
+}
+
+/*
+ * Performs the CRC calculation of pg_control data and then calls the routines
+ * that execute the final verification of the control file information.
+ */
+static void
+member_verify_control_data(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0);
+ Assert(mystreamer->verify_control_data);
+
+ /* Should have enough control file data needed for verification. */
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: unexpected control file size: %d, should be %zu",
+ mystreamer->archive_name, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data, sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc, (char *) (&control_file), offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(&control_file, mystreamer->mfile->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+member_reset_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->received_bytes = 0;
+ mystreamer->verify_checksum = false;
+ mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3dcb174e0a9..aed1adef4e7 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,12 @@
#include "common/compression.h"
#include "common/parse_manifest.h"
+#include "common/relpath.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "limits.h"
#include "pg_verifybackup.h"
+#include "pgtar.h"
#include "pgtime.h"
/*
@@ -39,6 +42,16 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
+/*
+ * Tar archive information needed for content verification.
+ */
+typedef struct tarFile
+{
+ char *relpath;
+ Oid tblspc_oid;
+ pg_compress_algorithm compress_algorithm;
+} tarFile;
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -60,8 +73,14 @@ static void report_manifest_error(JsonManifestParseContext *context,
static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_plain_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath);
+static void verify_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarFiles);
+static void verify_tar_file_contents(verifier_context *context,
+ SimplePtrList *tarFiles);
+static void verify_tar_contents(verifier_context *context, char *relpath,
+ char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -70,6 +89,10 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid,
+ pg_compress_algorithm compress_algo);
static void progress_report(bool finished);
static void usage(void);
@@ -148,6 +171,10 @@ main(int argc, char **argv)
*/
simple_string_list_append(&context.ignore_list, "backup_manifest");
simple_string_list_append(&context.ignore_list, "pg_wal");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -556,6 +583,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
{
DIR *dir;
struct dirent *dirent;
+ SimplePtrList tarFiles = {NULL, NULL};
dir = opendir(fullpath);
if (dir == NULL)
@@ -595,12 +623,23 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_plain_backup_file(context, newrelpath, newfullpath);
+ {
+ if (context->format == 'p')
+ verify_plain_backup_file(context, newrelpath, newfullpath);
+ else
+ verify_tar_backup_file(context, newrelpath, newfullpath,
+ &tarFiles);
+ }
pfree(newfullpath);
pfree(newrelpath);
}
+ /* Perform the final verification of the tar contents, if any. */
+ Assert(tarFiles.head == NULL || context->format == 't');
+ if (tarFiles.head != NULL)
+ verify_tar_file_contents(context, &tarFiles);
+
if (closedir(dir))
{
report_backup_error(context,
@@ -610,7 +649,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
}
/*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
*
* The arguments to this function have the same meaning as the arguments to
* verify_backup_directory.
@@ -677,6 +717,205 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
total_size += m->size;
}
+/*
+ * Verify one tar archive file.
+ *
+ * This function does not perform a complete verification; it only carries out
+ * basic validation of the tar format backup file, detects the compression
+ * type, and appends that information to the tarFiles list. An error will be
+ * reported if the tar archive is inaccessible, or if the file type, name, or
+ * compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_plain_backup_file. The additional argument is the file size for
+ * progress report.
+ */
+static void
+verify_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarFiles)
+{
+ struct stat sb;
+ Oid tblspc_oid = InvalidOid;
+ pg_compress_algorithm compress_algorithm;
+ tarFile *tar_file;
+ char *suffix = NULL;
+
+ /* Should be tar format backup */
+ Assert(context->format == 't');
+
+ /* Get file information */
+ if (stat(fullpath, &sb) != 0)
+ {
+ report_backup_error(context,
+ "could not stat file or directory \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ /* In a tar format backup, we expect only plain files. */
+ if (!S_ISREG(sb.st_mode))
+ {
+ report_backup_error(context,
+ "\"%s\" is not a plain file",
+ relpath);
+ return;
+ }
+
+ /*
+ * We expect tar archive files of backing up the main directory and
+ * tablespace.
+ *
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar and the tablespace directory to <tablespaceoid>.tar, followed
+ * by a compression type extension such as .gz, .lz4, or .zst.
+ */
+ if (strncmp("base", relpath, 4) == 0)
+ suffix = relpath + 4;
+ else
+ {
+ /* Expected a <tablespaceoid>.tar file here. */
+ int64 num = strtoi64(relpath, &suffix, 10);
+
+ /*
+ * Report an error if we didn't consume at least one character, if the
+ * result is 0, or if the value is too large to be a valid OID.
+ */
+ if (suffix == NULL || (num <= 0) || (num > OID_MAX))
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ tblspc_oid = (Oid) num;
+ }
+
+ /* Now, check the compression type of the tar file */
+ if (strcmp(suffix, ".tar") == 0)
+ compress_algorithm = PG_COMPRESSION_NONE;
+ else if (strcmp(suffix, ".tgz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.gz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.lz4") == 0)
+ compress_algorithm = PG_COMPRESSION_LZ4;
+ else if (strcmp(suffix, ".tar.zst") == 0)
+ compress_algorithm = PG_COMPRESSION_ZSTD;
+ else
+ {
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ return;
+ }
+
+ /*
+ * Append the information to the list for complete verification at a later
+ * stage.
+ */
+ tar_file = pg_malloc(sizeof(tarFile));
+ tar_file->relpath = pstrdup(relpath);
+ tar_file->tblspc_oid = tblspc_oid;
+ tar_file->compress_algorithm = compress_algorithm;
+
+ simple_ptr_list_append(tarFiles, tar_file);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress)
+ total_size += sb.st_size;
+}
+
+/*
+ * This is the final part of tar file verification, which prepares the archive
+ * streamer stack according to the tar file compression format for each tar
+ * archive and invokes them for reading, decompressing, and ultimately
+ * verifying the contents.
+ *
+ * The arguments to this function should be a list of valid tar archives to
+ * verify, and the allocation will be freed once the verification is complete.
+ */
+static void
+verify_tar_file_contents(verifier_context *context, SimplePtrList *tarFiles)
+{
+ SimplePtrListCell *cell;
+
+ progress_report(false);
+
+ for (cell = tarFiles->head; cell != NULL; cell = cell->next)
+ {
+ tarFile *tar_file = (tarFile *) cell->ptr;
+ astreamer *streamer;
+ char *fullpath;
+
+ /* Prepare archive streamer stack */
+ streamer = create_archive_verifier(context,
+ tar_file->relpath,
+ tar_file->tblspc_oid,
+ tar_file->compress_algorithm);
+
+ /* Compute the full pathname to the target file. */
+ fullpath = psprintf("%s/%s", context->backup_directory,
+ tar_file->relpath);
+
+ /* Invoke the streamer for reading, decompressing, and verifying. */
+ verify_tar_contents(context, tar_file->relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ pfree(tar_file->relpath);
+ pfree(tar_file);
+ pfree(fullpath);
+
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+ }
+ simple_ptr_list_destroy(tarFiles);
+
+ progress_report(true);
+}
+
+/*
+ * Performs the actual work for tar content verification. It reads a given tar
+ * file in predefined chunks and passes it to the streamer, which initiates
+ * routines for decompression (if necessary) and then verifies each member
+ * within the tar archive.
+ */
+static void
+verify_tar_contents(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ {
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ /* Report progress */
+ done_size += rc;
+ progress_report(false);
+ }
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1045,6 +1284,42 @@ find_backup_format(verifier_context *context)
return result;
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(context->format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algo == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the global variables.
*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 1bba4e7ea92..963ec71e270 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
/*
@@ -128,4 +129,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
extern bool should_ignore_relpath(verifier_context *context,
const char *relpath);
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 547d14b3e7c..47b5f0edcc7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3329,6 +3329,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
@@ -3950,6 +3951,7 @@ substitute_phv_relids_context
subxids_array_status
symbol
tablespaceinfo
+tarFile
td_entry
teSection
temp_tablespaces_extra
--
2.18.0
v10-0010-pg_verifybackup-Add-backup-format-and-compressio.patchapplication/x-patch; name=v10-0010-pg_verifybackup-Add-backup-format-and-compressio.patchDownload
From 1b07c42e3ea2b4a1b3b463611d93f8fa94426260 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v10 10/12] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 74 ++++++++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 1 +
2 files changed, 73 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index ddc6ed7471b..3dcb174e0a9 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
#include <sys/stat.h>
#include <time.h>
+#include "common/compression.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -56,6 +57,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
@@ -91,6 +93,7 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
@@ -112,6 +115,7 @@ main(int argc, char **argv)
progname = get_progname(argv[0]);
memset(&context, 0, sizeof(context));
+ context.format = '\0';
if (argc > 1)
{
@@ -148,7 +152,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -167,6 +171,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ context.format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ context.format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -214,11 +227,26 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Determine the backup format if it hasn't been specified. */
+ if (context.format == '\0')
+ context.format = find_backup_format(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (context.format != 'p')
+ {
+ pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+ pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -273,8 +301,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!context.skip_checksums)
+ if (!context.skip_checksums && context.format == 'p')
verify_backup_checksums(&context);
/*
@@ -976,6 +1009,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(context->format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ if (stat(path, &sb) == 0)
+ result = 'p';
+ else
+ {
+ if (errno != ENOENT)
+ {
+ pg_log_error("cannot determine backup format");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Otherwise, it is assumed to be a TAR backup. */
+ result = 't';
+ }
+ pfree(path);
+
+ return result;
+}
+
/*
* Print a progress report based on the global variables.
*
@@ -1033,6 +1102,7 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 56fbb731337..1bba4e7ea92 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -105,6 +105,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ char format; /* backup format: p(lain)/t(ar) */
bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
--
2.18.0
v10-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchapplication/x-patch; name=v10-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchDownload
From 6a5086e67647bbc90e71e469d5195b76b6ba9f4b Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 16:01:33 +0530
Subject: [PATCH v10 09/12] Add simple_ptr_list_destroy() and
simple_ptr_list_destroy_deep() API.
We didn't have any helper function to destroy SimplePtrList, likely
because it wasn't needed before, but it's required in a later patch in
this set. I've added two functions for this purpose, inspired by
list_free() and list_free_deep().
---
src/fe_utils/simple_list.c | 39 ++++++++++++++++++++++++++++++
src/include/fe_utils/simple_list.h | 2 ++
2 files changed, 41 insertions(+)
diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..9d218911c31 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,42 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
list->head = cell;
list->tail = cell;
}
+
+/*
+ * Destroy a pointer list and optionally the pointed-to element
+ */
+static void
+simple_ptr_list_destroy_private(SimplePtrList *list, bool deep)
+{
+ SimplePtrListCell *cell;
+
+ cell = list->head;
+ while (cell != NULL)
+ {
+ SimplePtrListCell *next;
+
+ next = cell->next;
+ if (deep)
+ pg_free(cell->ptr);
+ pg_free(cell);
+ cell = next;
+ }
+}
+
+/*
+ * Destroy a pointer list and the pointed-to element
+ */
+void
+simple_ptr_list_destroy_deep(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, true);
+}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, false);
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..5b7cbec8a62 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,7 @@ extern void simple_string_list_destroy(SimpleStringList *list);
extern const char *simple_string_list_not_touched(SimpleStringList *list);
extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy_deep(SimplePtrList *list);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
#endif /* SIMPLE_LIST_H */
--
2.18.0
v10-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v10-0008-Refactor-split-verify_control_file.patchDownload
From 0542b8d4d620a181f06a16339202928b7ec3ec8c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 15:47:26 +0530
Subject: [PATCH v10 08/12] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file contents instead, that needs to be verified in
the same way. For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 44 +++++++++++------------
src/bin/pg_verifybackup/pg_verifybackup.h | 15 ++++++++
2 files changed, 35 insertions(+), 24 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c8543cb4f7f..ddc6ed7471b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -626,14 +623,20 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the manifest system identifier */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
/* Update statistics for progress report, if necessary */
if (show_progress && !context->skip_checksums &&
@@ -682,18 +685,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -709,9 +708,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 93859d9d541..56fbb731337 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
#include "common/controldata_utils.h"
#include "common/hashfn_unstable.h"
+#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
@@ -44,6 +45,17 @@ typedef struct manifest_file
(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
(((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1. This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -103,6 +115,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
int64 bytes_read);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v10-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v10-0007-Refactor-split-verify_file_checksum-function.patchDownload
From 6c9cb1c2725b9a60c5e549808555b15b06ea44e9 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 15:14:15 +0530
Subject: [PATCH v10 07/12] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 20 +++++++++++++++++---
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index f85dfefab81..c8543cb4f7f 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -787,8 +787,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int fd;
int rc;
size_t bytes_read = 0;
- uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -839,6 +837,22 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
if (rc < 0)
return;
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, bytes_read);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, int64 bytes_read)
+{
+ const char *relpath = m->pathname;
+ int checksumlen;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
/*
* Double-check that we read the expected number of bytes from the file.
* Normally, a file size mismatch would be caught in verify_manifest_entry
@@ -855,7 +869,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
}
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 2e71f14669b..93859d9d541 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -100,6 +100,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ int64 bytes_read);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v10-0006-Refactor-split-verify_backup_file-function-and-r.patchapplication/x-patch; name=v10-0006-Refactor-split-verify_backup_file-function-and-r.patchDownload
From deba61f678fb652056d51889103e547d016dc6da Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 10:42:37 +0530
Subject: [PATCH v10 06/12] Refactor: split verify_backup_file() function and
rename it.
The function verify_backup_file() has now been renamed to
verify_plain_backup_file() to make it clearer that it is specifically
used for verifying files in a plain backup. Similarly, in a future
patch, we would have a verify_tar_backup_file() function for
verifying TAR backup files.
In addition to that, moved the manifest entry verification code into a
new function called verify_manifest_entry() so that it can be reused
for tar backup verification. If verify_manifest_entry() doesn't find
an entry, it reports an error as before and returns NULL to the
caller. This is why a NULL check is added to should_verify_checksum().
---
src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++++++++++++++--------
src/bin/pg_verifybackup/pg_verifybackup.h | 6 ++-
2 files changed, 42 insertions(+), 22 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 384a4dd3500..f85dfefab81 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -59,8 +59,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context,
+ char *relpath, char *fullpath);
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
@@ -565,7 +565,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_backup_file(context, newrelpath, newfullpath);
+ verify_plain_backup_file(context, newrelpath, newfullpath);
pfree(newfullpath);
pfree(newrelpath);
@@ -586,7 +586,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
* verify_backup_directory.
*/
static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath)
{
struct stat sb;
manifest_file *m;
@@ -622,6 +623,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ total_size += m->size;
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -629,40 +656,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
- /* Update statistics for progress report, if necessary */
- if (show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- total_size += m->size;
-
/*
* We don't verify checksums at this stage. We first finish verifying that
* we have the expected set of files with the expected sizes, and only
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -825,7 +841,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index bd9c95c477a..2e71f14669b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -41,7 +41,8 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
/*
* Define a hash table which we can use to store information about the files
@@ -97,6 +98,9 @@ typedef struct verifier_context
bool saw_any_error;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
v10-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-.patchapplication/x-patch; name=v10-0005-Refactor-move-some-part-of-pg_verifybackup.c-to-.patchDownload
From dea8e0cd58a41c0912a5dff3ed32ada80ce0034b Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:10:34 +0530
Subject: [PATCH v10 05/12] Refactor: move some part of pg_verifybackup.c to
pg_verifybackup.h
---
src/bin/pg_verifybackup/pg_verifybackup.c | 94 +------------------
src/bin/pg_verifybackup/pg_verifybackup.h | 108 ++++++++++++++++++++++
2 files changed, 112 insertions(+), 90 deletions(-)
create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c6d01d52335..384a4dd3500 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
#include <sys/stat.h>
#include <time.h>
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "pg_verifybackup.h"
#include "pgtime.h"
/*
@@ -40,84 +39,6 @@
*/
#define ESTIMATED_BYTES_PER_MANIFEST_LINE 100
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE (128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
- uint32 status; /* hash status */
- const char *pathname;
- size_t size;
- pg_checksum_type checksum_type;
- int checksum_length;
- uint8 *checksum_payload;
- bool matched;
- bool bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX manifest_files
-#define SH_ELEMENT_TYPE manifest_file
-#define SH_KEY_TYPE const char *
-#define SH_KEY pathname
-#define SH_HASH_KEY(tb, key) hash_string(key)
-#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
-#define SH_SCOPE static inline
-#define SH_RAW_ALLOCATOR pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
- TimeLineID tli;
- XLogRecPtr start_lsn;
- XLogRecPtr end_lsn;
- struct manifest_wal_range *next;
- struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
- int version;
- uint64 system_identifier;
- manifest_files_hash *files;
- manifest_wal_range *first_wal_range;
- manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
- manifest_data *manifest;
- char *backup_directory;
- SimpleStringList ignore_list;
- bool skip_checksums;
- bool exit_on_error;
- bool saw_any_error;
-} verifier_context;
-
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -151,13 +72,6 @@ static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
-static void report_backup_error(verifier_context *context,
- const char *pg_restrict fmt,...)
- pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
- pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
static void progress_report(bool finished);
static void usage(void);
@@ -980,7 +894,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
* Update the context to indicate that we saw an error, and exit if the
* context says we should.
*/
-static void
+void
report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
{
va_list ap;
@@ -997,7 +911,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
/*
* Report a fatal error and exit
*/
-static void
+void
report_fatal_error(const char *pg_restrict fmt,...)
{
va_list ap;
@@ -1016,7 +930,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
* Note that by "prefix" we mean a parent directory; for this purpose,
* "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
*/
-static bool
+bool
should_ignore_relpath(verifier_context *context, const char *relpath)
{
SimpleStringListCell *cell;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..bd9c95c477a
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,108 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ * Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+ uint32 status; /* hash status */
+ const char *pathname;
+ size_t size;
+ pg_checksum_type checksum_type;
+ int checksum_length;
+ uint8 *checksum_payload;
+ bool matched;
+ bool bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+ (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX manifest_files
+#define SH_ELEMENT_TYPE manifest_file
+#define SH_KEY_TYPE const char *
+#define SH_KEY pathname
+#define SH_HASH_KEY(tb, key) hash_string(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ struct manifest_wal_range *next;
+ struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+ int version;
+ uint64 system_identifier;
+ manifest_files_hash *files;
+ manifest_wal_range *first_wal_range;
+ manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+ manifest_data *manifest;
+ char *backup_directory;
+ SimpleStringList ignore_list;
+ bool skip_checksums;
+ bool exit_on_error;
+ bool saw_any_error;
+} verifier_context;
+
+extern void report_backup_error(verifier_context *context,
+ const char *pg_restrict fmt,...)
+ pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+ pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context,
+ const char *relpath);
+
+#endif /* PG_VERIFYBACKUP_H */
--
2.18.0
v10-0004-Refactor-move-skip_checksums-global-variable-to-.patchapplication/x-patch; name=v10-0004-Refactor-move-skip_checksums-global-variable-to-.patchDownload
From bcd2f84eb984badfdaa28ad84806c8e401259f9c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 15:10:43 +0530
Subject: [PATCH v10 04/12] Refactor: move skip_checksums global variable to
verifier_context struct
To enable access to this flag in another file.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..c6d01d52335 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -113,6 +113,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
} verifier_context;
@@ -164,7 +165,6 @@ static const char *progname;
/* options */
static bool show_progress = false;
-static bool skip_checksums = false;
/* Progress indicators */
static uint64 total_size = 0;
@@ -266,7 +266,7 @@ main(int argc, char **argv)
quiet = true;
break;
case 's':
- skip_checksums = true;
+ context.skip_checksums = true;
break;
case 'w':
wal_directory = pstrdup(optarg);
@@ -363,7 +363,7 @@ main(int argc, char **argv)
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
*/
- if (!skip_checksums)
+ if (!context.skip_checksums)
verify_backup_checksums(&context);
/*
@@ -739,7 +739,8 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
verify_control_file(fullpath, context->manifest->system_identifier);
/* Update statistics for progress report, if necessary */
- if (show_progress && !skip_checksums && should_verify_checksum(m))
+ if (show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
total_size += m->size;
/*
--
2.18.0
On Wed, Aug 14, 2024 at 9:20 AM Amul Sul <sulamul@gmail.com> wrote:
I agree with keeping verify_backup_file() separate, but I'm hesitant
about doing the same for verify_backup_directory().
I don't have time today to go through your whole email or re-review
the code, but I plan to circle back to that at a later time, However,
I want to respond to this point in the meanwhile. There are two big
things that are different for a tar-format backup vs. a
directory-format backup as far as verify_backup_directory() is
concerned. One is that, for a directory format backup, we need to be
able to recurse down through subdirectories; for tar-format backups we
don't. So a version of this function that only handled tar-format
backups would be somewhat shorter and simpler, and would need one
fewer argument. The second difference is that for the tar-format
backup, you need to make a list of the files you see and then go back
and visit each one a second time, and for a directory-format backup
you don't need to do that. It seems to me that those differences are
significant enough to warrant having two separate functions. If you
unify them, I think that less than half of the resulting function is
going to be common to both cases. Yeah, a few bits of logic will be
duplicated, like the error handling for closedir(), the logic to skip
"." and "..", and the psprintf() to construct a full pathname for the
directory entry. But that isn't really very much code, and it's code
that is pretty straightforward and also present in various other
places in the PostgreSQL source tree, perhaps not in precisely the
same form. The fact that two functions both call readdir() and do
something with each file in the directory isn't enough to say that
they should be the same function, IMHO.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 14, 2024 at 12:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 14, 2024 at 9:20 AM Amul Sul <sulamul@gmail.com> wrote:
I agree with keeping verify_backup_file() separate, but I'm hesitant
about doing the same for verify_backup_directory().I don't have time today to go through your whole email or re-review
the code, but I plan to circle back to that at a later time, However,
I want to respond to this point in the meanwhile.
I have committed 0004 (except that I changed a comment) and 0005
(except that I didn't move READ_CHUNK_SIZE).
Looking at the issue mentioned above again, I agree that the changes
in verify_backup_directory() in this version don't look overly
invasive in this version. I'm still not 100% convinced it's the right
call, but it doesn't seem bad.
You have a spurious comment change to the header of verify_plain_backup_file().
I feel like the naming of tarFile and tarFiles is not consistent with
the overall style of this file.
I don't like this:
[robert.haas ~]$ pg_verifybackup btar
pg_verifybackup: error: pg_waldump does not support parsing WAL files
from a tar archive.
pg_verifybackup: hint: Try "pg_verifybackup --help" to skip parse WAL
files option.
The hint seems like it isn't really correct grammar, and I also don't
see why we can't be more specific. How about "You must use -n,
--no-parse-wal when verifying a tar-format backup."?
The primary message seems a bit redundant, because parsing WAL files
is the only thing pg_waldump does. How about "pg_waldump cannot read
from a tar archive"? Note that primary error messages should not end
in a period (while hint and detail messages should).
+ int64 num = strtoi64(relpath, &suffix, 10);
+ if (suffix == NULL || (num <= 0) || (num > OID_MAX))
Seems like too many parentheses.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Aug 16, 2024 at 3:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
+ int64 num = strtoi64(relpath, &suffix, 10);
Hit send too early. Here, seems like this should be strtoul(), not strtoi64().
The documentation of --format seems to be cut-and-pasted from
pg_basebackup and the language isn't really appropriate here. e.g.
"The main data directory's contents will be written to a file
named..." but pg_verifybackup writes nothing.
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
Why not make the same logic that recognizes base or an OID also
recognize pg_wal as a prefix, and identify that as the WAL archive?
For now we'll have to skip it, but if you do it that way then if we
add future support for more suffixes, it'll just work, whereas this
way won't. And you'd need that code anyway if we ever can run
pg_waldump on a tarfile, because you would need to identify the
compression method. Note that the danger of the list of suffixes
getting out of sync here is not hypothetical: you added .tgz elsewhere
but not here.
There's probably more to look at here but I'm running out of energy for today.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Aug 17, 2024 at 1:34 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Aug 16, 2024 at 3:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
+ int64 num = strtoi64(relpath, &suffix, 10);
Hit send too early. Here, seems like this should be strtoul(), not strtoi64().
Fixed in the attached version including others suggestions in that mail.
The documentation of --format seems to be cut-and-pasted from
pg_basebackup and the language isn't really appropriate here. e.g.
"The main data directory's contents will be written to a file
named..." but pg_verifybackup writes nothing.
I wrote that intentionally -- I didn’t mean to imply that
pg_verifybackup handles this; rather, I meant that the backup tool (in
this case, pg_basebackup) produces those files. I can see the
confusion and have rephrased the text accordingly.
+ simple_string_list_append(&context.ignore_list, "pg_wal.tar"); + simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz"); + simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4"); + simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");Why not make the same logic that recognizes base or an OID also
recognize pg_wal as a prefix, and identify that as the WAL archive?
For now we'll have to skip it, but if you do it that way then if we
add future support for more suffixes, it'll just work, whereas this
way won't. And you'd need that code anyway if we ever can run
pg_waldump on a tarfile, because you would need to identify the
compression method. Note that the danger of the list of suffixes
getting out of sync here is not hypothetical: you added .tgz elsewhere
but not here.
Did this way.
There's probably more to look at here but I'm running out of energy for today.
Thank you for the review and committing 0004 and 0006 patches.
Regards,
Amul
Attachments:
v11-0012-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v11-0012-pg_verifybackup-Tests-and-document.patchDownload
From dfaeebdc09fd689b7e45a705e32111cb226a0657 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 20 Aug 2024 13:14:22 +0530
Subject: [PATCH v11 12/12] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 43 ++++++++++-
src/bin/pg_verifybackup/t/001_basic.pl | 6 +-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
6 files changed, 73 insertions(+), 101 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..ea6bc3ccb12 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,43 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. A valid backup includes the main data
+ directory in a file named <filename>base.tar</filename>, the WAL
+ files in <filename>pg_wal.tar</filename>, and separate tar files for
+ each tablespace, named after the tablespace's OID, followed by the
+ compression extension.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..ca5b0402b7d 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,11 +17,11 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
@@ -31,7 +31,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v11-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchapplication/x-patch; name=v11-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchDownload
From afc610cc11c3ec4afdbc354a0ea08340b473b254 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v11 11/12] pg_verifybackup: Read tar files and verify its
contents
This patch implements TAR format backup verification.
For progress reporting support, we perform this verification in two
passes: the first pass calculates total_size, and the second pass
updates done_size as verification progresses.
For the verification, in the first pass, we call verify_tar_backup_file(),
which performs basic verification by expecting only base.tar, pg_wal.tar, or
<tablespaceoid>.tar files and raises an error for any other files. It
also determines the compression type of the archive file. All this
information is stored in a newly added tarFile struct, which is
appended to a list that will be used in the second pass (by
verify_tar_content()) for the final verification. In the second pass,
the tar archives are read, decompressed, and the required verification
is carried out.
For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_content(), depending on its
compression type which is detected in the first pass.
When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.
Unlike in a plain backup, checksum verification here occurs in two
steps. First, as the contents are received, the checksum is computed
incrementally (see member_compute_checksum). Then, at the end of
processing the member file, the final verification is performed (see
member_verify_checksum).
Similarly, during the content receiving stage, if the file is
pg_control, the data will be copied into a local buffer (see
member_copy_control_data). The verification will then be carried out
at the end of the member file processing (see member_verify_control_data)
---
src/bin/pg_verifybackup/Makefile | 4 +-
src/bin/pg_verifybackup/astreamer_verify.c | 363 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 6 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 284 +++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 6 +
src/tools/pgindent/typedefs.list | 2 +
6 files changed, 659 insertions(+), 6 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
- pg_verifybackup.o
+ pg_verifybackup.o \
+ astreamer_verify.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..3d3659195a9
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,363 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+ pg_checksum_context *checksum_ctx;
+
+ /* Hold information for a member file verification */
+ manifest_file *mfile;
+ int64 received_bytes;
+ bool verify_checksum;
+ bool verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void member_verify_header(astreamer *streamer, astreamer_member *member);
+static void member_compute_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_checksum(astreamer *streamer);
+static void member_copy_control_data(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_control_data(astreamer *streamer);
+static void member_reset_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ if (!context->skip_checksums)
+ streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ return &streamer->base;
+}
+
+/*
+ * The main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+
+ /*
+ * Perform the initial check and setup verification steps.
+ */
+ member_verify_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Since we are receiving the file in chunks, we need to process
+ * the member content according to the flags set by the member
+ * header processing routine, which includes checksum computation
+ * and copying control data to the local buffer.
+ */
+ if (mystreamer->verify_checksum)
+ member_compute_checksum(streamer, member, data, len);
+
+ if (mystreamer->verify_control_data)
+ member_copy_control_data(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * We have reached the end of the member file. By this point, we
+ * should have successfully computed the checksum of the received
+ * content and copied the entire pg_control file data into our
+ * local buffer. We can now proceed with the final verification.
+ */
+ if (mystreamer->verify_checksum)
+ member_verify_checksum(streamer);
+
+ if (mystreamer->verify_control_data)
+ member_verify_control_data(streamer);
+
+ /*
+ * Reset the temporary information stored for the verification.
+ */
+ member_reset_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar archive");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verifies the tar member entry if it corresponds to a file in the backup
+ * manifest. If the archive being processed is a tablespace, prepares the
+ * required file path for subsequent operations. Finally, determines if
+ * checksum verification and control data verification need to be performed
+ * during file content processing
+ */
+static void
+member_verify_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+
+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;
+
+ /*
+ * The backup_manifest stores a relative path to the base directory for
+ * files belong tablespace, whereas <tablespaceoid>.tar doesn't. Prepare
+ * the required path, otherwise, the manfiest entry verification will
+ * fail.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and control data verification.
+ *
+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, having a
+ * single flag would be more efficient.
+ */
+ mystreamer->verify_checksum =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_control_data =
+ should_verify_control_data(mystreamer->context->manifest, m);
+
+ /* Initialize the context required for checksum verification. */
+ if (mystreamer->verify_checksum &&
+ pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, m->pathname);
+
+ /*
+ * Checksum verification cannot be performed without proper context
+ * initialization.
+ */
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Computes the checksum incrementally for the received file content.
+ *
+ * The caller should pass a correctly initialized checksum_ctx, which will be
+ * used for incremental checksum computation.
+ */
+static void
+member_compute_checksum(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ manifest_file *m = mystreamer->mfile;
+
+ Assert(mystreamer->verify_checksum);
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, m->pathname) == 0);
+
+ /*
+ * The checksum context should match the type noted in the backup
+ * manifest.
+ */
+ Assert(checksum_ctx->type == m->checksum_type);
+
+ /*
+ * Update the total count of computed checksum bytes for cross-checking
+ * with the file size in the final verification stage.
+ */
+ mystreamer->received_bytes += len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "could not update checksum of file \"%s\"",
+ m->pathname);
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Perform the final computation and checksum verification after the entire
+ * file content has been processed.
+ */
+static void
+member_verify_checksum(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(mystreamer->verify_checksum);
+
+ verify_checksum(mystreamer->context, mystreamer->mfile,
+ mystreamer->checksum_ctx, mystreamer->received_bytes);
+}
+
+/*
+ * Stores the pg_control file contents into a local buffer; we need the entire
+ * control file data for verification.
+ */
+static void
+member_copy_control_data(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(((astreamer_verify *) streamer)->verify_control_data);
+
+ /* Copy enough control file data needed for verification. */
+ astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData));
+}
+
+/*
+ * Performs the CRC calculation of pg_control data and then calls the routines
+ * that execute the final verification of the control file information.
+ */
+static void
+member_verify_control_data(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0);
+ Assert(mystreamer->verify_control_data);
+
+ /* Should have enough control file data needed for verification. */
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: unexpected control file size: %d, should be %zu",
+ mystreamer->archive_name, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data, sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc, (char *) (&control_file), offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(&control_file, mystreamer->mfile->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+member_reset_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->received_bytes = 0;
+ mystreamer->verify_checksum = false;
+ mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
- 'pg_verifybackup.c'
+ 'pg_verifybackup.c',
+ 'astreamer_verify.c'
)
if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
'--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
endif
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
pg_verifybackup = executable('pg_verifybackup',
pg_verifybackup_sources,
- dependencies: [frontend_code, libpq],
+ dependencies: pg_verifybackup_deps,
kwargs: default_bin_args,
)
bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 4c0367038b4..09840a2cef4 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -22,6 +22,7 @@
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "limits.h"
#include "pg_verifybackup.h"
#include "pgtime.h"
@@ -44,6 +45,16 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/*
+ * Tar archive information needed for content verification.
+ */
+typedef struct tar_archive
+{
+ char *relpath;
+ Oid tblspc_oid;
+ pg_compress_algorithm compress_algorithm;
+} tar_archive;
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -65,8 +76,14 @@ static void report_manifest_error(JsonManifestParseContext *context,
static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_plain_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath);
+static void verify_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles);
+static void verify_tar_file_contents(verifier_context *context,
+ SimplePtrList *tarfiles);
+static void verify_tar_contents(verifier_context *context, char *relpath,
+ char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -75,6 +92,10 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid,
+ pg_compress_algorithm compress_algo);
static void progress_report(bool finished);
static void usage(void);
@@ -561,6 +582,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
{
DIR *dir;
struct dirent *dirent;
+ SimplePtrList tarfiles = {NULL, NULL};
dir = opendir(fullpath);
if (dir == NULL)
@@ -600,12 +622,23 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_plain_backup_file(context, newrelpath, newfullpath);
+ {
+ if (context->format == 'p')
+ verify_plain_backup_file(context, newrelpath, newfullpath);
+ else
+ verify_tar_backup_file(context, newrelpath, newfullpath,
+ &tarfiles);
+ }
pfree(newfullpath);
pfree(newrelpath);
}
+ /* Perform the final verification of the tar contents, if any. */
+ Assert(tarfiles.head == NULL || context->format == 't');
+ if (tarfiles.head != NULL)
+ verify_tar_file_contents(context, &tarfiles);
+
if (closedir(dir))
{
report_backup_error(context,
@@ -682,6 +715,215 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
total_size += m->size;
}
+/*
+ * Verify one tar archive file.
+ *
+ * This function does not perform a complete verification; it only carries out
+ * basic validation of the tar format backup file, detects the compression
+ * type, and appends that information to the tarfiles list. An error will be
+ * reported if the tar archive is inaccessible, or if the file type, name, or
+ * compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_plain_backup_file. The additional argument is the file size for
+ * progress report.
+ */
+static void
+verify_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles)
+{
+ struct stat sb;
+ Oid tblspc_oid = InvalidOid;
+ pg_compress_algorithm compress_algorithm;
+ tar_archive *tar_file;
+ char *suffix = NULL;
+
+ /* Should be tar format backup */
+ Assert(context->format == 't');
+
+ /* Get file information */
+ if (stat(fullpath, &sb) != 0)
+ {
+ report_backup_error(context,
+ "could not stat file or directory \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ /* In a tar format backup, we expect only plain files. */
+ if (!S_ISREG(sb.st_mode))
+ {
+ report_backup_error(context,
+ "\"%s\" is not a plain file",
+ relpath);
+ return;
+ }
+
+ /*
+ * We expect tar archive files for backing up the main directory,
+ * tablespace, and pg_wal directory.
+ *
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar, the pg_wal directory to pg_wal.tar, and the tablespace
+ * directory to <tablespaceoid>.tar, each followed by a compression type
+ * extension such as .gz, .lz4, or .zst.
+ */
+ if (strncmp("base", relpath, 4) == 0)
+ suffix = relpath + 4;
+ else if (strncmp("pg_wal", relpath, 6) == 0)
+ suffix = relpath + 6;
+ else
+ {
+ /* Expected a <tablespaceoid>.tar file here. */
+ uint64 num = strtoul(relpath, &suffix, 10);
+
+ /*
+ * Report an error if we didn't consume at least one character, if the
+ * result is 0, or if the value is too large to be a valid OID.
+ */
+ if (suffix == NULL || num <= 0 || num > OID_MAX)
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ tblspc_oid = (Oid) num;
+ }
+
+ /* Now, check the compression type of the tar file */
+ if (strcmp(suffix, ".tar") == 0)
+ compress_algorithm = PG_COMPRESSION_NONE;
+ else if (strcmp(suffix, ".tgz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.gz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.lz4") == 0)
+ compress_algorithm = PG_COMPRESSION_LZ4;
+ else if (strcmp(suffix, ".tar.zst") == 0)
+ compress_algorithm = PG_COMPRESSION_ZSTD;
+ else
+ {
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ return;
+ }
+
+ /*
+ * Ignore WALs, as reading and verification will be handled through
+ * pg_waldump.
+ */
+ if (strncmp("pg_wal", relpath, 6) == 0)
+ return;
+
+ /*
+ * Append the information to the list for complete verification at a later
+ * stage.
+ */
+ tar_file = pg_malloc(sizeof(tar_archive));
+ tar_file->relpath = pstrdup(relpath);
+ tar_file->tblspc_oid = tblspc_oid;
+ tar_file->compress_algorithm = compress_algorithm;
+
+ simple_ptr_list_append(tarfiles, tar_file);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress)
+ total_size += sb.st_size;
+}
+
+/*
+ * This is the final part of tar file verification, which prepares the archive
+ * streamer stack according to the tar file compression format for each tar
+ * archive and invokes them for reading, decompressing, and ultimately
+ * verifying the contents.
+ *
+ * The arguments to this function should be a list of valid tar archives to
+ * verify, and the allocation will be freed once the verification is complete.
+ */
+static void
+verify_tar_file_contents(verifier_context *context, SimplePtrList *tarfiles)
+{
+ SimplePtrListCell *cell;
+
+ progress_report(false);
+
+ for (cell = tarfiles->head; cell != NULL; cell = cell->next)
+ {
+ tar_archive *tar_file = (tar_archive *) cell->ptr;
+ astreamer *streamer;
+ char *fullpath;
+
+ /* Prepare archive streamer stack */
+ streamer = create_archive_verifier(context,
+ tar_file->relpath,
+ tar_file->tblspc_oid,
+ tar_file->compress_algorithm);
+
+ /* Compute the full pathname to the target file. */
+ fullpath = psprintf("%s/%s", context->backup_directory,
+ tar_file->relpath);
+
+ /* Invoke the streamer for reading, decompressing, and verifying. */
+ verify_tar_contents(context, tar_file->relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ pfree(tar_file->relpath);
+ pfree(tar_file);
+ pfree(fullpath);
+
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+ }
+ simple_ptr_list_destroy(tarfiles);
+
+ progress_report(true);
+}
+
+/*
+ * Performs the actual work for tar content verification. It reads a given tar
+ * file in predefined chunks and passes it to the streamer, which initiates
+ * routines for decompression (if necessary) and then verifies each member
+ * within the tar archive.
+ */
+static void
+verify_tar_contents(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ {
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ /* Report progress */
+ done_size += rc;
+ progress_report(false);
+ }
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1050,6 +1292,42 @@ find_backup_format(verifier_context *context)
return result;
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(context->format == 't');
+
+ /*
+ * To verify the contents of the tar file, the initial step is to parse
+ * its content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar file is compressed, we must perform the appropriate
+ * decompression operation before proceeding with the verification of its
+ * contents.
+ */
+ if (compress_algo == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the global variables.
*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index d3b9f733087..006d457511d 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
/*
@@ -123,4 +124,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
extern bool should_ignore_relpath(verifier_context *context,
const char *relpath);
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6d424c89186..143dea2feaf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3330,6 +3330,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
@@ -3951,6 +3952,7 @@ substitute_phv_relids_context
subxids_array_status
symbol
tablespaceinfo
+tar_archive
td_entry
teSection
temp_tablespaces_extra
--
2.18.0
v11-0010-pg_verifybackup-Add-backup-format-and-compressio.patchapplication/x-patch; name=v11-0010-pg_verifybackup-Add-backup-format-and-compressio.patchDownload
From 34d3478011bb94672378b3860724a9a89e2023c0 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v11 10/12] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 73 ++++++++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 1 +
2 files changed, 72 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index a19f344ea67..4c0367038b4 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -62,6 +62,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
@@ -97,6 +98,7 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
@@ -118,6 +120,7 @@ main(int argc, char **argv)
progname = get_progname(argv[0]);
memset(&context, 0, sizeof(context));
+ context.format = '\0';
if (argc > 1)
{
@@ -154,7 +157,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -173,6 +176,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ context.format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ context.format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -220,11 +232,26 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Determine the backup format if it hasn't been specified. */
+ if (context.format == '\0')
+ context.format = find_backup_format(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (context.format != 'p')
+ {
+ pg_log_error("pg_waldump cannot read from a tar archive");
+ pg_log_error_hint("You must use -n or --no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -279,8 +306,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!context.skip_checksums)
+ if (!context.skip_checksums && context.format == 'p')
verify_backup_checksums(&context);
/*
@@ -982,6 +1014,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(context->format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ if (stat(path, &sb) == 0)
+ result = 'p';
+ else
+ {
+ if (errno != ENOENT)
+ {
+ pg_log_error("cannot determine backup format");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Otherwise, it is assumed to be a tar format backup. */
+ result = 't';
+ }
+ pfree(path);
+
+ return result;
+}
+
/*
* Print a progress report based on the global variables.
*
@@ -1039,6 +1107,7 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 2901b00870a..d3b9f733087 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -100,6 +100,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ char format; /* backup format: p(lain)/t(ar) */
bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
--
2.18.0
v11-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchapplication/x-patch; name=v11-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchDownload
From 7322f5ee8d461eb0c3d54086946850adee14a001 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 16:01:33 +0530
Subject: [PATCH v11 09/12] Add simple_ptr_list_destroy() and
simple_ptr_list_destroy_deep() API.
We didn't have any helper function to destroy SimplePtrList, likely
because it wasn't needed before, but it's required in a later patch in
this set. I've added two functions for this purpose, inspired by
list_free() and list_free_deep().
---
src/fe_utils/simple_list.c | 39 ++++++++++++++++++++++++++++++
src/include/fe_utils/simple_list.h | 2 ++
2 files changed, 41 insertions(+)
diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..9d218911c31 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,42 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
list->head = cell;
list->tail = cell;
}
+
+/*
+ * Destroy a pointer list and optionally the pointed-to element
+ */
+static void
+simple_ptr_list_destroy_private(SimplePtrList *list, bool deep)
+{
+ SimplePtrListCell *cell;
+
+ cell = list->head;
+ while (cell != NULL)
+ {
+ SimplePtrListCell *next;
+
+ next = cell->next;
+ if (deep)
+ pg_free(cell->ptr);
+ pg_free(cell);
+ cell = next;
+ }
+}
+
+/*
+ * Destroy a pointer list and the pointed-to element
+ */
+void
+simple_ptr_list_destroy_deep(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, true);
+}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, false);
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..5b7cbec8a62 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,7 @@ extern void simple_string_list_destroy(SimpleStringList *list);
extern const char *simple_string_list_not_touched(SimpleStringList *list);
extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy_deep(SimplePtrList *list);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
#endif /* SIMPLE_LIST_H */
--
2.18.0
v11-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v11-0008-Refactor-split-verify_control_file.patchDownload
From 792b4f4a19e458de284e062adc16caa0528539bd Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 15:47:26 +0530
Subject: [PATCH v11 08/12] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file contents instead, that needs to be verified in
the same way. For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 43 +++++++++++------------
src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++
2 files changed, 34 insertions(+), 23 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e44d0377cd5..a19f344ea67 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -66,8 +66,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -631,14 +629,20 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the pg_control file information */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
/* Update statistics for progress report, if necessary */
if (show_progress && !context->skip_checksums &&
@@ -687,18 +691,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -714,9 +714,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index fe0ce8a89aa..2901b00870a 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -40,6 +40,17 @@ typedef struct manifest_file
(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
(((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the control file and its system identifier against the manifest
+ * system identifier. Note that this feature is not available in manifest
+ * version 1. This validation should only be performed if the manifest entry
+ * validation has been completed without errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -99,6 +110,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
int64 bytes_read);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v11-0007-Refactor-split-verify_file_checksum-function.patchapplication/octet-stream; name=v11-0007-Refactor-split-verify_file_checksum-function.patchDownload
From a297985e7b7815f7bda545a50b05b19877b01cea Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 15:14:15 +0530
Subject: [PATCH v11 07/12] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 20 +++++++++++++++++---
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5bfc98e7874..e44d0377cd5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -792,8 +792,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int fd;
int rc;
size_t bytes_read = 0;
- uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -844,6 +842,22 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
if (rc < 0)
return;
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, bytes_read);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, int64 bytes_read)
+{
+ const char *relpath = m->pathname;
+ int checksumlen;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
/*
* Double-check that we read the expected number of bytes from the file.
* Normally, a file size mismatch would be caught in verify_manifest_entry
@@ -860,7 +874,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
}
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index ff9476e356e..fe0ce8a89aa 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -96,6 +96,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ int64 bytes_read);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v11-0006-Refactor-split-verify_backup_file-function-and-r.patchapplication/octet-stream; name=v11-0006-Refactor-split-verify_backup_file-function-and-r.patchDownload
From 7173d77e6da29ddc35bf8d87e20db9b16f06b26c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 10:42:37 +0530
Subject: [PATCH v11 06/12] Refactor: split verify_backup_file() function and
rename it.
The function verify_backup_file() has now been renamed to
verify_plain_backup_file() to make it clearer that it is specifically
used for verifying files in a plain backup. Similarly, in a future
patch, we would have a verify_tar_backup_file() function for
verifying TAR backup files.
In addition to that, moved the manifest entry verification code into a
new function called verify_manifest_entry() so that it can be reused
for tar backup verification. If verify_manifest_entry() doesn't find
an entry, it reports an error as before and returns NULL to the
caller. This is why a NULL check is added to should_verify_checksum().
---
src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++++++++++++++--------
src/bin/pg_verifybackup/pg_verifybackup.h | 6 ++-
2 files changed, 42 insertions(+), 22 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3fcfb167217..5bfc98e7874 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -64,8 +64,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context,
+ char *relpath, char *fullpath);
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
@@ -570,7 +570,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_backup_file(context, newrelpath, newfullpath);
+ verify_plain_backup_file(context, newrelpath, newfullpath);
pfree(newfullpath);
pfree(newrelpath);
@@ -591,7 +591,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
* verify_backup_directory.
*/
static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath)
{
struct stat sb;
manifest_file *m;
@@ -627,6 +628,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ total_size += m->size;
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -634,40 +661,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
- /* Update statistics for progress report, if necessary */
- if (show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- total_size += m->size;
-
/*
* We don't verify checksums at this stage. We first finish verifying that
* we have the expected set of files with the expected sizes, and only
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -830,7 +846,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index d8c566ed587..ff9476e356e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -37,7 +37,8 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
/*
* Define a hash table which we can use to store information about the files
@@ -93,6 +94,9 @@ typedef struct verifier_context
bool saw_any_error;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
On Tue, Aug 20, 2024 at 3:56 PM Amul Sul <sulamul@gmail.com> wrote:
On Sat, Aug 17, 2024 at 1:34 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Aug 16, 2024 at 3:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
[...]
There's probably more to look at here but I'm running out of energy for today.
Thank you for the review and committing 0004 and 0006 patches.
I have reworked a few comments, revised error messages, and made some
minor tweaks in the attached version.
Additionally, I would like to discuss a couple of concerns regarding
error placement and function names to gather your suggestions.
0007 patch: Regarding error placement:
1. I'm a bit unsure about the (bytes_read != m->size) check that I
placed in verify_checksum() and whether it's in the right place. Per
our previous discussion, this check is applicable to plain backup
files since they can change while being read, but not for files
belonging to tar backups. For consistency, I included the check for
tar backups as well, as it doesn't cause any harm. Is it okay to keep
this check in verify_checksum(), or should I move it back to
verify_file_checksum() and apply it only to the plain backup format?
2. For the verify_checksum() function, I kept the argument name as
bytes_read. Should we rename it to something more meaningful like
computed_bytes, computed_size, or checksum_computed_size?
0011 patch: Regarding function names:
1. named the function verify_tar_backup_file() to align with
verify_plain_backup_file(), but it does not perform the complete
verification as verify_plain_backup_file does. Not sure if it is the
right name.
2. verify_tar_file_contents() is the second and final part of tar
backup verification. Should its name be aligned with
verify_tar_backup_file()? I’m unsure what the best name would be.
Perhaps verify_tar_backup_file_final(), but then
verify_tar_backup_file() would need to be renamed to something like
verify_tar_backup_file_initial(), which might be too lengthy.
3. verify_tar_contents() is the core of verify_tar_file_contents()
that handles the actual verification. I’m unsure about the current
naming. Should we rename it to something like
verify_tar_contents_core()? It wouldn’t be an issue if we renamed
verify_tar_file_contents() as pointed in point #2.
Regards,
Amul
Attachments:
v12-0006-Refactor-split-verify_backup_file-function-and-r.patchapplication/x-patch; name=v12-0006-Refactor-split-verify_backup_file-function-and-r.patchDownload
From 8c5c624fa89eaa42e8812dad840842b30ec35ec7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 10:42:37 +0530
Subject: [PATCH v12 06/12] Refactor: split verify_backup_file() function and
rename it.
The function verify_backup_file() has now been renamed to
verify_plain_backup_file() to make it clearer that it is specifically
used for verifying files in a plain backup. Similarly, in a future
patch, we would have a verify_tar_backup_file() function for
verifying TAR backup files.
In addition to that, moved the manifest entry verification code into a
new function called verify_manifest_entry() so that it can be reused
for tar backup verification. If verify_manifest_entry() doesn't find
an entry, it reports an error as before and returns NULL to the
caller. This is why a NULL check is added to should_verify_checksum().
---
src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++++++++++++++--------
src/bin/pg_verifybackup/pg_verifybackup.h | 6 ++-
2 files changed, 42 insertions(+), 22 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3fcfb167217..5bfc98e7874 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -64,8 +64,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context,
+ char *relpath, char *fullpath);
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
@@ -570,7 +570,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_backup_file(context, newrelpath, newfullpath);
+ verify_plain_backup_file(context, newrelpath, newfullpath);
pfree(newfullpath);
pfree(newrelpath);
@@ -591,7 +591,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
* verify_backup_directory.
*/
static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath)
{
struct stat sb;
manifest_file *m;
@@ -627,6 +628,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ total_size += m->size;
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -634,40 +661,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
- /* Update statistics for progress report, if necessary */
- if (show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- total_size += m->size;
-
/*
* We don't verify checksums at this stage. We first finish verifying that
* we have the expected set of files with the expected sizes, and only
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -830,7 +846,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index d8c566ed587..ff9476e356e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -37,7 +37,8 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
/*
* Define a hash table which we can use to store information about the files
@@ -93,6 +94,9 @@ typedef struct verifier_context
bool saw_any_error;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
v12-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v12-0007-Refactor-split-verify_file_checksum-function.patchDownload
From 0d01f43ee33c20cbd1341a3454e312d85612104e Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 15:14:15 +0530
Subject: [PATCH v12 07/12] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 20 +++++++++++++++++---
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5bfc98e7874..e44d0377cd5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -792,8 +792,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int fd;
int rc;
size_t bytes_read = 0;
- uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -844,6 +842,22 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
if (rc < 0)
return;
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, bytes_read);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, int64 bytes_read)
+{
+ const char *relpath = m->pathname;
+ int checksumlen;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
/*
* Double-check that we read the expected number of bytes from the file.
* Normally, a file size mismatch would be caught in verify_manifest_entry
@@ -860,7 +874,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
}
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index ff9476e356e..fe0ce8a89aa 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -96,6 +96,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ int64 bytes_read);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v12-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v12-0008-Refactor-split-verify_control_file.patchDownload
From b877ed19cdb43bdce5aabb715b4dd6a4a3642d1a Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 10:51:23 +0530
Subject: [PATCH v12 08/12] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file contents instead, that needs to be verified in
the same way. For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 42 ++++++++++-------------
src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++
2 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e44d0377cd5..d04e1d8c8ac 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -66,8 +66,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -631,14 +629,20 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the pg_control information */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
/* Update statistics for progress report, if necessary */
if (show_progress && !context->skip_checksums &&
@@ -687,18 +691,13 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file, const char *controlpath,
+ bool crc_ok, uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -714,9 +713,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index fe0ce8a89aa..818064c6eed 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -40,6 +40,17 @@ typedef struct manifest_file
(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
(((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the control file and its system identifier against the manifest
+ * system identifier. Note that this feature is not available in manifest
+ * version 1. This validation should only be performed after the manifest entry
+ * validation for the pg_control file has been completed without errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -99,6 +110,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
int64 bytes_read);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v12-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchapplication/x-patch; name=v12-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchDownload
From 61edb2a348d87563aa3fde6b7c6b9e3a6dbeda9f Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 16:01:33 +0530
Subject: [PATCH v12 09/12] Add simple_ptr_list_destroy() and
simple_ptr_list_destroy_deep() API.
We didn't have any helper function to destroy SimplePtrList, likely
because it wasn't needed before, but it's required in a later patch in
this set. I've added two functions for this purpose, inspired by
list_free() and list_free_deep().
---
src/fe_utils/simple_list.c | 39 ++++++++++++++++++++++++++++++
src/include/fe_utils/simple_list.h | 2 ++
2 files changed, 41 insertions(+)
diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..9d218911c31 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,42 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
list->head = cell;
list->tail = cell;
}
+
+/*
+ * Destroy a pointer list and optionally the pointed-to element
+ */
+static void
+simple_ptr_list_destroy_private(SimplePtrList *list, bool deep)
+{
+ SimplePtrListCell *cell;
+
+ cell = list->head;
+ while (cell != NULL)
+ {
+ SimplePtrListCell *next;
+
+ next = cell->next;
+ if (deep)
+ pg_free(cell->ptr);
+ pg_free(cell);
+ cell = next;
+ }
+}
+
+/*
+ * Destroy a pointer list and the pointed-to element
+ */
+void
+simple_ptr_list_destroy_deep(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, true);
+}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, false);
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..5b7cbec8a62 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,7 @@ extern void simple_string_list_destroy(SimpleStringList *list);
extern const char *simple_string_list_not_touched(SimpleStringList *list);
extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy_deep(SimplePtrList *list);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
#endif /* SIMPLE_LIST_H */
--
2.18.0
v12-0010-pg_verifybackup-Add-backup-format-and-compressio.patchapplication/x-patch; name=v12-0010-pg_verifybackup-Add-backup-format-and-compressio.patchDownload
From 0ba8580c0ee6489f1195bae4d3d5427543479f76 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 11:03:44 +0530
Subject: [PATCH v12 10/12] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 73 ++++++++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 1 +
2 files changed, 72 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d04e1d8c8ac..c186678a592 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -62,6 +62,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
@@ -97,6 +98,7 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
@@ -118,6 +120,7 @@ main(int argc, char **argv)
progname = get_progname(argv[0]);
memset(&context, 0, sizeof(context));
+ context.format = '\0';
if (argc > 1)
{
@@ -154,7 +157,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -173,6 +176,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ context.format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ context.format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -220,11 +232,26 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Determine the backup format if it hasn't been specified. */
+ if (context.format == '\0')
+ context.format = find_backup_format(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar archive.
+ */
+ if (context.format != 'p')
+ {
+ pg_log_error("pg_waldump cannot read from a tar archive");
+ pg_log_error_hint("You must use -n or --no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -279,8 +306,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!context.skip_checksums)
+ if (!context.skip_checksums && context.format == 'p')
verify_backup_checksums(&context);
/*
@@ -981,6 +1013,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(context->format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ if (stat(path, &sb) == 0)
+ result = 'p';
+ else
+ {
+ if (errno != ENOENT)
+ {
+ pg_log_error("cannot determine backup format : %m");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Otherwise, it is assumed to be a tar format backup. */
+ result = 't';
+ }
+ pfree(path);
+
+ return result;
+}
+
/*
* Print a progress report based on the global variables.
*
@@ -1038,6 +1106,7 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 818064c6eed..80031ad4dbc 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -100,6 +100,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ char format; /* backup format: p(lain)/t(ar) */
bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
--
2.18.0
v12-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchapplication/x-patch; name=v12-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchDownload
From 34ef4b8f5bad4c232aec86e837bad135449a0f02 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 12:49:04 +0530
Subject: [PATCH v12 11/12] pg_verifybackup: Read tar files and verify its
contents
This patch implements TAR format backup verification.
For progress reporting support, we perform this verification in two
passes: the first pass calculates total_size, and the second pass
updates done_size as verification progresses.
For the verification, in the first pass, we call verify_tar_backup_file(),
which performs basic verification by expecting only base.tar, pg_wal.tar, or
<tablespaceoid>.tar files and raises an error for any other files. It
also determines the compression type of the archive file. All this
information is stored in a newly added tarFile struct, which is
appended to a list that will be used in the second pass (by
verify_tar_content()) for the final verification. In the second pass,
the tar archives are read, decompressed, and the required verification
is carried out.
For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_content(), depending on its
compression type which is detected in the first pass.
When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.
Unlike in a plain backup, checksum verification here occurs in two
steps. First, as the contents are received, the checksum is computed
incrementally (see member_compute_checksum). Then, at the end of
processing the member file, the final verification is performed (see
member_verify_checksum).
Similarly, during the content receiving stage, if the file is
pg_control, the data will be copied into a local buffer (see
member_copy_control_data). The verification will then be carried out
at the end of the member file processing (see member_verify_control_data)
---
src/bin/pg_verifybackup/Makefile | 2 +
src/bin/pg_verifybackup/astreamer_verify.c | 365 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 1 +
src/bin/pg_verifybackup/pg_verifybackup.c | 287 +++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 6 +
src/tools/pgindent/typedefs.list | 2 +
6 files changed, 658 insertions(+), 5 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..374d4a8afd1 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,10 +17,12 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
+ astreamer_verify.o \
pg_verifybackup.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..f08868170b4
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,365 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * format backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+ pg_checksum_context *checksum_ctx;
+
+ /* Hold information for a member file verification */
+ manifest_file *mfile;
+ int64 received_bytes;
+ bool verify_checksum;
+ bool verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void member_verify_header(astreamer *streamer, astreamer_member *member);
+static void member_compute_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_checksum(astreamer *streamer);
+static void member_copy_control_data(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_control_data(astreamer *streamer);
+static void member_reset_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ if (!context->skip_checksums)
+ streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ return &streamer->base;
+}
+
+/*
+ * The main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+
+ /*
+ * Perform the initial check and setup verification steps.
+ */
+ member_verify_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Since we are receiving the member content in chunks, it must be
+ * processed according to the flags set by the member header
+ * processing routine. This includes performing incremental
+ * checksum computations and copying control data to the local
+ * buffer.
+ */
+ if (mystreamer->verify_checksum)
+ member_compute_checksum(streamer, member, data, len);
+
+ if (mystreamer->verify_control_data)
+ member_copy_control_data(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * We have reached the end of the member file. By this point, we
+ * should have successfully computed the checksum of the received
+ * content and copied the entire pg_control file data into our
+ * local buffer. We can now proceed with the final verification.
+ */
+ if (mystreamer->verify_checksum)
+ member_verify_checksum(streamer);
+
+ if (mystreamer->verify_control_data)
+ member_verify_control_data(streamer);
+
+ /*
+ * Reset the temporary information stored for the verification.
+ */
+ member_reset_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verifies whether the tar member entry exists in the backup manifest.
+ *
+ * If the archive being processed is a tablespace, it prepares the necessary
+ * file path first. If a valid entry is found in the backup manifest, it then
+ * determines whether checksum and control data verification should be
+ * performed during file content processing.
+ */
+static void
+member_verify_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+
+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;
+
+ /*
+ * The backup manifest stores a relative path to the base directory for
+ * files belonging to a tablespace, while the tablespace backup tar
+ * archive does not include this path. Ensure the required path is
+ * prepared; otherwise, the manifest entry verification will fail.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and control data verification.
+ *
+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, setting
+ * flags here and using them before proceeding with verification will be
+ * more efficient.
+ */
+ mystreamer->verify_checksum =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_control_data =
+ should_verify_control_data(mystreamer->context->manifest, m);
+
+ /* Initialize the context required for checksum verification. */
+ if (mystreamer->verify_checksum &&
+ pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, m->pathname);
+
+ /*
+ * Checksum verification cannot be performed without proper context
+ * initialization.
+ */
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Computes the checksum incrementally for the received file content.
+ *
+ * Should have a correctly initialized checksum_ctx, which will be used for
+ * incremental checksum computation.
+ */
+static void
+member_compute_checksum(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ manifest_file *m = mystreamer->mfile;
+
+ Assert(mystreamer->verify_checksum);
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, m->pathname) == 0);
+
+ /*
+ * The checksum context should match the type noted in the backup
+ * manifest.
+ */
+ Assert(checksum_ctx->type == m->checksum_type);
+
+ /*
+ * Update the total count of computed checksum bytes for cross-checking
+ * with the file size in the final verification stage.
+ */
+ mystreamer->received_bytes += len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "could not update checksum of file \"%s\"",
+ m->pathname);
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Perform the final computation and checksum verification after the entire
+ * file content has been processed.
+ */
+static void
+member_verify_checksum(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(mystreamer->verify_checksum);
+
+ verify_checksum(mystreamer->context, mystreamer->mfile,
+ mystreamer->checksum_ctx, mystreamer->received_bytes);
+}
+
+/*
+ * Stores the pg_control file contents into a local buffer; we need the entire
+ * control file data for verification.
+ */
+static void
+member_copy_control_data(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(((astreamer_verify *) streamer)->verify_control_data);
+
+ /* Copy enough control file data needed for verification. */
+ astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData));
+}
+
+/*
+ * Performs the CRC calculation of pg_control data and then calls the routines
+ * that execute the final verification of the control file information.
+ */
+static void
+member_verify_control_data(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0);
+ Assert(mystreamer->verify_control_data);
+
+ /* Should have enough control file data needed for verification. */
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: unexpected control file size: %d, should be %zu",
+ mystreamer->archive_name, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data, sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc, (char *) (&control_file), offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(&control_file, mystreamer->mfile->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+member_reset_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->received_bytes = 0;
+ mystreamer->verify_checksum = false;
+ mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..0e09d1379d1 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
+ 'astreamer_verify.c',
'pg_verifybackup.c'
)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c186678a592..80b1a639803 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -22,6 +22,7 @@
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "limits.h"
#include "pg_verifybackup.h"
#include "pgtime.h"
@@ -44,6 +45,16 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/*
+ * Tar file information needed for content verification.
+ */
+typedef struct tar_file
+{
+ char *relpath;
+ Oid tblspc_oid;
+ pg_compress_algorithm compress_algorithm;
+} tar_file;
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -65,8 +76,14 @@ static void report_manifest_error(JsonManifestParseContext *context,
static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_plain_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath);
+static void verify_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles);
+static void verify_tar_file_contents(verifier_context *context,
+ SimplePtrList *tarfiles);
+static void verify_tar_contents(verifier_context *context, char *relpath,
+ char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -75,6 +92,10 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid,
+ pg_compress_algorithm compress_algo);
static void progress_report(bool finished);
static void usage(void);
@@ -243,11 +264,11 @@ main(int argc, char **argv)
/*
* XXX: In the future, we should consider enhancing pg_waldump to read
- * WAL files from the tar archive.
+ * WAL files from the tar file.
*/
if (context.format != 'p')
{
- pg_log_error("pg_waldump cannot read from a tar archive");
+ pg_log_error("pg_waldump cannot read from a tar");
pg_log_error_hint("You must use -n or --no-parse-wal when verifying a tar-format backup.");
exit(1);
}
@@ -561,6 +582,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
{
DIR *dir;
struct dirent *dirent;
+ SimplePtrList tarfiles = {NULL, NULL};
dir = opendir(fullpath);
if (dir == NULL)
@@ -600,12 +622,23 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_plain_backup_file(context, newrelpath, newfullpath);
+ {
+ if (context->format == 'p')
+ verify_plain_backup_file(context, newrelpath, newfullpath);
+ else
+ verify_tar_backup_file(context, newrelpath, newfullpath,
+ &tarfiles);
+ }
pfree(newfullpath);
pfree(newrelpath);
}
+ /* Perform the final verification of the tar contents, if any. */
+ Assert(tarfiles.head == NULL || context->format == 't');
+ if (tarfiles.head != NULL)
+ verify_tar_file_contents(context, &tarfiles);
+
if (closedir(dir))
{
report_backup_error(context,
@@ -682,6 +715,215 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
total_size += m->size;
}
+/*
+ * Verify one tar file.
+ *
+ * This function does not perform a complete verification; it only carries out
+ * basic validation of the tar format backup file, detects the compression
+ * type, and appends that information to the tarfiles list. An error will be
+ * reported if the tar file is inaccessible, or if the file type, name, or
+ * compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_plain_backup_file. The additional argument outputs a list of valid
+ * tar files.
+ */
+static void
+verify_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles)
+{
+ struct stat sb;
+ Oid tblspc_oid = InvalidOid;
+ pg_compress_algorithm compress_algorithm;
+ tar_file *tar;
+ char *suffix = NULL;
+
+ /* Should be tar format backup */
+ Assert(context->format == 't');
+
+ /* Get file information */
+ if (stat(fullpath, &sb) != 0)
+ {
+ report_backup_error(context,
+ "could not stat file or directory \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ /* In a tar format backup, we expect only plain files. */
+ if (!S_ISREG(sb.st_mode))
+ {
+ report_backup_error(context,
+ "\"%s\" is not a plain file",
+ relpath);
+ return;
+ }
+
+ /*
+ * We expect tar files for backing up the main directory, tablespace, and
+ * pg_wal directory.
+ *
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar, the pg_wal directory to pg_wal.tar, and the tablespace
+ * directory to <tablespaceoid>.tar, each followed by a compression type
+ * extension such as .gz, .lz4, or .zst.
+ */
+ if (strncmp("base", relpath, 4) == 0)
+ suffix = relpath + 4;
+ else if (strncmp("pg_wal", relpath, 6) == 0)
+ suffix = relpath + 6;
+ else
+ {
+ /* Expected a <tablespaceoid>.tar file here. */
+ uint64 num = strtoul(relpath, &suffix, 10);
+
+ /*
+ * Report an error if we didn't consume at least one character, if the
+ * result is 0, or if the value is too large to be a valid OID.
+ */
+ if (suffix == NULL || num <= 0 || num > OID_MAX)
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ tblspc_oid = (Oid) num;
+ }
+
+ /* Now, check the compression type of the tar */
+ if (strcmp(suffix, ".tar") == 0)
+ compress_algorithm = PG_COMPRESSION_NONE;
+ else if (strcmp(suffix, ".tgz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.gz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.lz4") == 0)
+ compress_algorithm = PG_COMPRESSION_LZ4;
+ else if (strcmp(suffix, ".tar.zst") == 0)
+ compress_algorithm = PG_COMPRESSION_ZSTD;
+ else
+ {
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ return;
+ }
+
+ /*
+ * Ignore WALs, as reading and verification will be handled through
+ * pg_waldump.
+ */
+ if (strncmp("pg_wal", relpath, 6) == 0)
+ return;
+
+ /*
+ * Append the information to the list for complete verification at a later
+ * stage.
+ */
+ tar = pg_malloc(sizeof(tar_file));
+ tar->relpath = pstrdup(relpath);
+ tar->tblspc_oid = tblspc_oid;
+ tar->compress_algorithm = compress_algorithm;
+
+ simple_ptr_list_append(tarfiles, tar);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress)
+ total_size += sb.st_size;
+}
+
+/*
+ * This is the final part of tar file verification, which prepares the
+ * archive streamer stack according to the tar compression format for
+ * each tar file and invokes them for reading, decompressing, and ultimately
+ * verifying the contents.
+ *
+ * The arguments to this function should be a list of valid tar files to
+ * verify, and the allocation will be freed once the verification is complete.
+ */
+static void
+verify_tar_file_contents(verifier_context *context, SimplePtrList *tarfiles)
+{
+ SimplePtrListCell *cell;
+
+ progress_report(false);
+
+ for (cell = tarfiles->head; cell != NULL; cell = cell->next)
+ {
+ tar_file *tar = (tar_file *) cell->ptr;
+ astreamer *streamer;
+ char *fullpath;
+
+ /* Prepare archive streamer stack */
+ streamer = create_archive_verifier(context,
+ tar->relpath,
+ tar->tblspc_oid,
+ tar->compress_algorithm);
+
+ /* Compute the full pathname to the target file. */
+ fullpath = psprintf("%s/%s", context->backup_directory,
+ tar->relpath);
+
+ /* Invoke the streamer for reading, decompressing, and verifying. */
+ verify_tar_contents(context, tar->relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ pfree(tar->relpath);
+ pfree(tar);
+ pfree(fullpath);
+
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+ }
+ simple_ptr_list_destroy(tarfiles);
+
+ progress_report(true);
+}
+
+/*
+ * Performs the actual work for tar content verification. It reads a given tar
+ * archive in predefined chunks and passes it to the streamer, which initiates
+ * routines for decompression (if necessary) and then verifies each member
+ * within the tar file.
+ */
+static void
+verify_tar_contents(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ {
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ /* Report progress */
+ done_size += rc;
+ progress_report(false);
+ }
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1049,6 +1291,41 @@ find_backup_format(verifier_context *context)
return result;
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(context->format == 't');
+
+ /*
+ * To verify the contents of the tar, the initial step is to parse its
+ * content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar is compressed, we must perform the appropriate decompression
+ * operation before proceeding with the verification of its contents.
+ */
+ if (compress_algo == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the global variables.
*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 80031ad4dbc..be7438af346 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
/*
@@ -123,4 +124,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
extern bool should_ignore_relpath(verifier_context *context,
const char *relpath);
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6d424c89186..48f37131e6a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3330,6 +3330,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
@@ -3951,6 +3952,7 @@ substitute_phv_relids_context
subxids_array_status
symbol
tablespaceinfo
+tar_file
td_entry
teSection
temp_tablespaces_extra
--
2.18.0
v12-0012-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v12-0012-pg_verifybackup-Tests-and-document.patchDownload
From 00fc4d6018fb6f8c3826bf18144d578fd51a66e8 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 16:04:37 +0530
Subject: [PATCH v12 12/12] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 43 ++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.c | 6 +-
src/bin/pg_verifybackup/t/001_basic.pl | 6 +-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++-------------
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
7 files changed, 76 insertions(+), 104 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..ea6bc3ccb12 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,43 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. A valid backup includes the main data
+ directory in a file named <filename>base.tar</filename>, the WAL
+ files in <filename>pg_wal.tar</filename>, and separate tar files for
+ each tablespace, named after the tablespace's OID, followed by the
+ compression extension.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 80b1a639803..8302f022b22 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -880,9 +880,9 @@ verify_tar_file_contents(verifier_context *context, SimplePtrList *tarfiles)
/*
* Performs the actual work for tar content verification. It reads a given tar
- * archive in predefined chunks and passes it to the streamer, which initiates
- * routines for decompression (if necessary) and then verifies each member
- * within the tar file.
+ * file in predefined chunks and passes it to the archive streamer, which
+ * initiates routines for decompression (if necessary) and then verifies each
+ * member within the tar file.
*/
static void
verify_tar_contents(verifier_context *context, char *relpath, char *fullpath,
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..ca5b0402b7d 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,11 +17,11 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
@@ -31,7 +31,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..1c83f38d5b5 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a tablespace directory.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
On Wed, Aug 21, 2024 at 7:08 AM Amul Sul <sulamul@gmail.com> wrote:
I have reworked a few comments, revised error messages, and made some
minor tweaks in the attached version.
Thanks.
Additionally, I would like to discuss a couple of concerns regarding
error placement and function names to gather your suggestions.0007 patch: Regarding error placement:
1. I'm a bit unsure about the (bytes_read != m->size) check that I
placed in verify_checksum() and whether it's in the right place. Per
our previous discussion, this check is applicable to plain backup
files since they can change while being read, but not for files
belonging to tar backups. For consistency, I included the check for
tar backups as well, as it doesn't cause any harm. Is it okay to keep
this check in verify_checksum(), or should I move it back to
verify_file_checksum() and apply it only to the plain backup format?
I think it's a good sanity check. For a long time I thought it was
triggerable until I eventually realized that you just get this message
if the file size is wrong:
pg_verifybackup: error: "pg_xact/0000" has size 8203 on disk but size
8192 in the manifest
After realizing that, I agree with you that this doesn't really seem
reachable for tar backups, but I don't think it hurts anything either.
While I was investigating this question, I discovered this problem:
$ pg_basebackup -cfast -Ft -Dx
$ pg_verifybackup -n x
backup successfully verified
$ mkdir x/tmpdir
$ tar -C x/tmpdir -xf x/base.tar
$ rm x/base.tar
$ tar -C x/tmpdir -cf x/base.tar .
$ pg_verifybackup -n x
<lots of errors>
It appears that the reason why this fails is that the paths in the
original base.tar from the server do not include "./" at the
beginning, and the ones that I get when I create my own tarfile have
that. But that shouldn't matter. Anyway, I was able to work around it
like this:
$ tar -C x/tmpdir -cf x/base.tar `(cd x/tmpdir; echo *)`
Then the result verifies. But I feel like we should have some test
cases that do this kind of stuff so that there is automated
verification. In fact, the current patch seems to have no negative
test cases at all. I think we should test all the cases in
003_corruption.pl with tar format backups as well as with plain format
backups, which we could do by untarring one of the archives, messing
something up, and then retarring it. I also think we should have some
negative test case specific to tar-format backup. I suggest running a
coverage analysis and trying to craft test cases that hit as much of
the code as possible. There will probably be some errors you can't
hit, but you should try to hit the ones you can.
2. For the verify_checksum() function, I kept the argument name as
bytes_read. Should we rename it to something more meaningful like
computed_bytes, computed_size, or checksum_computed_size?
I think it's fine the way you have it.
0011 patch: Regarding function names:
1. named the function verify_tar_backup_file() to align with
verify_plain_backup_file(), but it does not perform the complete
verification as verify_plain_backup_file does. Not sure if it is the
right name.
I was thinking of something like precheck_tar_backup_file().
2. verify_tar_file_contents() is the second and final part of tar
backup verification. Should its name be aligned with
verify_tar_backup_file()? I’m unsure what the best name would be.
Perhaps verify_tar_backup_file_final(), but then
verify_tar_backup_file() would need to be renamed to something like
verify_tar_backup_file_initial(), which might be too lengthy.
verify_tar_file_contents() actually verifies the contents of all the
tar files, not just one, so the name is a bit confusing. Maybe
verify_all_tar_files().
3. verify_tar_contents() is the core of verify_tar_file_contents()
that handles the actual verification. I’m unsure about the current
naming. Should we rename it to something like
verify_tar_contents_core()? It wouldn’t be an issue if we renamed
verify_tar_file_contents() as pointed in point #2.
verify_one_tar_file()?
But with those renames, I think you really start to see why I'm not
very comfortable with verify_backup_directory(). The tar and plain
format cases aren't really doing the same thing. We're just gluing
them into a single function anyway.
I am also still uncomfortable with the way you've refactored some of
this so that we end up with very small amounts of code far away from
other code that they influence. Like you end up with this:
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
/* Validate the pg_control information */
if (should_verify_control_data(context->manifest, m))
...
if (show_progress && !context->skip_checksums &&
should_verify_checksum(m))
But verify_manifest_entry can return NULL or it can set m->bad and
either of those change the result of should_verify_control_data() and
should_verify_checksum(), but none of that is obvious when you just
look at this. Admittedly, the code in master isn't brilliant in terms
of making it obvious what's happening either, but I think this is
worse. I'm not really sure what I think we should do about that yet,
but I'm uncomfortable with it.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Aug 24, 2024 at 2:02 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 21, 2024 at 7:08 AM Amul Sul <sulamul@gmail.com> wrote:
[....]
Then the result verifies. But I feel like we should have some test
cases that do this kind of stuff so that there is automated
verification. In fact, the current patch seems to have no negative
test cases at all. I think we should test all the cases in
003_corruption.pl with tar format backups as well as with plain format
backups, which we could do by untarring one of the archives, messing
something up, and then retarring it. I also think we should have some
negative test case specific to tar-format backup. I suggest running a
coverage analysis and trying to craft test cases that hit as much of
the code as possible. There will probably be some errors you can't
hit, but you should try to hit the ones you can.
Done. I’ve added a few tests that extract, modify, and repack the tar
files, mainly base.tar and skipping tablespace.tar since it mostly
duplicate tests. I’ve also updated 002_algorithm.pl to cover
tests for tar backups.
0011 patch: Regarding function names:
1. named the function verify_tar_backup_file() to align with
verify_plain_backup_file(), but it does not perform the complete
verification as verify_plain_backup_file does. Not sure if it is the
right name.I was thinking of something like precheck_tar_backup_file().
Done.
2. verify_tar_file_contents() is the second and final part of tar
backup verification. Should its name be aligned with
verify_tar_backup_file()? I’m unsure what the best name would be.
Perhaps verify_tar_backup_file_final(), but then
verify_tar_backup_file() would need to be renamed to something like
verify_tar_backup_file_initial(), which might be too lengthy.verify_tar_file_contents() actually verifies the contents of all the
tar files, not just one, so the name is a bit confusing. Maybe
verify_all_tar_files().
Done.
3. verify_tar_contents() is the core of verify_tar_file_contents()
that handles the actual verification. I’m unsure about the current
naming. Should we rename it to something like
verify_tar_contents_core()? It wouldn’t be an issue if we renamed
verify_tar_file_contents() as pointed in point #2.verify_one_tar_file()?
Done.
But with those renames, I think you really start to see why I'm not
very comfortable with verify_backup_directory(). The tar and plain
format cases aren't really doing the same thing. We're just gluing
them into a single function anyway.
Agreed. I can see the uncomfortness -- added a new function.
I am also still uncomfortable with the way you've refactored some of
this so that we end up with very small amounts of code far away from
other code that they influence. Like you end up with this:/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);/* Validate the pg_control information */
if (should_verify_control_data(context->manifest, m))
...
if (show_progress && !context->skip_checksums &&
should_verify_checksum(m))But verify_manifest_entry can return NULL or it can set m->bad and
either of those change the result of should_verify_control_data() and
should_verify_checksum(), but none of that is obvious when you just
look at this. Admittedly, the code in master isn't brilliant in terms
of making it obvious what's happening either, but I think this is
worse. I'm not really sure what I think we should do about that yet,
but I'm uncomfortable with it.
I am not sure if I fully understand the concern, but I see it
differently. The verify_manifest_entry function returns an entry, m,
that the caller doesn't need to worry about, as it simply passes it to
subsequent routines or macros that are aware of the possible inputs --
whether it's NULL, m->bad, or something else.
Regards,
Amul
Attachments:
v13-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchapplication/x-patch; name=v13-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchDownload
From b74a6bed49474c4cacdb1a7a3626004ae9ad7f13 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 16:01:33 +0530
Subject: [PATCH v13 09/12] Add simple_ptr_list_destroy() and
simple_ptr_list_destroy_deep() API.
We didn't have any helper function to destroy SimplePtrList, likely
because it wasn't needed before, but it's required in a later patch in
this set. I've added two functions for this purpose, inspired by
list_free() and list_free_deep().
---
src/fe_utils/simple_list.c | 39 ++++++++++++++++++++++++++++++
src/include/fe_utils/simple_list.h | 2 ++
2 files changed, 41 insertions(+)
diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..9d218911c31 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,42 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
list->head = cell;
list->tail = cell;
}
+
+/*
+ * Destroy a pointer list and optionally the pointed-to element
+ */
+static void
+simple_ptr_list_destroy_private(SimplePtrList *list, bool deep)
+{
+ SimplePtrListCell *cell;
+
+ cell = list->head;
+ while (cell != NULL)
+ {
+ SimplePtrListCell *next;
+
+ next = cell->next;
+ if (deep)
+ pg_free(cell->ptr);
+ pg_free(cell);
+ cell = next;
+ }
+}
+
+/*
+ * Destroy a pointer list and the pointed-to element
+ */
+void
+simple_ptr_list_destroy_deep(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, true);
+}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+ simple_ptr_list_destroy_private(list, false);
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..5b7cbec8a62 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,7 @@ extern void simple_string_list_destroy(SimpleStringList *list);
extern const char *simple_string_list_not_touched(SimpleStringList *list);
extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy_deep(SimplePtrList *list);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
#endif /* SIMPLE_LIST_H */
--
2.18.0
v13-0010-pg_verifybackup-Add-backup-format-and-compressio.patchapplication/x-patch; name=v13-0010-pg_verifybackup-Add-backup-format-and-compressio.patchDownload
From b4fd85d44473e4fb21eee92f06a547f8518ca203 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 11:03:44 +0530
Subject: [PATCH v13 10/12] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 73 ++++++++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 1 +
2 files changed, 72 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d04e1d8c8ac..bcae2d2990f 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -62,6 +62,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
@@ -97,6 +98,7 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
@@ -118,6 +120,7 @@ main(int argc, char **argv)
progname = get_progname(argv[0]);
memset(&context, 0, sizeof(context));
+ context.format = '\0';
if (argc > 1)
{
@@ -154,7 +157,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -173,6 +176,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ context.format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ context.format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -220,11 +232,26 @@ main(int argc, char **argv)
pg_fatal("cannot specify both %s and %s",
"-P/--progress", "-q/--quiet");
+ /* Determine the backup format if it hasn't been specified. */
+ if (context.format == '\0')
+ context.format = find_backup_format(&context);
+
/* Unless --no-parse-wal was specified, we will need pg_waldump. */
if (!no_parse_wal)
{
int ret;
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar file.
+ */
+ if (context.format != 'p')
+ {
+ pg_log_error("pg_waldump cannot read from a tar");
+ pg_log_error_hint("You must use -n or --no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+
pg_waldump_path = pg_malloc(MAXPGPATH);
ret = find_other_exec(argv[0], "pg_waldump",
"pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -279,8 +306,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!context.skip_checksums)
+ if (!context.skip_checksums && context.format == 'p')
verify_backup_checksums(&context);
/*
@@ -981,6 +1013,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(context->format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ if (stat(path, &sb) == 0)
+ result = 'p';
+ else
+ {
+ if (errno != ENOENT)
+ {
+ pg_log_error("cannot determine backup format : %m");
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /* Otherwise, it is assumed to be a tar format backup. */
+ result = 't';
+ }
+ pfree(path);
+
+ return result;
+}
+
/*
* Print a progress report based on the global variables.
*
@@ -1038,6 +1106,7 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 818064c6eed..80031ad4dbc 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -100,6 +100,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ char format; /* backup format: p(lain)/t(ar) */
bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
--
2.18.0
v13-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchapplication/x-patch; name=v13-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchDownload
From 402a301f203cebac028d1aef5ba8db2cb87890b7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 12:49:04 +0530
Subject: [PATCH v13 11/12] pg_verifybackup: Read tar files and verify its
contents
This patch implements TAR format backup verification.
For progress reporting support, we perform this verification in two
passes: the first pass calculates total_size, and the second pass
updates done_size as verification progresses.
For the verification, in the first pass, we call verify_tar_backup_file(),
which performs basic verification by expecting only base.tar, pg_wal.tar, or
<tablespaceoid>.tar files and raises an error for any other files. It
also determines the compression type of the archive file. All this
information is stored in a newly added tarFile struct, which is
appended to a list that will be used in the second pass (by
verify_tar_content()) for the final verification. In the second pass,
the tar archives are read, decompressed, and the required verification
is carried out.
For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_content(), depending on its
compression type which is detected in the first pass.
When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.
Unlike in a plain backup, checksum verification here occurs in two
steps. First, as the contents are received, the checksum is computed
incrementally (see member_compute_checksum). Then, at the end of
processing the member file, the final verification is performed (see
member_verify_checksum).
Similarly, during the content receiving stage, if the file is
pg_control, the data will be copied into a local buffer (see
member_copy_control_data). The verification will then be carried out
at the end of the member file processing (see member_verify_control_data)
---
src/bin/pg_verifybackup/Makefile | 2 +
src/bin/pg_verifybackup/astreamer_verify.c | 365 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 1 +
src/bin/pg_verifybackup/pg_verifybackup.c | 341 ++++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 6 +
src/tools/pgindent/typedefs.list | 2 +
6 files changed, 714 insertions(+), 3 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..374d4a8afd1 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,10 +17,12 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
+ astreamer_verify.o \
pg_verifybackup.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..f08868170b4
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,365 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * format backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+ pg_checksum_context *checksum_ctx;
+
+ /* Hold information for a member file verification */
+ manifest_file *mfile;
+ int64 received_bytes;
+ bool verify_checksum;
+ bool verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void member_verify_header(astreamer *streamer, astreamer_member *member);
+static void member_compute_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_checksum(astreamer *streamer);
+static void member_copy_control_data(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_control_data(astreamer *streamer);
+static void member_reset_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ if (!context->skip_checksums)
+ streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ return &streamer->base;
+}
+
+/*
+ * The main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+
+ /*
+ * Perform the initial check and setup verification steps.
+ */
+ member_verify_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Since we are receiving the member content in chunks, it must be
+ * processed according to the flags set by the member header
+ * processing routine. This includes performing incremental
+ * checksum computations and copying control data to the local
+ * buffer.
+ */
+ if (mystreamer->verify_checksum)
+ member_compute_checksum(streamer, member, data, len);
+
+ if (mystreamer->verify_control_data)
+ member_copy_control_data(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * We have reached the end of the member file. By this point, we
+ * should have successfully computed the checksum of the received
+ * content and copied the entire pg_control file data into our
+ * local buffer. We can now proceed with the final verification.
+ */
+ if (mystreamer->verify_checksum)
+ member_verify_checksum(streamer);
+
+ if (mystreamer->verify_control_data)
+ member_verify_control_data(streamer);
+
+ /*
+ * Reset the temporary information stored for the verification.
+ */
+ member_reset_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verifies whether the tar member entry exists in the backup manifest.
+ *
+ * If the archive being processed is a tablespace, it prepares the necessary
+ * file path first. If a valid entry is found in the backup manifest, it then
+ * determines whether checksum and control data verification should be
+ * performed during file content processing.
+ */
+static void
+member_verify_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+
+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;
+
+ /*
+ * The backup manifest stores a relative path to the base directory for
+ * files belonging to a tablespace, while the tablespace backup tar
+ * archive does not include this path. Ensure the required path is
+ * prepared; otherwise, the manifest entry verification will fail.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ {
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);
+ }
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, member->pathname,
+ member->size);
+ mystreamer->mfile = (void *) m;
+
+ /*
+ * Prepare for checksum and control data verification.
+ *
+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, setting
+ * flags here and using them before proceeding with verification will be
+ * more efficient.
+ */
+ mystreamer->verify_checksum =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_control_data =
+ should_verify_control_data(mystreamer->context->manifest, m);
+
+ /* Initialize the context required for checksum verification. */
+ if (mystreamer->verify_checksum &&
+ pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, m->pathname);
+
+ /*
+ * Checksum verification cannot be performed without proper context
+ * initialization.
+ */
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Computes the checksum incrementally for the received file content.
+ *
+ * Should have a correctly initialized checksum_ctx, which will be used for
+ * incremental checksum computation.
+ */
+static void
+member_compute_checksum(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ manifest_file *m = mystreamer->mfile;
+
+ Assert(mystreamer->verify_checksum);
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, m->pathname) == 0);
+
+ /*
+ * The checksum context should match the type noted in the backup
+ * manifest.
+ */
+ Assert(checksum_ctx->type == m->checksum_type);
+
+ /*
+ * Update the total count of computed checksum bytes for cross-checking
+ * with the file size in the final verification stage.
+ */
+ mystreamer->received_bytes += len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "could not update checksum of file \"%s\"",
+ m->pathname);
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Perform the final computation and checksum verification after the entire
+ * file content has been processed.
+ */
+static void
+member_verify_checksum(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(mystreamer->verify_checksum);
+
+ verify_checksum(mystreamer->context, mystreamer->mfile,
+ mystreamer->checksum_ctx, mystreamer->received_bytes);
+}
+
+/*
+ * Stores the pg_control file contents into a local buffer; we need the entire
+ * control file data for verification.
+ */
+static void
+member_copy_control_data(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(((astreamer_verify *) streamer)->verify_control_data);
+
+ /* Copy enough control file data needed for verification. */
+ astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData));
+}
+
+/*
+ * Performs the CRC calculation of pg_control data and then calls the routines
+ * that execute the final verification of the control file information.
+ */
+static void
+member_verify_control_data(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0);
+ Assert(mystreamer->verify_control_data);
+
+ /* Should have enough control file data needed for verification. */
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: unexpected control file size: %d, should be %zu",
+ mystreamer->archive_name, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ memcpy(&control_file, streamer->bbs_buffer.data, sizeof(ControlFileData));
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc, (char *) (&control_file), offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(&control_file, mystreamer->mfile->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+member_reset_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->received_bytes = 0;
+ mystreamer->verify_checksum = false;
+ mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..0e09d1379d1 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
+ 'astreamer_verify.c',
'pg_verifybackup.c'
)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index bcae2d2990f..9cbfc07a7ba 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -22,6 +22,7 @@
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "limits.h"
#include "pg_verifybackup.h"
#include "pgtime.h"
@@ -44,6 +45,16 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/*
+ * Tar file information needed for content verification.
+ */
+typedef struct tar_file
+{
+ char *relpath;
+ Oid tblspc_oid;
+ pg_compress_algorithm compress_algorithm;
+} tar_file;
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -63,10 +74,18 @@ static void report_manifest_error(JsonManifestParseContext *context,
pg_attribute_printf(2, 3) pg_attribute_noreturn();
static char find_backup_format(verifier_context *context);
+static void verify_plain_backup(verifier_context *context);
+static void verify_tar_backup(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_plain_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath);
+static void precheck_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles);
+static void verify_all_tar_files(verifier_context *context,
+ SimplePtrList *tarfiles);
+static void verify_one_tar_file(verifier_context *context, char *relpath,
+ char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -75,6 +94,10 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid,
+ pg_compress_algorithm compress_algo);
static void progress_report(bool finished);
static void usage(void);
@@ -294,7 +317,10 @@ main(int argc, char **argv)
* match. We also set the "matched" flag on every manifest entry that
* corresponds to a file on disk.
*/
- verify_backup_directory(&context, NULL, context.backup_directory);
+ if (context.format == 'p')
+ verify_plain_backup(&context);
+ else
+ verify_tar_backup(&context);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -546,6 +572,16 @@ verifybackup_per_wal_range_cb(JsonManifestParseContext *context,
manifest->last_wal_range = range;
}
+/*
+ * Verify plain backup.
+ */
+static void
+verify_plain_backup(verifier_context *context)
+{
+ Assert(context->format == 'p');
+ verify_backup_directory(context, NULL, context->backup_directory);
+}
+
/*
* Verify one directory.
*
@@ -682,6 +718,270 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
total_size += m->size;
}
+/*
+ * Verify tar backup.
+ *
+ * Unlike plan backup verification, tar backup verification carried out in two
+ * steps; in the first step this would simply sanity check on expected tar file
+ * to be present in the backup directory and it's compression type and collect
+ * these information is list. In the second pass, the tar archives are read,
+ * decompressed, and the required verification is carried out.
+ */
+static void
+verify_tar_backup(verifier_context *context)
+{
+ DIR *dir;
+ struct dirent *dirent;
+ SimplePtrList tarfiles = {NULL, NULL};
+ char *fullpath = context->backup_directory;
+
+ Assert(context->format == 't');
+
+ /*
+ * If the backup directory cannot be found, treat this as a fatal error.
+ */
+ dir = opendir(fullpath);
+ if (dir == NULL)
+ report_fatal_error("could not open directory \"%s\": %m", fullpath);
+
+ while (errno = 0, (dirent = readdir(dir)) != NULL)
+ {
+ char *filename = dirent->d_name;
+ char *newfullpath = psprintf("%s/%s", fullpath, filename);
+
+ /* Skip "." and ".." */
+ if (filename[0] == '.' && (filename[1] == '\0'
+ || strcmp(filename, "..") == 0))
+ continue;
+
+ if (!should_ignore_relpath(context, filename))
+ precheck_tar_backup_file(context, filename, newfullpath,
+ &tarfiles);
+
+ pfree(newfullpath);
+ }
+
+ /* Perform the final verification of the tar contents. */
+ verify_all_tar_files(context, &tarfiles);
+
+ if (closedir(dir))
+ {
+ report_backup_error(context,
+ "could not close directory \"%s\": %m", fullpath);
+ return;
+ }
+}
+
+/*
+ * Preparatory steps for verifying files in tar format backups.
+ *
+ * Carries out basic validation of the tar format backup file, detects the
+ * compression type, and appends that information to the tarfiles list. An
+ * error will be reported if the tar file is inaccessible, or if the file type,
+ * name, or compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_plain_backup_file. The additional argument outputs a list of valid
+ * tar files.
+ */
+static void
+precheck_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles)
+{
+ struct stat sb;
+ Oid tblspc_oid = InvalidOid;
+ pg_compress_algorithm compress_algorithm;
+ tar_file *tar;
+ char *suffix = NULL;
+
+ /* Should be tar format backup */
+ Assert(context->format == 't');
+
+ /* Get file information */
+ if (stat(fullpath, &sb) != 0)
+ {
+ report_backup_error(context,
+ "could not stat file or directory \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ /* In a tar format backup, we expect only plain files. */
+ if (!S_ISREG(sb.st_mode))
+ {
+ report_backup_error(context,
+ "\"%s\" is not a plain file",
+ relpath);
+ return;
+ }
+
+ /*
+ * We expect tar files for backing up the main directory, tablespace, and
+ * pg_wal directory.
+ *
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar, the pg_wal directory to pg_wal.tar, and the tablespace
+ * directory to <tablespaceoid>.tar, each followed by a compression type
+ * extension such as .gz, .lz4, or .zst.
+ */
+ if (strncmp("base", relpath, 4) == 0)
+ suffix = relpath + 4;
+ else if (strncmp("pg_wal", relpath, 6) == 0)
+ suffix = relpath + 6;
+ else
+ {
+ /* Expected a <tablespaceoid>.tar file here. */
+ uint64 num = strtoul(relpath, &suffix, 10);
+
+ /*
+ * Report an error if we didn't consume at least one character, if the
+ * result is 0, or if the value is too large to be a valid OID.
+ */
+ if (suffix == NULL || num <= 0 || num > OID_MAX)
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ tblspc_oid = (Oid) num;
+ }
+
+ /* Now, check the compression type of the tar */
+ if (strcmp(suffix, ".tar") == 0)
+ compress_algorithm = PG_COMPRESSION_NONE;
+ else if (strcmp(suffix, ".tgz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.gz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.lz4") == 0)
+ compress_algorithm = PG_COMPRESSION_LZ4;
+ else if (strcmp(suffix, ".tar.zst") == 0)
+ compress_algorithm = PG_COMPRESSION_ZSTD;
+ else
+ {
+ report_backup_error(context,
+ "\"%s\" unexpected file in the tar format backup",
+ relpath);
+ return;
+ }
+
+ /*
+ * Ignore WALs, as reading and verification will be handled through
+ * pg_waldump.
+ */
+ if (strncmp("pg_wal", relpath, 6) == 0)
+ return;
+
+ /*
+ * Append the information to the list for complete verification at a later
+ * stage.
+ */
+ tar = pg_malloc(sizeof(tar_file));
+ tar->relpath = pstrdup(relpath);
+ tar->tblspc_oid = tblspc_oid;
+ tar->compress_algorithm = compress_algorithm;
+
+ simple_ptr_list_append(tarfiles, tar);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress)
+ total_size += sb.st_size;
+}
+
+/*
+ * The final steps for verifying files in tar format backups.
+ *
+ * Prepares the archive streamer stack according to the tar compression format
+ * for each tar file and invokes them for reading, decompressing, and
+ * ultimately verifying the contents.
+ *
+ * The arguments to this function should be a list of valid tar files to
+ * verify, and the allocation will be freed once the verification is complete.
+ */
+static void
+verify_all_tar_files(verifier_context *context, SimplePtrList *tarfiles)
+{
+ SimplePtrListCell *cell;
+
+ progress_report(false);
+
+ for (cell = tarfiles->head; cell != NULL; cell = cell->next)
+ {
+ tar_file *tar = (tar_file *) cell->ptr;
+ astreamer *streamer;
+ char *fullpath;
+
+ /* Prepare archive streamer stack */
+ streamer = create_archive_verifier(context,
+ tar->relpath,
+ tar->tblspc_oid,
+ tar->compress_algorithm);
+
+ /* Compute the full pathname to the target file. */
+ fullpath = psprintf("%s/%s", context->backup_directory,
+ tar->relpath);
+
+ /* Invoke the streamer for reading, decompressing, and verifying. */
+ verify_one_tar_file(context, tar->relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ pfree(tar->relpath);
+ pfree(tar);
+ pfree(fullpath);
+
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+ }
+ simple_ptr_list_destroy(tarfiles);
+
+ progress_report(true);
+}
+
+/*
+ * Verification of a single tar file content.
+ *
+ * It reads a given tar archive in predefined chunks and passes it to the
+ * streamer, which initiates routines for decompression (if necessary) and then
+ * verifies each member within the tar file.
+ */
+static void
+verify_one_tar_file(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ {
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ /* Report progress */
+ done_size += rc;
+ progress_report(false);
+ }
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1049,6 +1349,41 @@ find_backup_format(verifier_context *context)
return result;
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(context->format == 't');
+
+ /*
+ * To verify the contents of the tar, the initial step is to parse its
+ * content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar is compressed, we must perform the appropriate decompression
+ * operation before proceeding with the verification of its contents.
+ */
+ if (compress_algo == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the global variables.
*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 80031ad4dbc..be7438af346 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
/*
@@ -123,4 +124,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
extern bool should_ignore_relpath(verifier_context *context,
const char *relpath);
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9e951a9e6f3..be3d32ef68a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3329,6 +3329,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
@@ -3950,6 +3951,7 @@ substitute_phv_relids_context
subxids_array_status
symbol
tablespaceinfo
+tar_file
td_entry
teSection
temp_tablespaces_extra
--
2.18.0
v13-0012-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v13-0012-pg_verifybackup-Tests-and-document.patchDownload
From 500d9bc69d7d786fae637315261dca79916b91ab Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 29 Aug 2024 19:01:22 +0530
Subject: [PATCH v13 12/12] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 43 +++-
src/bin/pg_verifybackup/t/001_basic.pl | 6 +-
src/bin/pg_verifybackup/t/002_algorithm.pl | 32 ++-
src/bin/pg_verifybackup/t/003_corruption.pl | 187 ++++++++++++++++--
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 73 +++----
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +----
8 files changed, 274 insertions(+), 119 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..ea6bc3ccb12 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
</para>
<para>
@@ -168,6 +170,43 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. A valid backup includes the main data
+ directory in a file named <filename>base.tar</filename>, the WAL
+ files in <filename>pg_wal.tar</filename>, and separate tar files for
+ each tablespace, named after the tablespace's OID, followed by the
+ compression extension.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..ca5b0402b7d 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,11 +17,11 @@ command_fails_like(
qr/no backup directory specified/,
'target directory must be specified');
command_fails_like(
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
qr/could not open file.*\/backup_manifest\"/,
'pg_verifybackup requires a manifest');
command_fails_like(
- [ 'pg_verifybackup', $tempdir, $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
qr/too many command-line arguments/,
'multiple target directories not allowed');
@@ -31,7 +31,7 @@ close($fh);
# but then try to use an alternate, nonexisting manifest
command_fails_like(
- [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+ [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
qr/could not open file.*\/not_the_manifest\"/,
'pg_verifybackup respects -m flag');
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index fb2a1fd7c4e..ac276f3da6b 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -14,24 +14,33 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
-for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
+sub test_checksums
{
- my $backup_path = $primary->backup_dir . '/' . $algorithm;
+ my ($format, $algorithm) = @_;
+ my $backup_path = $primary->backup_dir . '/' . $format . '/' . $algorithm;
my @backup = (
'pg_basebackup', '-D', $backup_path,
'--manifest-checksums', $algorithm, '--no-sync', '-cfast');
my @verify = ('pg_verifybackup', '-e', $backup_path);
+ if ($format eq 'tar')
+ {
+ # Add tar backup format option
+ push @backup, ('-F', 't');
+ # Add switch to skip WAL verification.
+ push @verify, ('-n');
+ }
+
# A backup with a bogus algorithm should fail.
if ($algorithm eq 'bogus')
{
$primary->command_fails(\@backup,
- "backup fails with algorithm \"$algorithm\"");
- next;
+ "$format backup fails with algorithm \"$algorithm\"");
+ return;
}
# A backup with a valid algorithm should work.
- $primary->command_ok(\@backup, "backup ok with algorithm \"$algorithm\"");
+ $primary->command_ok(\@backup, "$format backup ok with algorithm \"$algorithm\"");
# We expect each real checksum algorithm to be mentioned on every line of
# the backup manifest file except the first and last; for simplicity, we
@@ -39,7 +48,7 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
# is none, we just check that the manifest exists.
if ($algorithm eq 'none')
{
- ok(-f "$backup_path/backup_manifest", "backup manifest exists");
+ ok(-f "$backup_path/backup_manifest", "$format backup manifest exists");
}
else
{
@@ -52,10 +61,19 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
# Make sure that it verifies OK.
$primary->command_ok(\@verify,
- "verify backup with algorithm \"$algorithm\"");
+ "verify $format backup with algorithm \"$algorithm\"");
# Remove backup immediately to save disk space.
rmtree($backup_path);
}
+# Do the check
+for my $format (qw(plain tar))
+{
+ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
+ {
+ test_checksums($format, $algorithm);
+ }
+}
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index ae91e043384..d0c3ffedd14 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -11,6 +11,8 @@ use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+my $tar = $ENV{TAR};
+
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
@@ -32,62 +34,73 @@ EOM
my @scenario = (
{
'name' => 'extra_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_extra_file,
'fails_like' =>
qr/extra_file.*present on disk but not in the manifest/
},
{
'name' => 'extra_tablespace_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_extra_tablespace_file,
'fails_like' =>
qr/extra_ts_file.*present on disk but not in the manifest/
},
{
'name' => 'missing_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_missing_file,
'fails_like' =>
qr/pg_xact\/0000.*present in the manifest but not on disk/
},
{
'name' => 'missing_tablespace',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_missing_tablespace,
'fails_like' =>
qr/pg_tblspc.*present in the manifest but not on disk/
},
{
'name' => 'append_to_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_append_to_file,
'fails_like' => qr/has size \d+ on disk but size \d+ in the manifest/
},
{
'name' => 'truncate_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_truncate_file,
'fails_like' => qr/has size 0 on disk but size \d+ in the manifest/
},
{
'name' => 'replace_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_replace_file,
'fails_like' => qr/checksum mismatch for file/
},
{
'name' => 'system_identifier',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_system_identifier,
'fails_like' =>
qr/manifest system identifier is .*, but control file has/
},
{
'name' => 'bad_manifest',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_bad_manifest,
'fails_like' => qr/manifest checksum mismatch/
},
{
'name' => 'open_file_fails',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_open_file_fails,
'fails_like' => qr/could not open file/,
'skip_on_windows' => 1
},
{
'name' => 'open_directory_fails',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_open_directory_fails,
'cleanup' => \&cleanup_open_directory_fails,
'fails_like' => qr/could not open directory/,
@@ -95,10 +108,61 @@ my @scenario = (
},
{
'name' => 'search_directory_fails',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_search_directory_fails,
'cleanup' => \&cleanup_search_directory_fails,
'fails_like' => qr/could not stat file or directory/,
'skip_on_windows' => 1
+ },
+ {
+ 'name' => 'tar_backup_unexpected_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_extra_file,
+ 'fails_like' =>
+ qr/extra_file.*unexpected file in the tar format backup/
+ },
+ {
+ 'name' => 'tar_backup_extra_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_extra_file,
+ 'fails_like' =>
+ qr/extra_tar_member_file.*present on disk but not in the manifest/
+ },
+ {
+ 'name' => 'tar_backup_missing_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_missing_file,
+ 'fails_like' =>
+ qr/pg_xact\/0000.*present in the manifest but not on disk/,
+ 'skip_on_windows' => 1
+ },
+ {
+ 'name' => 'tar_backup_append_to_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_append_to_file,
+ 'fails_like' => qr/has size \d+ on disk but size \d+ in the manifest/,
+ 'skip_on_windows' => 1
+ },
+ {
+ 'name' => 'tar_backup_truncate_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_truncate_file,
+ 'fails_like' => qr/has size 0 on disk but size \d+ in the manifest/,
+ 'skip_on_windows' => 1
+ },
+ {
+ 'name' => 'tar_backup_replace_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_replace_file,
+ 'fails_like' => qr/checksum mismatch for file/,
+ 'skip_on_windows' => 1
+ },
+ {
+ 'name' => 'tar_backup_system_identifier',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_system_identifier,
+ 'fails_like' =>
+ qr/manifest system identifier is .*, but control file has/
});
for my $scenario (@scenario)
@@ -111,29 +175,40 @@ for my $scenario (@scenario)
if ($scenario->{'skip_on_windows'}
&& ($windows_os || $Config::Config{osname} eq 'cygwin'));
+ # Skip tests for tar format-backup if tar is not available.
+ skip "no tar program available", 4
+ if ($scenario->{'backup_format'} eq 't' && (!defined $tar || $tar eq ''));
+
# Take a backup and check that it verifies OK.
my $backup_path = $primary->backup_dir . '/' . $name;
my $backup_ts_path = PostgreSQL::Test::Utils::tempdir_short();
+
+ my @backup = (
+ 'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast',
+ '-T', "${source_ts_path}=${backup_ts_path}");
+ my @verify = ('pg_verifybackup', $backup_path);
+
+ if ($scenario->{'backup_format'} eq 't')
+ {
+ # Add tar backup format option
+ push @backup, ('-F', 't');
+ # Add switch to skip WAL verification.
+ push @verify, ('-n');
+ }
+
# The tablespace map parameter confuses Msys2, which tries to mangle
# it. Tell it not to.
# See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
- $primary->command_ok(
- [
- 'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast',
- '-T', "${source_ts_path}=${backup_ts_path}"
- ],
- "base backup ok");
- command_ok([ 'pg_verifybackup', $backup_path ],
- "intact backup verified");
+
+ $primary->command_ok( \@backup, "base backup ok");
+ command_ok(\@verify, "intact backup verified");
# Mutilate the backup in some way.
$scenario->{'mutilate'}->($backup_path);
# Now check that the backup no longer verifies.
- command_fails_like(
- [ 'pg_verifybackup', $backup_path ],
- $scenario->{'fails_like'},
+ command_fails_like(\@verify, $scenario->{'fails_like'},
"corrupt backup fails verification: $name");
# Run cleanup hook, if provided.
@@ -260,6 +335,7 @@ sub mutilate_system_identifier
$backup_path . '/backup_manifest')
or BAIL_OUT "could not copy manifest to $backup_path";
$node->teardown_node(fail_ok => 1);
+ $node->clean_node();
return;
}
@@ -316,4 +392,93 @@ sub cleanup_search_directory_fails
return;
}
+# Unpack base.tar, perform the specified file operation, and then repack the
+# modified content into base.tar at the same location.
+sub mutilate_base_tar
+{
+ my ($backup_path, $op) = @_;
+
+ my $archive = 'base.tar';
+ my $tmpdir = "$backup_path/tmpdir";
+ mkdir($tmpdir) || die "$!";
+
+ # Extract the archive
+ system_or_bail($tar, '-xf', "$backup_path/$archive", '-C', "$tmpdir");
+ unlink("$backup_path/$archive") || die "$!";
+
+ if ($op eq 'add')
+ {
+ create_extra_file($tmpdir, 'extra_tar_member_file');
+ }
+ elsif ($op eq 'delete')
+ {
+ mutilate_missing_file($tmpdir);
+ }
+ elsif ($op eq 'append')
+ {
+ mutilate_append_to_file($tmpdir);
+ }
+ elsif ($op eq 'truncate')
+ {
+ mutilate_truncate_file($tmpdir);
+ }
+ elsif ($op eq 'replace')
+ {
+ mutilate_replace_file($tmpdir);
+ }
+ else
+ {
+ die "mutilate_tar_backup: \"$op\" invalid operation";
+ }
+
+
+ # Navigate to the extracted location and list the files.
+ chdir("$tmpdir") || die "$!";
+ my @files = glob("*");
+ # Repack the extracted content
+ system_or_bail($tar, '-cf', "$backup_path/$archive", @files);
+ chdir($backup_path) || die "$!";
+ rmtree("$tmpdir") || die "$!";
+}
+
+# Add a file into the base.tar of the backup.
+sub mutilate_tar_backup_extra_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'add');
+ return;
+}
+
+# Remove a file.
+sub mutilate_tar_backup_missing_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'delete');
+ return;
+}
+
+# Append an additional bytes to a file.
+sub mutilate_tar_backup_append_to_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'append');
+ return;
+}
+
+# Truncate a file to zero length.
+sub mutilate_tar_backup_truncate_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'truncate');
+ return;
+}
+
+# Replace a file's contents
+sub mutilate_tar_backup_replace_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'replace');
+ return;
+}
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
print $fh $manifest_contents;
close($fh);
- command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+ command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
return;
}
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..1c83f38d5b5 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a tablespace directory.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +37,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
v13-0006-Refactor-split-verify_backup_file-function-and-r.patchapplication/x-patch; name=v13-0006-Refactor-split-verify_backup_file-function-and-r.patchDownload
From 4f98ffa42916fe179e9a87b9043393b6449f1705 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 10:42:37 +0530
Subject: [PATCH v13 06/12] Refactor: split verify_backup_file() function and
rename it.
The function verify_backup_file() has now been renamed to
verify_plain_backup_file() to make it clearer that it is specifically
used for verifying files in a plain backup. Similarly, in a future
patch, we would have a verify_tar_backup_file() function for
verifying TAR backup files.
In addition to that, moved the manifest entry verification code into a
new function called verify_manifest_entry() so that it can be reused
for tar backup verification. If verify_manifest_entry() doesn't find
an entry, it reports an error as before and returns NULL to the
caller. This is why a NULL check is added to should_verify_checksum().
---
src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++++++++++++++--------
src/bin/pg_verifybackup/pg_verifybackup.h | 6 ++-
2 files changed, 42 insertions(+), 22 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3fcfb167217..5bfc98e7874 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -64,8 +64,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context,
+ char *relpath, char *fullpath);
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
@@ -570,7 +570,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_backup_file(context, newrelpath, newfullpath);
+ verify_plain_backup_file(context, newrelpath, newfullpath);
pfree(newfullpath);
pfree(newrelpath);
@@ -591,7 +591,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
* verify_backup_directory.
*/
static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath)
{
struct stat sb;
manifest_file *m;
@@ -627,6 +628,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ total_size += m->size;
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -634,40 +661,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
- /* Update statistics for progress report, if necessary */
- if (show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- total_size += m->size;
-
/*
* We don't verify checksums at this stage. We first finish verifying that
* we have the expected set of files with the expected sizes, and only
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -830,7 +846,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index d8c566ed587..ff9476e356e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -37,7 +37,8 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
/*
* Define a hash table which we can use to store information about the files
@@ -93,6 +94,9 @@ typedef struct verifier_context
bool saw_any_error;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
v13-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v13-0007-Refactor-split-verify_file_checksum-function.patchDownload
From ab254802604c8c31b5ec05784d938593c6efd9b2 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 15:14:15 +0530
Subject: [PATCH v13 07/12] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 20 +++++++++++++++++---
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5bfc98e7874..e44d0377cd5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -792,8 +792,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int fd;
int rc;
size_t bytes_read = 0;
- uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -844,6 +842,22 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
if (rc < 0)
return;
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, bytes_read);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, int64 bytes_read)
+{
+ const char *relpath = m->pathname;
+ int checksumlen;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
/*
* Double-check that we read the expected number of bytes from the file.
* Normally, a file size mismatch would be caught in verify_manifest_entry
@@ -860,7 +874,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
}
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index ff9476e356e..fe0ce8a89aa 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -96,6 +96,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ int64 bytes_read);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v13-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v13-0008-Refactor-split-verify_control_file.patchDownload
From 744f359f0782d590cac0f29bde76430561c038b5 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 10:51:23 +0530
Subject: [PATCH v13 08/12] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file contents instead, that needs to be verified in
the same way. For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 42 ++++++++++-------------
src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++
2 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e44d0377cd5..d04e1d8c8ac 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -66,8 +66,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -631,14 +629,20 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the pg_control information */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
/* Update statistics for progress report, if necessary */
if (show_progress && !context->skip_checksums &&
@@ -687,18 +691,13 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file, const char *controlpath,
+ bool crc_ok, uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -714,9 +713,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index fe0ce8a89aa..818064c6eed 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -40,6 +40,17 @@ typedef struct manifest_file
(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
(((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the control file and its system identifier against the manifest
+ * system identifier. Note that this feature is not available in manifest
+ * version 1. This validation should only be performed after the manifest entry
+ * validation for the pg_control file has been completed without errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -99,6 +110,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
int64 bytes_read);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
I would rather that you didn't add simple_ptr_list_destroy_deep()
given that you don't need it for this patch series.
+
"\"%s\" unexpected file in the tar format backup",
This doesn't seem grammatical to me. Perhaps change this to: file
\"%s\" is not expected in a tar format backup
+ /* We are only interested in files that are not in the ignore list. */
+ if (member->is_directory || member->is_link ||
+ should_ignore_relpath(mystreamer->context, member->pathname))
+ return;
Doesn't this need to happen after we add pg_tblspc/$OID to the path,
rather than before? I bet this doesn't work correctly for files in
user-defined tablespaces, compared to the way it work for a
directory-format backup.
+ char temp[MAXPGPATH];
+
+ /* Copy original name at temporary space */
+ memcpy(temp, member->pathname, MAXPGPATH);
+
+ snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, temp);
I don't like this at all. This function doesn't have any business
modifying the astreamer_member, and it doesn't need to. It can just do
char *pathname; char tmppathbuf[MAXPGPATH] and then set pathname to
either member->pathname or tmppathbuf depending on
OidIsValid(tblspcoid). Also, shouldn't this be using %u, not %d?
+ mystreamer->mfile = (void *) m;
Either the cast to void * isn't necessary, or it indicates that
there's a type mismatch that should be fixed.
+ * We could have these checks while receiving contents. However, since
+ * contents are received in multiple iterations, this would result in
+ * these lengthy checks being performed multiple times. Instead, setting
+ * flags here and using them before proceeding with verification will be
+ * more efficient.
Seems unnecessary to explain this.
+ Assert(mystreamer->verify_checksum);
+
+ /* Should have came for the right file */
+ Assert(strcmp(member->pathname, m->pathname) == 0);
+
+ /*
+ * The checksum context should match the type noted in the backup
+ * manifest.
+ */
+ Assert(checksum_ctx->type == m->checksum_type);
What do you think about:
Assert(m != NULL && !m->bad);
Assert(checksum_ctx->type == m->checksum_type);
Assert(strcmp(member->pathname, m->pathname) == 0);
Or possibly change the first one to Assert(should_verify_checksum(m))?
+ memcpy(&control_file, streamer->bbs_buffer.data,
sizeof(ControlFileData));
This probably doesn't really hurt anything, but it's a bit ugly. You
first use astreamer_buffer_until() to force the entire file into a
buffer. And then here, you copy the entire file into a second buffer
which is exactly the same except that it's guaranteed to be properly
aligned. It would be possible to include a ControlFileData in
astreamer_verify and copy the bytes directly into it (you'd need a
second astreamer_verify field for the number of bytes already copied
into that structure). I'm not 100% sure that's worth the code, but it
seems like it wouldn't take more than a few lines, so perhaps it is.
+/*
+ * Verify plain backup.
+ */
+static void
+verify_plain_backup(verifier_context *context)
+{
+ Assert(context->format == 'p');
+ verify_backup_directory(context, NULL, context->backup_directory);
+}
+
This seems like a waste of space.
+verify_tar_backup(verifier_context *context)
I like this a lot better now! I'm still not quite sure about the
decision to have the ignore list apply to both the backup directory
and the tar file contents -- but given the low participation on this
thread, I don't think we have much chance of getting feedback from
anyone else right now, so let's just do it the way you have it and we
can change it later if someone shows up to complain.
+verify_all_tar_files(verifier_context *context, SimplePtrList *tarfiles)
I think this code could be moved into its only caller instead of
having a separate function. And then if you do that, maybe
verify_one_tar_file could be renamed to just verify_tar_file. Or
perhaps that function could also be removed and just move the code
into the caller. It's not much code and not very deeply nested.
Similarly create_archive_verifier() could be considered for this
treatment. Maybe inlining all of these is too much and the result will
look messy, but I think we should at least try to combine some of
them.
...Robert
+ pg_log_error("pg_waldump cannot read from a tar");
"tar" isn't normally used as a noun as you do here, so I think this
should say "pg_waldump cannot read tar files".
Technically, the position of this check could lead to an unnecessary
failure, if -n wasn't specified but pg_wal.tar(.whatever) also doesn't
exist on disk. But I think it's OK to ignore that case.
However, I also notice this change to the TAP tests in a few places:
- [ 'pg_verifybackup', $tempdir ],
+ [ 'pg_verifybackup', '-Fp', $tempdir ],
It's not the end of the world to have to make a change like this, but
it seems easy to do better. Just postpone the call to
find_backup_format() until right after we call parse_manifest_file().
That also means postponing the check mentioned above until right after
that, but that's fine: after parse_manifest_file() and then
find_backup_format(), you can do if (!no_parse_wal && context.format
== 't') { bail out }.
+ if (stat(path, &sb) == 0)
+ result = 'p';
+ else
+ {
+ if (errno != ENOENT)
+ {
+ pg_log_error("cannot determine backup format : %m");
+ pg_log_error_hint("Try \"%s --help\" for more
information.", progname);
+ exit(1);
+ }
+
+ /* Otherwise, it is assumed to be a tar format backup. */
+ result = 't';
+ }
This doesn't look good, for a few reasons:
1. It would be clearer to structure this as if (stat(...) == 0) result
= 'p'; else if (errno == ENOENT) result = 't'; else { report an error;
} instead of the way you have it.
2. "cannot determine backup format" is not an appropriate way to
report the failure of stat(). The appropriate message is "could not
stat file \"%s\"".
3. It is almost never correct to put a space before a colon in an error message.
4. The hint doesn't look helpful, or necessary. I think you can just
delete that.
Regarding both point #2 and point #4, I think we should ask ourselves
how stat() could realistically fail here. On my system (macOS), the
document failed modes for stat() are EACCES (i.e. permission denied),
EFAULT (i.e. we have a bug in pg_verifybackup), EIO (I/O Error), ELOOP
(symlink loop), ENAMETOOLONG, ENOENT, ENOTDIR, and EOVERFLOW. In none
of those cases does it seem likely that specifying the format manually
will help anything. Thus, suggesting that the user look at the help,
presumably to find --format, is unlikely to solve anything, and
telling them that the error happened while trying to determine the
backup format isn't really helping anything, either. What the user
needs to know is that it was stat() that failed, and the pathname for
which it failed. Then they need to sort out whatever problem is
causing them to get one of those really weird errors.
Aside from the above, 0010 looks pretty reasonable, although I'll
probably want to do some wordsmithing on some of the comments at some
point.
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup must be stored in the "plain" or "tar"
+ format. Verification is supported for <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup;
+ any other compressed format backups can be checked after decompressing them.
I don't think that we need to say that the backup must be stored in
the plain or tar format, because those are the only backup formats
pg_basebackup knows about. Similarly, it doesn't seem help to me to
enumerate all the compression algorithms that pg_basebackup supports
and say we only support those; what else would a user expect?
What I would do is replace the original sentence ("The backup must be
stored...") with: The backup may be stored either in the "plain" or
the "tar" format; this includes "tar" backups compressed with any
algorithm supported by pg_basebackup. However, at present, WAL
verification is supported only for plain-format backups. Therefore, if
the backup is stored in "tar" format, the <literal>-n,
--no-parse-wal<literal> option should be used.
+ # Add tar backup format option
+ push @backup, ('-F', 't');
+ # Add switch to skip WAL verification.
+ push @verify, ('-n');
Say why, not what. The second comment should say something like "WAL
verification not yet supported for tar-format backups".
+ "$format backup fails with algorithm \"$algorithm\"");
+ $primary->command_ok(\@backup, "$format backup ok with
algorithm \"$algorithm\"");
+ ok(-f "$backup_path/backup_manifest", "$format backup
manifest exists");
+ "verify $format backup with algorithm \"$algorithm\"");
Personally I would change "$format" to "$format format" in all of
these places, so that we talk about a "tar format backup" or a "plain
format backup" instead of a "tar backup" or a "plain backup".
+ 'skip_on_windows' => 1
I don't understand why 4 of the 7 new tests are skipped on Windows.
The existing "skip" message for this parameter says "unix-style
permissions not supported on Windows" but that doesn't seem applicable
for any of the new cases, and I couldn't find a comment about it,
either.
+ my @files = glob("*");
+ system_or_bail($tar, '-cf', "$backup_path/$archive", @files);
Why not just system_or_bail($tar, '-cf', "$backup_path/$archive", '.')?
Also, instead of having separate entries in the test array to do
basically the same thing on Windows, could we just iterate through the
test array twice and do everything once for plain format and then a
second time for tar format, and do the tests once for each? I don't
think that idea QUITE works, because the open_file_fails,
open_directory_fails, and search_directory_fails tests are really not
applicable to tar format. But we could rename skip_on_windows to
tests_file_permissions and skip those both on Windows and for tar
format. But aside from that, I don't quite see why it makes sense to,
for example, test extra_file for both formats but not
extra_tablespace_file, and indeed it seems like an important bit of
test coverage.
I also feel like we should have tests someplace that add extra files
to a tar-format backup in the backup directory (e.g. 1234567.tar, or
wug.tar, or 123456.wug) or remove entire files.
...Robert
On Tue, Sep 10, 2024 at 1:31 AM Robert Haas <robertmhaas@gmail.com> wrote:
I would rather that you didn't add simple_ptr_list_destroy_deep()
given that you don't need it for this patch series.
Done.
+
"\"%s\" unexpected file in the tar format backup",This doesn't seem grammatical to me. Perhaps change this to: file
\"%s\" is not expected in a tar format backup
Ok, updated in the attached version.
+ /* We are only interested in files that are not in the ignore list. */ + if (member->is_directory || member->is_link || + should_ignore_relpath(mystreamer->context, member->pathname)) + return;Doesn't this need to happen after we add pg_tblspc/$OID to the path,
rather than before? I bet this doesn't work correctly for files in
user-defined tablespaces, compared to the way it work for a
directory-format backup.+ char temp[MAXPGPATH]; + + /* Copy original name at temporary space */ + memcpy(temp, member->pathname, MAXPGPATH); + + snprintf(member->pathname, MAXPGPATH, "%s/%d/%s", + "pg_tblspc", mystreamer->tblspc_oid, temp);I don't like this at all. This function doesn't have any business
modifying the astreamer_member, and it doesn't need to. It can just do
char *pathname; char tmppathbuf[MAXPGPATH] and then set pathname to
either member->pathname or tmppathbuf depending on
OidIsValid(tblspcoid). Also, shouldn't this be using %u, not %d?
True, fixed in the attached version.
+ mystreamer->mfile = (void *) m;
Either the cast to void * isn't necessary, or it indicates that
there's a type mismatch that should be fixed.
Fixed -- was added in the very first version and forgotten in later updates.
+ * We could have these checks while receiving contents. However, since + * contents are received in multiple iterations, this would result in + * these lengthy checks being performed multiple times. Instead, setting + * flags here and using them before proceeding with verification will be + * more efficient.Seems unnecessary to explain this.
Removed.
+ Assert(mystreamer->verify_checksum); + + /* Should have came for the right file */ + Assert(strcmp(member->pathname, m->pathname) == 0); + + /* + * The checksum context should match the type noted in the backup + * manifest. + */ + Assert(checksum_ctx->type == m->checksum_type);What do you think about:
Assert(m != NULL && !m->bad);
Assert(checksum_ctx->type == m->checksum_type);
Assert(strcmp(member->pathname, m->pathname) == 0);Or possibly change the first one to Assert(should_verify_checksum(m))?
LGTM.
+ memcpy(&control_file, streamer->bbs_buffer.data,
sizeof(ControlFileData));This probably doesn't really hurt anything, but it's a bit ugly. You
first use astreamer_buffer_until() to force the entire file into a
buffer. And then here, you copy the entire file into a second buffer
which is exactly the same except that it's guaranteed to be properly
aligned. It would be possible to include a ControlFileData in
astreamer_verify and copy the bytes directly into it (you'd need a
second astreamer_verify field for the number of bytes already copied
into that structure). I'm not 100% sure that's worth the code, but it
seems like it wouldn't take more than a few lines, so perhaps it is.
I think we could skip this memcpy() and directly cast
streamer->bbs_buffer.data to ControlFileData *, as we already ensure
that the correct length is being read just before this memcpy(). Did
the same in the attached version.
+/* + * Verify plain backup. + */ +static void +verify_plain_backup(verifier_context *context) +{ + Assert(context->format == 'p'); + verify_backup_directory(context, NULL, context->backup_directory); +} +This seems like a waste of space.
Yeah, but aim to keep the function name more self-explanatory and
consistent with the naming style.
+verify_tar_backup(verifier_context *context)
I like this a lot better now! I'm still not quite sure about the
decision to have the ignore list apply to both the backup directory
and the tar file contents -- but given the low participation on this
thread, I don't think we have much chance of getting feedback from
anyone else right now, so let's just do it the way you have it and we
can change it later if someone shows up to complain.
Ok.
+verify_all_tar_files(verifier_context *context, SimplePtrList *tarfiles)
I think this code could be moved into its only caller instead of
having a separate function. And then if you do that, maybe
verify_one_tar_file could be renamed to just verify_tar_file. Or
perhaps that function could also be removed and just move the code
into the caller. It's not much code and not very deeply nested.
Similarly create_archive_verifier() could be considered for this
treatment. Maybe inlining all of these is too much and the result will
look messy, but I think we should at least try to combine some of
them.
I have removed verify_all_tar_files() and renamed
verify_one_tar_file() as suggested. However, I can't merge further
because I need verify_tar_file() (formerly verify_one_tar_file()) to
remain a separate function. This way, regardless of whether it
succeeds or encounters an error, I can easily perform cleanup
afterward.
On Tue, Sep 10, 2024 at 10:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
+ pg_log_error("pg_waldump cannot read from a tar");
"tar" isn't normally used as a noun as you do here, so I think this
should say "pg_waldump cannot read tar files".
Done.
Technically, the position of this check could lead to an unnecessary
failure, if -n wasn't specified but pg_wal.tar(.whatever) also doesn't
exist on disk. But I think it's OK to ignore that case.However, I also notice this change to the TAP tests in a few places:
- [ 'pg_verifybackup', $tempdir ], + [ 'pg_verifybackup', '-Fp', $tempdir ],It's not the end of the world to have to make a change like this, but
it seems easy to do better. Just postpone the call to
find_backup_format() until right after we call parse_manifest_file().
That also means postponing the check mentioned above until right after
that, but that's fine: after parse_manifest_file() and then
find_backup_format(), you can do if (!no_parse_wal && context.format
== 't') { bail out }.
Done.
+ if (stat(path, &sb) == 0) + result = 'p'; + else + { + if (errno != ENOENT) + { + pg_log_error("cannot determine backup format : %m"); + pg_log_error_hint("Try \"%s --help\" for more information.", progname); + exit(1); + } + + /* Otherwise, it is assumed to be a tar format backup. */ + result = 't'; + }This doesn't look good, for a few reasons:
1. It would be clearer to structure this as if (stat(...) == 0) result
= 'p'; else if (errno == ENOENT) result = 't'; else { report an error;
} instead of the way you have it.2. "cannot determine backup format" is not an appropriate way to
report the failure of stat(). The appropriate message is "could not
stat file \"%s\"".3. It is almost never correct to put a space before a colon in an error message.
4. The hint doesn't look helpful, or necessary. I think you can just
delete that.Regarding both point #2 and point #4, I think we should ask ourselves
how stat() could realistically fail here. On my system (macOS), the
document failed modes for stat() are EACCES (i.e. permission denied),
EFAULT (i.e. we have a bug in pg_verifybackup), EIO (I/O Error), ELOOP
(symlink loop), ENAMETOOLONG, ENOENT, ENOTDIR, and EOVERFLOW. In none
of those cases does it seem likely that specifying the format manually
will help anything. Thus, suggesting that the user look at the help,
presumably to find --format, is unlikely to solve anything, and
telling them that the error happened while trying to determine the
backup format isn't really helping anything, either. What the user
needs to know is that it was stat() that failed, and the pathname for
which it failed. Then they need to sort out whatever problem is
causing them to get one of those really weird errors.
Done.
- of the backup. The backup must be stored in the "plain" - format; a "tar" format backup can be checked after extracting it. + of the backup. The backup must be stored in the "plain" or "tar" + format. Verification is supported for <literal>gzip</literal>, + <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup; + any other compressed format backups can be checked after decompressing them.I don't think that we need to say that the backup must be stored in
the plain or tar format, because those are the only backup formats
pg_basebackup knows about. Similarly, it doesn't seem help to me to
enumerate all the compression algorithms that pg_basebackup supports
and say we only support those; what else would a user expect?What I would do is replace the original sentence ("The backup must be
stored...") with: The backup may be stored either in the "plain" or
the "tar" format; this includes "tar" backups compressed with any
algorithm supported by pg_basebackup. However, at present, WAL
verification is supported only for plain-format backups. Therefore, if
the backup is stored in "tar" format, the <literal>-n,
--no-parse-wal<literal> option should be used.
Done
+ # Add tar backup format option + push @backup, ('-F', 't'); + # Add switch to skip WAL verification. + push @verify, ('-n');Say why, not what. The second comment should say something like "WAL
verification not yet supported for tar-format backups".
Done.
+ "$format backup fails with algorithm \"$algorithm\""); + $primary->command_ok(\@backup, "$format backup ok with algorithm \"$algorithm\""); + ok(-f "$backup_path/backup_manifest", "$format backup manifest exists"); + "verify $format backup with algorithm \"$algorithm\"");Personally I would change "$format" to "$format format" in all of
these places, so that we talk about a "tar format backup" or a "plain
format backup" instead of a "tar backup" or a "plain backup".
Done.
+ 'skip_on_windows' => 1
I don't understand why 4 of the 7 new tests are skipped on Windows.
The existing "skip" message for this parameter says "unix-style
permissions not supported on Windows" but that doesn't seem applicable
for any of the new cases, and I couldn't find a comment about it,
either.
I was a bit unsure whether Windows could handle unpacking and
repacking tar files and the required path formats for these tests but
the "Cirrus CI / Windows - Server 2019, VS 2019" workflow doesn’t have
any issues with them. I’ve removed the flag.
+ my @files = glob("*"); + system_or_bail($tar, '-cf', "$backup_path/$archive", @files);Why not just system_or_bail($tar, '-cf', "$backup_path/$archive", '.')?
Doesn't suit since re-packing includes "./" at the beginning of each file path.
Also, instead of having separate entries in the test array to do
basically the same thing on Windows, could we just iterate through the
test array twice and do everything once for plain format and then a
second time for tar format, and do the tests once for each? I don't
think that idea QUITE works, because the open_file_fails,
open_directory_fails, and search_directory_fails tests are really not
applicable to tar format. But we could rename skip_on_windows to
tests_file_permissions and skip those both on Windows and for tar
format. But aside from that, I don't quite see why it makes sense to,
for example, test extra_file for both formats but not
extra_tablespace_file, and indeed it seems like an important bit of
test coverage.
Added test extra_file and missing_file test for tablespace as well.
I also feel like we should have tests someplace that add extra files
to a tar-format backup in the backup directory (e.g. 1234567.tar, or
wug.tar, or 123456.wug) or remove entire files.
If I am not missing something, tar_backup_unexpected_file test does
that. I have added a test that removes the tablespace archive in the
attached version.
The updated version attached. Thank you for the review !
Regards,
Amul
Attachments:
v14-0006-Refactor-split-verify_backup_file-function-and-r.patchapplication/x-patch; name=v14-0006-Refactor-split-verify_backup_file-function-and-r.patchDownload
From e02e69b65e70553313ea1287e65d77cd34a00688 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 10:42:37 +0530
Subject: [PATCH v14 06/12] Refactor: split verify_backup_file() function and
rename it.
The function verify_backup_file() has now been renamed to
verify_plain_backup_file() to make it clearer that it is specifically
used for verifying files in a plain backup. Similarly, in a future
patch, we would have a verify_tar_backup_file() function for
verifying TAR backup files.
In addition to that, moved the manifest entry verification code into a
new function called verify_manifest_entry() so that it can be reused
for tar backup verification. If verify_manifest_entry() doesn't find
an entry, it reports an error as before and returns NULL to the
caller. This is why a NULL check is added to should_verify_checksum().
---
src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++++++++++++++--------
src/bin/pg_verifybackup/pg_verifybackup.h | 6 ++-
2 files changed, 42 insertions(+), 22 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3fcfb167217..5bfc98e7874 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -64,8 +64,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context,
+ char *relpath, char *fullpath);
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
@@ -570,7 +570,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_backup_file(context, newrelpath, newfullpath);
+ verify_plain_backup_file(context, newrelpath, newfullpath);
pfree(newfullpath);
pfree(newrelpath);
@@ -591,7 +591,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
* verify_backup_directory.
*/
static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath)
{
struct stat sb;
manifest_file *m;
@@ -627,6 +628,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
return;
}
+ /* Check the backup manifest entry for this file. */
+ m = verify_manifest_entry(context, relpath, sb.st_size);
+
+ /*
+ * Validate the manifest system identifier, not available in manifest
+ * version 1.
+ */
+ if (context->manifest->version != 1 &&
+ strcmp(relpath, "global/pg_control") == 0 &&
+ m != NULL && m->matched && !m->bad)
+ verify_control_file(fullpath, context->manifest->system_identifier);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress && !context->skip_checksums &&
+ should_verify_checksum(m))
+ total_size += m->size;
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+ manifest_file *m;
+
/* Check whether there's an entry in the manifest hash. */
m = manifest_files_lookup(context->manifest->files, relpath);
if (m == NULL)
@@ -634,40 +661,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
report_backup_error(context,
"\"%s\" is present on disk but not in the manifest",
relpath);
- return;
+ return NULL;
}
/* Flag this entry as having been encountered in the filesystem. */
m->matched = true;
/* Check that the size matches. */
- if (m->size != sb.st_size)
+ if (m->size != filesize)
{
report_backup_error(context,
"\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ relpath, (long long int) filesize, m->size);
m->bad = true;
}
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0)
- verify_control_file(fullpath, context->manifest->system_identifier);
-
- /* Update statistics for progress report, if necessary */
- if (show_progress && !context->skip_checksums &&
- should_verify_checksum(m))
- total_size += m->size;
-
/*
* We don't verify checksums at this stage. We first finish verifying that
* we have the expected set of files with the expected sizes, and only
* afterwards verify the checksums. That's because computing checksums may
* take a while, and we'd like to report more obvious problems quickly.
*/
+
+ return m;
}
/*
@@ -830,7 +846,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
+ * Normally, a file size mismatch would be caught in verify_manifest_entry
* and this check would never be reached, but this provides additional
* safety and clarity in the event of concurrent modifications or
* filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index d8c566ed587..ff9476e356e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -37,7 +37,8 @@ typedef struct manifest_file
} manifest_file;
#define should_verify_checksum(m) \
- (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+ (((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
/*
* Define a hash table which we can use to store information about the files
@@ -93,6 +94,9 @@ typedef struct verifier_context
bool saw_any_error;
} verifier_context;
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+ char *relpath, int64 filesize);
+
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
pg_attribute_printf(2, 3);
--
2.18.0
v14-0007-Refactor-split-verify_file_checksum-function.patchapplication/x-patch; name=v14-0007-Refactor-split-verify_file_checksum-function.patchDownload
From 01286f76085dcaf72238eb6ad0a80f38d9ee05a3 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 15:14:15 +0530
Subject: [PATCH v14 07/12] Refactor: split verify_file_checksum() function.
Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.
The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 20 +++++++++++++++++---
src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++
2 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5bfc98e7874..e44d0377cd5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -792,8 +792,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
int fd;
int rc;
size_t bytes_read = 0;
- uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
- int checksumlen;
/* Open the target file. */
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -844,6 +842,22 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
if (rc < 0)
return;
+ /* Do the final computation and verification. */
+ verify_checksum(context, m, &checksum_ctx, bytes_read);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx, int64 bytes_read)
+{
+ const char *relpath = m->pathname;
+ int checksumlen;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
/*
* Double-check that we read the expected number of bytes from the file.
* Normally, a file size mismatch would be caught in verify_manifest_entry
@@ -860,7 +874,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
}
/* Get the final checksum. */
- checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+ checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
if (checksumlen < 0)
{
report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index ff9476e356e..fe0ce8a89aa 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -96,6 +96,9 @@ typedef struct verifier_context
extern manifest_file *verify_manifest_entry(verifier_context *context,
char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+ pg_checksum_context *checksum_ctx,
+ int64 bytes_read);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v14-0008-Refactor-split-verify_control_file.patchapplication/x-patch; name=v14-0008-Refactor-split-verify_control_file.patchDownload
From 616f5a911dbc4c82e999d3b64a98bec7c9f557a8 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 10:51:23 +0530
Subject: [PATCH v14 08/12] Refactor: split verify_control_file.
Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().
Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file contents instead, that needs to be verified in
the same way. For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 42 ++++++++++-------------
src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++
2 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e44d0377cd5..d04e1d8c8ac 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -66,8 +66,6 @@ static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
- uint64 manifest_system_identifier);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -631,14 +629,20 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
/* Check the backup manifest entry for this file. */
m = verify_manifest_entry(context, relpath, sb.st_size);
- /*
- * Validate the manifest system identifier, not available in manifest
- * version 1.
- */
- if (context->manifest->version != 1 &&
- strcmp(relpath, "global/pg_control") == 0 &&
- m != NULL && m->matched && !m->bad)
- verify_control_file(fullpath, context->manifest->system_identifier);
+ /* Validate the pg_control information */
+ if (should_verify_control_data(context->manifest, m))
+ {
+ ControlFileData *control_file;
+ bool crc_ok;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+ control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+ verify_control_data(control_file, fullpath, crc_ok,
+ context->manifest->system_identifier);
+ /* Release memory. */
+ pfree(control_file);
+ }
/* Update statistics for progress report, if necessary */
if (show_progress && !context->skip_checksums &&
@@ -687,18 +691,13 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
}
/*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
*/
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file, const char *controlpath,
+ bool crc_ok, uint64 manifest_system_identifier)
{
- ControlFileData *control_file;
- bool crc_ok;
-
- pg_log_debug("reading \"%s\"", controlpath);
- control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
/* Control file contents not meaningful if CRC is bad. */
if (!crc_ok)
report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -714,9 +713,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
controlpath,
(unsigned long long) manifest_system_identifier,
(unsigned long long) control_file->system_identifier);
-
- /* Release memory. */
- pfree(control_file);
}
/*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index fe0ce8a89aa..818064c6eed 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -40,6 +40,17 @@ typedef struct manifest_file
(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
(((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+/*
+ * Validate the control file and its system identifier against the manifest
+ * system identifier. Note that this feature is not available in manifest
+ * version 1. This validation should only be performed after the manifest entry
+ * validation for the pg_control file has been completed without errors.
+ */
+#define should_verify_control_data(manifest, m) \
+ (((manifest)->version != 1) && \
+ ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+ (strcmp((m)->pathname, "global/pg_control") == 0))
+
/*
* Define a hash table which we can use to store information about the files
* mentioned in the backup manifest.
@@ -99,6 +110,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
extern void verify_checksum(verifier_context *context, manifest_file *m,
pg_checksum_context *checksum_ctx,
int64 bytes_read);
+extern void verify_control_data(ControlFileData *control_file,
+ const char *controlpath, bool crc_ok,
+ uint64 manifest_system_identifier);
extern void report_backup_error(verifier_context *context,
const char *pg_restrict fmt,...)
--
2.18.0
v14-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchapplication/x-patch; name=v14-0009-Add-simple_ptr_list_destroy-and-simple_ptr_list_.patchDownload
From 8b60229b2ae59e7c01a2fdb83614b1aa362d52ab Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 16:01:33 +0530
Subject: [PATCH v14 09/12] Add simple_ptr_list_destroy() and
simple_ptr_list_destroy_deep() API.
We didn't have any helper function to destroy SimplePtrList, likely
because it wasn't needed before, but it's required in a later patch in
this set. I've added two functions for this purpose, inspired by
list_free() and list_free_deep().
---
src/fe_utils/simple_list.c | 19 +++++++++++++++++++
src/include/fe_utils/simple_list.h | 1 +
2 files changed, 20 insertions(+)
diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..c07e6bd9180 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,22 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
list->head = cell;
list->tail = cell;
}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+ SimplePtrListCell *cell;
+
+ cell = list->head;
+ while (cell != NULL)
+ {
+ SimplePtrListCell *next;
+
+ next = cell->next;
+ pg_free(cell);
+ cell = next;
+ }
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..c83ab6f77e4 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,6 @@ extern void simple_string_list_destroy(SimpleStringList *list);
extern const char *simple_string_list_not_touched(SimpleStringList *list);
extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
#endif /* SIMPLE_LIST_H */
--
2.18.0
v14-0010-pg_verifybackup-Add-backup-format-and-compressio.patchapplication/x-patch; name=v14-0010-pg_verifybackup-Add-backup-format-and-compressio.patchDownload
From d253d21df5f3137461e6410b2ee590ca052a3aea Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 11:03:44 +0530
Subject: [PATCH v14 10/12] pg_verifybackup: Add backup format and compression
option
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
src/bin/pg_verifybackup/pg_verifybackup.c | 68 ++++++++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 1 +
2 files changed, 67 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d04e1d8c8ac..c1542983b93 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -62,6 +62,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
+static char find_backup_format(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
static void verify_plain_backup_file(verifier_context *context,
@@ -97,6 +98,7 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
@@ -118,6 +120,7 @@ main(int argc, char **argv)
progname = get_progname(argv[0]);
memset(&context, 0, sizeof(context));
+ context.format = '\0';
if (argc > 1)
{
@@ -154,7 +157,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -173,6 +176,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ context.format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ context.format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -261,6 +273,21 @@ main(int argc, char **argv)
*/
context.manifest = parse_manifest_file(manifest_path);
+ /* Determine the backup format if it hasn't been specified. */
+ if (context.format == '\0')
+ context.format = find_backup_format(&context);
+
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from the tar file.
+ */
+ if (!no_parse_wal && context.format == 't')
+ {
+ pg_log_error("pg_waldump cannot read tar files");
+ pg_log_error_hint("You must use -n or --no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+
/*
* Now scan the files in the backup directory. At this stage, we verify
* that every file on disk is present in the manifest and that the sizes
@@ -279,8 +306,13 @@ main(int argc, char **argv)
/*
* Now do the expensive work of verifying file checksums, unless we were
* told to skip it.
+ *
+ * We were only checking the plain backup here. For the tar backup, file
+ * checksums verification (if requested) will be done immediately when the
+ * file is accessed, as we don't have random access to the files like we
+ * do with plain backups.
*/
- if (!context.skip_checksums)
+ if (!context.skip_checksums && context.format == 'p')
verify_backup_checksums(&context);
/*
@@ -981,6 +1013,37 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+ char result;
+ char *path;
+ struct stat sb;
+
+ /* Should be here only if the backup format is unknown */
+ Assert(context->format == '\0');
+
+ /* Check PG_VERSION file. */
+ path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+ if (stat(path, &sb) == 0)
+ result = 'p';
+ else if (errno == ENOENT)
+ result = 't';
+ else
+ {
+ pg_log_error("could not stat file \"%s\": %m", path);
+ exit(1);
+ }
+ pfree(path);
+
+ return result;
+}
+
/*
* Print a progress report based on the global variables.
*
@@ -1038,6 +1101,7 @@ usage(void)
printf(_(" -e, --exit-on-error exit immediately on error\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 818064c6eed..80031ad4dbc 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -100,6 +100,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ char format; /* backup format: p(lain)/t(ar) */
bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
--
2.18.0
v14-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchapplication/x-patch; name=v14-0011-pg_verifybackup-Read-tar-files-and-verify-its-co.patchDownload
From 68a85869b9a8c3b872035661df85976c890116f4 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 12:49:04 +0530
Subject: [PATCH v14 11/12] pg_verifybackup: Read tar files and verify its
contents
This patch implements TAR format backup verification.
For progress reporting support, we perform this verification in two
passes: the first pass calculates total_size, and the second pass
updates done_size as verification progresses.
For the verification, in the first pass, we call precheck_tar_backup_file(),
which performs basic verification by expecting only base.tar, pg_wal.tar, or
<tablespaceoid>.tar files and raises an error for any other files. It
also determines the compression type of the archive file. All this
information is stored in a newly added tarFile struct, which is
appended to a list that will be used in the second pass for the final
verification. In the second pass, the tar archives are read,
decompressed, and the required verification is carried out.
For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_content(), depending on its
compression type which is detected in the first pass.
When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.
Unlike in a plain backup, checksum verification here occurs in two
steps. First, as the contents are received, the checksum is computed
incrementally (see member_compute_checksum). Then, at the end of
processing the member file, the final verification is performed (see
member_verify_checksum).
Similarly, during the content receiving stage, if the file is
pg_control, the data will be copied into a local buffer (see
member_copy_control_data). The verification will then be carried out
at the end of the member file processing (see member_verify_control_data)
---
src/bin/pg_verifybackup/Makefile | 2 +
src/bin/pg_verifybackup/astreamer_verify.c | 358 +++++++++++++++++++++
src/bin/pg_verifybackup/meson.build | 1 +
src/bin/pg_verifybackup/pg_verifybackup.c | 326 ++++++++++++++++++-
src/bin/pg_verifybackup/pg_verifybackup.h | 6 +
src/tools/pgindent/typedefs.list | 2 +
6 files changed, 692 insertions(+), 3 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..374d4a8afd1 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,10 +17,12 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
+ astreamer_verify.o \
pg_verifybackup.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..b496e9320ea
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,358 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * format backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+ pg_checksum_context *checksum_ctx;
+
+ /* Hold information for a member file verification */
+ manifest_file *mfile;
+ int64 received_bytes;
+ bool verify_checksum;
+ bool verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void member_verify_header(astreamer *streamer, astreamer_member *member);
+static void member_compute_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_checksum(astreamer *streamer);
+static void member_copy_control_data(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_control_data(astreamer *streamer);
+static void member_reset_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ if (!context->skip_checksums)
+ streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ return &streamer->base;
+}
+
+/*
+ * The main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+
+ /*
+ * Perform the initial check and setup verification steps.
+ */
+ member_verify_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Since we are receiving the member content in chunks, it must be
+ * processed according to the flags set by the member header
+ * processing routine. This includes performing incremental
+ * checksum computations and copying control data to the local
+ * buffer.
+ */
+ if (mystreamer->verify_checksum)
+ member_compute_checksum(streamer, member, data, len);
+
+ if (mystreamer->verify_control_data)
+ member_copy_control_data(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * We have reached the end of the member file. By this point, we
+ * should have successfully computed the checksum of the received
+ * content and copied the entire pg_control file data into our
+ * local buffer. We can now proceed with the final verification.
+ */
+ if (mystreamer->verify_checksum)
+ member_verify_checksum(streamer);
+
+ if (mystreamer->verify_control_data)
+ member_verify_control_data(streamer);
+
+ /*
+ * Reset the temporary information stored for the verification.
+ */
+ member_reset_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Verifies whether the tar member entry exists in the backup manifest.
+ *
+ * If the archive being processed is a tablespace, it prepares the necessary
+ * file path first. If a valid entry is found in the backup manifest, it then
+ * determines whether checksum and control data verification should be
+ * performed during file content processing.
+ */
+static void
+member_verify_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+ char pathname[MAXPGPATH];
+
+ /* We are only interested in normal files. */
+ if (member->is_directory || member->is_link)
+ return;
+
+ /*
+ * The backup manifest stores a relative path to the base directory for
+ * files belonging to a tablespace, while the tablespace backup tar
+ * archive does not include this path. Ensure the required path is
+ * prepared; otherwise, the manifest entry verification will fail.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ snprintf(pathname, MAXPGPATH, "%s/%u/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, member->pathname);
+ else
+ memcpy(pathname, member->pathname, MAXPGPATH);
+
+
+ /* Ignore any files that are listed in the ignore list. */
+ if (should_ignore_relpath(mystreamer->context, pathname))
+ return;
+
+ /* Check the manifest entry */
+ m = verify_manifest_entry(mystreamer->context, pathname,
+ member->size);
+ mystreamer->mfile = m;
+
+ /* Prepare for checksum and control data verification. */
+ mystreamer->verify_checksum =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_control_data =
+ should_verify_control_data(mystreamer->context->manifest, m);
+
+ /* Initialize the context required for checksum verification. */
+ if (mystreamer->verify_checksum &&
+ pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, m->pathname);
+
+ /*
+ * Checksum verification cannot be performed without proper context
+ * initialization.
+ */
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Computes the checksum incrementally for the received file content.
+ *
+ * Should have a correctly initialized checksum_ctx, which will be used for
+ * incremental checksum computation.
+ */
+static void
+member_compute_checksum(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ manifest_file *m = mystreamer->mfile;
+
+ Assert(mystreamer->verify_checksum);
+
+ /*
+ * Should have been applied to the correct file. Note that strcmp() cannot
+ * be used because the member pathname (if it belongs to a tablespace) is
+ * not relative to the base directory, unlike the backup manifest. For
+ * more details, see member_verify_header().
+ */
+ Assert(should_verify_checksum(m));
+ Assert(m->checksum_type == checksum_ctx->type);
+ Assert(strstr(m->pathname, member->pathname));
+
+ /*
+ * Update the total count of computed checksum bytes for cross-checking
+ * with the file size in the final verification stage.
+ */
+ mystreamer->received_bytes += len;
+
+ if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "could not update checksum of file \"%s\"",
+ m->pathname);
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Perform the final computation and checksum verification after the entire
+ * file content has been processed.
+ */
+static void
+member_verify_checksum(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(mystreamer->verify_checksum);
+
+ verify_checksum(mystreamer->context, mystreamer->mfile,
+ mystreamer->checksum_ctx, mystreamer->received_bytes);
+}
+
+/*
+ * Stores the pg_control file contents into a local buffer; we need the entire
+ * control file data for verification.
+ */
+static void
+member_copy_control_data(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ /* Should be here only for control file */
+ Assert(strcmp(member->pathname, "global/pg_control") == 0);
+ Assert(((astreamer_verify *) streamer)->verify_control_data);
+
+ /* Copy enough control file data needed for verification. */
+ astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData));
+}
+
+/*
+ * Performs the CRC calculation of pg_control data and then calls the routines
+ * that execute the final verification of the control file information.
+ */
+static void
+member_verify_control_data(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ ControlFileData *control_file;
+ pg_crc32c crc;
+ bool crc_ok;
+
+ /* Should be here only for control file */
+ Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0);
+ Assert(mystreamer->verify_control_data);
+
+ /* Should have enough control file data needed for verification. */
+ if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+ report_fatal_error("%s: unexpected control file size: %d, should be %zu",
+ mystreamer->archive_name, streamer->bbs_buffer.len,
+ sizeof(ControlFileData));
+
+ control_file = (ControlFileData *) streamer->bbs_buffer.data;
+
+ /* Check the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc, (char *) (control_file), offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ crc_ok = EQ_CRC32C(crc, control_file->crc);
+
+ /* Do the final control data verification. */
+ verify_control_data(control_file, mystreamer->mfile->pathname, crc_ok,
+ manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+member_reset_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->received_bytes = 0;
+ mystreamer->verify_checksum = false;
+ mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..0e09d1379d1 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
+ 'astreamer_verify.c',
'pg_verifybackup.c'
)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c1542983b93..e63a0ed0798 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -22,6 +22,7 @@
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "limits.h"
#include "pg_verifybackup.h"
#include "pgtime.h"
@@ -44,6 +45,16 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/*
+ * Tar file information needed for content verification.
+ */
+typedef struct tar_file
+{
+ char *relpath;
+ Oid tblspc_oid;
+ pg_compress_algorithm compress_algorithm;
+} tar_file;
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -63,10 +74,16 @@ static void report_manifest_error(JsonManifestParseContext *context,
pg_attribute_printf(2, 3) pg_attribute_noreturn();
static char find_backup_format(verifier_context *context);
+static void verify_plain_backup(verifier_context *context);
+static void verify_tar_backup(verifier_context *context);
static void verify_backup_directory(verifier_context *context,
char *relpath, char *fullpath);
-static void verify_plain_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath);
+static void precheck_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles);
+static void verify_tar_file(verifier_context *context, char *relpath,
+ char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -75,6 +92,10 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid,
+ pg_compress_algorithm compress_algo);
static void progress_report(bool finished);
static void usage(void);
@@ -294,7 +315,10 @@ main(int argc, char **argv)
* match. We also set the "matched" flag on every manifest entry that
* corresponds to a file on disk.
*/
- verify_backup_directory(&context, NULL, context.backup_directory);
+ if (context.format == 'p')
+ verify_plain_backup(&context);
+ else
+ verify_tar_backup(&context);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -546,6 +570,16 @@ verifybackup_per_wal_range_cb(JsonManifestParseContext *context,
manifest->last_wal_range = range;
}
+/*
+ * Verify plain backup.
+ */
+static void
+verify_plain_backup(verifier_context *context)
+{
+ Assert(context->format == 'p');
+ verify_backup_directory(context, NULL, context->backup_directory);
+}
+
/*
* Verify one directory.
*
@@ -682,6 +716,257 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
total_size += m->size;
}
+/*
+ * Verify tar backup.
+ *
+ * Unlike plan backup verification, tar backup verification carried out in two
+ * passes; in the first pass this would simply sanity check on expected tar file
+ * to be present in the backup directory and it's compression type and collect
+ * these information is list. In the second pass, the tar archives are read,
+ * decompressed, and the required verification is carried out.
+ */
+static void
+verify_tar_backup(verifier_context *context)
+{
+ DIR *dir;
+ struct dirent *dirent;
+ SimplePtrList tarfiles = {NULL, NULL};
+ SimplePtrListCell *cell;
+ char *fullpath;
+
+ Assert(context->format == 't');
+
+ progress_report(false);
+
+ /*
+ * If the backup directory cannot be found, treat this as a fatal error.
+ */
+ fullpath = context->backup_directory;
+ dir = opendir(fullpath);
+ if (dir == NULL)
+ report_fatal_error("could not open directory \"%s\": %m", fullpath);
+
+ while (errno = 0, (dirent = readdir(dir)) != NULL)
+ {
+ char *filename = dirent->d_name;
+ char *newfullpath = psprintf("%s/%s", fullpath, filename);
+
+ /* Skip "." and ".." */
+ if (filename[0] == '.' && (filename[1] == '\0'
+ || strcmp(filename, "..") == 0))
+ continue;
+
+ /* First pass: Collect valid tar files from the backup. */
+ if (!should_ignore_relpath(context, filename))
+ precheck_tar_backup_file(context, filename, newfullpath,
+ &tarfiles);
+
+ pfree(newfullpath);
+ }
+
+ if (closedir(dir))
+ {
+ report_backup_error(context,
+ "could not close directory \"%s\": %m", fullpath);
+ return;
+ }
+
+ /* Second pass: Perform the final verification of the tar contents. */
+ for (cell = tarfiles.head; cell != NULL; cell = cell->next)
+ {
+ tar_file *tar = (tar_file *) cell->ptr;
+ astreamer *streamer;
+
+ /*
+ * Prepares the archive streamer stack according to the tar
+ * compression format.
+ */
+ streamer = create_archive_verifier(context,
+ tar->relpath,
+ tar->tblspc_oid,
+ tar->compress_algorithm);
+
+ /* Compute the full pathname to the target file. */
+ fullpath = psprintf("%s/%s", context->backup_directory,
+ tar->relpath);
+
+ /* Invoke the streamer for reading, decompressing, and verifying. */
+ verify_tar_file(context, tar->relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ pfree(tar->relpath);
+ pfree(tar);
+ pfree(fullpath);
+
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+ }
+ simple_ptr_list_destroy(&tarfiles);
+
+ progress_report(true);
+}
+
+/*
+ * Preparatory steps for verifying files in tar format backups.
+ *
+ * Carries out basic validation of the tar format backup file, detects the
+ * compression type, and appends that information to the tarfiles list. An
+ * error will be reported if the tar file is inaccessible, or if the file type,
+ * name, or compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_plain_backup_file. The additional argument outputs a list of valid
+ * tar files.
+ */
+static void
+precheck_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles)
+{
+ struct stat sb;
+ Oid tblspc_oid = InvalidOid;
+ pg_compress_algorithm compress_algorithm;
+ tar_file *tar;
+ char *suffix = NULL;
+
+ /* Should be tar format backup */
+ Assert(context->format == 't');
+
+ /* Get file information */
+ if (stat(fullpath, &sb) != 0)
+ {
+ report_backup_error(context,
+ "could not stat file or directory \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ /* In a tar format backup, we expect only plain files. */
+ if (!S_ISREG(sb.st_mode))
+ {
+ report_backup_error(context,
+ "\"%s\" is not a plain file",
+ relpath);
+ return;
+ }
+
+ /*
+ * We expect tar files for backing up the main directory, tablespace, and
+ * pg_wal directory.
+ *
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar, the pg_wal directory to pg_wal.tar, and the tablespace
+ * directory to <tablespaceoid>.tar, each followed by a compression type
+ * extension such as .gz, .lz4, or .zst.
+ */
+ if (strncmp("base", relpath, 4) == 0)
+ suffix = relpath + 4;
+ else if (strncmp("pg_wal", relpath, 6) == 0)
+ suffix = relpath + 6;
+ else
+ {
+ /* Expected a <tablespaceoid>.tar file here. */
+ uint64 num = strtoul(relpath, &suffix, 10);
+
+ /*
+ * Report an error if we didn't consume at least one character, if the
+ * result is 0, or if the value is too large to be a valid OID.
+ */
+ if (suffix == NULL || num <= 0 || num > OID_MAX)
+ report_backup_error(context,
+ "file \"%s\" is not expected in a tar format backup",
+ relpath);
+ tblspc_oid = (Oid) num;
+ }
+
+ /* Now, check the compression type of the tar */
+ if (strcmp(suffix, ".tar") == 0)
+ compress_algorithm = PG_COMPRESSION_NONE;
+ else if (strcmp(suffix, ".tgz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.gz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.lz4") == 0)
+ compress_algorithm = PG_COMPRESSION_LZ4;
+ else if (strcmp(suffix, ".tar.zst") == 0)
+ compress_algorithm = PG_COMPRESSION_ZSTD;
+ else
+ {
+ report_backup_error(context,
+ "file \"%s\" is not expected in a tar format backup",
+ relpath);
+ return;
+ }
+
+ /*
+ * Ignore WALs, as reading and verification will be handled through
+ * pg_waldump.
+ */
+ if (strncmp("pg_wal", relpath, 6) == 0)
+ return;
+
+ /*
+ * Append the information to the list for complete verification at a later
+ * stage.
+ */
+ tar = pg_malloc(sizeof(tar_file));
+ tar->relpath = pstrdup(relpath);
+ tar->tblspc_oid = tblspc_oid;
+ tar->compress_algorithm = compress_algorithm;
+
+ simple_ptr_list_append(tarfiles, tar);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress)
+ total_size += sb.st_size;
+}
+
+/*
+ * Verification of a single tar file content.
+ *
+ * It reads a given tar archive in predefined chunks and passes it to the
+ * streamer, which initiates routines for decompression (if necessary) and then
+ * verifies each member within the tar file.
+ */
+static void
+verify_tar_file(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ {
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ /* Report progress */
+ done_size += rc;
+ progress_report(false);
+ }
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Verify file and its size entry in the manifest.
*/
@@ -1044,6 +1329,41 @@ find_backup_format(verifier_context *context)
return result;
}
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(context->format == 't');
+
+ /*
+ * To verify the contents of the tar, the initial step is to parse its
+ * content.
+ */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /*
+ * If the tar is compressed, we must perform the appropriate decompression
+ * operation before proceeding with the verification of its contents.
+ */
+ if (compress_algo == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the global variables.
*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 80031ad4dbc..be7438af346 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
/*
@@ -123,4 +124,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
extern bool should_ignore_relpath(verifier_context *context,
const char *relpath);
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e9ebddde24d..2b155586f8c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3330,6 +3330,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
@@ -3951,6 +3952,7 @@ substitute_phv_relids_context
subxids_array_status
symbol
tablespaceinfo
+tar_file
td_entry
teSection
temp_tablespaces_extra
--
2.18.0
v14-0012-pg_verifybackup-Tests-and-document.patchapplication/x-patch; name=v14-0012-pg_verifybackup-Tests-and-document.patchDownload
From aeed7ab45f6c15abae8e4935b61c3659a9ba139e Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 12 Sep 2024 15:35:49 +0530
Subject: [PATCH v14 12/12] pg_verifybackup: Tests and document
----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
doc/src/sgml/ref/pg_verifybackup.sgml | 45 ++-
src/bin/pg_verifybackup/t/002_algorithm.pl | 34 ++-
src/bin/pg_verifybackup/t/003_corruption.pl | 267 +++++++++++++++++-
src/bin/pg_verifybackup/t/004_options.pl | 2 +-
src/bin/pg_verifybackup/t/008_untar.pl | 71 ++---
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +---
6 files changed, 352 insertions(+), 115 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..8380178ca49 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,12 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup may be stored either in the "plain" or the "tar"
+ format; this includes tar-format backups compressed with any algorithm
+ supported by <application>pg_basebackup</application>. However, at present,
+ <literal>WAL</literal> verification is supported only for plain-format
+ backups. Therefore, if the backup is stored in tar-format, the
+ <literal>-n, --no-parse-wal</literal> option should be used.
</para>
<para>
@@ -168,6 +172,43 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files. A valid backup includes the main data
+ directory in a file named <filename>base.tar</filename>, the WAL
+ files in <filename>pg_wal.tar</filename>, and separate tar files for
+ each tablespace, named after the tablespace's OID, followed by the
+ compression extension.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index fb2a1fd7c4e..4959d5bd0b9 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -14,24 +14,35 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
-for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
+sub test_checksums
{
- my $backup_path = $primary->backup_dir . '/' . $algorithm;
+ my ($format, $algorithm) = @_;
+ my $backup_path = $primary->backup_dir . '/' . $format . '/' . $algorithm;
my @backup = (
'pg_basebackup', '-D', $backup_path,
'--manifest-checksums', $algorithm, '--no-sync', '-cfast');
my @verify = ('pg_verifybackup', '-e', $backup_path);
+ if ($format eq 'tar')
+ {
+ # Add switch to get a tar-format backup
+ push @backup, ('-F', 't');
+
+ # Add switch to skip WAL verification, which is not yet supported for
+ # tar-format backups
+ push @verify, ('-n');
+ }
+
# A backup with a bogus algorithm should fail.
if ($algorithm eq 'bogus')
{
$primary->command_fails(\@backup,
- "backup fails with algorithm \"$algorithm\"");
- next;
+ "$format format backup fails with algorithm \"$algorithm\"");
+ return;
}
# A backup with a valid algorithm should work.
- $primary->command_ok(\@backup, "backup ok with algorithm \"$algorithm\"");
+ $primary->command_ok(\@backup, "$format format backup ok with algorithm \"$algorithm\"");
# We expect each real checksum algorithm to be mentioned on every line of
# the backup manifest file except the first and last; for simplicity, we
@@ -39,7 +50,7 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
# is none, we just check that the manifest exists.
if ($algorithm eq 'none')
{
- ok(-f "$backup_path/backup_manifest", "backup manifest exists");
+ ok(-f "$backup_path/backup_manifest", "$format format backup manifest exists");
}
else
{
@@ -52,10 +63,19 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
# Make sure that it verifies OK.
$primary->command_ok(\@verify,
- "verify backup with algorithm \"$algorithm\"");
+ "verify $format format backup with algorithm \"$algorithm\"");
# Remove backup immediately to save disk space.
rmtree($backup_path);
}
+# Do the check
+for my $format (qw(plain tar))
+{
+ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
+ {
+ test_checksums($format, $algorithm);
+ }
+}
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index ae91e043384..c953bbc20d8 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -11,6 +11,8 @@ use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+my $tar = $ENV{TAR};
+
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
@@ -32,62 +34,73 @@ EOM
my @scenario = (
{
'name' => 'extra_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_extra_file,
'fails_like' =>
qr/extra_file.*present on disk but not in the manifest/
},
{
'name' => 'extra_tablespace_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_extra_tablespace_file,
'fails_like' =>
qr/extra_ts_file.*present on disk but not in the manifest/
},
{
'name' => 'missing_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_missing_file,
'fails_like' =>
qr/pg_xact\/0000.*present in the manifest but not on disk/
},
{
'name' => 'missing_tablespace',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_missing_tablespace,
'fails_like' =>
qr/pg_tblspc.*present in the manifest but not on disk/
},
{
'name' => 'append_to_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_append_to_file,
'fails_like' => qr/has size \d+ on disk but size \d+ in the manifest/
},
{
'name' => 'truncate_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_truncate_file,
'fails_like' => qr/has size 0 on disk but size \d+ in the manifest/
},
{
'name' => 'replace_file',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_replace_file,
'fails_like' => qr/checksum mismatch for file/
},
{
'name' => 'system_identifier',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_system_identifier,
'fails_like' =>
qr/manifest system identifier is .*, but control file has/
},
{
'name' => 'bad_manifest',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_bad_manifest,
'fails_like' => qr/manifest checksum mismatch/
},
{
'name' => 'open_file_fails',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_open_file_fails,
'fails_like' => qr/could not open file/,
'skip_on_windows' => 1
},
{
'name' => 'open_directory_fails',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_open_directory_fails,
'cleanup' => \&cleanup_open_directory_fails,
'fails_like' => qr/could not open directory/,
@@ -95,10 +108,78 @@ my @scenario = (
},
{
'name' => 'search_directory_fails',
+ 'backup_format' => 'p',
'mutilate' => \&mutilate_search_directory_fails,
'cleanup' => \&cleanup_search_directory_fails,
'fails_like' => qr/could not stat file or directory/,
'skip_on_windows' => 1
+ },
+ {
+ 'name' => 'tar_backup_unexpected_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_extra_file,
+ 'fails_like' =>
+ qr/file "extra_file" is not expected in a tar format backup/
+ },
+ {
+ 'name' => 'tar_backup_extra_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_extra_file,
+ 'fails_like' =>
+ qr/extra_tar_member_file.*present on disk but not in the manifest/
+ },
+ {
+ 'name' => 'tar_extra_tablespace_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_extra_tablespace_file,
+ 'fails_like' =>
+ qr/extra_ts_member_file.*present on disk but not in the manifest/
+ },
+ {
+ 'name' => 'tar_backup_missing_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_missing_file,
+ 'fails_like' =>
+ qr/pg_xact\/0000.*present in the manifest but not on disk/,
+ },
+ {
+ 'name' => 'tar_missing_tablespace',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_missing_tablespace,
+ 'fails_like' =>
+ qr/pg_tblspc.*present in the manifest but not on disk/
+ },
+ {
+ 'name' => 'tar_missing_tablespace_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_missing_tablespace_file,
+ 'fails_like' =>
+ qr/pg_tblspc.*present in the manifest but not on disk/
+ },
+ {
+ 'name' => 'tar_backup_append_to_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_append_to_file,
+ 'fails_like' => qr/has size \d+ on disk but size \d+ in the manifest/,
+ },
+ {
+ 'name' => 'tar_backup_truncate_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_truncate_file,
+ 'fails_like' => qr/has size 0 on disk but size \d+ in the manifest/,
+ },
+ {
+ 'name' => 'tar_backup_replace_file',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_tar_backup_replace_file,
+ 'fails_like' => qr/checksum mismatch for file/,
+ },
+ {
+ 'name' => 'tar_backup_system_identifier',
+ 'backup_format' => 't',
+ 'mutilate' => \&mutilate_system_identifier,
+ 'fails_like' =>
+ qr/manifest system identifier is .*, but control file has/
});
for my $scenario (@scenario)
@@ -111,29 +192,42 @@ for my $scenario (@scenario)
if ($scenario->{'skip_on_windows'}
&& ($windows_os || $Config::Config{osname} eq 'cygwin'));
+ # Skip tests for tar format-backup if tar is not available.
+ skip "no tar program available", 4
+ if ($scenario->{'backup_format'} eq 't' && (!defined $tar || $tar eq ''));
+
# Take a backup and check that it verifies OK.
my $backup_path = $primary->backup_dir . '/' . $name;
my $backup_ts_path = PostgreSQL::Test::Utils::tempdir_short();
+
+ my @backup = (
+ 'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast',
+ '-T', "${source_ts_path}=${backup_ts_path}");
+ my @verify = ('pg_verifybackup', $backup_path);
+
+ if ($scenario->{'backup_format'} eq 't')
+ {
+ # Add switch to get a tar-format backup
+ push @backup, ('-F', 't');
+
+ # Add switch to skip WAL verification, which is not yet supported
+ # for tar-format backups
+ push @verify, ('-n');
+ }
+
# The tablespace map parameter confuses Msys2, which tries to mangle
# it. Tell it not to.
# See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
- $primary->command_ok(
- [
- 'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast',
- '-T', "${source_ts_path}=${backup_ts_path}"
- ],
- "base backup ok");
- command_ok([ 'pg_verifybackup', $backup_path ],
- "intact backup verified");
+
+ $primary->command_ok( \@backup, "base backup ok");
+ command_ok(\@verify, "intact backup verified");
# Mutilate the backup in some way.
$scenario->{'mutilate'}->($backup_path);
# Now check that the backup no longer verifies.
- command_fails_like(
- [ 'pg_verifybackup', $backup_path ],
- $scenario->{'fails_like'},
+ command_fails_like(\@verify, $scenario->{'fails_like'},
"corrupt backup fails verification: $name");
# Run cleanup hook, if provided.
@@ -260,6 +354,7 @@ sub mutilate_system_identifier
$backup_path . '/backup_manifest')
or BAIL_OUT "could not copy manifest to $backup_path";
$node->teardown_node(fail_ok => 1);
+ $node->clean_node();
return;
}
@@ -316,4 +411,154 @@ sub cleanup_search_directory_fails
return;
}
+# Unpack tar file, perform the specified file operation, and then repack the
+# modified content into same tar file at the same location.
+sub mutilate_base_tar
+{
+ my ($backup_path, $archive, $op) = @_;
+
+ my $tmpdir = "$backup_path/tmpdir";
+ mkdir($tmpdir) || die "$!";
+
+ # Extract the archive
+ system_or_bail($tar, '-xf', "$backup_path/$archive", '-C', "$tmpdir");
+ unlink("$backup_path/$archive") || die "$!";
+
+ if ($op eq 'add')
+ {
+ if ($archive eq 'base.tar')
+ {
+ create_extra_file($tmpdir, 'extra_tar_member_file');
+ }
+ else
+ {
+ # must be a tablespace archive
+ my ($catvdir) = grep { $_ ne '.' && $_ ne '..' }
+ slurp_dir("$tmpdir");
+ my ($tsdboid) = grep { $_ ne '.' && $_ ne '..' }
+ slurp_dir("$tmpdir/$catvdir");
+ create_extra_file($tmpdir,
+ "$catvdir/$tsdboid/extra_ts_member_file");
+ }
+ }
+ elsif ($op eq 'delete')
+ {
+ if ($archive eq 'base.tar')
+ {
+ mutilate_missing_file($tmpdir);
+ }
+ else
+ {
+ # must be a tablespace archive
+ my ($catvdir) = grep { $_ ne '.' && $_ ne '..' }
+ slurp_dir("$tmpdir");
+ my ($tsdboid) = grep { $_ ne '.' && $_ ne '..' }
+ slurp_dir("$tmpdir/$catvdir");
+ my ($reloid) = grep { $_ ne '.' && $_ ne '..' }
+ slurp_dir("$tmpdir/$catvdir/$tsdboid");
+ my $pathname = "$tmpdir/$catvdir/$tsdboid/$reloid";
+ unlink($pathname) || die "$pathname: $!";
+ }
+ }
+ elsif ($op eq 'append')
+ {
+ mutilate_append_to_file($tmpdir);
+ }
+ elsif ($op eq 'truncate')
+ {
+ mutilate_truncate_file($tmpdir);
+ }
+ elsif ($op eq 'replace')
+ {
+ mutilate_replace_file($tmpdir);
+ }
+ else
+ {
+ die "mutilate_tar_backup: \"$op\" invalid operation";
+ }
+
+
+ # Navigate to the extracted location and list the files.
+ chdir("$tmpdir") || die "$!";
+ my @files = glob("*");
+ # Repack the extracted content
+ system_or_bail($tar, '-cf', "$backup_path/$archive", @files);
+ chdir($backup_path) || die "$!";
+ rmtree("$tmpdir") || die "$!";
+}
+
+# Add a file to the main directory archive.
+sub mutilate_tar_backup_extra_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'base.tar', 'add');
+ return;
+}
+
+# Add a file to the user-defined tablespace archive.
+sub mutilate_tar_extra_tablespace_file
+{
+ my ($backup_path) = @_;
+ my ($archive) =
+ grep { $_ =~ qr/\d+\.tar/ } slurp_dir("$backup_path");
+ die "tablespace tar backup not found." unless defined $archive;
+ mutilate_base_tar($backup_path, $archive, 'add');
+ return;
+}
+
+# Remove a file from main directory archive.
+sub mutilate_tar_backup_missing_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'base.tar', 'delete');
+ return;
+}
+
+# Remove the user-defined tablespace archive.
+sub mutilate_tar_missing_tablespace
+{
+ my ($backup_path) = @_;
+ my ($archive) =
+ grep { $_ =~ qr/\d+\.tar/ } slurp_dir("$backup_path");
+ die "tablespace tar backup not found." unless defined $archive;
+ my $pathname = "$backup_path/$archive";
+ unlink($pathname) || die "$pathname: $!";
+ return;
+}
+
+# Remove the files from the user-defined tablespace archive.
+sub mutilate_tar_missing_tablespace_file
+{
+ my ($backup_path) = @_;
+ my ($archive) =
+ grep { $_ =~ qr/\d+\.tar/ } slurp_dir("$backup_path");
+ die "tablespace tar backup not found." unless defined $archive;
+ mutilate_base_tar($backup_path, $archive, 'delete');
+ return;
+}
+
+# Append an additional bytes to a file.
+sub mutilate_tar_backup_append_to_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'base.tar', 'append');
+ return;
+}
+
+# Truncate a file to zero length.
+sub mutilate_tar_backup_truncate_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'base.tar', 'truncate');
+ return;
+}
+
+# Replace a file's contents
+sub mutilate_tar_backup_replace_file
+{
+ my ($backup_path) = @_;
+ mutilate_base_tar($backup_path, 'base.tar', 'replace');
+ return;
+}
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
# Test valid manifest with nonexistent backup directory.
command_fails_like(
[
- 'pg_verifybackup', '-m',
+ 'pg_verifybackup', '-Fp', '-m',
"$backup_path/backup_manifest", "$backup_path/fake"
],
qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..e7ec8369362 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,18 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a tablespace directory.
+my $source_ts_path = PostgreSQL::Test::Utils::tempdir_short();
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$source_ts_path';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +35,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +90,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
--
2.18.0
On Thu, Sep 12, 2024 at 7:05 AM Amul Sul <sulamul@gmail.com> wrote:
The updated version attached. Thank you for the review !
I have spent a bunch of time on this and have made numerous revisions.
I hope to commit the result, aftering seeing what you and the
buildfarm think (and anyone else who wishes to offer an opinion).
Changes:
1. I adjusted some documentation wording for clarity.
2. I adjusted quite a few comments.
3. I changed the code to canonicalize pathnames taken from tar files,
so that a backup where tar file names begin with "./" doesn't break
backup verification.
4. I changed the code to use a dedicated buffer of type
ControlFileData instead of buffering the control file in bbs_buffer,
because there's no guarantee that bbs_buffer is sufficiently aligned,
which could result in failures on non-x86 platforms.
5. I changed the way that we validate the length of the control file;
the old code looked like it was checking that the file size was
sizeof(ControlFileData), but in fact the control file is much bigger
than that and its size is given by PG_CONTROL_FILE_SIZE. The old test
passed only because the computed file size was capped at
sizeof(ControlFileData), even though the actual file size was larger.
6. I fixed things so that we check that the target directory exists
before trying to figure out the backup format, so that cases where the
directory doesn't exist behave the same as before instead of failing
with a different error message.
7. I adjusted the test cases in view of point #3 and point #6.
8. I reverted various refactorings about which I earlier complained,
because they put very small amounts of code into functions which in my
opinion made the code harder to read. I also realized along the way
that (a) you hadn't updated the comments in those functions, or at
least not thoroughly, so they contained some text that was really only
applicable to the plain-format case and (b) some of the error message
really deserved to be different in the plain and tar format cases. In
particular, when there's a problem with an archive member, it seems
good to mention both the name of the archive and the name of the
archive member. Having separate code paths makes that easy and I've
done it in this version. Exception: I didn't update the messages for
failing to initialize the checksum context, because I don't think
those can happen and it doesn't really even make sense to include the
file name in the first place; any hypothetical failure would
presumably be based on which algorithm was picked, not which file you
were planning to use it on. This area could use some cleanup but it's
not this patch's job to make it less weird.
9. I rewrote 003_corruption.pl so that we apply the same tests for tar
and plain format backups without nearly as much code duplication as
you had.
10. I added a few test cases to 004_options.pl, so that we test the -F
option systematically, including what happens with an invalid value.
11. I moved the --format option to the correct place in alphabetical
order in the usage output.
I think that's everything that I changed, but I might have missed
something in putting this list together. Hopefully not.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v15-0001-pg_verifybackup-Verify-tar-format-backups.patchapplication/octet-stream; name=v15-0001-pg_verifybackup-Verify-tar-format-backups.patchDownload
From e8bf51858c4151b957775e0b17b91e3f42b02589 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 10:42:37 +0530
Subject: [PATCH v15] pg_verifybackup: Verify tar-format backups.
This also works for compressed tar-format backups. However, -n must be
used, because we use pg_waldump to verify WAL, and it doesn't yet know
how to verify WAL that is stored inside of a tarfile.
Amul Sul, reviewed by Sravan Kumar and by me, and substantially
revised by me.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 47 +-
src/bin/pg_verifybackup/Makefile | 2 +
src/bin/pg_verifybackup/astreamer_verify.c | 428 +++++++++++++++++
src/bin/pg_verifybackup/meson.build | 1 +
src/bin/pg_verifybackup/pg_verifybackup.c | 433 ++++++++++++++++--
src/bin/pg_verifybackup/pg_verifybackup.h | 7 +
src/bin/pg_verifybackup/t/002_algorithm.pl | 34 +-
src/bin/pg_verifybackup/t/003_corruption.pl | 77 +++-
src/bin/pg_verifybackup/t/004_options.pl | 17 +
src/bin/pg_verifybackup/t/008_untar.pl | 71 +--
src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-
src/fe_utils/simple_list.c | 19 +
src/include/fe_utils/simple_list.h | 1 +
src/tools/pgindent/typedefs.list | 2 +
14 files changed, 1033 insertions(+), 154 deletions(-)
create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..53341024cd2 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,12 @@ PostgreSQL documentation
integrity of a database cluster backup taken using
<command>pg_basebackup</command> against a
<literal>backup_manifest</literal> generated by the server at the time
- of the backup. The backup must be stored in the "plain"
- format; a "tar" format backup can be checked after extracting it.
+ of the backup. The backup may be stored either in the "plain" or the "tar"
+ format; this includes tar-format backups compressed with any algorithm
+ supported by <application>pg_basebackup</application>. However, at present,
+ <literal>WAL</literal> verification is supported only for plain-format
+ backups. Therefore, if the backup is stored in tar-format, the
+ <literal>-n, --no-parse-wal</literal> option should be used.
</para>
<para>
@@ -168,6 +172,45 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+ <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the format of the backup. <replaceable>format</replaceable>
+ can be one of the following:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>p</literal></term>
+ <term><literal>plain</literal></term>
+ <listitem>
+ <para>
+ Backup consists of plain files with the same layout as the
+ source server's data directory and tablespaces.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>t</literal></term>
+ <term><literal>tar</literal></term>
+ <listitem>
+ <para>
+ Backup consists of tar files, which may be compressed. A valid
+ backup includes the main data directory in a file named
+ <filename>base.tar</filename>, the WAL files in
+ <filename>pg_wal.tar</filename>, and separate tar files for
+ each tablespace, named after the tablespace's OID. If the backup
+ is compressed, the relevant compression extension is added to the
+ end of each file name.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist></para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-n</option></term>
<term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..374d4a8afd1 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,10 +17,12 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
# We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
OBJS = \
$(WIN32RES) \
+ astreamer_verify.o \
pg_verifybackup.o
all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..57072fdfe04
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,428 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Archive streamer for verification of a tar format backup (including
+ * compressed tar format backups).
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "catalog/pg_control.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+ /* These fields don't change once initialized. */
+ astreamer base;
+ verifier_context *context;
+ char *archive_name;
+ Oid tblspc_oid;
+
+ /* These fields change for each archive member. */
+ manifest_file *mfile;
+ bool verify_checksum;
+ bool verify_control_data;
+ pg_checksum_context *checksum_ctx;
+ uint64 checksum_bytes;
+ ControlFileData control_file;
+ uint64 control_file_bytes;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void member_verify_header(astreamer *streamer, astreamer_member *member);
+static void member_compute_checksum(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_checksum(astreamer *streamer);
+static void member_copy_control_data(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len);
+static void member_verify_control_data(astreamer *streamer);
+static void member_reset_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+ .content = astreamer_verify_content,
+ .finalize = astreamer_verify_finalize,
+ .free = astreamer_verify_free
+};
+
+/*
+ * Create an astreamer that can verify a tar file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+ char *archive_name, Oid tblspc_oid)
+{
+ astreamer_verify *streamer;
+
+ streamer = palloc0(sizeof(astreamer_verify));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_verify_ops;
+
+ streamer->base.bbs_next = next;
+ streamer->context = context;
+ streamer->archive_name = archive_name;
+ streamer->tblspc_oid = tblspc_oid;
+
+ if (!context->skip_checksums)
+ streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ /* Initial setup plus decide which checks to perform. */
+ member_verify_header(streamer, member);
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ /* Incremental work required to verify file contents. */
+ if (mystreamer->verify_checksum)
+ member_compute_checksum(streamer, member, data, len);
+ if (mystreamer->verify_control_data)
+ member_copy_control_data(streamer, member, data, len);
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ /* Now we've got all the file data. */
+ if (mystreamer->verify_checksum)
+ member_verify_checksum(streamer);
+ if (mystreamer->verify_control_data)
+ member_verify_control_data(streamer);
+
+ /* Reset for next archive member. */
+ member_reset_info(streamer);
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ if (mystreamer->checksum_ctx)
+ pfree(mystreamer->checksum_ctx);
+
+ pfree(streamer);
+}
+
+/*
+ * Prepare to validate the next archive member.
+ */
+static void
+member_verify_header(astreamer *streamer, astreamer_member *member)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m;
+ char pathname[MAXPGPATH];
+
+ /* We are only interested in normal files. */
+ if (member->is_directory || member->is_link)
+ return;
+
+ /*
+ * The backup manifest stores a relative path to the base directory for
+ * files belonging to a tablespace, while the tablespace backup tar
+ * archive does not include this path.
+ *
+ * The pathname taken from the tar file could contain '.' or '..'
+ * references, which we want to remove, so apply canonicalize_path(). It
+ * could also be an absolute pathname, which we want to treat as a
+ * relative path, so prepend "./" if we're not adding a tablespace prefix
+ * to make sure that canonicalize_path() does what we want.
+ */
+ if (OidIsValid(mystreamer->tblspc_oid))
+ snprintf(pathname, MAXPGPATH, "%s/%u/%s",
+ "pg_tblspc", mystreamer->tblspc_oid, member->pathname);
+ else
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+
+ /* Ignore any files that are listed in the ignore list. */
+ if (should_ignore_relpath(mystreamer->context, pathname))
+ return;
+
+ /* Check whether there's an entry in the manifest hash. */
+ m = manifest_files_lookup(mystreamer->context->manifest->files, pathname);
+ if (m == NULL)
+ {
+ report_backup_error(mystreamer->context,
+ "\"%s\" is present in \"%s\" but not in the manifest",
+ member->pathname, mystreamer->archive_name);
+ return;
+ }
+ mystreamer->mfile = m;
+
+ /* Flag this entry as having been encountered in a tar archive. */
+ m->matched = true;
+
+ /* Check that the size matches. */
+ if (m->size != member->size)
+ {
+ report_backup_error(mystreamer->context,
+ "\"%s\" has size %lld in \"%s\" but size %zu in the manifest",
+ member->pathname, (long long int) member->size,
+ mystreamer->archive_name, m->size);
+ m->bad = true;
+ return;
+ }
+
+ /*
+ * Decide whether we're going to verify the checksum for this file, and
+ * whether we're going to perform the additional validation that we do
+ * only for the control file.
+ */
+ mystreamer->verify_checksum =
+ (!mystreamer->context->skip_checksums && should_verify_checksum(m));
+ mystreamer->verify_control_data =
+ mystreamer->context->manifest->version != 1 &&
+ !m->bad && strcmp(m->pathname, "global/pg_control") == 0;
+
+ /* If we're going to verify the checksum, initial a checksum context. */
+ if (mystreamer->verify_checksum &&
+ pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "%s: could not initialize checksum of file \"%s\"",
+ mystreamer->archive_name, m->pathname);
+
+ /*
+ * Checksum verification cannot be performed without proper context
+ * initialization.
+ */
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Computes the checksum incrementally for the received file content.
+ *
+ * Should have a correctly initialized checksum_ctx, which will be used for
+ * incremental checksum computation.
+ */
+static void
+member_compute_checksum(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+ manifest_file *m = mystreamer->mfile;
+
+ Assert(mystreamer->verify_checksum);
+ Assert(m->checksum_type == checksum_ctx->type);
+
+ /*
+ * Update the total count of computed checksum bytes so that we can
+ * cross-check against the file size.
+ */
+ mystreamer->checksum_bytes += len;
+
+ /* Feed these bytes to the checksum calculation. */
+ if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "could not update checksum of file \"%s\"",
+ m->pathname);
+ mystreamer->verify_checksum = false;
+ }
+}
+
+/*
+ * Perform the final computation and checksum verification after the entire
+ * file content has been processed.
+ */
+static void
+member_verify_checksum(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_file *m = mystreamer->mfile;
+ uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+ int checksumlen;
+
+ Assert(mystreamer->verify_checksum);
+
+ /*
+ * It's unclear how this could fail, but let's check anyway to be safe.
+ */
+ if (mystreamer->checksum_bytes != m->size)
+ {
+ report_backup_error(mystreamer->context,
+ "file \"%s\" in \"%s\" should contain %zu bytes, but read %zu bytes",
+ m->pathname, mystreamer->archive_name,
+ m->size, mystreamer->checksum_bytes);
+ return;
+ }
+
+ /* Get the final checksum. */
+ checksumlen = pg_checksum_final(mystreamer->checksum_ctx, checksumbuf);
+ if (checksumlen < 0)
+ {
+ report_backup_error(mystreamer->context,
+ "could not finalize checksum of file \"%s\"",
+ m->pathname);
+ return;
+ }
+
+ /* And check it against the manifest. */
+ if (checksumlen != m->checksum_length)
+ report_backup_error(mystreamer->context,
+ "file \"%s\" in \"%s\" has checksum of length %d, but expected %d",
+ m->pathname, mystreamer->archive_name,
+ m->checksum_length, checksumlen);
+ else if (memcmp(checksumbuf, m->checksum_payload, checksumlen) != 0)
+ report_backup_error(mystreamer->context,
+ "checksum mismatch for file \"%s\" in \"%s\"",
+ m->pathname, mystreamer->archive_name);
+}
+
+/*
+ * Stores the pg_control file contents into a local buffer; we need the entire
+ * control file data for verification.
+ */
+static void
+member_copy_control_data(astreamer *streamer, astreamer_member *member,
+ const char *data, int len)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ /* Should be here only for control file */
+ Assert(mystreamer->verify_control_data);
+
+ /*
+ * Copy the new data into the control file buffer, but do not overrun the
+ * buffer. Note that the on-disk length of the control file is expected to
+ * be PG_CONTROL_FILE_SIZE, but the part that fits in our buffer is
+ * shorter, just sizeof(ControlFileData).
+ */
+ if (mystreamer->control_file_bytes <= sizeof(ControlFileData))
+ {
+ int remaining;
+
+ remaining = sizeof(ControlFileData) - mystreamer->control_file_bytes;
+ memcpy(((char *) &mystreamer->control_file)
+ + mystreamer->control_file_bytes,
+ data, Min(len, remaining));
+ }
+
+ /* Remember how many bytes we saw, even if we didn't buffer them. */
+ mystreamer->control_file_bytes += len;
+}
+
+/*
+ * Performs the CRC calculation of pg_control data and then calls the routines
+ * that execute the final verification of the control file information.
+ */
+static void
+member_verify_control_data(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+ manifest_data *manifest = mystreamer->context->manifest;
+ pg_crc32c crc;
+
+ /* Should be here only for control file */
+ Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0);
+ Assert(mystreamer->verify_control_data);
+
+ /*
+ * If the control file is not the right length, that's a big problem.
+ *
+ * NB: There is a theoretical overflow risk here from casting to int, but
+ * it isn't likely to be a real problem and this enables us to match the
+ * same format string that pg_rewind uses for this case. Perhaps both this
+ * and pg_rewind should use an unsigned 64-bit value, but for now we don't
+ * worry about it.
+ */
+ if (mystreamer->control_file_bytes != PG_CONTROL_FILE_SIZE)
+ report_fatal_error("unexpected control file size %d, expected %d",
+ (int) mystreamer->control_file_bytes,
+ PG_CONTROL_FILE_SIZE);
+
+ /* Compute the CRC. */
+ INIT_CRC32C(crc);
+ COMP_CRC32C(crc, &mystreamer->control_file,
+ offsetof(ControlFileData, crc));
+ FIN_CRC32C(crc);
+
+ /* Control file contents not meaningful if CRC is bad. */
+ if (!EQ_CRC32C(crc, mystreamer->control_file.crc))
+ report_fatal_error("%s: %s: CRC is incorrect",
+ mystreamer->archive_name,
+ mystreamer->mfile->pathname);
+
+ /* Can't interpret control file if not current version. */
+ if (mystreamer->control_file.pg_control_version != PG_CONTROL_VERSION)
+ report_fatal_error("%s: %s: unexpected control file version",
+ mystreamer->archive_name,
+ mystreamer->mfile->pathname);
+
+ /* System identifiers should match. */
+ if (manifest->system_identifier !=
+ mystreamer->control_file.system_identifier)
+ report_fatal_error("%s: %s: manifest system identifier is %llu, but control file has %llu",
+ mystreamer->archive_name,
+ mystreamer->mfile->pathname,
+ (unsigned long long) manifest->system_identifier,
+ (unsigned long long) mystreamer->control_file.system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+member_reset_info(astreamer *streamer)
+{
+ astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+ mystreamer->mfile = NULL;
+ mystreamer->verify_checksum = false;
+ mystreamer->verify_control_data = false;
+ mystreamer->checksum_bytes = 0;
+ mystreamer->control_file_bytes = 0;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..0e09d1379d1 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2024, PostgreSQL Global Development Group
pg_verifybackup_sources = files(
+ 'astreamer_verify.c',
'pg_verifybackup.c'
)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3fcfb167217..1504576303b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -22,6 +22,7 @@
#include "common/parse_manifest.h"
#include "fe_utils/simple_list.h"
#include "getopt_long.h"
+#include "limits.h"
#include "pg_verifybackup.h"
#include "pgtime.h"
@@ -44,6 +45,16 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/*
+ * Tar file information needed for content verification.
+ */
+typedef struct tar_file
+{
+ char *relpath;
+ Oid tblspc_oid;
+ pg_compress_algorithm compress_algorithm;
+} tar_file;
+
static manifest_data *parse_manifest_file(char *manifest_path);
static void verifybackup_version_cb(JsonManifestParseContext *context,
int manifest_version);
@@ -62,12 +73,18 @@ static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3) pg_attribute_noreturn();
-static void verify_backup_directory(verifier_context *context,
- char *relpath, char *fullpath);
-static void verify_backup_file(verifier_context *context,
- char *relpath, char *fullpath);
+static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_plain_backup_directory(verifier_context *context,
+ char *relpath, char *fullpath,
+ DIR *dir);
+static void verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath);
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
+static void precheck_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles);
+static void verify_tar_file(verifier_context *context, char *relpath,
+ char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
static void verify_backup_checksums(verifier_context *context);
static void verify_file_checksum(verifier_context *context,
@@ -76,6 +93,10 @@ static void verify_file_checksum(verifier_context *context,
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid,
+ pg_compress_algorithm compress_algo);
static void progress_report(bool finished);
static void usage(void);
@@ -99,6 +120,7 @@ main(int argc, char **argv)
{"exit-on-error", no_argument, NULL, 'e'},
{"ignore", required_argument, NULL, 'i'},
{"manifest-path", required_argument, NULL, 'm'},
+ {"format", required_argument, NULL, 'F'},
{"no-parse-wal", no_argument, NULL, 'n'},
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
@@ -114,6 +136,7 @@ main(int argc, char **argv)
bool quiet = false;
char *wal_directory = NULL;
char *pg_waldump_path = NULL;
+ DIR *dir;
pg_logging_init(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verifybackup"));
@@ -156,7 +179,7 @@ main(int argc, char **argv)
simple_string_list_append(&context.ignore_list, "recovery.signal");
simple_string_list_append(&context.ignore_list, "standby.signal");
- while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+ while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -175,6 +198,15 @@ main(int argc, char **argv)
manifest_path = pstrdup(optarg);
canonicalize_path(manifest_path);
break;
+ case 'F':
+ if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+ context.format = 'p';
+ else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+ context.format = 't';
+ else
+ pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+ optarg);
+ break;
case 'n':
no_parse_wal = true;
break;
@@ -264,25 +296,75 @@ main(int argc, char **argv)
context.manifest = parse_manifest_file(manifest_path);
/*
- * Now scan the files in the backup directory. At this stage, we verify
- * that every file on disk is present in the manifest and that the sizes
- * match. We also set the "matched" flag on every manifest entry that
- * corresponds to a file on disk.
+ * If the backup directory cannot be found, treat this as a fatal error.
+ */
+ dir = opendir(context.backup_directory);
+ if (dir == NULL)
+ report_fatal_error("could not open directory \"%s\": %m",
+ context.backup_directory);
+
+ /*
+ * At this point, we know that the backup directory exists, so it's now
+ * reasonable to check for files immediately inside it. Thus, before going
+ * futher, if the user did not specify the backup format, check for
+ * PG_VERSION to distinguish between tar and plain format.
*/
- verify_backup_directory(&context, NULL, context.backup_directory);
+ if (context.format == '\0')
+ {
+ struct stat sb;
+ char *path;
+
+ path = psprintf("%s/%s", context.backup_directory, "PG_VERSION");
+ if (stat(path, &sb) == 0)
+ context.format = 'p';
+ else if (errno != ENOENT)
+ {
+ pg_log_error("could not stat file \"%s\": %m", path);
+ exit(1);
+ }
+ else
+ {
+ /* No PG_VERSION, so assume tar format. */
+ context.format = 't';
+ }
+ pfree(path);
+ }
+
+ /*
+ * XXX: In the future, we should consider enhancing pg_waldump to read
+ * WAL files from an archive.
+ */
+ if (!no_parse_wal && context.format == 't')
+ {
+ pg_log_error("pg_waldump cannot read tar files");
+ pg_log_error_hint("You must use -n or --no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+
+ /*
+ * Perform the appropriate type of verification appropriate based on the
+ * backup format. This will close 'dir'.
+ */
+ if (context.format == 'p')
+ verify_plain_backup_directory(&context, NULL, context.backup_directory,
+ dir);
+ else
+ verify_tar_backup(&context, dir);
/*
* The "matched" flag should now be set on every entry in the hash table.
* Any entries for which the bit is not set are files mentioned in the
- * manifest that don't exist on disk.
+ * manifest that don't exist on disk (or in the relevant tar files).
*/
report_extra_backup_files(&context);
/*
- * Now do the expensive work of verifying file checksums, unless we were
- * told to skip it.
+ * If this is a tar-format backup, checksums were already verified above;
+ * but if it's a plain-format backup, we postpone it until this point,
+ * since the earlier checks can be performed just by knowing which files
+ * are present, without needing to read all of them.
*/
- if (!context.skip_checksums)
+ if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
/*
@@ -517,35 +599,27 @@ verifybackup_per_wal_range_cb(JsonManifestParseContext *context,
}
/*
- * Verify one directory.
+ * Verify one directory of a plain-format backup.
*
* 'relpath' is NULL if we are to verify the top-level backup directory,
* and otherwise the relative path to the directory that is to be verified.
*
* 'fullpath' is the backup directory with 'relpath' appended; i.e. the actual
* filesystem path at which it can be found.
+ *
+ * 'dir' is an open directory handle, or NULL if the caller wants us to
+ * open it. If the caller chooses to pass a handle, we'll close it when
+ * we're done with it.
*/
static void
-verify_backup_directory(verifier_context *context, char *relpath,
- char *fullpath)
+verify_plain_backup_directory(verifier_context *context, char *relpath,
+ char *fullpath, DIR *dir)
{
- DIR *dir;
struct dirent *dirent;
- dir = opendir(fullpath);
- if (dir == NULL)
+ /* Open the directory unless the caller did it. */
+ if (dir == NULL && ((dir = opendir(fullpath)) == NULL))
{
- /*
- * If even the toplevel backup directory cannot be found, treat this
- * as a fatal error.
- */
- if (relpath == NULL)
- report_fatal_error("could not open directory \"%s\": %m", fullpath);
-
- /*
- * Otherwise, treat this as a non-fatal error, but ignore any further
- * errors related to this path and anything beneath it.
- */
report_backup_error(context,
"could not open directory \"%s\": %m", fullpath);
simple_string_list_append(&context->ignore_list, relpath);
@@ -570,7 +644,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
newrelpath = psprintf("%s/%s", relpath, filename);
if (!should_ignore_relpath(context, newrelpath))
- verify_backup_file(context, newrelpath, newfullpath);
+ verify_plain_backup_file(context, newrelpath, newfullpath);
pfree(newfullpath);
pfree(newrelpath);
@@ -587,11 +661,12 @@ verify_backup_directory(verifier_context *context, char *relpath,
/*
* Verify one file (which might actually be a directory or a symlink).
*
- * The arguments to this function have the same meaning as the arguments to
- * verify_backup_directory.
+ * The arguments to this function have the same meaning as the similarly named
+ * arguments to verify_plain_backup_directory.
*/
static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_plain_backup_file(verifier_context *context, char *relpath,
+ char *fullpath)
{
struct stat sb;
manifest_file *m;
@@ -614,7 +689,7 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
/* If it's a directory, just recurse. */
if (S_ISDIR(sb.st_mode))
{
- verify_backup_directory(context, relpath, fullpath);
+ verify_plain_backup_directory(context, relpath, fullpath, NULL);
return;
}
@@ -703,6 +778,252 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
pfree(control_file);
}
+/*
+ * Verify tar backup.
+ *
+ * The caller should pass a handle to the target directory, which we will
+ * close when we're done with it.
+ */
+static void
+verify_tar_backup(verifier_context *context, DIR *dir)
+{
+ struct dirent *dirent;
+ SimplePtrList tarfiles = {NULL, NULL};
+ SimplePtrListCell *cell;
+
+ Assert(context->format != 'p');
+
+ progress_report(false);
+
+ /* First pass: scan the directory for tar files. */
+ while (errno = 0, (dirent = readdir(dir)) != NULL)
+ {
+ char *filename = dirent->d_name;
+
+ /* Skip "." and ".." */
+ if (filename[0] == '.' && (filename[1] == '\0'
+ || strcmp(filename, "..") == 0))
+ continue;
+
+ /*
+ * Unless it's something we should ignore, perform prechecks and add
+ * it to the list.
+ */
+ if (!should_ignore_relpath(context, filename))
+ {
+ char *fullpath;
+
+ fullpath = psprintf("%s/%s", context->backup_directory, filename);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ pfree(fullpath);
+ }
+ }
+
+ if (closedir(dir))
+ {
+ report_backup_error(context,
+ "could not close directory \"%s\": %m",
+ context->backup_directory);
+ return;
+ }
+
+ /* Second pass: Perform the final verification of the tar contents. */
+ for (cell = tarfiles.head; cell != NULL; cell = cell->next)
+ {
+ tar_file *tar = (tar_file *) cell->ptr;
+ astreamer *streamer;
+ char *fullpath;
+
+ /*
+ * Prepares the archive streamer stack according to the tar
+ * compression format.
+ */
+ streamer = create_archive_verifier(context,
+ tar->relpath,
+ tar->tblspc_oid,
+ tar->compress_algorithm);
+
+ /* Compute the full pathname to the target file. */
+ fullpath = psprintf("%s/%s", context->backup_directory,
+ tar->relpath);
+
+ /* Invoke the streamer for reading, decompressing, and verifying. */
+ verify_tar_file(context, tar->relpath, fullpath, streamer);
+
+ /* Cleanup. */
+ pfree(tar->relpath);
+ pfree(tar);
+ pfree(fullpath);
+
+ astreamer_finalize(streamer);
+ astreamer_free(streamer);
+ }
+ simple_ptr_list_destroy(&tarfiles);
+
+ progress_report(true);
+}
+
+/*
+ * Preparatory steps for verifying files in tar format backups.
+ *
+ * Carries out basic validation of the tar format backup file, detects the
+ * compression type, and appends that information to the tarfiles list. An
+ * error will be reported if the tar file is inaccessible, or if the file type,
+ * name, or compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_plain_backup_file. The additional argument outputs a list of valid
+ * tar files.
+ */
+static void
+precheck_tar_backup_file(verifier_context *context, char *relpath,
+ char *fullpath, SimplePtrList *tarfiles)
+{
+ struct stat sb;
+ Oid tblspc_oid = InvalidOid;
+ pg_compress_algorithm compress_algorithm;
+ tar_file *tar;
+ char *suffix = NULL;
+
+ /* Should be tar format backup */
+ Assert(context->format == 't');
+
+ /* Get file information */
+ if (stat(fullpath, &sb) != 0)
+ {
+ report_backup_error(context,
+ "could not stat file or directory \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ /* In a tar format backup, we expect only plain files. */
+ if (!S_ISREG(sb.st_mode))
+ {
+ report_backup_error(context,
+ "\"%s\" is not a plain file",
+ relpath);
+ return;
+ }
+
+ /*
+ * We expect tar files for backing up the main directory, tablespace, and
+ * pg_wal directory.
+ *
+ * pg_basebackup writes the main data directory to an archive file named
+ * base.tar, the pg_wal directory to pg_wal.tar, and the tablespace
+ * directory to <tablespaceoid>.tar, each followed by a compression type
+ * extension such as .gz, .lz4, or .zst.
+ */
+ if (strncmp("base", relpath, 4) == 0)
+ suffix = relpath + 4;
+ else if (strncmp("pg_wal", relpath, 6) == 0)
+ suffix = relpath + 6;
+ else
+ {
+ /* Expected a <tablespaceoid>.tar file here. */
+ uint64 num = strtoul(relpath, &suffix, 10);
+
+ /*
+ * Report an error if we didn't consume at least one character, if the
+ * result is 0, or if the value is too large to be a valid OID.
+ */
+ if (suffix == NULL || num <= 0 || num > OID_MAX)
+ report_backup_error(context,
+ "file \"%s\" is not expected in a tar format backup",
+ relpath);
+ tblspc_oid = (Oid) num;
+ }
+
+ /* Now, check the compression type of the tar */
+ if (strcmp(suffix, ".tar") == 0)
+ compress_algorithm = PG_COMPRESSION_NONE;
+ else if (strcmp(suffix, ".tgz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.gz") == 0)
+ compress_algorithm = PG_COMPRESSION_GZIP;
+ else if (strcmp(suffix, ".tar.lz4") == 0)
+ compress_algorithm = PG_COMPRESSION_LZ4;
+ else if (strcmp(suffix, ".tar.zst") == 0)
+ compress_algorithm = PG_COMPRESSION_ZSTD;
+ else
+ {
+ report_backup_error(context,
+ "file \"%s\" is not expected in a tar format backup",
+ relpath);
+ return;
+ }
+
+ /*
+ * Ignore WALs, as reading and verification will be handled through
+ * pg_waldump.
+ */
+ if (strncmp("pg_wal", relpath, 6) == 0)
+ return;
+
+ /*
+ * Append the information to the list for complete verification at a later
+ * stage.
+ */
+ tar = pg_malloc(sizeof(tar_file));
+ tar->relpath = pstrdup(relpath);
+ tar->tblspc_oid = tblspc_oid;
+ tar->compress_algorithm = compress_algorithm;
+
+ simple_ptr_list_append(tarfiles, tar);
+
+ /* Update statistics for progress report, if necessary */
+ if (show_progress)
+ total_size += sb.st_size;
+}
+
+/*
+ * Verification of a single tar file content.
+ *
+ * It reads a given tar archive in predefined chunks and passes it to the
+ * streamer, which initiates routines for decompression (if necessary) and then
+ * verifies each member within the tar file.
+ */
+static void
+verify_tar_file(verifier_context *context, char *relpath, char *fullpath,
+ astreamer *streamer)
+{
+ int fd;
+ int rc;
+ char *buffer;
+
+ pg_log_debug("reading \"%s\"", fullpath);
+
+ /* Open the target file. */
+ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+ {
+ report_backup_error(context, "could not open file \"%s\": %m",
+ relpath);
+ return;
+ }
+
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+ /* Perform the reads */
+ while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+ {
+ astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+ /* Report progress */
+ done_size += rc;
+ progress_report(false);
+ }
+
+ if (rc < 0)
+ report_backup_error(context, "could not read file \"%s\": %m",
+ relpath);
+
+ /* Close the file. */
+ if (close(fd) != 0)
+ report_backup_error(context, "could not close file \"%s\": %m",
+ relpath);
+}
+
/*
* Scan the hash table for entries where the 'matched' flag is not set; report
* that such files are present in the manifest but not on disk.
@@ -830,10 +1151,10 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
/*
* Double-check that we read the expected number of bytes from the file.
- * Normally, a file size mismatch would be caught in verify_backup_file
- * and this check would never be reached, but this provides additional
- * safety and clarity in the event of concurrent modifications or
- * filesystem misbehavior.
+ * Normally, mismatches would be caught in verify_plain_backup_file and
+ * this check would never be reached, but this provides additional safety
+ * and clarity in the event of concurrent modifications or filesystem
+ * misbehavior.
*/
if (bytes_read != m->size)
{
@@ -955,6 +1276,37 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
return false;
}
+/*
+ * Create a chain of archive streamers appropriate for verifying a given
+ * archive.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+ Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+ astreamer *streamer = NULL;
+
+ /* Should be here only for tar backup */
+ Assert(context->format == 't');
+
+ /* Last step is the actual verification. */
+ streamer = astreamer_verify_content_new(streamer, context, archive_name,
+ tblspc_oid);
+
+ /* Before that we must parse the tar file. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* Before that we must decompress, if archive is compressed. */
+ if (compress_algo == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compress_algo == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ return streamer;
+}
+
/*
* Print a progress report based on the global variables.
*
@@ -1010,6 +1362,7 @@ usage(void)
printf(_("Usage:\n %s [OPTION]... BACKUPDIR\n\n"), progname);
printf(_("Options:\n"));
printf(_(" -e, --exit-on-error exit immediately on error\n"));
+ printf(_(" -F, --format=p|t backup format (plain, tar)\n"));
printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n"));
printf(_(" -m, --manifest-path=PATH use specified path for manifest\n"));
printf(_(" -n, --no-parse-wal do not try to parse WAL files\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index d8c566ed587..183b1d5111b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
#include "common/hashfn_unstable.h"
#include "common/logging.h"
#include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
#include "fe_utils/simple_list.h"
/*
@@ -88,6 +89,7 @@ typedef struct verifier_context
manifest_data *manifest;
char *backup_directory;
SimpleStringList ignore_list;
+ char format; /* backup format: p(lain)/t(ar) */
bool skip_checksums;
bool exit_on_error;
bool saw_any_error;
@@ -101,4 +103,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
extern bool should_ignore_relpath(verifier_context *context,
const char *relpath);
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+ verifier_context *context,
+ char *archive_name,
+ Oid tblspc_oid);
+
#endif /* PG_VERIFYBACKUP_H */
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index fb2a1fd7c4e..4959d5bd0b9 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -14,24 +14,35 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
-for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
+sub test_checksums
{
- my $backup_path = $primary->backup_dir . '/' . $algorithm;
+ my ($format, $algorithm) = @_;
+ my $backup_path = $primary->backup_dir . '/' . $format . '/' . $algorithm;
my @backup = (
'pg_basebackup', '-D', $backup_path,
'--manifest-checksums', $algorithm, '--no-sync', '-cfast');
my @verify = ('pg_verifybackup', '-e', $backup_path);
+ if ($format eq 'tar')
+ {
+ # Add switch to get a tar-format backup
+ push @backup, ('-F', 't');
+
+ # Add switch to skip WAL verification, which is not yet supported for
+ # tar-format backups
+ push @verify, ('-n');
+ }
+
# A backup with a bogus algorithm should fail.
if ($algorithm eq 'bogus')
{
$primary->command_fails(\@backup,
- "backup fails with algorithm \"$algorithm\"");
- next;
+ "$format format backup fails with algorithm \"$algorithm\"");
+ return;
}
# A backup with a valid algorithm should work.
- $primary->command_ok(\@backup, "backup ok with algorithm \"$algorithm\"");
+ $primary->command_ok(\@backup, "$format format backup ok with algorithm \"$algorithm\"");
# We expect each real checksum algorithm to be mentioned on every line of
# the backup manifest file except the first and last; for simplicity, we
@@ -39,7 +50,7 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
# is none, we just check that the manifest exists.
if ($algorithm eq 'none')
{
- ok(-f "$backup_path/backup_manifest", "backup manifest exists");
+ ok(-f "$backup_path/backup_manifest", "$format format backup manifest exists");
}
else
{
@@ -52,10 +63,19 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
# Make sure that it verifies OK.
$primary->command_ok(\@verify,
- "verify backup with algorithm \"$algorithm\"");
+ "verify $format format backup with algorithm \"$algorithm\"");
# Remove backup immediately to save disk space.
rmtree($backup_path);
}
+# Do the check
+for my $format (qw(plain tar))
+{
+ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
+ {
+ test_checksums($format, $algorithm);
+ }
+}
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index ae91e043384..8fe911a3aec 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -5,12 +5,15 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use File::Path qw(rmtree);
use File::Copy;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+my $tar = $ENV{TAR};
+
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
@@ -34,35 +37,35 @@ my @scenario = (
'name' => 'extra_file',
'mutilate' => \&mutilate_extra_file,
'fails_like' =>
- qr/extra_file.*present on disk but not in the manifest/
+ qr/extra_file.*present (on disk|in "[^"]+") but not in the manifest/
},
{
'name' => 'extra_tablespace_file',
'mutilate' => \&mutilate_extra_tablespace_file,
'fails_like' =>
- qr/extra_ts_file.*present on disk but not in the manifest/
+ qr/extra_ts_file.*present (on disk|in "[^"]+") but not in the manifest/
},
{
'name' => 'missing_file',
'mutilate' => \&mutilate_missing_file,
'fails_like' =>
- qr/pg_xact\/0000.*present in the manifest but not on disk/
+ qr/pg_xact\/0000.*present in the manifest but not (on disk|in "[^"]+")/
},
{
'name' => 'missing_tablespace',
'mutilate' => \&mutilate_missing_tablespace,
'fails_like' =>
- qr/pg_tblspc.*present in the manifest but not on disk/
+ qr/pg_tblspc.*present in the manifest but not (on disk|in "[^"]+")/
},
{
'name' => 'append_to_file',
'mutilate' => \&mutilate_append_to_file,
- 'fails_like' => qr/has size \d+ on disk but size \d+ in the manifest/
+ 'fails_like' => qr/has size \d+ (on disk|in "[^"]+") but size \d+ in the manifest/
},
{
'name' => 'truncate_file',
'mutilate' => \&mutilate_truncate_file,
- 'fails_like' => qr/has size 0 on disk but size \d+ in the manifest/
+ 'fails_like' => qr/has size 0 (on disk|in "[^"]+") but size \d+ in the manifest/
},
{
'name' => 'replace_file',
@@ -84,21 +87,21 @@ my @scenario = (
'name' => 'open_file_fails',
'mutilate' => \&mutilate_open_file_fails,
'fails_like' => qr/could not open file/,
- 'skip_on_windows' => 1
+ 'needs_unix_permissions' => 1
},
{
'name' => 'open_directory_fails',
'mutilate' => \&mutilate_open_directory_fails,
'cleanup' => \&cleanup_open_directory_fails,
'fails_like' => qr/could not open directory/,
- 'skip_on_windows' => 1
+ 'needs_unix_permissions' => 1
},
{
'name' => 'search_directory_fails',
'mutilate' => \&mutilate_search_directory_fails,
'cleanup' => \&cleanup_search_directory_fails,
'fails_like' => qr/could not stat file or directory/,
- 'skip_on_windows' => 1
+ 'needs_unix_permissions' => 1
});
for my $scenario (@scenario)
@@ -108,7 +111,7 @@ for my $scenario (@scenario)
SKIP:
{
skip "unix-style permissions not supported on Windows", 4
- if ($scenario->{'skip_on_windows'}
+ if ($scenario->{'needs_unix_permissions'}
&& ($windows_os || $Config::Config{osname} eq 'cygwin'));
# Take a backup and check that it verifies OK.
@@ -140,7 +143,59 @@ for my $scenario (@scenario)
$scenario->{'cleanup'}->($backup_path)
if exists $scenario->{'cleanup'};
- # Finally, use rmtree to reclaim space.
+ # Turn it into a tar-format backup and see if we can still detect the
+ # same problem, unless the scenario needs UNIX permissions or we don't
+ # have a TAR program available. Note that this destructively modifies
+ # the backup directory.
+ if (! $scenario->{'needs_unix_permissions'} ||
+ !defined $tar || $tar eq '')
+ {
+ my $tar_backup_path = $primary->backup_dir . '/tar_' . $name;
+ mkdir($tar_backup_path) || die "mkdir $tar_backup_path: $!";
+
+ # tar and then remove each tablespace. We remove the original files
+ # so that they don't also end up in base.tar.
+ my @tsoid = grep { $_ ne '.' && $_ ne '..' }
+ slurp_dir("$backup_path/pg_tblspc");
+ my $cwd = getcwd;
+ for my $tsoid (@tsoid)
+ {
+ my $tspath = $backup_path . '/pg_tblspc/' . $tsoid;
+
+ chdir($tspath) || die "chdir: $!";
+ command_ok([ $tar, '-cf', "$tar_backup_path/$tsoid.tar", '.' ]);
+ chdir($cwd) || die "chdir: $!";
+ rmtree($tspath);
+ }
+
+ # tar and remove pg_wal
+ chdir($backup_path . '/pg_wal') || die "chdir: $!";
+ command_ok([ $tar, '-cf', "$tar_backup_path/pg_wal.tar", '.' ]);
+ chdir($cwd) || die "chdir: $!";
+ rmtree($backup_path . '/pg_wal');
+
+ # move the backup manifest
+ move($backup_path . '/backup_manifest',
+ $tar_backup_path . '/backup_manifest')
+ or die "could not copy manifest to $tar_backup_path";
+
+ # Construct base.tar with what's left.
+ chdir($backup_path) || die "chdir: $!";
+ command_ok([ $tar, '-cf', "$tar_backup_path/base.tar", '.' ]);
+ chdir($cwd) || die "chdir: $!";
+
+ # Now check that the backup no longer verifies. We must use -n
+ # here, because pg_waldump can't yet read WAL from a tarfile.
+ command_fails_like(
+ [ 'pg_verifybackup', '-n', $tar_backup_path ],
+ $scenario->{'fails_like'},
+ "corrupt backup fails verification: $name");
+
+ # Use rmtree to reclaim space.
+ rmtree($tar_backup_path);
+ }
+
+ # Use rmtree to reclaim space.
rmtree($backup_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..9dbb8c1a0ac 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -28,6 +28,23 @@ ok($result, "-q succeeds: exit code 0");
is($stdout, '', "-q succeeds: no stdout");
is($stderr, '', "-q succeeds: no stderr");
+# Should still work if we specify -Fp.
+$primary->command_ok(
+ [ 'pg_verifybackup', '-Fp', $backup_path ],
+ "verifies with -Fp");
+
+# Should not work if we specify -Fy because that's invalid.
+$primary->command_fails_like(
+ [ 'pg_verifybackup', '-Fy', $backup_path ],
+ qr(invalid backup format "y", must be "plain" or "tar"),
+ "does not verify with -Fy");
+
+# Should produce a lengthy list of errors; we test for just one of those.
+$primary->command_fails_like(
+ [ 'pg_verifybackup', '-Ft', '-n', $backup_path ],
+ qr("pg_multixact" is not a plain file),
+ "does not verify with -Ft -n");
+
# Test invalid options
command_fails_like(
[ 'pg_verifybackup', '--progress', '--quiet', $backup_path ],
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..e7ec8369362 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,18 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
+# Create a tablespace directory.
+my $source_ts_path = PostgreSQL::Test::Utils::tempdir_short();
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+ CREATE TABLESPACE regress_ts1 LOCATION '$source_ts_path';
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+ CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+ INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+ SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
my $backup_path = $primary->backup_dir . '/server-backup';
my $extract_path = $primary->backup_dir . '/extracted-backup';
@@ -23,39 +35,31 @@ my @test_configuration = (
{
'compression_method' => 'none',
'backup_flags' => [],
- 'backup_archive' => 'base.tar',
+ 'backup_archive' => ['base.tar', "$tsoid.tar"],
'enabled' => 1
},
{
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'server-gzip' ],
- 'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'server-lz4' ],
- 'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => [ '-d', '-m' ],
+ 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
- 'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
+ 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
});
@@ -86,47 +90,16 @@ for my $tc (@test_configuration)
my $backup_files = join(',',
sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
my $expected_backup_files =
- join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+ join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+ "verify backup, compression $method");
# Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- unlink($backup_path . '/' . $tc->{'backup_archive'});
+ rmtree($backup_path);
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => [ '--compress', 'client-gzip:5' ],
'backup_archive' => 'base.tar.gz',
- 'decompress_program' => $ENV{'GZIP_PROGRAM'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
},
{
'compression_method' => 'lz4',
'backup_flags' => [ '--compress', 'client-lz4:5' ],
'backup_archive' => 'base.tar.lz4',
- 'decompress_program' => $ENV{'LZ4'},
- 'decompress_flags' => ['-d'],
- 'output_file' => 'base.tar',
'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:5' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
},
{
'compression_method' => 'parallel zstd',
'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
'backup_archive' => 'base.tar.zst',
- 'decompress_program' => $ENV{'ZSTD'},
- 'decompress_flags' => ['-d'],
'enabled' => check_pg_config("#define USE_ZSTD 1"),
'possibly_unsupported' =>
qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
is($backup_files, $expected_backup_files,
"found expected backup files, compression $method");
- # Decompress.
- if (exists $tc->{'decompress_program'})
- {
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{ $tc->{'decompress_flags'} }
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- push @decompress, $backup_path . '/' . $tc->{'output_file'}
- if $tc->{'output_file'};
- system_or_bail(@decompress);
- }
-
- SKIP:
- {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accommodate various odd
- # cases. If tar doesn't work the init_from_backup below will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
-
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
-
- # Verify.
- $primary->command_ok(
- [
- 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest",
- '-e', $extract_path
- ],
- "verify backup, compression $method");
- }
+ # Verify tar backup.
+ $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+ "verify backup, compression $method");
# Cleanup.
rmtree($extract_path);
diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..c07e6bd9180 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,22 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
list->head = cell;
list->tail = cell;
}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+ SimplePtrListCell *cell;
+
+ cell = list->head;
+ while (cell != NULL)
+ {
+ SimplePtrListCell *next;
+
+ next = cell->next;
+ pg_free(cell);
+ cell = next;
+ }
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..c83ab6f77e4 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,6 @@ extern void simple_string_list_destroy(SimpleStringList *list);
extern const char *simple_string_list_not_touched(SimpleStringList *list);
extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
#endif /* SIMPLE_LIST_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b6135f03479..5fabb127d7e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3336,6 +3336,7 @@ astreamer_plain_writer
astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
+astreamer_verify
astreamer_zstd_frame
bgworker_main_type
bh_node_type
@@ -3957,6 +3958,7 @@ substitute_phv_relids_context
subxids_array_status
symbol
tablespaceinfo
+tar_file
td_entry
teSection
temp_tablespaces_extra
--
2.39.3 (Apple Git-145)
On Thu, Sep 26, 2024 at 12:18 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 12, 2024 at 7:05 AM Amul Sul <sulamul@gmail.com> wrote:
The updated version attached. Thank you for the review !
I have spent a bunch of time on this and have made numerous revisions.
I hope to commit the result, aftering seeing what you and the
buildfarm think (and anyone else who wishes to offer an opinion).
Changes:
Thank you, Robert. The code changes look much better now.
A few minor comments:
+ each tablespace, named after the tablespace's OID. If the backup
+ is compressed, the relevant compression extension is added to the
+ end of each file name.
I am a bit unsure about the last line, especially the use of the word
"added." I feel like it's implying that we're adding something, which
isn't true.
--
Typo: futher
--
The addition of simple_ptr_list_destroy will be part of a separate
commit, correct?
Regards,
Amul
On Fri, Sep 27, 2024 at 2:07 AM Amul Sul <sulamul@gmail.com> wrote:
Thank you, Robert. The code changes look much better now.
Cool.
A few minor comments:
+ each tablespace, named after the tablespace's OID. If the backup + is compressed, the relevant compression extension is added to the + end of each file name.I am a bit unsure about the last line, especially the use of the word
"added." I feel like it's implying that we're adding something, which
isn't true.
If you add .gz to the end of 16904.tar, you get 16904.tar.gz. This
seems like correct English to me.
Typo: futher
OK, thanks.
The addition of simple_ptr_list_destroy will be part of a separate
commit, correct?
To me, it doesn't seem worth splitting that out into a separate commit.
--
Robert Haas
EDB: http://www.enterprisedb.com
The 32-bit buildfarm animals don't like this too much [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2024-09-28%2006%3A03%3A02:
astreamer_verify.c: In function 'member_verify_checksum':
astreamer_verify.c:297:68: error: format '%zu' expects argument of type 'size_t', but argument 6 has type 'uint64' {aka 'long long unsigned int'} [-Werror=format=]
297 | "file \\"%s\\" in \\"%s\\" should contain %zu bytes, but read %zu bytes",
| ~~^
| |
| unsigned int
| %llu
298 | m->pathname, mystreamer->archive_name,
299 | m->size, mystreamer->checksum_bytes);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| uint64 {aka long long unsigned int}
Now, manifest_file.size is in fact a size_t, so %zu is the correct
format spec for it. But astreamer_verify.checksum_bytes is declared
uint64. This leads me to question whether size_t was the correct
choice for manifest_file.size. If it is, is it OK to use size_t
for checksum_bytes as well? If not, your best bet is likely
to write %lld and add an explicit cast to long long, as we do in
dozens of other places. I see that these messages are intended to be
translatable, so INT64_FORMAT is not usable here.
Aside from that, I'm unimpressed with expending a five-line comment
at line 376 to justify casting control_file_bytes to int, when you
could similarly cast it to long long, avoiding the need to justify
something that's not even in tune with project style.
regards, tom lane
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2024-09-28%2006%3A03%3A02
Piling on a bit ... Coverity reported the following issues in
this new code. I have not analyzed them to see if they're
real problems.
________________________________________________________________________________________________________
*** CID 1620458: Resource leaks (RESOURCE_LEAK)
/srv/coverity/git/pgsql-git/postgresql/src/bin/pg_verifybackup/pg_verifybackup.c: 1025 in verify_tar_file()
1019 relpath);
1020
1021 /* Close the file. */
1022 if (close(fd) != 0)
1023 report_backup_error(context, "could not close file \"%s\": %m",
1024 relpath);
CID 1620458: Resource leaks (RESOURCE_LEAK)
Variable "buffer" going out of scope leaks the storage it points to.
1025 }
1026
1027 /*
1028 * Scan the hash table for entries where the 'matched' flag is not set; report
1029 * that such files are present in the manifest but not on disk.
1030 */
________________________________________________________________________________________________________
*** CID 1620457: Memory - illegal accesses (OVERRUN)
/srv/coverity/git/pgsql-git/postgresql/src/bin/pg_verifybackup/astreamer_verify.c: 349 in member_copy_control_data()
343 */
344 if (mystreamer->control_file_bytes <= sizeof(ControlFileData))
345 {
346 int remaining;
347
348 remaining = sizeof(ControlFileData) - mystreamer->control_file_bytes;
CID 1620457: Memory - illegal accesses (OVERRUN)
Overrunning array of 296 bytes at byte offset 296 by dereferencing pointer "(char *)&mystreamer->control_file + mystreamer->control_file_bytes".
349 memcpy(((char *) &mystreamer->control_file)
350 + mystreamer->control_file_bytes,
351 data, Min(len, remaining));
352 }
353
354 /* Remember how many bytes we saw, even if we didn't buffer them. */
________________________________________________________________________________________________________
*** CID 1620456: Null pointer dereferences (FORWARD_NULL)
/srv/coverity/git/pgsql-git/postgresql/src/bin/pg_verifybackup/pg_verifybackup.c: 939 in precheck_tar_backup_file()
933 "file \"%s\" is not expected in a tar format backup",
934 relpath);
935 tblspc_oid = (Oid) num;
936 }
937
938 /* Now, check the compression type of the tar */
CID 1620456: Null pointer dereferences (FORWARD_NULL)
Passing null pointer "suffix" to "strcmp", which dereferences it.
939 if (strcmp(suffix, ".tar") == 0)
940 compress_algorithm = PG_COMPRESSION_NONE;
941 else if (strcmp(suffix, ".tgz") == 0)
942 compress_algorithm = PG_COMPRESSION_GZIP;
943 else if (strcmp(suffix, ".tar.gz") == 0)
944 compress_algorithm = PG_COMPRESSION_GZIP;
regards, tom lane
On Sat, Sep 28, 2024 at 6:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Now, manifest_file.size is in fact a size_t, so %zu is the correct
format spec for it. But astreamer_verify.checksum_bytes is declared
uint64. This leads me to question whether size_t was the correct
choice for manifest_file.size. If it is, is it OK to use size_t
for checksum_bytes as well? If not, your best bet is likely
to write %lld and add an explicit cast to long long, as we do in
dozens of other places. I see that these messages are intended to be
translatable, so INT64_FORMAT is not usable here.
It seems that manifest_file.size is size_t because parse_manifest.h is
using size_t for json_manifest_per_file_callback. What's happening is
that json_manifest_finalize_file() is parsing the file size
information out of the manifest. It uses strtoul to do that and
assigns the result to a size_t. I don't think I had any principled
reason for making that decision; size_t is, I think, the size of an
object in memory, and this is not that. This is just a string in a
JSON file that represents an integer which will hopefully turn out to
be the size of the file on disk. I guess I don't know what type I
should be using here. Most things in PostgreSQL use a type like uint32
or uint64, but technically what we're going to be comparing against in
the end is probably an off_t produced by stat(), but the return value
of strtoul() or strtoull() is unsigned long or unsigned long long,
which is not any of those things. If you have a suggestion here, I'm
all ears.
Aside from that, I'm unimpressed with expending a five-line comment
at line 376 to justify casting control_file_bytes to int, when you
could similarly cast it to long long, avoiding the need to justify
something that's not even in tune with project style.
I don't know what you mean by this. The comment explains that I used
%d here because that's what pg_rewind does, and %d corresponds to int,
not long long. If you think it should be some other way, you can
change it, and perhaps you'd like to change pg_rewind to match while
you're at it. But the fact that there's a comment here explaining the
reasoning is a feature, not a bug. It's weird to me to get criticized
for failing to follow project style when I literally copied something
that already exists.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sun, Sep 29, 2024 at 1:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
*** CID 1620458: Resource leaks (RESOURCE_LEAK)
/srv/coverity/git/pgsql-git/postgresql/src/bin/pg_verifybackup/pg_verifybackup.c: 1025 in verify_tar_file()
1019 relpath);
1020
1021 /* Close the file. */
1022 if (close(fd) != 0)
1023 report_backup_error(context, "could not close file \"%s\": %m",
1024 relpath);CID 1620458: Resource leaks (RESOURCE_LEAK)
Variable "buffer" going out of scope leaks the storage it points to.1025 }
1026
1027 /*
1028 * Scan the hash table for entries where the 'matched' flag is not set; report
1029 * that such files are present in the manifest but not on disk.
1030 */
This looks like a real leak. It can only happen once per tarfile when
verifying a tar backup so it can never add up to much, but it makes
sense to fix it.
*** CID 1620457: Memory - illegal accesses (OVERRUN)
/srv/coverity/git/pgsql-git/postgresql/src/bin/pg_verifybackup/astreamer_verify.c: 349 in member_copy_control_data()
343 */
344 if (mystreamer->control_file_bytes <= sizeof(ControlFileData))
345 {
346 int remaining;
347
348 remaining = sizeof(ControlFileData) - mystreamer->control_file_bytes;CID 1620457: Memory - illegal accesses (OVERRUN)
Overrunning array of 296 bytes at byte offset 296 by dereferencing pointer "(char *)&mystreamer->control_file + mystreamer->control_file_bytes".349 memcpy(((char *) &mystreamer->control_file)
350 + mystreamer->control_file_bytes,
351 data, Min(len, remaining));
352 }
353
354 /* Remember how many bytes we saw, even if we didn't buffer them. */
I think this might be complaining about a potential zero-length copy.
Seems like perhaps the <= sizeof(ControlFileData) test should actually
be < sizeof(ControlFileData).
*** CID 1620456: Null pointer dereferences (FORWARD_NULL)
/srv/coverity/git/pgsql-git/postgresql/src/bin/pg_verifybackup/pg_verifybackup.c: 939 in precheck_tar_backup_file()
933 "file \"%s\" is not expected in a tar format backup",
934 relpath);
935 tblspc_oid = (Oid) num;
936 }
937
938 /* Now, check the compression type of the tar */CID 1620456: Null pointer dereferences (FORWARD_NULL)
Passing null pointer "suffix" to "strcmp", which dereferences it.939 if (strcmp(suffix, ".tar") == 0)
940 compress_algorithm = PG_COMPRESSION_NONE;
941 else if (strcmp(suffix, ".tgz") == 0)
942 compress_algorithm = PG_COMPRESSION_GZIP;
943 else if (strcmp(suffix, ".tar.gz") == 0)
944 compress_algorithm = PG_COMPRESSION_GZIP;
This one is happening, I believe, because report_backup_error()
doesn't perform a non-local exit, but we have a bit of code here that
acts like it does.
Patch attached.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
0001-Fix-issues-reported-by-Coverity.patchapplication/octet-stream; name=0001-Fix-issues-reported-by-Coverity.patchDownload
From d1a925bf02d1f4cb4d4f736ef32b4c821dec976d Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 30 Sep 2024 11:16:15 -0400
Subject: [PATCH] Fix issues reported by Coverity.
---
src/bin/pg_verifybackup/astreamer_verify.c | 2 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
index 57072fdfe04..eb17dfbd95f 100644
--- a/src/bin/pg_verifybackup/astreamer_verify.c
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -341,7 +341,7 @@ member_copy_control_data(astreamer *streamer, astreamer_member *member,
* be PG_CONTROL_FILE_SIZE, but the part that fits in our buffer is
* shorter, just sizeof(ControlFileData).
*/
- if (mystreamer->control_file_bytes <= sizeof(ControlFileData))
+ if (mystreamer->control_file_bytes < sizeof(ControlFileData))
{
int remaining;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index a9d41a6b838..32467a1ba09 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -929,9 +929,12 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* result is 0, or if the value is too large to be a valid OID.
*/
if (suffix == NULL || num <= 0 || num > OID_MAX)
+ {
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
relpath);
+ return;
+ }
tblspc_oid = (Oid) num;
}
@@ -1014,6 +1017,8 @@ verify_tar_file(verifier_context *context, char *relpath, char *fullpath,
progress_report(false);
}
+ pg_free(buffer);
+
if (rc < 0)
report_backup_error(context, "could not read file \"%s\": %m",
relpath);
--
2.39.3 (Apple Git-145)
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Sep 28, 2024 at 6:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Now, manifest_file.size is in fact a size_t, so %zu is the correct
format spec for it. But astreamer_verify.checksum_bytes is declared
uint64. This leads me to question whether size_t was the correct
choice for manifest_file.size.
It seems that manifest_file.size is size_t because parse_manifest.h is
using size_t for json_manifest_per_file_callback. What's happening is
that json_manifest_finalize_file() is parsing the file size
information out of the manifest. It uses strtoul to do that and
assigns the result to a size_t. I don't think I had any principled
reason for making that decision; size_t is, I think, the size of an
object in memory, and this is not that.
Correct, size_t is defined to measure in-memory object sizes. It's
the argument type of malloc(), the result type of sizeof(), etc.
It does not restrict the size of disk files.
This is just a string in a
JSON file that represents an integer which will hopefully turn out to
be the size of the file on disk. I guess I don't know what type I
should be using here. Most things in PostgreSQL use a type like uint32
or uint64, but technically what we're going to be comparing against in
the end is probably an off_t produced by stat(), but the return value
of strtoul() or strtoull() is unsigned long or unsigned long long,
which is not any of those things. If you have a suggestion here, I'm
all ears.
I don't know if it's realistic to expect that this code might be used
to process JSON blobs exceeding 4GB. But if it is, I'd be inclined to
use uint64 and strtoull for these purposes, if only to avoid
cross-platform hazards with varying sizeof(long) and sizeof(size_t).
Um, wait ... we do have strtou64(), so you should use that.
Aside from that, I'm unimpressed with expending a five-line comment
at line 376 to justify casting control_file_bytes to int,
I don't know what you mean by this.
I mean that we have a widely-used, better solution. If you copied
this from someplace else, the someplace else could stand to be
improved too.
regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Sep 29, 2024 at 1:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
CID 1620458: Resource leaks (RESOURCE_LEAK)
Variable "buffer" going out of scope leaks the storage it points to.
This looks like a real leak. It can only happen once per tarfile when
verifying a tar backup so it can never add up to much, but it makes
sense to fix it.
+1
CID 1620457: Memory - illegal accesses (OVERRUN)
Overrunning array of 296 bytes at byte offset 296 by dereferencing pointer "(char *)&mystreamer->control_file + mystreamer->control_file_bytes".
I think this might be complaining about a potential zero-length copy.
Seems like perhaps the <= sizeof(ControlFileData) test should actually
be < sizeof(ControlFileData).
That's clearly an improvement, but I was wondering if we should also
change "len" and "remaining" to be unsigned (probably size_t).
Coverity might be unhappy about the off-the-end array reference,
but perhaps it's also concerned about what happens if len is negative.
CID 1620456: Null pointer dereferences (FORWARD_NULL)
Passing null pointer "suffix" to "strcmp", which dereferences it.
This one is happening, I believe, because report_backup_error()
doesn't perform a non-local exit, but we have a bit of code here that
acts like it does.
Check.
Patch attached.
WFM, modulo the suggestion about changing data types.
regards, tom lane
On Mon, Sep 30, 2024 at 11:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This is just a string in a
JSON file that represents an integer which will hopefully turn out to
be the size of the file on disk. I guess I don't know what type I
should be using here. Most things in PostgreSQL use a type like uint32
or uint64, but technically what we're going to be comparing against in
the end is probably an off_t produced by stat(), but the return value
of strtoul() or strtoull() is unsigned long or unsigned long long,
which is not any of those things. If you have a suggestion here, I'm
all ears.I don't know if it's realistic to expect that this code might be used
to process JSON blobs exceeding 4GB. But if it is, I'd be inclined to
use uint64 and strtoull for these purposes, if only to avoid
cross-platform hazards with varying sizeof(long) and sizeof(size_t).Um, wait ... we do have strtou64(), so you should use that.
The thing we should be worried about is not how large a JSON blob
might be, but rather how large any file that appears in the data
directory might be. So uint32 is out; and I think I hear you voting
for uint64 over size_t. But then how do you think we should print
that? Cast to unsigned long long and use %llu?
Aside from that, I'm unimpressed with expending a five-line comment
at line 376 to justify casting control_file_bytes to int,I don't know what you mean by this.
I mean that we have a widely-used, better solution. If you copied
this from someplace else, the someplace else could stand to be
improved too.
I don't understand what you think the widely-used, better solution is
here. As far as I can see, there are two goods here, between which one
must choose. One can decide to use the same error message string, and
I hope we can agree that's good, because I've been criticized in the
past when I have done otherwise, as have many others. The other good
is to use the most appropriate data type. One cannot have both of
those things in this instance, unless one goes and fixes the other
code also, but such a change had no business being part of this patch.
If the issue had been serious and likely to occur in real life, I
would have probably fixed it in a preparatory patch, but it isn't, so
I settled for adding a comment.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Sep 30, 2024 at 11:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
WFM, modulo the suggestion about changing data types.
I would prefer not to make the data type change here because it has
quite a few tentacles. If I change member_copy_control_data() then I
have to change astreamer_verify_content() which means changing the
astreamer interface which means adjusting all of the other astreamers.
That can certainly be done, but it's quite possible it might provoke
some other Coverity warning. Since this is a length, it might've been
better to use an unsigned data type, but there's no reason that I can
see why it should be size_t specifically: the origin of the value
could be either the return value of read(), which is ssize_t not
size_t, or the number of bytes returned by a decompression library or
the number of bytes present in a protocol message. Trying to make
things fit better here is just likely to make them fit worse someplace
else.
"You are in a maze of twisty little data types, all alike."
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Sep 30, 2024 at 11:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Um, wait ... we do have strtou64(), so you should use that.
The thing we should be worried about is not how large a JSON blob
might be, but rather how large any file that appears in the data
directory might be. So uint32 is out; and I think I hear you voting
for uint64 over size_t.
Yes. size_t might only be 32 bits.
But then how do you think we should print
that? Cast to unsigned long long and use %llu?
Our two standard solutions are to do that or to use UINT64_FORMAT.
But UINT64_FORMAT is problematic in translatable strings because
then the .po files would become platform-specific, so long long
is what to use in that case. For a non-translated format string
you can do either.
I don't understand what you think the widely-used, better solution is
here.
What we just said above.
regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Sep 30, 2024 at 11:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
WFM, modulo the suggestion about changing data types.
I would prefer not to make the data type change here because it has
quite a few tentacles.
I see your point for the function's "len" argument, but perhaps it's
worth doing
- int remaining;
+ size_t remaining;
remaining = sizeof(ControlFileData) - mystreamer->control_file_bytes;
memcpy(((char *) &mystreamer->control_file)
+ mystreamer->control_file_bytes,
- data, Min(len, remaining));
+ data, Min((size_t) len, remaining));
This is enough to ensure the Min() remains safe.
regards, tom lane
On Mon, Sep 30, 2024 at 6:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Sep 30, 2024 at 11:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
WFM, modulo the suggestion about changing data types.
I would prefer not to make the data type change here because it has
quite a few tentacles.I see your point for the function's "len" argument, but perhaps it's
worth doing- int remaining; + size_t remaining;remaining = sizeof(ControlFileData) - mystreamer->control_file_bytes; memcpy(((char *) &mystreamer->control_file) + mystreamer->control_file_bytes, - data, Min(len, remaining)); + data, Min((size_t) len, remaining));This is enough to ensure the Min() remains safe.
OK, done!
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Sep 30, 2024 at 6:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
But then how do you think we should print
that? Cast to unsigned long long and use %llu?Our two standard solutions are to do that or to use UINT64_FORMAT.
But UINT64_FORMAT is problematic in translatable strings because
then the .po files would become platform-specific, so long long
is what to use in that case. For a non-translated format string
you can do either.
Here is an attempt at cleaning this up. I'm far from convinced that
it's fully correct; my local compiler (clang version 15.0.7) doesn't
seem fussed about conflating size_t with uint64, not even with -Wall
-Werror. I don't suppose you have a fussier compiler locally that you
can use to test this?
I don't understand what you think the widely-used, better solution is
here.What we just said above.
Respectfully, if you'd just said in your first email about this "I
understand that you were trying to be consistent with a format string
somewhere else, but I don't think that's a good reason to do it this
way, so please use %llu and insert a cast," I would have just said
"fine, no problem" and I wouldn't have been irritated at all. But you
seem determined to deny the existence of the argument that I made
instead of just disagreeing with it, and that's actually pretty
frustrating. I feel like you've wasted my time and your own to no
purpose, and made me feel stupid in the process, over something that
barely even matters. Anyone who has a control file bigger than 2GB has
... a lot of issues.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v1-0001-Try-to-use-uint64-to-size_t-for-file-sizes.patchapplication/octet-stream; name=v1-0001-Try-to-use-uint64-to-size_t-for-file-sizes.patchDownload
From d0f79c4a9cf980a564a68cbc61a4100cab8ad28c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 1 Oct 2024 10:23:27 -0400
Subject: [PATCH v1] Try to use uint64 to size_t for file sizes.
---
src/bin/pg_combinebackup/load_manifest.c | 4 ++--
src/bin/pg_combinebackup/load_manifest.h | 2 +-
src/bin/pg_combinebackup/write_manifest.c | 5 +++--
src/bin/pg_combinebackup/write_manifest.h | 2 +-
src/bin/pg_verifybackup/astreamer_verify.c | 13 ++++++++-----
src/bin/pg_verifybackup/pg_verifybackup.c | 16 +++++++++-------
src/bin/pg_verifybackup/pg_verifybackup.h | 2 +-
src/common/parse_manifest.c | 4 ++--
src/include/common/parse_manifest.h | 2 +-
9 files changed, 28 insertions(+), 22 deletions(-)
diff --git a/src/bin/pg_combinebackup/load_manifest.c b/src/bin/pg_combinebackup/load_manifest.c
index be8e6273fcb..3a3ad6c2474 100644
--- a/src/bin/pg_combinebackup/load_manifest.c
+++ b/src/bin/pg_combinebackup/load_manifest.c
@@ -60,7 +60,7 @@ static void combinebackup_version_cb(JsonManifestParseContext *context,
static void combinebackup_system_identifier_cb(JsonManifestParseContext *context,
uint64 manifest_system_identifier);
static void combinebackup_per_file_cb(JsonManifestParseContext *context,
- const char *pathname, size_t size,
+ const char *pathname, uint64 size,
pg_checksum_type checksum_type,
int checksum_length,
uint8 *checksum_payload);
@@ -267,7 +267,7 @@ combinebackup_system_identifier_cb(JsonManifestParseContext *context,
*/
static void
combinebackup_per_file_cb(JsonManifestParseContext *context,
- const char *pathname, size_t size,
+ const char *pathname, uint64 size,
pg_checksum_type checksum_type,
int checksum_length, uint8 *checksum_payload)
{
diff --git a/src/bin/pg_combinebackup/load_manifest.h b/src/bin/pg_combinebackup/load_manifest.h
index a96ae12eb8e..8e657179af0 100644
--- a/src/bin/pg_combinebackup/load_manifest.h
+++ b/src/bin/pg_combinebackup/load_manifest.h
@@ -23,7 +23,7 @@ typedef struct manifest_file
{
uint32 status; /* hash status */
const char *pathname;
- size_t size;
+ uint64 size;
pg_checksum_type checksum_type;
int checksum_length;
uint8 *checksum_payload;
diff --git a/src/bin/pg_combinebackup/write_manifest.c b/src/bin/pg_combinebackup/write_manifest.c
index 369d6d2071c..6fea07e7c64 100644
--- a/src/bin/pg_combinebackup/write_manifest.c
+++ b/src/bin/pg_combinebackup/write_manifest.c
@@ -74,7 +74,7 @@ create_manifest_writer(char *directory, uint64 system_identifier)
*/
void
add_file_to_manifest(manifest_writer *mwriter, const char *manifest_path,
- size_t size, time_t mtime,
+ uint64 size, time_t mtime,
pg_checksum_type checksum_type,
int checksum_length,
uint8 *checksum_payload)
@@ -104,7 +104,8 @@ add_file_to_manifest(manifest_writer *mwriter, const char *manifest_path,
appendStringInfoString(&mwriter->buf, "\", ");
}
- appendStringInfo(&mwriter->buf, "\"Size\": %zu, ", size);
+ appendStringInfo(&mwriter->buf, "\"Size\": %llu, ",
+ (unsigned long long) size);
appendStringInfoString(&mwriter->buf, "\"Last-Modified\": \"");
enlargeStringInfo(&mwriter->buf, 128);
diff --git a/src/bin/pg_combinebackup/write_manifest.h b/src/bin/pg_combinebackup/write_manifest.h
index ebc4f9441ad..d2becaba1f9 100644
--- a/src/bin/pg_combinebackup/write_manifest.h
+++ b/src/bin/pg_combinebackup/write_manifest.h
@@ -23,7 +23,7 @@ extern manifest_writer *create_manifest_writer(char *directory,
uint64 system_identifier);
extern void add_file_to_manifest(manifest_writer *mwriter,
const char *manifest_path,
- size_t size, time_t mtime,
+ uint64 size, time_t mtime,
pg_checksum_type checksum_type,
int checksum_length,
uint8 *checksum_payload);
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
index f7ecdc1f655..a442b2849fc 100644
--- a/src/bin/pg_verifybackup/astreamer_verify.c
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -207,9 +207,11 @@ member_verify_header(astreamer *streamer, astreamer_member *member)
if (m->size != member->size)
{
report_backup_error(mystreamer->context,
- "\"%s\" has size %lld in \"%s\" but size %zu in the manifest",
- member->pathname, (long long int) member->size,
- mystreamer->archive_name, m->size);
+ "\"%s\" has size %llu in \"%s\" but size %llu in the manifest",
+ member->pathname,
+ (unsigned long long) member->size,
+ mystreamer->archive_name,
+ (unsigned long long) m->size);
m->bad = true;
return;
}
@@ -294,9 +296,10 @@ member_verify_checksum(astreamer *streamer)
if (mystreamer->checksum_bytes != m->size)
{
report_backup_error(mystreamer->context,
- "file \"%s\" in \"%s\" should contain %zu bytes, but read %zu bytes",
+ "file \"%s\" in \"%s\" should contain %llu bytes, but read %llu bytes",
m->pathname, mystreamer->archive_name,
- m->size, mystreamer->checksum_bytes);
+ (unsigned long long) m->size,
+ (unsigned long long) mystreamer->checksum_bytes);
return;
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 32467a1ba09..0719cb89783 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -61,7 +61,7 @@ static void verifybackup_version_cb(JsonManifestParseContext *context,
static void verifybackup_system_identifier(JsonManifestParseContext *context,
uint64 manifest_system_identifier);
static void verifybackup_per_file_cb(JsonManifestParseContext *context,
- const char *pathname, size_t size,
+ const char *pathname, uint64 size,
pg_checksum_type checksum_type,
int checksum_length,
uint8 *checksum_payload);
@@ -547,7 +547,7 @@ verifybackup_system_identifier(JsonManifestParseContext *context,
*/
static void
verifybackup_per_file_cb(JsonManifestParseContext *context,
- const char *pathname, size_t size,
+ const char *pathname, uint64 size,
pg_checksum_type checksum_type,
int checksum_length, uint8 *checksum_payload)
{
@@ -719,8 +719,9 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
if (m->size != sb.st_size)
{
report_backup_error(context,
- "\"%s\" has size %lld on disk but size %zu in the manifest",
- relpath, (long long int) sb.st_size, m->size);
+ "\"%s\" has size %llu on disk but size %llu in the manifest",
+ relpath, (unsigned long long) sb.st_size,
+ (unsigned long long) m->size);
m->bad = true;
}
@@ -1101,7 +1102,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
const char *relpath = m->pathname;
int fd;
int rc;
- size_t bytes_read = 0;
+ uint64 bytes_read = 0;
uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH];
int checksumlen;
@@ -1164,8 +1165,9 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
if (bytes_read != m->size)
{
report_backup_error(context,
- "file \"%s\" should contain %zu bytes, but read %zu bytes",
- relpath, m->size, bytes_read);
+ "file \"%s\" should contain %llu bytes, but read %llu bytes",
+ relpath, (unsigned long long) m->size,
+ (unsigned long long) bytes_read);
return;
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 183b1d5111b..2f864fb0f3f 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -29,7 +29,7 @@ typedef struct manifest_file
{
uint32 status; /* hash status */
const char *pathname;
- size_t size;
+ uint64 size;
pg_checksum_type checksum_type;
int checksum_length;
uint8 *checksum_payload;
diff --git a/src/common/parse_manifest.c b/src/common/parse_manifest.c
index 5a7b491e9a9..ad2d0fd808f 100644
--- a/src/common/parse_manifest.c
+++ b/src/common/parse_manifest.c
@@ -650,7 +650,7 @@ static void
json_manifest_finalize_file(JsonManifestParseState *parse)
{
JsonManifestParseContext *context = parse->context;
- size_t size;
+ uint64 size;
char *ep;
int checksum_string_length;
pg_checksum_type checksum_type;
@@ -688,7 +688,7 @@ json_manifest_finalize_file(JsonManifestParseState *parse)
}
/* Parse size. */
- size = strtoul(parse->size, &ep, 10);
+ size = strtou64(parse->size, &ep, 10);
if (*ep)
json_manifest_parse_failure(parse->context,
"file size is not an integer");
diff --git a/src/include/common/parse_manifest.h b/src/include/common/parse_manifest.h
index ee571a568a1..1b8bc447e44 100644
--- a/src/include/common/parse_manifest.h
+++ b/src/include/common/parse_manifest.h
@@ -28,7 +28,7 @@ typedef void (*json_manifest_system_identifier_callback) (JsonManifestParseConte
uint64 manifest_system_identifier);
typedef void (*json_manifest_per_file_callback) (JsonManifestParseContext *,
const char *pathname,
- size_t size, pg_checksum_type checksum_type,
+ uint64 size, pg_checksum_type checksum_type,
int checksum_length, uint8 *checksum_payload);
typedef void (*json_manifest_per_wal_range_callback) (JsonManifestParseContext *,
TimeLineID tli,
--
2.39.3 (Apple Git-145)
Robert Haas <robertmhaas@gmail.com> writes:
Here is an attempt at cleaning this up. I'm far from convinced that
it's fully correct; my local compiler (clang version 15.0.7) doesn't
seem fussed about conflating size_t with uint64, not even with -Wall
-Werror. I don't suppose you have a fussier compiler locally that you
can use to test this?
Sure, I can try it on mamba's host. It's slow though ...
Respectfully, if you'd just said in your first email about this "I
understand that you were trying to be consistent with a format string
somewhere else, but I don't think that's a good reason to do it this
way, so please use %llu and insert a cast," I would have just said
"fine, no problem" and I wouldn't have been irritated at all.
I apologize for rubbing you the wrong way on this. It was not my
intent. (But, in fact, I had not realized you copied that code
from somewhere else.)
regards, tom lane
On Tue, Oct 1, 2024 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sure, I can try it on mamba's host. It's slow though ...
OK, thanks.
I apologize for rubbing you the wrong way on this. It was not my
intent. (But, in fact, I had not realized you copied that code
from somewhere else.)
That's good to hear, but I'm still slightly puzzled because you
started by complaining about the comment and the comment, as I read
it, says that it's copied. So either the comment isn't as clear as I
think it is, or you didn't read it before complaining about it. We
don't have to keep going back and forth on this. I'm happy to have you
change it in any way that you feel suitable, with or without adjusting
pg_rewind to match. There wouldn't have been a lengthy comment here if
I'd been certain what the right thing to do was; and I wasn't. If you
are, great!
--
Robert Haas
EDB: http://www.enterprisedb.com
I wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Here is an attempt at cleaning this up. I'm far from convinced that
it's fully correct; my local compiler (clang version 15.0.7) doesn't
seem fussed about conflating size_t with uint64, not even with -Wall
-Werror. I don't suppose you have a fussier compiler locally that you
can use to test this?
Sure, I can try it on mamba's host. It's slow though ...
Yes, mamba thinks this is OK.
regards, tom lane
On Tue, Oct 1, 2024 at 1:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yes, mamba thinks this is OK.
Committed.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Oct 1, 2024 at 1:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yes, mamba thinks this is OK.
Committed.
Sadly, it seems adder[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-10-02%2014%3A09%3A58 is even pickier than mamba:
../pgsql/src/backend/backup/basebackup_incremental.c: In function \342\200\230CreateIncrementalBackupInfo\342\200\231:
../pgsql/src/backend/backup/basebackup_incremental.c:179:30: error: assignment to \342\200\230json_manifest_per_file_callback\342\200\231 {aka \342\200\230void (*)(JsonManifestParseContext *, const char *, long long unsigned int, pg_checksum_type, int, unsigned char *)\342\200\231} from incompatible pointer type \342\200\230void (*)(JsonManifestParseContext *, const char *, size_t, pg_checksum_type, int, uint8 *)\342\200\231 {aka \342\200\230void (*)(JsonManifestParseContext *, const char *, unsigned int, pg_checksum_type, int, unsigned char *)\342\200\231} [-Wincompatible-pointer-types]
179 | context->per_file_cb = manifest_process_file;
| ^
regards, tom lane
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-10-02%2014%3A09%3A58
I wrote:
Sadly, it seems adder[1] is even pickier than mamba:
Nope, it was my testing error: I supposed that this patch only
affected pg_combinebackup and pg_verifybackup, so I only
recompiled those modules not the whole tree. But there's one
more place with a json_manifest_per_file_callback callback :-(.
I pushed a quick-hack fix.
regards, tom lane
On Wed, Oct 2, 2024 at 8:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nope, it was my testing error: I supposed that this patch only
affected pg_combinebackup and pg_verifybackup, so I only
recompiled those modules not the whole tree. But there's one
more place with a json_manifest_per_file_callback callback :-(.I pushed a quick-hack fix.
I should have realized that was there, considering that it was I who
added it and not very long ago.
Thanks for fixing it.
--
Robert Haas
EDB: http://www.enterprisedb.com