WIP/PoC for parallel backup
Hi Hackers,
I have been looking into adding parallel backup feature in pg_basebackup.
Currently pg_basebackup sends BASE_BACKUP command for taking full backup,
server scans the PGDATA and sends the files to pg_basebackup. In general,
server takes the following steps on BASE_BACKUP command:
- do pg_start_backup
- scans PGDATA, creates and send header containing information of
tablespaces.
- sends each tablespace to pg_basebackup.
- and then do pg_stop_backup
All these steps are executed sequentially by a single process. The idea I
am working on is to separate these steps into multiple commands in
replication grammer. Add worker processes to the pg_basebackup where they
can copy the contents of PGDATA in parallel.
The command line interface syntax would be like:
pg_basebackup --jobs=WORKERS
Replication commands:
- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup,
scans PGDATA and sends a list of file names.
- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
pg_basebackup will then send back a list of filenames in this command. This
commands will be send by each worker and that worker will be getting the
said files.
- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUP command.
The pg_basebackup can start by sending "BASE_BACKUP PARALLEL" command and
getting a list of filenames from the server in response. It should then
divide this list as per --jobs parameter. (This division can be based on
file sizes). Each of the worker process will issue a SEND_FILES_CONTENTS
(file1, file2,...) command. In response, the server will send the files
mentioned in the list back to the requesting worker process.
Once all the files are copied, then pg_basebackup will send the STOP_BACKUP
command. Similar idea has been been discussed by Robert, on the incremental
backup thread a while ago. This is similar to that but instead of
START_BACKUP and SEND_FILE_LIST, I have combined them into BASE_BACKUP
PARALLEL.
I have done a basic proof of concenpt (POC), which is also attached. I
would appreciate some input on this. So far, I am simply dividing the list
equally and assigning them to worker processes. I intend to fine tune this
by taking into consideration file sizes. Further to add tar format support,
I am considering that each worker process, processes all files belonging to
a tablespace in its list (i.e. creates and copies tar file), before it
processes the next tablespace. As a result, this will create tar files that
are disjointed with respect tablespace data. For example:
Say, tablespace t1 has 20 files and we have 5 worker processes and
tablespace t2 has 10. Ignoring all other factors for the sake of this
example, each worker process will get a group of 4 files of t1 and 2 files
of t2. Each process will create 2 tar files, one for t1 containing 4 files
and another for t2 containing 2 files.
Regards,
Asif
Attachments:
0001-Initial-POC-on-parallel-backup.patchapplication/octet-stream; name=0001-Initial-POC-on-parallel-backup.patchDownload
From ffa6d0946af34d78e59eb5b82f1572f2537fffeb Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 21 Aug 2019 18:35:45 +0500
Subject: [PATCH] Initial POC on parallel backup
---
src/backend/replication/basebackup.c | 765 +++++++++++++++++--------
src/backend/replication/repl_gram.y | 53 +-
src/backend/replication/repl_scanner.l | 4 +
src/bin/pg_basebackup/pg_basebackup.c | 285 ++++++++-
src/include/nodes/replnodes.h | 8 +
5 files changed, 865 insertions(+), 250 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c91f66dcbe..9cbee408ff 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -52,11 +52,26 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ bool parallel;
} basebackup_options;
+typedef struct
+{
+ bool isdir;
+ char *path;
+} pathinfo;
+
+#define MAKE_PATHINFO(a, b) \
+ do { \
+ pi = palloc0(sizeof(pathinfo)); \
+ pi->isdir = a; \
+ pi->path = pstrdup(b); \
+ } while(0)
static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks);
+static int64 sendDir_(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, List **files);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -74,12 +89,18 @@ static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void StopBackup(basebackup_options *opt);
+static void SendBackupFileList(List *tablespaces);
+static void SendFilesContents(List *files, bool missing_ok);
+static void includeWALFiles(basebackup_options *opt, XLogRecPtr endptr, TimeLineID endtli);
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
/* Relative path of temporary statistics directory */
static char *statrelpath = NULL;
+#define TMP_BACKUP_LABEL_FILE BACKUP_LABEL_FILE".tmp"
/*
* Size of each block sent into the tar stream for larger files.
*/
@@ -305,6 +326,33 @@ perform_base_backup(basebackup_options *opt)
throttling_counter = -1;
}
+ /*
+ * In the parallel mode, we will not be closing the backup or sending the files right away.
+ * Instead we will only send the list of file names in the $PGDATA direcotry.
+ */
+ if (opt->parallel)
+ {
+ /* save backup label into temp file for now. So stop backup can send it to pg_basebackup later on. */
+ FILE *fp = AllocateFile(TMP_BACKUP_LABEL_FILE, "w");
+ if (!fp)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m",
+ TMP_BACKUP_LABEL_FILE)));
+ if (fwrite(labelfile->data, labelfile->len, 1, fp) != 1 ||
+ fflush(fp) != 0 ||
+ pg_fsync(fileno(fp)) != 0 ||
+ ferror(fp) ||
+ FreeFile(fp))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ TMP_BACKUP_LABEL_FILE)));
+
+ SendBackupFileList(tablespaces);
+ return;
+ }
+
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
{
@@ -367,234 +415,8 @@ perform_base_backup(basebackup_options *opt)
if (opt->includewal)
- {
- /*
- * We've left the last tar file "open", so we can now append the
- * required WAL files to it.
- */
- char pathbuf[MAXPGPATH];
- XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
- struct stat statbuf;
- List *historyFileList = NIL;
- List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
- ListCell *lc;
- TimeLineID tli;
-
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
+ includeWALFiles(opt, endptr, endtli);
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
- /* Ok, we have everything we need. Send the WAL files. */
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- FILE *fp;
- char buf[TAR_SEND_SIZE];
- size_t cnt;
- pgoff_t len = 0;
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
-
- fp = AllocateFile(pathbuf, "rb");
- if (fp == NULL)
- {
- int save_errno = errno;
-
- /*
- * Most likely reason for this is that the file was already
- * removed by a checkpoint, so check for that to get a better
- * error message.
- */
- CheckXLogRemoved(segno, tli);
-
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", pathbuf)));
- }
-
- if (fstat(fileno(fp), &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- pathbuf)));
- if (statbuf.st_size != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
-
- while ((cnt = fread(buf, 1,
- Min(sizeof(buf), wal_segment_size - len),
- fp)) > 0)
- {
- CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
-
- len += cnt;
- throttle(cnt);
-
- if (len == wal_segment_size)
- break;
- }
-
- if (len != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* wal_segment_size is a multiple of 512, so no need for padding */
-
- FreeFile(fp);
-
- /*
- * Mark file as archived, otherwise files can get archived again
- * after promotion of a new node. This is in line with
- * walreceiver.c always doing an XLogArchiveForceDone() after a
- * complete segment.
- */
- StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /*
- * Send timeline history files too. Only the latest timeline history
- * file is required for recovery, and even that only if there happens
- * to be a timeline switch in the first WAL segment that contains the
- * checkpoint record, or if we're taking a base backup from a standby
- * server and the target timeline changes while the backup is taken.
- * But they are small and highly useful for debugging purposes, so
- * better include them all, always.
- */
- foreach(lc, historyFileList)
- {
- char *fname = lfirst(lc);
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
-
- if (lstat(pathbuf, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m", pathbuf)));
-
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
-
- /* unconditionally mark file as archived */
- StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /* Send CopyDone message for the last tar file */
- pq_putemptymessage('c');
- }
SendXlogRecPtrResult(endptr, endtli);
if (total_checksum_failures)
@@ -638,6 +460,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_parallel = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -726,6 +549,16 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ if (o_parallel)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ opt->parallel = true;
+ o_parallel = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
@@ -760,7 +593,12 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ if (cmd->cmdtag == SEND_FILES_CONTENT)
+ SendFilesContents(cmd->backupfiles, true);
+ else if (cmd->cmdtag == STOP_BACKUP)
+ StopBackup(&opt);
+ else
+ perform_base_backup(&opt);
}
static void
@@ -1004,9 +842,16 @@ sendTablespace(char *path, bool sizeonly)
* information in the tar file. If not, we can skip that
* as it will be sent separately in the tablespace_map file.
*/
+
+static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks)
+{
+ return sendDir_(path, basepathlen, sizeonly, tablespaces, sendtblspclinks, NULL);
+}
+
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks)
+sendDir_(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+ bool sendtblspclinks, List **files)
{
DIR *dir;
struct dirent *de;
@@ -1160,6 +1005,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ if (files != NULL)
+ {
+ pathinfo *pi;
+
+ MAKE_PATHINFO(true, pathbuf);
+ *files = lappend(*files, pi);
+ }
+
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
excludeFound = true;
break;
@@ -1197,6 +1051,13 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
sizeonly);
+ if (files != NULL)
+ {
+ pathinfo *pi;
+
+ MAKE_PATHINFO(true, pathbuf);
+ *files = lappend(*files, pi);
+ }
continue; /* don't recurse into pg_wal */
}
@@ -1282,13 +1143,29 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ {
+ if (files != NULL)
+ {
+ pathinfo *pi;
+
+ MAKE_PATHINFO(true, pathbuf);
+ *files = lappend(*files, pi);
+ }
+ size += sendDir_(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks, files);
+ }
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ if (files != NULL)
+ {
+ pathinfo *pi;
+
+ MAKE_PATHINFO(false, pathbuf);
+ *files = lappend(*files, pi);
+ }
+ else if (!sizeonly)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
@@ -1711,3 +1588,427 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+
+static void
+StopBackup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+ char *labelfile;
+ struct stat statbuf;
+ int r;
+ StringInfoData buf;
+
+ /* Disable throttling. */
+ throttling_counter = -1;
+
+ /* send backup_label.tmp and pg_control files */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(TMP_BACKUP_LABEL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ TMP_BACKUP_LABEL_FILE)));
+ sendFile(TMP_BACKUP_LABEL_FILE, BACKUP_LABEL_FILE, &statbuf, false, InvalidOid);
+
+ /* read backup_label file into buffer, we need it for do_pg_stop_backup */
+ FILE *lfp = AllocateFile(TMP_BACKUP_LABEL_FILE, "r");
+ if (!lfp)
+ {
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m",
+ TMP_BACKUP_LABEL_FILE)));
+ }
+
+ labelfile = palloc(statbuf.st_size + 1);
+ r = fread(labelfile, statbuf.st_size, 1, lfp);
+ labelfile[statbuf.st_size] = '\0';
+
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /* stop backup */
+ endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
+
+ /*
+ * FIXME: opt->includewal is not avaiable here. so just calling it unconditionaly. but should add
+ * includewal option to STOP_BACKUP command that pg_basebacup sends.
+ */
+
+ // if (opt->includewal)
+ includeWALFiles(opt, endptr, endtli);
+
+ /* send ending wal record. */
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+static void
+SendBackupFileList(List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ List *files = NIL;
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+ if (ti->path == NULL)
+ sendDir_(".", 1, false, NIL, true, &files);
+ else
+ sendDir_(ti->path, 1, false, NIL, true, &files);
+ }
+
+ // add backup label file
+ pathinfo *pi;
+ MAKE_PATHINFO(false, TMP_BACKUP_LABEL_FILE);
+ files = lcons(pi, files);
+
+ /* Construct and send the directory information */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* First field - isdirectory */
+ pq_sendstring(&buf, "isDir");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, INT4OID); /* type oid */
+ pq_sendint16(&buf, 4); /* typlen */
+ pq_sendint32(&buf, 0); /* typmod */
+ pq_sendint16(&buf, 0); /* format code */
+
+ /* Second field - file path */
+ pq_sendstring(&buf, "path");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_endmessage(&buf);
+
+ foreach(lc, files)
+ {
+ pathinfo *pi = (pathinfo *) lfirst(lc);
+ char *path = pi->path;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ int32 isdir = pi->isdir ? 1 : 0;
+ send_int8_string(&buf, isdir);
+
+ Size len = strlen(path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, path, len);
+
+ pq_endmessage(&buf);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+ }
+
+static void
+SendFilesContents(List *files, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ /* Disable throttling. */
+ throttling_counter = -1;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, files)
+ {
+ Value *strval = lfirst(lc);
+ char *pathbuf = (char *) strVal(strval);
+
+ // send file
+ struct stat statbuf;
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /*
+ * TODO: perhaps create directory entry in the tar file, to avoid the need of manually creating
+ * directories in pg_basebackup.c
+ */
+// if (S_ISDIR(statbuf.st_mode))
+// {
+// bool skip_this_dir = false;
+// ListCell *lc;
+//
+// /*
+// * Store a directory entry in the tar file so we can get the
+// * permissions right.
+// */
+//
+// _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+// }
+ sendFile(pathbuf, pathbuf, &statbuf, true, InvalidOid);
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+ return;
+}
+
+static void
+includeWALFiles(basebackup_options *opt, XLogRecPtr endptr, TimeLineID endtli)
+{
+ /*
+ * We've left the last tar file "open", so we can now append the
+ * required WAL files to it.
+ */
+ char pathbuf[MAXPGPATH];
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ struct stat statbuf;
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and
+ * include all WAL files in the range between 'startptr' and 'endptr',
+ * regardless of the timeline the file is stamped with. If there are
+ * some spurious WAL files belonging to timelines that don't belong in
+ * this server's history, they will be included too. Normally there
+ * shouldn't be such files, but if there are, there's little harm in
+ * including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we
+ * need were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from
+ * oldest to newest, to reduce the chance that a file is recycled
+ * before we get a chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since
+ * we are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ /* Ok, we have everything we need. Send the WAL files. */
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ FILE *fp;
+ char buf[TAR_SEND_SIZE];
+ size_t cnt;
+ pgoff_t len = 0;
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+
+ fp = AllocateFile(pathbuf, "rb");
+ if (fp == NULL)
+ {
+ int save_errno = errno;
+
+ /*
+ * Most likely reason for this is that the file was already
+ * removed by a checkpoint, so check for that to get a better
+ * error message.
+ */
+ CheckXLogRemoved(segno, tli);
+
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", pathbuf)));
+ }
+
+ if (fstat(fileno(fp), &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ pathbuf)));
+ if (statbuf.st_size != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* send the WAL file itself */
+ _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+
+ while ((cnt = fread(buf, 1,
+ Min(sizeof(buf), wal_segment_size - len),
+ fp)) > 0)
+ {
+ CheckXLogRemoved(segno, tli);
+ /* Send the chunk as a CopyData message */
+ if (pq_putmessage('d', buf, cnt))
+ ereport(ERROR,
+ (errmsg("base backup could not send data, aborting backup")));
+
+ len += cnt;
+ throttle(cnt);
+
+ if (len == wal_segment_size)
+ break;
+ }
+
+ if (len != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* wal_segment_size is a multiple of 512, so no need for padding */
+
+ FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again
+ * after promotion of a new node. This is in line with
+ * walreceiver.c always doing an XLogArchiveForceDone() after a
+ * complete segment.
+ */
+ StatusFilePath(pathbuf, walFileName, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /*
+ * Send timeline history files too. Only the latest timeline history
+ * file is required for recovery, and even that only if there happens
+ * to be a timeline switch in the first WAL segment that contains the
+ * checkpoint record, or if we're taking a base backup from a standby
+ * server and the target timeline changes while the backup is taken.
+ * But they are small and highly useful for debugging purposes, so
+ * better include them all, always.
+ */
+ foreach(lc, historyFileList)
+ {
+ char *fname = lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /* Send CopyDone message for the last tar file */
+ pq_putemptymessage('c');
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..56b6934e43 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,6 +87,10 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_PARALLEL
+%token K_START_BACKUP
+%token K_SEND_FILES_CONTENT
+%token K_STOP_BACKUP
%type <node> command
%type <node> base_backup start_replication start_logical_replication
@@ -102,6 +106,8 @@ static SQLCmd *make_sqlcmd(void);
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
+%type <list> backup_files backup_files_list
+%type <node> backup_file
%%
@@ -155,13 +161,29 @@ var_name: IDENT { $$ = $1; }
/*
* BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
+ * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS] [PARALLEL]
*/
base_backup:
K_BASE_BACKUP base_backup_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_FILES_CONTENT backup_files
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = NIL;
+ cmd->cmdtag = SEND_FILES_CONTENT;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
+ }
+ | K_STOP_BACKUP
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = NIL;
+ cmd->cmdtag = STOP_BACKUP;
$$ = (Node *) cmd;
}
;
@@ -214,6 +236,35 @@ base_backup_opt:
$$ = makeDefElem("noverify_checksums",
(Node *)makeInteger(true), -1);
}
+ | K_PARALLEL
+ {
+ $$ = makeDefElem("parallel",
+ (Node *)makeInteger(true), -1);
+ }
+ ;
+
+backup_files:
+ '(' backup_files_list ')'
+ {
+ $$ = $2;
+ }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ backup_file
+ {
+ $$ = list_make1($1);
+ }
+ | backup_files_list ',' backup_file
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
;
create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..87a38046c0 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,10 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+PARALLEL { return K_PARALLEL; }
+START_BACKUP { return K_START_BACKUP; }
+SEND_FILES_CONTENT { return K_SEND_FILES_CONTENT; }
+STOP_BACKUP { return K_STOP_BACKUP; }
"," { return ','; }
";" { return ';'; }
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 9207109ba3..ed58d06316 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -40,6 +40,7 @@
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
+#include "fe_utils/simple_list.h"
#define ERRCODE_DATA_CORRUPTED "XX001"
@@ -105,6 +106,7 @@ static bool temp_replication_slot = true;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
+static int numWorkers = 1;
static bool success = false;
static bool made_new_pgdata = false;
@@ -114,6 +116,9 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+PGconn **conn_list = NULL;
+int *worker_process;
+
/* Progress counters */
static uint64 totalsize;
static uint64 totaldone;
@@ -157,6 +162,11 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void ParallelBackupEnd(void);
+static int getFiles(SimpleStringList *files);
+static SimpleStringList** divideFilesList(SimpleStringList *files, int numFiles);
+static void create_workers_and_fetch(SimpleStringList **worker_files);
+
static void
cleanup_directories_atexit(void)
@@ -355,6 +365,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -1477,6 +1488,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*
@@ -1867,7 +1879,23 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
+ if (numWorkers > 1)
+ {
+ basebkp =
+ psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s %s",
+ escaped_label,
+ showprogress ? "PROGRESS" : "",
+ includewal == FETCH_WAL ? "WAL" : "",
+ fastcheckpoint ? "FAST" : "",
+ includewal == NO_WAL ? "" : "NOWAIT",
+ maxrate_clause ? maxrate_clause : "",
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS",
+ (numWorkers > 1) ? "PARALLEL" : "");
+ }
+ else
+ {
+ basebkp =
psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
escaped_label,
showprogress ? "PROGRESS" : "",
@@ -1877,6 +1905,8 @@ BaseBackup(void)
maxrate_clause ? maxrate_clause : "",
format == 't' ? "TABLESPACE_MAP" : "",
verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ }
+
if (PQsendQuery(conn, basebkp) == 0)
{
@@ -1982,24 +2012,87 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
+ if (numWorkers > 1)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ SimpleStringList files = {NULL, NULL};
+ SimpleStringList **worker_files;
- if (showprogress)
- {
- progress_report(PQntuples(res), NULL, true);
- if (isatty(fileno(stderr)))
- fprintf(stderr, "\n"); /* Need to move to next line */
+ /*
+ * Get the header
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get backup header: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ int num_files = 0;
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ bool isdir = atoi(PQgetvalue(res, i, 0));
+ const char *path = PQgetvalue(res, i, 1);
+
+ /* create directories while traversing */
+ if (isdir)
+ {
+ bool created;
+ bool found;
+ char current_path[MAXPGPATH];
+
+ if (includewal == STREAM_WAL &&
+ (pg_str_endswith(path, "/pg_wal") ||
+ pg_str_endswith(path, "/pg_xlog") ||
+ pg_str_endswith(path, "/archive_status")))
+ continue;
+
+ snprintf(current_path, sizeof(current_path), "%s/%s", basedir, path + 2);
+ verify_dir_is_empty_or_create(current_path, &created, &found);
+ }
+
+ else
+ {
+ num_files++;
+ simple_string_list_append(&files, path);
+ }
+ }
+
+ res = PQgetResult(conn); //NoData
+ res = PQgetResult(conn); //CopyDone
+
+ worker_files = divideFilesList(&files, num_files);
+ create_workers_and_fetch(worker_files);
+
+ pg_log_info("total files in $PGDTA: %d", num_files);
+
+ ParallelBackupEnd();
}
+ else
+ {
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
+ if (showprogress)
+ {
+ progress_report(PQntuples(res), NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
+ }
PQclear(res);
/*
@@ -2195,6 +2288,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2222,7 +2316,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2363,6 +2457,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2477,6 +2574,22 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (numWorkers > 1 && format != 'p')
+ {
+ pg_log_error("Worker can only be specified in plain mode");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2545,7 +2658,145 @@ main(int argc, char **argv)
}
BaseBackup();
-
success = true;
return 0;
}
+
+static void
+ParallelBackupEnd(void)
+{
+ PGresult *res = NULL;
+ int i = 0;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP"); /* FIXME: add "WAL" to the command, to handle -X FETCH command option. */
+
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* receive backup_label and pg_control files */
+ ReceiveAndUnpackTarFile(conn, res, i);
+ PQclear(res);
+}
+
+static int
+getFiles(SimpleStringList *files)
+{
+ SimpleStringListCell *cell;
+ PGresult *res = NULL;
+ int i = 0;
+
+ PQExpBuffer buf;
+ buf = createPQExpBuffer();
+
+ /* build query in form of: SEND_FILES_CONTENT ('base/1/1245/32683', 'base/1/1245/32683', ...) */
+ appendPQExpBuffer(buf, "SEND_FILES_CONTENT ( ");
+ for (cell = files->head; cell; cell = cell->next)
+ {
+ char *str = cell->val; // skip './'
+
+ if (str == NULL)
+ continue;
+
+ if (str[0] == '.' && str[1] == '/')
+ str += 2;
+
+ i++;
+ if (cell != files->tail)
+ appendPQExpBuffer(buf, "'%s' ,", str);
+ else
+ appendPQExpBuffer(buf, "'%s'", str);
+ }
+ appendPQExpBufferStr(buf, " )");
+
+ PGconn *worker_conn = GetConnection();
+ if (!worker_conn)
+ return 1;
+
+
+ if (PQsendQuery(worker_conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(worker_conn));
+ return 1;
+ }
+ destroyPQExpBuffer(buf);
+
+// if (format == 't')
+// ReceiveTarFile(conn1, res, i);
+// else
+ ReceiveAndUnpackTarFile(worker_conn, res, i);
+
+ res = PQgetResult(worker_conn); //NoData
+ res = PQgetResult(worker_conn); //CopyDone
+
+ PQclear(res);
+ PQfinish(worker_conn);
+
+ return 0;
+}
+
+static SimpleStringList**
+divideFilesList(SimpleStringList *files, int numFiles)
+{
+ SimpleStringList **worker_files;
+ SimpleStringListCell *cell;
+ int file_per_worker = (numFiles / numWorkers) + 1;
+ int cnt = 0, i = 0;
+
+ /* init worker_files */
+ worker_files = (SimpleStringList**) palloc0(sizeof(SimpleStringList) * numWorkers);
+ for (i = 0; i < numWorkers; i++)
+ worker_files[i] = (SimpleStringList*) palloc0(sizeof(SimpleStringList));
+
+ /* copy file to worker_files[] */
+ i = 0;
+ for (cell = files->head; cell; cell = cell->next)
+ {
+ if (i >= file_per_worker)
+ {
+ printf("%d files for worker %d\n", i, cnt);
+ cnt ++;
+ i = 0;
+ }
+
+ simple_string_list_append(worker_files[cnt], cell->val);
+ i++;
+ }
+
+ return worker_files;
+}
+
+
+static void
+create_workers_and_fetch(SimpleStringList **worker_files)
+{
+ worker_process = (int*) palloc(sizeof(int) * numWorkers);
+ int status;
+ int pid, i;
+ for (i = 0; i < numWorkers; i++)
+ {
+ worker_process[i] = fork();
+ if (worker_process[i] == 0)
+ {
+ /* in child process */
+ _exit(getFiles(worker_files[i]));
+ }
+ else if (worker_process[i] < 0)
+ {
+ pg_log_error("could not create background process: %m");
+ exit(1);
+ }
+
+ pg_log_info("process (%d) created", worker_process[i]);
+ /*
+ * Else we are in the parent process and all is well.
+ */
+ }
+
+ while (waitpid(-1, NULL, WNOHANG) > 0);
+}
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 1e3ed4e19f..b4127864c2 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,12 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ SEND_FILES_CONTENT,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +48,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
--
2.20.1 (Apple Git-117)
Hi Asif
Interesting proposal. Bulk of the work in a backup is transferring files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set. Is that necessary? I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.
What do you think?
Asim
On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
Hi Asif
Interesting proposal. Bulk of the work in a backup is transferring files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set. Is that necessary? I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.What do you think?
The main question is what we really want to solve here. What is the
bottleneck? and which HW want to saturate?. Why I am saying that because
there are multiple H/W involve while taking the backup (Network/CPU/Disk).
If we
already saturated the disk then there is no need to add parallelism because
we will be blocked on disk I/O anyway. I implemented the parallel backup
in a sperate
application and has wonderful results. I just skim through the code and have
some reservation that creating a separate process only for copying data is
overkill.
There are two options, one is non-blocking calls or you can have some
worker threads.
But before doing that need to see the pg_basebackup bottleneck, after that,
we
can see what is the best way to solve that. Some numbers may help to
understand the
actual benefit.
--
Ibrar Ahmed
On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
Hi Asif
Interesting proposal. Bulk of the work in a backup is transferring files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set. Is that necessary? I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.What do you think?
Asim
Thanks Asim for the feedback. This is a good suggestion. The main idea I
wanted to discuss is the design where we can open multiple backend
connections to get the data instead of a single connection.
On the client side we can have multiple approaches, One is to use
asynchronous APIs ( as suggested by you) and other could be to decide
between multi-process and multi-thread. The main point was we can extract
lot of performance benefit by using the multiple connections and I built
this POC to float the idea of how the parallel backup can work, since the
core logic of getting the files using multiple connections will remain the
same, wether we use asynchronous, multi-process or multi-threaded.
I am going to address the division of files to be distributed evenly among
multiple workers based on file sizes, that would allow to get some concrete
numbers as well as it will also us to gauge some benefits between async and
multiprocess/thread approach on client side.
Regards,
Asif
Greetings,
* Asif Rehman (asifr.rehman@gmail.com) wrote:
On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
Interesting proposal. Bulk of the work in a backup is transferring files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set. Is that necessary? I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.Thanks Asim for the feedback. This is a good suggestion. The main idea I
wanted to discuss is the design where we can open multiple backend
connections to get the data instead of a single connection.
On the client side we can have multiple approaches, One is to use
asynchronous APIs ( as suggested by you) and other could be to decide
between multi-process and multi-thread. The main point was we can extract
lot of performance benefit by using the multiple connections and I built
this POC to float the idea of how the parallel backup can work, since the
core logic of getting the files using multiple connections will remain the
same, wether we use asynchronous, multi-process or multi-threaded.I am going to address the division of files to be distributed evenly among
multiple workers based on file sizes, that would allow to get some concrete
numbers as well as it will also us to gauge some benefits between async and
multiprocess/thread approach on client side.
I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.
Thanks,
Stephen
On Fri, Aug 23, 2019 at 10:26 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Asif Rehman (asifr.rehman@gmail.com) wrote:
On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
Interesting proposal. Bulk of the work in a backup is transferring
files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set inparallel.
This seems correct, however, your patch is also creating a new process
to
handle each set. Is that necessary? I think we should try to achieve
this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On theserver
side, it may still result in multiple backend processes per
connection, and
an attempt should be made to avoid that as well, but it seems
complicated.
Thanks Asim for the feedback. This is a good suggestion. The main idea I
wanted to discuss is the design where we can open multiple backend
connections to get the data instead of a single connection.
On the client side we can have multiple approaches, One is to use
asynchronous APIs ( as suggested by you) and other could be to decide
between multi-process and multi-thread. The main point was we can extract
lot of performance benefit by using the multiple connections and I built
this POC to float the idea of how the parallel backup can work, since the
core logic of getting the files using multiple connections will remainthe
same, wether we use asynchronous, multi-process or multi-threaded.
I am going to address the division of files to be distributed evenly
among
multiple workers based on file sizes, that would allow to get some
concrete
numbers as well as it will also us to gauge some benefits between async
and
multiprocess/thread approach on client side.
I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.+1 for compression and encryption, but I think parallelism will give us
the benefit with and without the compression.
Thanks,
Stephen
--
Ibrar Ahmed
On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Asif Rehman (asifr.rehman@gmail.com) wrote:
On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
Interesting proposal. Bulk of the work in a backup is transferring
files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set inparallel.
This seems correct, however, your patch is also creating a new process
to
handle each set. Is that necessary? I think we should try to achieve
this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On theserver
side, it may still result in multiple backend processes per
connection, and
an attempt should be made to avoid that as well, but it seems
complicated.
Thanks Asim for the feedback. This is a good suggestion. The main idea I
wanted to discuss is the design where we can open multiple backend
connections to get the data instead of a single connection.
On the client side we can have multiple approaches, One is to use
asynchronous APIs ( as suggested by you) and other could be to decide
between multi-process and multi-thread. The main point was we can extract
lot of performance benefit by using the multiple connections and I built
this POC to float the idea of how the parallel backup can work, since the
core logic of getting the files using multiple connections will remainthe
same, wether we use asynchronous, multi-process or multi-threaded.
I am going to address the division of files to be distributed evenly
among
multiple workers based on file sizes, that would allow to get some
concrete
numbers as well as it will also us to gauge some benefits between async
and
multiprocess/thread approach on client side.
I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.
It would be interesting to see the benefits of compression (before the data
is transferred over the network) on top of parallelism. Since there is also
some overhead associated with performing the compression. I agree with your
suggestion of trying to add parallelism first and then try compression
before the data is sent across the network.
Show quoted text
Thanks,
Stephen
Greetings,
* Ahsan Hadi (ahsan.hadi@gmail.com) wrote:
On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.It would be interesting to see the benefits of compression (before the data
is transferred over the network) on top of parallelism. Since there is also
some overhead associated with performing the compression. I agree with your
suggestion of trying to add parallelism first and then try compression
before the data is sent across the network.
You're welcome to take a look at pgbackrest for insight and to play with
regarding compression-before-transfer, how best to split up the files
and order them, encryption, et al. We've put quite a bit of effort into
figuring all of that out.
Thanks!
Stephen
On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.
So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.
- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.
- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUP command.
This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.
I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I am simply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into consideration file sizes. Further to add tar format support, I am considering that each worker process, processes all files belonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. As a result, this will create tar files that are disjointed with respect tablespace data. For example:
Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.
If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.
Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.
This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:
1. Error! Tar format parallel backups are not supported.
2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.
3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).
There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.
I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi Robert,
Thanks for the feedback. Please see the comments below:
On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup,scans PGDATA and sends a list of file names.
So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.
Sure. I will add a separate command (START_BACKUP) for parallel.
- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
list.
pg_basebackup will then send back a list of filenames in this command.
This commands will be send by each worker and that worker will be getting
the said files.Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.
I considered this approach initially, however, I adopted the current
strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.
- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUPcommand.
This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.I have done a basic proof of concenpt (POC), which is also attached. I
would appreciate some input on this. So far, I am simply dividing the list
equally and assigning them to worker processes. I intend to fine tune this
by taking into consideration file sizes. Further to add tar format support,
I am considering that each worker process, processes all files belonging to
a tablespace in its list (i.e. creates and copies tar file), before it
processes the next tablespace. As a result, this will create tar files that
are disjointed with respect tablespace data. For example:Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.
Ideally, I would like to support the tar format as well, which would be
much easier to implement when fetching multiple files at once since that
would enable using the existent functionality to be used without much
change.
Your idea of sorting the files in descending order of size seems very
appealing. I think we can do this and have the file divided among the
workers one by one i.e. the first file in the list goes to worker 1, the
second to process 2, and so on and so forth.
Say, tablespace t1 has 20 files and we have 5 worker processes and
tablespace t2 has 10. Ignoring all other factors for the sake of this
example, each worker process will get a group of 4 files of t1 and 2 files
of t2. Each process will create 2 tar files, one for t1 containing 4 files
and another for t2 containing 2 files.This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:1. Error! Tar format parallel backups are not supported.
2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.
Yes for the tar format support, approach (2) is what I had in
mind. Currently I'm working on the implementation and will share the patch
in a couple of days.
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Hi Asif,
I was looking at the patch and tried comipling it. However, got few errors
and warnings.
Fixed those in the attached patch.
On Fri, Sep 27, 2019 at 9:30 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi Robert,
Thanks for the feedback. Please see the comments below:
On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com>
wrote:On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup,scans PGDATA and sends a list of file names.
So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.Sure. I will add a separate command (START_BACKUP) for parallel.
- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
list.
pg_basebackup will then send back a list of filenames in this command.
This commands will be send by each worker and that worker will be getting
the said files.Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.I considered this approach initially, however, I adopted the current
strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUPcommand.
This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.I have done a basic proof of concenpt (POC), which is also attached. I
would appreciate some input on this. So far, I am simply dividing the list
equally and assigning them to worker processes. I intend to fine tune this
by taking into consideration file sizes. Further to add tar format support,
I am considering that each worker process, processes all files belonging to
a tablespace in its list (i.e. creates and copies tar file), before it
processes the next tablespace. As a result, this will create tar files that
are disjointed with respect tablespace data. For example:Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.Ideally, I would like to support the tar format as well, which would be
much easier to implement when fetching multiple files at once since that
would enable using the existent functionality to be used without much
change.Your idea of sorting the files in descending order of size seems very
appealing. I think we can do this and have the file divided among the
workers one by one i.e. the first file in the list goes to worker 1, the
second to process 2, and so on and so forth.Say, tablespace t1 has 20 files and we have 5 worker processes and
tablespace t2 has 10. Ignoring all other factors for the sake of this
example, each worker process will get a group of 4 files of t1 and 2 files
of t2. Each process will create 2 tar files, one for t1 containing 4 files
and another for t2 containing 2 files.This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:1. Error! Tar format parallel backups are not supported.
2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.Yes for the tar format support, approach (2) is what I had in
mind. Currently I'm working on the implementation and will share the patch
in a couple of days.--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachments:
0001-Initial-POC-on-parallel-backup_fix_errors_warnings_delta.patchtext/x-patch; charset=US-ASCII; name=0001-Initial-POC-on-parallel-backup_fix_errors_warnings_delta.patchDownload
commit 0d7433e44123b486b48f2071b24f1eaef46f4849
Author: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Date: Thu Oct 3 12:58:55 2019 +0530
Fix errors and warnings.
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ef55bd0..32ed160 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -346,6 +346,7 @@ perform_base_backup(basebackup_options *opt)
{
/* save backup label into temp file for now. So stop backup can send it to pg_basebackup later on. */
FILE *fp = AllocateFile(TMP_BACKUP_LABEL_FILE, "w");
+
if (!fp)
ereport(ERROR,
(errcode_for_file_access(),
@@ -1627,8 +1628,8 @@ StopBackup(basebackup_options *opt)
XLogRecPtr endptr;
char *labelfile;
struct stat statbuf;
- int r;
StringInfoData buf;
+ FILE *lfp;
/* Disable throttling. */
throttling_counter = -1;
@@ -1648,7 +1649,7 @@ StopBackup(basebackup_options *opt)
sendFile(TMP_BACKUP_LABEL_FILE, BACKUP_LABEL_FILE, &statbuf, false, InvalidOid);
/* read backup_label file into buffer, we need it for do_pg_stop_backup */
- FILE *lfp = AllocateFile(TMP_BACKUP_LABEL_FILE, "r");
+ lfp = AllocateFile(TMP_BACKUP_LABEL_FILE, "r");
if (!lfp)
{
ereport(ERROR,
@@ -1658,7 +1659,7 @@ StopBackup(basebackup_options *opt)
}
labelfile = palloc(statbuf.st_size + 1);
- r = fread(labelfile, statbuf.st_size, 1, lfp);
+ fread(labelfile, statbuf.st_size, 1, lfp);
labelfile[statbuf.st_size] = '\0';
@@ -1692,6 +1693,7 @@ SendBackupFileList(List *tablespaces)
{
StringInfoData buf;
ListCell *lc;
+ pathinfo *pi;
List *files = NIL;
foreach(lc, tablespaces)
@@ -1704,7 +1706,6 @@ SendBackupFileList(List *tablespaces)
}
// add backup label file
- pathinfo *pi;
MAKE_PATHINFO(false, TMP_BACKUP_LABEL_FILE);
files = lcons(pi, files);
@@ -1736,15 +1737,15 @@ SendBackupFileList(List *tablespaces)
{
pathinfo *pi = (pathinfo *) lfirst(lc);
char *path = pi->path;
+ Size len;
/* Send one datarow message */
pq_beginmessage(&buf, 'D');
pq_sendint16(&buf, 2); /* number of columns */
- int32 isdir = pi->isdir ? 1 : 0;
- send_int8_string(&buf, isdir);
+ send_int8_string(&buf, (pi->isdir ? 1 : 0));
- Size len = strlen(path);
+ len = strlen(path);
pq_sendint32(&buf, len);
pq_sendbytes(&buf, path, len);
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1637735..a316cc6 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1864,6 +1864,7 @@ BaseBackup(void)
{
SimpleStringList files = {NULL, NULL};
SimpleStringList **worker_files;
+ int num_files;
/*
* Get the header
@@ -1881,7 +1882,7 @@ BaseBackup(void)
exit(1);
}
- int num_files = 0;
+ num_files = 0;
for (i = 0; i < PQntuples(res); i++)
{
bool isdir = atoi(PQgetvalue(res, i, 0));
@@ -2537,6 +2538,7 @@ getFiles(SimpleStringList *files)
SimpleStringListCell *cell;
PGresult *res = NULL;
int i = 0;
+ PGconn *worker_conn;
PQExpBuffer buf;
buf = createPQExpBuffer();
@@ -2561,7 +2563,7 @@ getFiles(SimpleStringList *files)
}
appendPQExpBufferStr(buf, " )");
- PGconn *worker_conn = GetConnection();
+ worker_conn = GetConnection();
if (!worker_conn)
return 1;
@@ -2623,9 +2625,10 @@ divideFilesList(SimpleStringList *files, int numFiles)
static void
create_workers_and_fetch(SimpleStringList **worker_files)
{
+ int i;
+
worker_process = (int*) palloc(sizeof(int) * numWorkers);
- int status;
- int pid, i;
+
for (i = 0; i < numWorkers; i++)
{
worker_process[i] = fork();
On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.
I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.
However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.
To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.
Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching multiple files at once since that would enable using the existent functionality to be used without much change.
I think we should just have the client generate the tarfile. It'll
require duplicating some code, but it's not actually that much code or
that complicated from what I can see.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
list.
pg_basebackup will then send back a list of filenames in this
command. This commands will be send by each worker and that worker will be
getting the said files.Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.I considered this approach initially, however, I adopted the current
strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.
I have updated the patch (see the attached patch) to include tablespace
support, tar format support and all other backup base backup options to
work in parallel mode as well. As previously suggested, I have removed
BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
backup. The tar format will write multiple tar files depending upon the
number of workers specified. Also made all commands
(START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
base_backup_opt_list. This way the command-line options can also be
provided to these commands. Since the command-line options don't change
once the backup initiates, I went this way instead of storing them in
shared state.
The START_BACKUP command will now return a sorted list of files in
descending order based on file sizes. This way, the larger files will be on
top of the list. hence these files will be assigned to workers one by one,
making it so that the larger files will be copied before other files.
Based on my understanding your main concern is that the files won't be
distributed fairly i.e one worker might get a big file and take more time
while others get done early with smaller files? In this approach I have
created a list of files in descending order based on there sizes so all the
big size files will come at the top. The maximum file size in PG is 1GB so
if we have four workers who are picking up file from the list one by one,
the worst case scenario is that one worker gets a file of 1GB to process
while others get files of smaller size. However with this approach of
descending files based on size and handing it out to workers one by one,
there is a very high likelihood of workers getting work evenly. does this
address your concerns?
Furthermore the patch also includes the regression test. As t/
010_pg_basebackup.pl test-case is testing base backup comprehensively, so I
have duplicated it to "t/040_pg_basebackup_parallel.pl" and added parallel
option in all of its tests, to make sure parallel mode works expectantly.
The one thing that differs from base backup is the file checksum reporting.
In parallel mode, the total number of checksum failures are not reported
correctly however it will abort the backup whenever a checksum failure
occurs. This is because processes are not maintaining any shared state. I
assume that it's not much important to report total number of failures vs
noticing the failure and aborting.
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0001-parallel-backup.patchapplication/octet-stream; name=0001-parallel-backup.patchDownload
From 8c29c68ff24413d8d01478080d9741b0b231d848 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Thu, 3 Oct 2019 23:41:55 +0500
Subject: [PATCH] parallel backup
---
src/backend/access/transam/xlog.c | 2 +-
src/backend/replication/basebackup.c | 1078 +++++++++++++----
src/backend/replication/repl_gram.y | 58 +
src/backend/replication/repl_scanner.l | 5 +
src/bin/pg_basebackup/pg_basebackup.c | 360 +++++-
.../t/040_pg_basebackup_parallel.pl | 571 +++++++++
src/include/nodes/replnodes.h | 9 +
src/include/replication/basebackup.h | 2 +-
8 files changed, 1797 insertions(+), 288 deletions(-)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 790e2c8714..3dc2ebd7dc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10477,7 +10477,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d0f210de8c..fe906dbfdf 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -52,11 +52,31 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ int32 worker;
} basebackup_options;
+typedef struct
+{
+ char path[MAXPGPATH];
+ bool isdir;
+ int32 size;
+} pathinfo;
+
+#define STORE_PATHINFO(_filenames, _path, _isdir, _size) \
+ do { \
+ if (files != NULL) { \
+ pathinfo *pi = palloc0(sizeof(pathinfo)); \
+ strlcpy(pi->path, _path, sizeof(pi->path)); \
+ pi->isdir = _isdir; \
+ pi->size = _size; \
+ *_filenames = lappend(*_filenames, pi); \
+ } \
+ } while(0)
static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks);
+static int64 sendDir_(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, List **files);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -71,15 +91,26 @@ static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
+static int compareFileSize(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void StartBackup(basebackup_options *opt);
+static void StopBackup(basebackup_options *opt);
+static void SendBackupFileList(basebackup_options *opt, List *tablespaces);
+static void SendFilesContents(basebackup_options *opt, List *filenames, bool missing_ok);
+static void include_wal_files(XLogRecPtr endptr, TimeLineID endtli);
+static void setup_throttle(int maxrate);
+static char *readfile(const char *readfilename, bool missing_ok);
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
/* Relative path of temporary statistics directory */
static char *statrelpath = NULL;
+#define BACKUP_LABEL_FILE_TMP BACKUP_LABEL_FILE ".tmp"
+#define TABLESPACE_MAP_TMP TABLESPACE_MAP ".tmp"
/*
* Size of each block sent into the tar stream for larger files.
*/
@@ -192,6 +223,14 @@ static const char *const excludeFiles[] =
BACKUP_LABEL_FILE,
TABLESPACE_MAP,
+ /*
+ * Skip backup_label.tmp or tablespace_map.tmp files. These are temporary
+ * and are injected into the backup by SendFilesList and
+ * SendFilesContents, will be removed after as well.
+ */
+ BACKUP_LABEL_FILE_TMP,
+ TABLESPACE_MAP_TMP,
+
"postmaster.pid",
"postmaster.opts",
@@ -294,28 +333,7 @@ perform_base_backup(basebackup_options *opt)
SendBackupHeader(tablespaces);
/* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ setup_throttle(opt->maxrate);
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
@@ -357,7 +375,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -384,227 +402,7 @@ perform_base_backup(basebackup_options *opt)
* We've left the last tar file "open", so we can now append the
* required WAL files to it.
*/
- char pathbuf[MAXPGPATH];
- XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
- struct stat statbuf;
- List *historyFileList = NIL;
- List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
- ListCell *lc;
- TimeLineID tli;
-
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
- /* Ok, we have everything we need. Send the WAL files. */
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- FILE *fp;
- char buf[TAR_SEND_SIZE];
- size_t cnt;
- pgoff_t len = 0;
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
-
- fp = AllocateFile(pathbuf, "rb");
- if (fp == NULL)
- {
- int save_errno = errno;
-
- /*
- * Most likely reason for this is that the file was already
- * removed by a checkpoint, so check for that to get a better
- * error message.
- */
- CheckXLogRemoved(segno, tli);
-
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", pathbuf)));
- }
-
- if (fstat(fileno(fp), &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- pathbuf)));
- if (statbuf.st_size != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
-
- while ((cnt = fread(buf, 1,
- Min(sizeof(buf), wal_segment_size - len),
- fp)) > 0)
- {
- CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
-
- len += cnt;
- throttle(cnt);
-
- if (len == wal_segment_size)
- break;
- }
-
- CHECK_FREAD_ERROR(fp, pathbuf);
-
- if (len != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* wal_segment_size is a multiple of 512, so no need for padding */
-
- FreeFile(fp);
-
- /*
- * Mark file as archived, otherwise files can get archived again
- * after promotion of a new node. This is in line with
- * walreceiver.c always doing an XLogArchiveForceDone() after a
- * complete segment.
- */
- StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /*
- * Send timeline history files too. Only the latest timeline history
- * file is required for recovery, and even that only if there happens
- * to be a timeline switch in the first WAL segment that contains the
- * checkpoint record, or if we're taking a base backup from a standby
- * server and the target timeline changes while the backup is taken.
- * But they are small and highly useful for debugging purposes, so
- * better include them all, always.
- */
- foreach(lc, historyFileList)
- {
- char *fname = lfirst(lc);
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
-
- if (lstat(pathbuf, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m", pathbuf)));
-
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
-
- /* unconditionally mark file as archived */
- StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
- }
+ include_wal_files(endptr, endtli);
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
@@ -637,6 +435,24 @@ compareWalFileNames(const ListCell *a, const ListCell *b)
return strcmp(fna + 8, fnb + 8);
}
+/*
+ * list_sort comparison function, to compare size attribute of pathinfo
+ * in descending order.
+ */
+static int
+compareFileSize(const ListCell *a, const ListCell *b)
+{
+ pathinfo *fna = (pathinfo *) lfirst(a);
+ pathinfo *fnb = (pathinfo *) lfirst(b);
+
+ if (fna->size > fnb->size)
+ return -1;
+ if (fna->size < fnb->size)
+ return 1;
+ return 0;
+
+}
+
/*
* Parse the base backup options passed down by the parser
*/
@@ -652,8 +468,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_worker = false;
MemSet(opt, 0, sizeof(*opt));
+ opt->worker = -1;
foreach(lopt, options)
{
DefElem *defel = (DefElem *) lfirst(lopt);
@@ -740,6 +558,16 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "worker") == 0)
+ {
+ if (o_worker)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ opt->worker = intVal(defel->arg);
+ o_worker = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
@@ -774,7 +602,26 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ StartBackup(&opt);
+ break;
+ case SEND_FILES_CONTENT:
+ SendFilesContents(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ StopBackup(&opt);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -968,7 +815,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool sizeonly, List **files)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -997,11 +844,11 @@ sendTablespace(char *path, bool sizeonly)
return 0;
}
+ STORE_PATHINFO(files, pathbuf, true, -1);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir_(pathbuf, strlen(path), sizeonly, NIL, true, files);
return size;
}
@@ -1019,8 +866,16 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks)
+sendDir(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks)
+{
+ return sendDir_(path, basepathlen, sizeonly, tablespaces, sendtblspclinks, NULL);
+}
+
+/* Same as sendDir(), except that it also returns a list of filenames in PGDATA */
+static int64
+sendDir_(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+ bool sendtblspclinks, List **files)
{
DIR *dir;
struct dirent *de;
@@ -1174,6 +1029,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ STORE_PATHINFO(files, pathbuf, true, -1);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
excludeFound = true;
break;
@@ -1190,6 +1047,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ STORE_PATHINFO(files, pathbuf, true, -1);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
continue;
}
@@ -1211,6 +1070,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
sizeonly);
+ STORE_PATHINFO(files, pathbuf, true, -1);
+ STORE_PATHINFO(files, "./pg_wal/archive_status", true, -1);
+
continue; /* don't recurse into pg_wal */
}
@@ -1240,6 +1102,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ STORE_PATHINFO(files, pathbuf, false, statbuf.st_size);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1266,6 +1129,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
+ STORE_PATHINFO(files, pathbuf, true, -1);
+
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1296,13 +1161,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir_(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks, files);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ STORE_PATHINFO(files, pathbuf, false, statbuf.st_size);
+
+ if (!sizeonly && files == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
@@ -1743,3 +1610,710 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+/*
+ * In parallel mode, pg_stop_backup() is not called, nor are the files sent
+ * right away. Upon receiving the BASE_BACKUP call, it sends out a list of
+ * files in $PGDATA.
+ */
+static void
+StartBackup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo labelfile;
+ StringInfo tblspc_map_file = NULL;
+ int datadirpathlen;
+ List *tablespaces = NIL;
+
+ datadirpathlen = strlen(DataDir);
+
+ backup_started_in_recovery = RecoveryInProgress();
+
+ labelfile = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+
+ total_checksum_failures = 0;
+
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ labelfile, &tablespaces,
+ tblspc_map_file,
+ opt->progress, opt->sendtblspcmapfile);
+
+ /*
+ * Once do_pg_start_backup has been called, ensure that any failure causes
+ * us to abort the backup so we don't "leak" a backup counter. For this
+ * reason, *all* functionality between do_pg_start_backup() and the end of
+ * do_pg_stop_backup() should be inside the error cleanup block!
+ */
+
+ PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ {
+ tablespaceinfo *ti;
+ FILE *fp;
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /*
+ * backup_label and tablespace_map are stored into temp files for
+ * their usage are a later stage i.e. during STOP_BACKUP or while
+ * transfering files to the client.
+ */
+ fp = AllocateFile(BACKUP_LABEL_FILE_TMP, "w");
+ if (!fp)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m",
+ BACKUP_LABEL_FILE_TMP)));
+ if (fwrite(labelfile->data, labelfile->len, 1, fp) != 1 ||
+ fflush(fp) != 0 ||
+ pg_fsync(fileno(fp)) != 0 ||
+ ferror(fp) ||
+ FreeFile(fp))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ BACKUP_LABEL_FILE_TMP)));
+
+ if (opt->sendtblspcmapfile && tblspc_map_file->len > 0)
+ {
+ fp = AllocateFile(TABLESPACE_MAP_TMP, "w");
+ if (!fp)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m",
+ TABLESPACE_MAP_TMP)));
+ if (fwrite(tblspc_map_file->data, tblspc_map_file->len, 1, fp) != 1 ||
+ fflush(fp) != 0 ||
+ pg_fsync(fileno(fp)) != 0 ||
+ ferror(fp) ||
+ FreeFile(fp))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ TABLESPACE_MAP_TMP)));
+ }
+
+ /* send out the list of file in $PGDATA */
+ SendBackupFileList(opt, tablespaces);
+ }
+ PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+}
+
+/*
+ * StopBackup() - ends a parallel backup
+ *
+ * The function is called in parallel mode. It ends a parallel backup session
+ * established by 'BASE_BACKUP PARALLEL' command.
+ */
+static void
+StopBackup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+ struct stat statbuf;
+ StringInfoData buf;
+ char *labelfile;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /* read backup_label file into buffer, we need it for do_pg_stop_backup */
+ labelfile = readfile(BACKUP_LABEL_FILE_TMP, false);
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
+
+ /* stop backup */
+ endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
+
+ if (opt->includewal)
+ include_wal_files(endptr, endtli);
+
+ pq_putemptymessage('c'); /* CopyDone */
+ SendXlogRecPtrResult(endptr, endtli);
+
+ unlink(BACKUP_LABEL_FILE_TMP);
+ unlink(TABLESPACE_MAP_TMP);
+}
+
+/*
+ * SendBackupFileList() - sends a list of filenames of PGDATA
+ *
+ * The function collects a list of filenames, nessery for a full backup and sends
+ * this list to the client.
+ */
+static void
+SendBackupFileList(basebackup_options *opt, List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ foreach(lc, tablespaces)
+ {
+ List *filenames = NULL;
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+
+ if (ti->path == NULL)
+ sendDir_(".", 1, false, NIL, !opt->sendtblspcmapfile, &filenames);
+ else
+ sendTablespace(ti->path, false, &filenames);
+
+ /* sort the files in desending order, based on file size */
+ list_sort(filenames, compareFileSize);
+
+ /* Construct and send the list of filenames */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 3); /* 1 field */
+
+ /* First field - file path */
+ pq_sendstring(&buf, "path");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "isdir");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, BOOLOID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ pathinfo *pi = (pathinfo *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 3); /* number of columns */
+
+ /* send file name */
+ len = strlen(pi->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, pi->path, len);
+
+ /* send isdir */
+ pq_sendint32(&buf, 1);
+ pq_sendbytes(&buf, pi->isdir ? "t" : "f", 1);
+
+ /* send size */
+ send_int8_string(&buf, pi->size);
+
+ pq_endmessage(&buf);
+ }
+
+ pfree(filenames);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * SendFilesContents() - sends the actual files to the caller
+ *
+ * The function sends the files over to the caller using the COPY protocol.
+ */
+static void
+SendFilesContents(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ char *labelfile;
+ ListCell *lc;
+ char startxlogfilename[MAXFNAMELEN];
+ bool basetablespace = true;
+ int basepathlen = 1;
+ char ch;
+ uint32 hi,
+ lo;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /*
+ * LABEL is reused here to identify the tablespace path on server. Its empty
+ * in case of 'base' tablespace.
+ */
+ if (is_absolute_path(opt->label))
+ {
+ basepathlen = strlen(opt->label);
+ basetablespace = false;
+ }
+
+ /* retrive the backup start location from backup_label file. */
+ labelfile = readfile(BACKUP_LABEL_FILE_TMP, false);
+ if (sscanf(labelfile, "START WAL LOCATION: %X/%X (file %24s)%c",
+ &hi, &lo, startxlogfilename,
+ &ch) != 4 || ch != '\n')
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE_TMP)));
+ startptr = ((uint64) hi) << 32 | lo;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ if (opt->worker == 0 && basetablespace) /* 'base' tablespace */
+ {
+ /* Send BACKUP_LABEL_FILE file */
+ sendFileWithContent(BACKUP_LABEL_FILE, labelfile);
+
+ /* Send TABLESPACE_MAP file */
+ if (opt->sendtblspcmapfile)
+ {
+ char *mapfile = readfile(TABLESPACE_MAP_TMP, true);
+
+ if (mapfile)
+ {
+ sendFileWithContent(TABLESPACE_MAP, mapfile);
+ pfree(mapfile);
+ }
+ }
+ }
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /* Allow symbolic links in pg_tblspc only */
+ if (strstr(pathbuf, "./pg_tblspc") != NULL &&
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ char linkpath[MAXPGPATH];
+ int rllen;
+
+ rllen = readlink(pathbuf, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read symbolic link \"%s\": %m",
+ pathbuf)));
+ if (rllen >= sizeof(linkpath))
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("symbolic link \"%s\" target is too long",
+ pathbuf)));
+ linkpath[rllen] = '\0';
+
+ _tarWriteHeader(pathbuf, linkpath, &statbuf, false);
+ }
+ else if (S_ISDIR(statbuf.st_mode))
+ {
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, InvalidOid);
+ }
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report totoal checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
+
+static void
+include_wal_files(XLogRecPtr endptr, TimeLineID endtli)
+{
+ /*
+ * We've left the last tar file "open", so we can now append the required
+ * WAL files to it.
+ */
+ char pathbuf[MAXPGPATH];
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ struct stat statbuf;
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and include
+ * all WAL files in the range between 'startptr' and 'endptr', regardless
+ * of the timeline the file is stamped with. If there are some spurious
+ * WAL files belonging to timelines that don't belong in this server's
+ * history, they will be included too. Normally there shouldn't be such
+ * files, but if there are, there's little harm in including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we need
+ * were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from oldest
+ * to newest, to reduce the chance that a file is recycled before we get a
+ * chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since we
+ * are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ /* Ok, we have everything we need. Send the WAL files. */
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ FILE *fp;
+ char buf[TAR_SEND_SIZE];
+ size_t cnt;
+ pgoff_t len = 0;
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+
+ fp = AllocateFile(pathbuf, "rb");
+ if (fp == NULL)
+ {
+ int save_errno = errno;
+
+ /*
+ * Most likely reason for this is that the file was already
+ * removed by a checkpoint, so check for that to get a better
+ * error message.
+ */
+ CheckXLogRemoved(segno, tli);
+
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", pathbuf)));
+ }
+
+ if (fstat(fileno(fp), &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ pathbuf)));
+ if (statbuf.st_size != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* send the WAL file itself */
+ _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+
+ while ((cnt = fread(buf, 1,
+ Min(sizeof(buf), wal_segment_size - len),
+ fp)) > 0)
+ {
+ CheckXLogRemoved(segno, tli);
+ /* Send the chunk as a CopyData message */
+ if (pq_putmessage('d', buf, cnt))
+ ereport(ERROR,
+ (errmsg("base backup could not send data, aborting backup")));
+
+ len += cnt;
+ throttle(cnt);
+
+ if (len == wal_segment_size)
+ break;
+ }
+
+ if (len != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* wal_segment_size is a multiple of 512, so no need for padding */
+
+ FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again after
+ * promotion of a new node. This is in line with walreceiver.c always
+ * doing an XLogArchiveForceDone() after a complete segment.
+ */
+ StatusFilePath(pathbuf, walFileName, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /*
+ * Send timeline history files too. Only the latest timeline history file
+ * is required for recovery, and even that only if there happens to be a
+ * timeline switch in the first WAL segment that contains the checkpoint
+ * record, or if we're taking a base backup from a standby server and the
+ * target timeline changes while the backup is taken. But they are small
+ * and highly useful for debugging purposes, so better include them all,
+ * always.
+ */
+ foreach(lc, historyFileList)
+ {
+ char *fname = lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+}
+
+/*
+ * Setup and activate network throttling, if client requested it
+ */
+static void
+setup_throttle(int maxrate)
+{
+ /* Setup and activate network throttling, if client requested it */
+ if (maxrate > 0)
+ {
+ throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ /* Enable throttling. */
+ throttling_counter = 0;
+
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
+}
+
+static char *
+readfile(const char *readfilename, bool missing_ok)
+{
+ struct stat statbuf;
+ FILE *fp;
+ char *data;
+ int r;
+
+ if (stat(readfilename, &statbuf))
+ {
+ if (errno == ENOENT && missing_ok)
+ return NULL;
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ readfilename)));
+ }
+
+ fp = AllocateFile(readfilename, "r");
+ if (!fp)
+ {
+ if (errno == ENOENT && missing_ok)
+ return NULL;
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", readfilename)));
+ }
+
+ data = palloc(statbuf.st_size + 1);
+ r = fread(data, statbuf.st_size, 1, fp);
+ data[statbuf.st_size] = '\0';
+
+ /* Close the file */
+ if (r != 1 || ferror(fp) || FreeFile(fp))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m",
+ readfilename)));
+
+ return data;
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..88e384bf3c 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,6 +87,10 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_SEND_FILES_CONTENT
+%token K_STOP_BACKUP
+%token K_WORKER
%type <node> command
%type <node> base_backup start_replication start_logical_replication
@@ -102,6 +106,8 @@ static SQLCmd *make_sqlcmd(void);
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
+%type <list> backup_files backup_files_list
+%type <node> backup_file
%%
@@ -162,6 +168,29 @@ base_backup:
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_START_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_FILES_CONTENT backup_files base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_FILES_CONTENT;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
+ }
+ | K_STOP_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
$$ = (Node *) cmd;
}
;
@@ -214,6 +243,35 @@ base_backup_opt:
$$ = makeDefElem("noverify_checksums",
(Node *)makeInteger(true), -1);
}
+ | K_WORKER UCONST
+ {
+ $$ = makeDefElem("worker",
+ (Node *)makeInteger($2), -1);
+ }
+ ;
+
+backup_files:
+ '(' backup_files_list ')'
+ {
+ $$ = $2;
+ }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ backup_file
+ {
+ $$ = list_make1($1);
+ }
+ | backup_files_list ',' backup_file
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
;
create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..4836828c39 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,11 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+SEND_FILES_CONTENT { return K_SEND_FILES_CONTENT; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+WORKER { return K_WORKER; }
+
"," { return ','; }
";" { return ';'; }
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 55ef13926d..5139dcbe03 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -41,6 +41,7 @@
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
+#include "fe_utils/simple_list.h"
#define ERRCODE_DATA_CORRUPTED "XX001"
@@ -57,6 +58,15 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct WorkerFiles
+{
+ int num_files;
+ char *tspath;
+ SimpleStringList *worker_files;
+
+} WorkerFiles;
+
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -110,6 +120,10 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+static SimpleOidList workerspid = {NULL, NULL};
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -141,7 +155,7 @@ static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
-static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
+static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum, int worker);
static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void BaseBackup(void);
@@ -151,6 +165,10 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void ParallelBackupEnd(void);
+static int ReceiveFiles(WorkerFiles * workerFiles, int worker);
+static void create_workers_and_fetch(WorkerFiles * workerFiles);
+static int simple_list_length(SimpleStringList *list);
static void
cleanup_directories_atexit(void)
@@ -349,6 +367,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -921,7 +940,7 @@ writeTarData(
* No attempt to inspect or validate the contents of the file is done.
*/
static void
-ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
+ReceiveTarFile(PGconn *conn, PGresult *res, int rownum, int worker)
{
char filename[MAXPGPATH];
char *copybuf = NULL;
@@ -978,7 +997,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
#ifdef HAVE_LIBZ
if (compresslevel != 0)
{
- snprintf(filename, sizeof(filename), "%s/base.tar.gz", basedir);
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/base.%d.tar.gz", basedir, worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/base.tar.gz", basedir);
ztarfile = gzopen(filename, "wb");
if (gzsetparams(ztarfile, compresslevel,
Z_DEFAULT_STRATEGY) != Z_OK)
@@ -991,7 +1013,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
else
#endif
{
- snprintf(filename, sizeof(filename), "%s/base.tar", basedir);
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/base.%d.tar", basedir, worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/base.tar", basedir);
tarfile = fopen(filename, "wb");
}
}
@@ -1004,8 +1029,12 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
#ifdef HAVE_LIBZ
if (compresslevel != 0)
{
- snprintf(filename, sizeof(filename), "%s/%s.tar.gz", basedir,
- PQgetvalue(res, rownum, 0));
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/%s.%d.tar.gz", basedir,
+ PQgetvalue(res, rownum, 0), worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/%s.tar.gz", basedir,
+ PQgetvalue(res, rownum, 0));
ztarfile = gzopen(filename, "wb");
if (gzsetparams(ztarfile, compresslevel,
Z_DEFAULT_STRATEGY) != Z_OK)
@@ -1018,8 +1047,12 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
else
#endif
{
- snprintf(filename, sizeof(filename), "%s/%s.tar", basedir,
- PQgetvalue(res, rownum, 0));
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/%s.%d.tar", basedir,
+ PQgetvalue(res, rownum, 0), worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/%s.tar", basedir,
+ PQgetvalue(res, rownum, 0));
tarfile = fopen(filename, "wb");
}
}
@@ -1475,6 +1508,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*
@@ -1486,21 +1520,14 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* Directory
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
+
+ /*
+ * In parallel mode, we create directories before fetching
+ * files so its Ok if a directory already exist.
+ */
if (mkdir(filename, pg_dir_create_mode) != 0)
{
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal
- * receiver process. Also, when the WAL directory
- * location was specified, pg_wal (or pg_xlog) has
- * already been created as a symbolic link before
- * starting the actual backup. So just ignore creation
- * failures on related directories.
- */
- if (!((pg_str_endswith(filename, "/pg_wal") ||
- pg_str_endswith(filename, "/pg_xlog") ||
- pg_str_endswith(filename, "/archive_status")) &&
- errno == EEXIST))
+ if (errno != EEXIST)
{
pg_log_error("could not create directory \"%s\": %m",
filename);
@@ -1528,8 +1555,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
-
mapped_tblspc_path = get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
@@ -1716,7 +1743,8 @@ BaseBackup(void)
}
basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ psprintf("%s LABEL '%s' %s %s %s %s %s %s %s",
+ (numWorkers > 1) ? "START_BACKUP" : "BASE_BACKUP",
escaped_label,
showprogress ? "PROGRESS" : "",
includewal == FETCH_WAL ? "WAL" : "",
@@ -1830,20 +1858,102 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
+ if (numWorkers > 1)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ WorkerFiles *workerFiles = palloc0(sizeof(WorkerFiles) * tablespacecount);
+
+ tablespacehdr = res;
+
+ for (i = 0; i < tablespacecount; i++)
+ {
+ bool basetablespace;
+
+ workerFiles[i].worker_files = palloc0(sizeof(SimpleStringList) * numWorkers);
+
+ /*
+ * Get the header
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get backup header: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ workerFiles[i].tspath = PQgetvalue(tablespacehdr, i, 1);
+ workerFiles[i].num_files = 0;
+
+ for (int j = 0; j < PQntuples(res); j++)
+ {
+ const char *path = PQgetvalue(res, j, 0);
+ bool isdir = PQgetvalue(res, j, 1)[0] == 't';
+
+ if (format == 'p' && isdir)
+ {
+ char dirpath[MAXPGPATH];
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, path);
+ else
+ {
+ const char *tspath = PQgetvalue(tablespacehdr, i, 1);
+
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tspath), (path + strlen(tspath) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+ }
+
+ workerFiles[i].num_files++;
+ simple_string_list_append(&workerFiles[i].worker_files[j % numWorkers], path);
+ }
+ PQclear(res);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ create_workers_and_fetch(workerFiles);
+ ParallelBackupEnd();
+ }
+ else
+ {
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i, 0);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
+ }
if (showprogress)
{
- progress_report(PQntuples(res), NULL, true);
+ progress_report(PQntuples(tablespacehdr), NULL, true);
if (isatty(fileno(stderr)))
fprintf(stderr, "\n"); /* Need to move to next line */
}
@@ -2043,6 +2153,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2070,7 +2181,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2211,6 +2322,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2325,6 +2439,14 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2397,3 +2519,173 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+static void
+ParallelBackupEnd(void)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP %s %s",
+ includewal == FETCH_WAL ? "WAL" : "",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* receive pg_control and wal files */
+ if (format == 't')
+ ReceiveTarFile(conn, res, tablespacecount, numWorkers);
+ else
+ ReceiveAndUnpackTarFile(conn, res, tablespacecount);
+
+ PQclear(res);
+}
+
+static int
+ReceiveFiles(WorkerFiles * workerFiles, int worker)
+{
+ SimpleStringListCell *cell;
+ PGresult *res = NULL;
+ PGconn *worker_conn;
+ int i;
+
+ worker_conn = GetConnection();
+ for (i = 0; i < tablespacecount; i++)
+ {
+ SimpleStringList *files = &workerFiles[i].worker_files[worker];
+ PQExpBuffer buf = createPQExpBuffer();
+
+ if (simple_list_length(files) <= 0)
+ continue;
+
+
+ /*
+ * build query in form of: SEND_FILES_CONTENT ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
+ appendPQExpBuffer(buf, "SEND_FILES_CONTENT (");
+ for (cell = files->head; cell; cell = cell->next)
+ {
+ if (cell != files->tail)
+ appendPQExpBuffer(buf, "'%s' ,", cell->val);
+ else
+ appendPQExpBuffer(buf, "'%s'", cell->val);
+ }
+ appendPQExpBufferStr(buf, " )");
+
+ /*
+ * Add backup options to the command. we are reusing the LABEL here to
+ * keep the original tablespace path on the server.
+ */
+ appendPQExpBuffer(buf, " LABEL '%s' WORKER %u %s %s",
+ workerFiles[i].tspath,
+ worker,
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (maxrate > 0)
+ appendPQExpBuffer(buf, " MAX_RATE %u", maxrate);
+
+ if (!worker_conn)
+ return 1;
+
+ if (PQsendQuery(worker_conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(worker_conn));
+ return 1;
+ }
+ destroyPQExpBuffer(buf);
+
+ if (format == 't')
+ ReceiveTarFile(worker_conn, tablespacehdr, i, worker);
+ else
+ ReceiveAndUnpackTarFile(worker_conn, tablespacehdr, i);
+
+ res = PQgetResult(worker_conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data stream: %s",
+ PQerrorMessage(worker_conn));
+ exit(1);
+ }
+
+ res = PQgetResult(worker_conn);
+ }
+
+ PQclear(res);
+ PQfinish(worker_conn);
+
+ return 0;
+}
+
+static void
+create_workers_and_fetch(WorkerFiles * workerFiles)
+{
+ int status;
+ int pid,
+ i;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ pid = fork();
+ if (pid == 0)
+ {
+ /* in child process */
+ _exit(ReceiveFiles(workerFiles, i));
+ }
+ else if (pid < 0)
+ {
+ pg_log_error("could not create backup worker: %m");
+ exit(1);
+ }
+
+ simple_oid_list_append(&workerspid, pid);
+ if (verbose)
+ pg_log_info("backup worker (%d) created", pid);
+
+ /*
+ * Else we are in the parent process and all is well.
+ */
+ }
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ pid = waitpid(-1, &status, 0);
+
+ if (WIFEXITED(status) && WEXITSTATUS(status) == EXIT_FAILURE)
+ {
+ SimpleOidListCell *cell;
+
+ pg_log_error("backup worker (%d) failed with code %d", pid, WEXITSTATUS(status));
+
+ /* error. kill other workers and exit. */
+ for (cell = workerspid.head; cell; cell = cell->next)
+ {
+ if (pid != cell->val)
+ {
+ kill(cell->val, SIGTERM);
+ pg_log_error("backup worker killed %d", cell->val);
+ }
+ }
+
+ exit(1);
+ }
+ }
+}
+
+
+static int
+simple_list_length(SimpleStringList *list)
+{
+ int len = 0;
+ SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next, len++)
+ ;
+
+ return len;
+}
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 0000000000..6c31214f3d
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,571 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 106;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+foreach my $filename (
+ qw(base.0.tar base.1.tar base.2.tar base.3.tar))
+{
+ ok(!-f "$tempdir/backup/$filename", "backup $filename tar created");
+}
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# Tar format doesn't support filenames longer than 100 bytes.
+my $superlongname = "superlongname_" . ("x" x 100);
+my $superlongpath = "$pgdata/$superlongname";
+
+open my $file, '>', "$superlongpath"
+ or die "unable to create file $superlongpath";
+close $file;
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/tarbackup_l1", '-Ft', "-j 4" ],
+ 'pg_basebackup tar with long name fails');
+unlink "$pgdata/$superlongname";
+
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+ # Create a temporary directory in the system location and symlink it
+ # to our physical temp location. That way we can use shorter names
+ # for the tablespace directories, which hopefully won't run afoul of
+ # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+ $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup2", '-Ft', "-j 4" ],
+ 'tar format with tablespaces');
+ ok(-f "$tempdir/tarbackup2/base.0.tar", 'backup tar was created');
+ my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
+ is(scalar(@tblspc_tars), 3, 'one tablespace tar was created');
+ rmtree("$tempdir/tarbackup2");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+
+ mkdir "$tempdir/$superlongname";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc3 LOCATION '$tempdir/$superlongname';");
+ $node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/tarbackup_l3", '-Ft' , '-j 4'],
+ 'pg_basebackup tar with long symlink target');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc3;");
+ rmtree("$tempdir/tarbackup_l3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' , "-j 4"],
+ 'pg_basebackup -X stream runs in tar mode');
+ok(-f "$tempdir/backupxst/pg_wal.tar", "tar file was created");
+rmtree("$tempdir/backupxst");
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+#$node->command_checks_all(
+# [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3", '-j 4'],
+# 1,
+# [qr{^$}],
+# [qr/^WARNING.*checksum verification failed/s],
+# 'pg_basebackup correctly report the total number of checksum mismatches');
+#rmtree("$tempdir/backup_corrupt3");
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 1e3ed4e19f..f92d593e2e 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,13 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ SEND_FILES_CONTENT,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +49,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..9e792af99d 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool sizeonly, List **files);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122)
On Fri, Oct 4, 2019 at 7:02 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a big file and take more time while others get done early with smaller files? In this approach I have created a list of files in descending order based on there sizes so all the big size files will come at the top. The maximum file size in PG is 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that one worker gets a file of 1GB to process while others get files of smaller size. However with this approach of descending files based on size and handing it out to workers one by one, there is a very high likelihood of workers getting work evenly. does this address your concerns?
Somewhat, but I'm not sure it's good enough. There are lots of reasons
why two processes that are started at the same time with the same
amount of work might not finish at the same time.
I'm also not particularly excited about having the server do the
sorting based on file size. Seems like that ought to be the client's
job, if the client needs the sorting.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Thanks Asif for the patch. I am opting this for a review. Patch is
bit big, so here are very initial comments to make the review process
easier.
1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.
Example:
a: setup_throttle
b: include_wal_files
2) As I can see this patch basically have three major phase.
a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
STOP_BACKUP.
b) Implementation of actual parallel backup.
c) Testcase
I would suggest, if you can break out in three as a separate patch that
would be nice. It will benefit in reviewing the patch.
3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1]/messages/by-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com, and I think we
should use that patch to get the backup manifests for parallel backup.
Further, I will continue to review patch but meanwhile if you can
break the patches - so that review process be easier.
[1]: /messages/by-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
/messages/by-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
Thanks,
On Fri, Oct 4, 2019 at 4:32 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in
given list.
pg_basebackup will then send back a list of filenames in this
command. This commands will be send by each worker and that worker will be
getting the said files.Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.I considered this approach initially, however, I adopted the current
strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.I have updated the patch (see the attached patch) to include tablespace
support, tar format support and all other backup base backup options to
work in parallel mode as well. As previously suggested, I have removed
BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
backup. The tar format will write multiple tar files depending upon the
number of workers specified. Also made all commands
(START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
base_backup_opt_list. This way the command-line options can also be
provided to these commands. Since the command-line options don't change
once the backup initiates, I went this way instead of storing them in
shared state.The START_BACKUP command will now return a sorted list of files in
descending order based on file sizes. This way, the larger files will be on
top of the list. hence these files will be assigned to workers one by one,
making it so that the larger files will be copied before other files.Based on my understanding your main concern is that the files won't be
distributed fairly i.e one worker might get a big file and take more time
while others get done early with smaller files? In this approach I have
created a list of files in descending order based on there sizes so all the
big size files will come at the top. The maximum file size in PG is 1GB so
if we have four workers who are picking up file from the list one by one,
the worst case scenario is that one worker gets a file of 1GB to process
while others get files of smaller size. However with this approach of
descending files based on size and handing it out to workers one by one,
there is a very high likelihood of workers getting work evenly. does this
address your concerns?Furthermore the patch also includes the regression test. As t/
010_pg_basebackup.pl test-case is testing base backup comprehensively, so
I have duplicated it to "t/040_pg_basebackup_parallel.pl" and added
parallel option in all of its tests, to make sure parallel mode works
expectantly. The one thing that differs from base backup is the file
checksum reporting. In parallel mode, the total number of checksum failures
are not reported correctly however it will abort the backup whenever a
checksum failure occurs. This is because processes are not maintaining any
shared state. I assume that it's not much important to report total number
of failures vs noticing the failure and aborting.--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
--
Rushabh Lathia
On Mon, Oct 7, 2019 at 1:52 PM Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:
Thanks Asif for the patch. I am opting this for a review. Patch is
bit big, so here are very initial comments to make the review process
easier.
Thanks Rushabh for reviewing the patch.
1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.Example:
a: setup_throttle
b: include_wal_files2) As I can see this patch basically have three major phase.
a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
STOP_BACKUP.
b) Implementation of actual parallel backup.
c) TestcaseI would suggest, if you can break out in three as a separate patch that
would be nice. It will benefit in reviewing the patch.
Sure, why not. I will break them into multiple patches.
3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.
Sure. Though the backup manifest patch calculates and includes the checksum
of backup files and is done
while the file is being transferred to the frontend-end. The manifest file
itself is copied at the
very end of the backup. In parallel backup, I need the list of filenames
before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file has
to be available when START_BACKUP
is called.
That means, backup manifest should support its creation while excluding the
checksum during START_BACKUP().
I also need the directory information as well for two reasons:
- In plain format, base path has to exist before we can write the file. we
can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but those
directories although empty, are still
expected in PGDATA.
I can make these changes part of parallel backup (which would be on top of
backup manifest patch) or
these changes can be done as part of manifest patch and then parallel can
use them.
Robert what do you suggest?
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
while the file is being transferred to the frontend-end. The manifest file itself is copied at the
very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
is called.That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
I also need the directory information as well for two reasons:- In plain format, base path has to exist before we can write the file. we can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but those directories although empty, are still
expected in PGDATA.I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
these changes can be done as part of manifest patch and then parallel can use them.Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit. I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.
I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.
I still think that the files should be requested one at a time, not a
huge long list in a single command.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
Sure. Though the backup manifest patch calculates and includes the
checksum of backup files and is done
while the file is being transferred to the frontend-end. The manifest
file itself is copied at the
very end of the backup. In parallel backup, I need the list of filenames
before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file
has to be available when START_BACKUP
is called.
That means, backup manifest should support its creation while excluding
the checksum during START_BACKUP().
I also need the directory information as well for two reasons:
- In plain format, base path has to exist before we can write the file.
we can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but thosedirectories although empty, are still
expected in PGDATA.
I can make these changes part of parallel backup (which would be on top
of backup manifest patch) or
these changes can be done as part of manifest patch and then parallel
can use them.
Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.
Okay.
I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.
yes current patch already returns the result set. will add the additional
information.
I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.
Currently pg_basebackup does not enter in exclusive backup mode and other
tools have to
use pg_start_backup() and pg_stop_backup() functions to achieve that. Since
we are breaking
backup into multiple command, I believe it would be a good idea to have
this option. I will include
it in next revision of this patch.
I still think that the files should be requested one at a time, not a
huge long list in a single command.
sure, will make the change.
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Mon, Oct 7, 2019 at 6:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
Sure. Though the backup manifest patch calculates and includes the
checksum of backup files and is done
while the file is being transferred to the frontend-end. The manifest
file itself is copied at the
very end of the backup. In parallel backup, I need the list of filenames
before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file
has to be available when START_BACKUP
is called.
That means, backup manifest should support its creation while excluding
the checksum during START_BACKUP().
I also need the directory information as well for two reasons:
- In plain format, base path has to exist before we can write the file.
we can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but thosedirectories although empty, are still
expected in PGDATA.
I can make these changes part of parallel backup (which would be on top
of backup manifest patch) or
these changes can be done as part of manifest patch and then parallel
can use them.
Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit. I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.I still think that the files should be requested one at a time, not a
huge long list in a single command.
What about have an API to get the single file or list of files? We will use
a single file in
our application and other tools can get the benefit of list of files.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Ibrar Ahmed
On Mon, Oct 7, 2019 at 9:43 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
What about have an API to get the single file or list of files? We will use a single file in
our application and other tools can get the benefit of list of files.
That sounds a bit speculative to me. Who is to say that anyone will
find that useful? I mean, I think it's fine and good to build the
functionality that we need in a way that maximizes the likelihood that
other tools can reuse that functionality, and I think we should do
that. But I don't think it's smart to build functionality that we
don't really need in the hope that somebody else will find it useful
unless we're pretty sure that they actually will. I don't see that as
being the case here; YMMV.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:Sure. Though the backup manifest patch calculates and includes the
checksum of backup files and is done
while the file is being transferred to the frontend-end. The manifest
file itself is copied at the
very end of the backup. In parallel backup, I need the list of
filenames before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file
has to be available when START_BACKUP
is called.
That means, backup manifest should support its creation while excluding
the checksum during START_BACKUP().
I also need the directory information as well for two reasons:
- In plain format, base path has to exist before we can write the file.
we can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories butthose directories although empty, are still
expected in PGDATA.
I can make these changes part of parallel backup (which would be on top
of backup manifest patch) or
these changes can be done as part of manifest patch and then parallel
can use them.
Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.Okay.
I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.yes current patch already returns the result set. will add the additional
information.I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.Currently pg_basebackup does not enter in exclusive backup mode and other
tools have to
use pg_start_backup() and pg_stop_backup() functions to achieve that.
Since we are breaking
backup into multiple command, I believe it would be a good idea to have
this option. I will include
it in next revision of this patch.I still think that the files should be requested one at a time, not a
huge long list in a single command.sure, will make the change.
I have refactored the functionality into multiple smaller patches in order
to make the review process easier. I have divided the code into backend
changes and pg_basebackup changes. The
backend replication system now supports the following commands:
- START_BACKUP
- SEND_FILE_LIST
- SEND_FILES_CONTENT
- STOP_BACKUP
The START_BACKUP will not return the list of files, instead SEND_FILE_LIST
is used for that. The START_BACKUP
now calls pg_start_backup and returns starting WAL position, tablespace
header information and content of backup label file.
Initially I was using tmp files to store the backup_label content but that
turns out to be bad idea, because there can be multiple
non-exclusive backups running. The backup label information is needed by
stop_backup so pg_basebackup will send it as part
of STOP_BACKUP.
The SEND_FILE_LIST will return the list of files. It will be returned as
resultset having four columns (filename, type, size, mtime).
The SEND_FILES_CONTENT can now return the single file or multiple files as
required. There is not much change required to
support both, so I believe it will be much useable this way if other tools
want to utilise it.
As per suggestion from Robert, I am currently working on making changes in
pg_basebackup to fetch files one by one. However that's not complete and
the attach patch
is still using the old method of multi-file fetching to test the backend
commands. I will send an updated patch which will contain the changes on
fetching file one by one.
I wanted to share the backend patch to get some feedback in the mean time.
Thanks,
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0001-Refactor-some-basebackup-code-to-increase-reusabilit.patchapplication/octet-stream; name=0001-Refactor-some-basebackup-code-to-increase-reusabilit.patchDownload
From 2ee6bc8d60ab73f8426166ca1864a062e9a51431 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 9 Oct 2019 12:39:41 +0500
Subject: [PATCH 1/4] Refactor some basebackup code to increase reusability, in
anticipation of adding parallel backup
---
src/backend/access/transam/xlog.c | 191 +++++-----
src/backend/replication/basebackup.c | 512 ++++++++++++++-------------
src/include/access/xlog.h | 2 +
3 files changed, 372 insertions(+), 333 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0ff9af53fe..577feccea5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10282,10 +10282,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
{
bool gotUniqueStartpoint = false;
- DIR *tblspcdir;
- struct dirent *de;
- tablespaceinfo *ti;
- int datadirpathlen;
/*
* Force an XLOG file switch before the checkpoint, to ensure that the
@@ -10411,93 +10407,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
if (exclusive)
tblspcmapfile = makeStringInfo();
- datadirpathlen = strlen(DataDir);
-
/* Collect information about all tablespaces */
- tblspcdir = AllocateDir("pg_tblspc");
- while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
- {
- char fullpath[MAXPGPATH + 10];
- char linkpath[MAXPGPATH];
- char *relpath = NULL;
- int rllen;
- StringInfoData buflinkpath;
- char *s = linkpath;
-
- /* Skip special stuff */
- if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
- continue;
-
- snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
-
-#if defined(HAVE_READLINK) || defined(WIN32)
- rllen = readlink(fullpath, linkpath, sizeof(linkpath));
- if (rllen < 0)
- {
- ereport(WARNING,
- (errmsg("could not read symbolic link \"%s\": %m",
- fullpath)));
- continue;
- }
- else if (rllen >= sizeof(linkpath))
- {
- ereport(WARNING,
- (errmsg("symbolic link \"%s\" target is too long",
- fullpath)));
- continue;
- }
- linkpath[rllen] = '\0';
-
- /*
- * Add the escape character '\\' before newline in a string to
- * ensure that we can distinguish between the newline in the
- * tablespace path and end of line while reading tablespace_map
- * file during archive recovery.
- */
- initStringInfo(&buflinkpath);
-
- while (*s)
- {
- if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
- appendStringInfoChar(&buflinkpath, '\\');
- appendStringInfoChar(&buflinkpath, *s++);
- }
-
- /*
- * Relpath holds the relative path of the tablespace directory
- * when it's located within PGDATA, or NULL if it's located
- * elsewhere.
- */
- if (rllen > datadirpathlen &&
- strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
- IS_DIR_SEP(linkpath[datadirpathlen]))
- relpath = linkpath + datadirpathlen + 1;
-
- ti = palloc(sizeof(tablespaceinfo));
- ti->oid = pstrdup(de->d_name);
- ti->path = pstrdup(buflinkpath.data);
- ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
-
- if (tablespaces)
- *tablespaces = lappend(*tablespaces, ti);
-
- appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
-
- pfree(buflinkpath.data);
-#else
-
- /*
- * If the platform does not have symbolic links, it should not be
- * possible to have tablespaces - clearly somebody else created
- * them. Warn about it and ignore.
- */
- ereport(WARNING,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("tablespaces are not supported on this platform")));
-#endif
- }
- FreeDir(tblspcdir);
+ collectTablespaces(tablespaces, tblspcmapfile, infotbssize, needtblspcmapfile);
/*
* Construct backup label file
@@ -12261,3 +12172,103 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Collect information about all tablespaces.
+ */
+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool infotbssize, bool needtblspcmapfile)
+{
+ DIR *tblspcdir;
+ struct dirent *de;
+ tablespaceinfo *ti;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ /* Collect information about all tablespaces */
+ tblspcdir = AllocateDir("pg_tblspc");
+ while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+ {
+ char fullpath[MAXPGPATH + 10];
+ char linkpath[MAXPGPATH];
+ char *relpath = NULL;
+ int rllen;
+ StringInfoData buflinkpath;
+ char *s = linkpath;
+
+ /* Skip special stuff */
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+
+#if defined(HAVE_READLINK) || defined(WIN32)
+ rllen = readlink(fullpath, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not read symbolic link \"%s\": %m",
+ fullpath)));
+ continue;
+ }
+ else if (rllen >= sizeof(linkpath))
+ {
+ ereport(WARNING,
+ (errmsg("symbolic link \"%s\" target is too long",
+ fullpath)));
+ continue;
+ }
+ linkpath[rllen] = '\0';
+
+ /*
+ * Add the escape character '\\' before newline in a string to
+ * ensure that we can distinguish between the newline in the
+ * tablespace path and end of line while reading tablespace_map
+ * file during archive recovery.
+ */
+ initStringInfo(&buflinkpath);
+
+ while (*s)
+ {
+ if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
+ appendStringInfoChar(&buflinkpath, '\\');
+ appendStringInfoChar(&buflinkpath, *s++);
+ }
+
+ /*
+ * Relpath holds the relative path of the tablespace directory
+ * when it's located within PGDATA, or NULL if it's located
+ * elsewhere.
+ */
+ if (rllen > datadirpathlen &&
+ strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
+ IS_DIR_SEP(linkpath[datadirpathlen]))
+ relpath = linkpath + datadirpathlen + 1;
+
+ ti = palloc(sizeof(tablespaceinfo));
+ ti->oid = pstrdup(de->d_name);
+ ti->path = pstrdup(buflinkpath.data);
+ ti->rpath = relpath ? pstrdup(relpath) : NULL;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
+
+ if (tablespaces)
+ *tablespaces = lappend(*tablespaces, ti);
+
+ appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
+
+ pfree(buflinkpath.data);
+#else
+
+ /*
+ * If the platform does not have symbolic links, it should not be
+ * possible to have tablespaces - clearly somebody else created
+ * them. Warn about it and ignore.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("tablespaces are not supported on this platform")));
+#endif
+ }
+ FreeDir(tblspcdir);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d0f210de8c..a05a97ded2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -68,10 +68,12 @@ static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
static void perform_base_backup(basebackup_options *opt);
+static void include_wal_files(XLogRecPtr endptr, TimeLineID endtli);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
+static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
/* Was the backup currently in-progress initiated in recovery mode? */
@@ -294,28 +296,7 @@ perform_base_backup(basebackup_options *opt)
SendBackupHeader(tablespaces);
/* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ setup_throttle(opt->maxrate);
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
@@ -384,227 +365,7 @@ perform_base_backup(basebackup_options *opt)
* We've left the last tar file "open", so we can now append the
* required WAL files to it.
*/
- char pathbuf[MAXPGPATH];
- XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
- struct stat statbuf;
- List *historyFileList = NIL;
- List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
- ListCell *lc;
- TimeLineID tli;
-
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
- /* Ok, we have everything we need. Send the WAL files. */
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- FILE *fp;
- char buf[TAR_SEND_SIZE];
- size_t cnt;
- pgoff_t len = 0;
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
-
- fp = AllocateFile(pathbuf, "rb");
- if (fp == NULL)
- {
- int save_errno = errno;
-
- /*
- * Most likely reason for this is that the file was already
- * removed by a checkpoint, so check for that to get a better
- * error message.
- */
- CheckXLogRemoved(segno, tli);
-
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", pathbuf)));
- }
-
- if (fstat(fileno(fp), &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- pathbuf)));
- if (statbuf.st_size != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
-
- while ((cnt = fread(buf, 1,
- Min(sizeof(buf), wal_segment_size - len),
- fp)) > 0)
- {
- CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
-
- len += cnt;
- throttle(cnt);
-
- if (len == wal_segment_size)
- break;
- }
-
- CHECK_FREAD_ERROR(fp, pathbuf);
-
- if (len != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* wal_segment_size is a multiple of 512, so no need for padding */
-
- FreeFile(fp);
-
- /*
- * Mark file as archived, otherwise files can get archived again
- * after promotion of a new node. This is in line with
- * walreceiver.c always doing an XLogArchiveForceDone() after a
- * complete segment.
- */
- StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /*
- * Send timeline history files too. Only the latest timeline history
- * file is required for recovery, and even that only if there happens
- * to be a timeline switch in the first WAL segment that contains the
- * checkpoint record, or if we're taking a base backup from a standby
- * server and the target timeline changes while the backup is taken.
- * But they are small and highly useful for debugging purposes, so
- * better include them all, always.
- */
- foreach(lc, historyFileList)
- {
- char *fname = lfirst(lc);
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
-
- if (lstat(pathbuf, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m", pathbuf)));
-
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
-
- /* unconditionally mark file as archived */
- StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
- }
+ include_wal_files(endptr, endtli);
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
@@ -1743,3 +1504,268 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+/*
+ * Append the required WAL files to the backup tar file. It assumes that the
+ * last tar file is "open" and the WALs will be appended to it.
+ */
+static void
+include_wal_files(XLogRecPtr endptr, TimeLineID endtli)
+{
+ /*
+ * We've left the last tar file "open", so we can now append the
+ * required WAL files to it.
+ */
+ char pathbuf[MAXPGPATH];
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ struct stat statbuf;
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and
+ * include all WAL files in the range between 'startptr' and 'endptr',
+ * regardless of the timeline the file is stamped with. If there are
+ * some spurious WAL files belonging to timelines that don't belong in
+ * this server's history, they will be included too. Normally there
+ * shouldn't be such files, but if there are, there's little harm in
+ * including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we
+ * need were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from
+ * oldest to newest, to reduce the chance that a file is recycled
+ * before we get a chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since
+ * we are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ /* Ok, we have everything we need. Send the WAL files. */
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ FILE *fp;
+ char buf[TAR_SEND_SIZE];
+ size_t cnt;
+ pgoff_t len = 0;
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+
+ fp = AllocateFile(pathbuf, "rb");
+ if (fp == NULL)
+ {
+ int save_errno = errno;
+
+ /*
+ * Most likely reason for this is that the file was already
+ * removed by a checkpoint, so check for that to get a better
+ * error message.
+ */
+ CheckXLogRemoved(segno, tli);
+
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", pathbuf)));
+ }
+
+ if (fstat(fileno(fp), &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ pathbuf)));
+ if (statbuf.st_size != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* send the WAL file itself */
+ _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+
+ while ((cnt = fread(buf, 1,
+ Min(sizeof(buf), wal_segment_size - len),
+ fp)) > 0)
+ {
+ CheckXLogRemoved(segno, tli);
+ /* Send the chunk as a CopyData message */
+ if (pq_putmessage('d', buf, cnt))
+ ereport(ERROR,
+ (errmsg("base backup could not send data, aborting backup")));
+
+ len += cnt;
+ throttle(cnt);
+
+ if (len == wal_segment_size)
+ break;
+ }
+
+ CHECK_FREAD_ERROR(fp, pathbuf);
+
+ if (len != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* wal_segment_size is a multiple of 512, so no need for padding */
+
+ FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again
+ * after promotion of a new node. This is in line with
+ * walreceiver.c always doing an XLogArchiveForceDone() after a
+ * complete segment.
+ */
+ StatusFilePath(pathbuf, walFileName, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /*
+ * Send timeline history files too. Only the latest timeline history
+ * file is required for recovery, and even that only if there happens
+ * to be a timeline switch in the first WAL segment that contains the
+ * checkpoint record, or if we're taking a base backup from a standby
+ * server and the target timeline changes while the backup is taken.
+ * But they are small and highly useful for debugging purposes, so
+ * better include them all, always.
+ */
+ foreach(lc, historyFileList)
+ {
+ char *fname = lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+}
+
+/*
+ * Setup and activate network throttling, if client requested it
+ */
+static void
+setup_throttle(int maxrate)
+{
+ /* Setup and activate network throttling, if client requested it */
+ if (maxrate > 0)
+ {
+ throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ /* Enable throttling. */
+ throttling_counter = 0;
+
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..5b0aa8ae85 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -350,6 +350,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
bool needtblspcmapfile);
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
+extern void collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile);
extern void do_pg_abort_backup(void);
extern SessionBackupState get_backup_status(void);
--
2.21.0 (Apple Git-122)
0004-parallel-backup-testcase.patchapplication/octet-stream; name=0004-parallel-backup-testcase.patchDownload
From 35fd5cf5701010cebacd1be3ebd1d9f7af841d3a Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 21:54:23 +0500
Subject: [PATCH 4/4] parallel backup - testcase
---
.../t/040_pg_basebackup_parallel.pl | 571 ++++++++++++++++++
1 file changed, 571 insertions(+)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 0000000000..6c31214f3d
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,571 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 106;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+foreach my $filename (
+ qw(base.0.tar base.1.tar base.2.tar base.3.tar))
+{
+ ok(!-f "$tempdir/backup/$filename", "backup $filename tar created");
+}
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# Tar format doesn't support filenames longer than 100 bytes.
+my $superlongname = "superlongname_" . ("x" x 100);
+my $superlongpath = "$pgdata/$superlongname";
+
+open my $file, '>', "$superlongpath"
+ or die "unable to create file $superlongpath";
+close $file;
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/tarbackup_l1", '-Ft', "-j 4" ],
+ 'pg_basebackup tar with long name fails');
+unlink "$pgdata/$superlongname";
+
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+ # Create a temporary directory in the system location and symlink it
+ # to our physical temp location. That way we can use shorter names
+ # for the tablespace directories, which hopefully won't run afoul of
+ # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+ $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup2", '-Ft', "-j 4" ],
+ 'tar format with tablespaces');
+ ok(-f "$tempdir/tarbackup2/base.0.tar", 'backup tar was created');
+ my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
+ is(scalar(@tblspc_tars), 3, 'one tablespace tar was created');
+ rmtree("$tempdir/tarbackup2");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+
+ mkdir "$tempdir/$superlongname";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc3 LOCATION '$tempdir/$superlongname';");
+ $node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/tarbackup_l3", '-Ft' , '-j 4'],
+ 'pg_basebackup tar with long symlink target');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc3;");
+ rmtree("$tempdir/tarbackup_l3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' , "-j 4"],
+ 'pg_basebackup -X stream runs in tar mode');
+ok(-f "$tempdir/backupxst/pg_wal.tar", "tar file was created");
+rmtree("$tempdir/backupxst");
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+#$node->command_checks_all(
+# [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3", '-j 4'],
+# 1,
+# [qr{^$}],
+# [qr/^WARNING.*checksum verification failed/s],
+# 'pg_basebackup correctly report the total number of checksum mismatches');
+#rmtree("$tempdir/backup_corrupt3");
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
--
2.21.0 (Apple Git-122)
0003-pg_basebackup-changes-for-parallel-backup.patchapplication/octet-stream; name=0003-pg_basebackup-changes-for-parallel-backup.patchDownload
From 245b10802490fafeba7b17779e5c2860fbc1181c Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 14 Oct 2019 17:28:58 +0500
Subject: [PATCH 3/4] pg_basebackup changes for parallel backup.
---
src/bin/pg_basebackup/pg_basebackup.c | 583 ++++++++++++++++++++++++--
1 file changed, 548 insertions(+), 35 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 55ef13926d..311c1f94ca 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -41,6 +41,7 @@
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
+#include "fe_utils/simple_list.h"
#define ERRCODE_DATA_CORRUPTED "XX001"
@@ -57,6 +58,37 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
+
+typedef struct
+{
+ Oid tblspcOid;
+ char *tablespace; /* tablespace name or NULL if 'base' tablespace */
+ int numFiles; /* number of files */
+ BackupFile *backupFiles; /* list of files in tablespace */
+} TablespaceInfo;
+
+typedef struct
+{
+ int tablespacecount;
+ int numWorkers;
+
+ char xlogstart[64];
+ char *backup_label;
+ char *tablespace_map;
+
+ TablespaceInfo *tsInfo;
+ SimpleStringList **worker_files;
+} BackupInfo;
+
+static BackupInfo *backupInfo = NULL;
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -110,6 +142,10 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+static SimpleOidList workerspid = {NULL, NULL};
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -141,7 +177,7 @@ static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
-static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
+static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum, int worker);
static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void BaseBackup(void);
@@ -151,6 +187,16 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void ParallelBackupEnd(void);
+static void GetBackupFilesList(PGconn *conn, BackupInfo *binfo);
+static int ReceiveFiles(BackupInfo *backupInfo, int worker);
+static int compareFileSize(const void *a, const void *b);
+static void create_workers_and_fetch(BackupInfo *backupInfo);
+static void read_label_tblspcmap(PGconn *conn, char **backup_label, char **tablespace_map);
+static void create_backup_dirs(bool basetablespace, char *tablespace, char *name);
+static void writefile(char *path, char *buf);
+static int simple_list_length(SimpleStringList *list);
+
static void
cleanup_directories_atexit(void)
@@ -349,6 +395,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -921,7 +968,7 @@ writeTarData(
* No attempt to inspect or validate the contents of the file is done.
*/
static void
-ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
+ReceiveTarFile(PGconn *conn, PGresult *res, int rownum, int worker)
{
char filename[MAXPGPATH];
char *copybuf = NULL;
@@ -978,7 +1025,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
#ifdef HAVE_LIBZ
if (compresslevel != 0)
{
- snprintf(filename, sizeof(filename), "%s/base.tar.gz", basedir);
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/base.%d.tar.gz", basedir, worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/base.tar.gz", basedir);
ztarfile = gzopen(filename, "wb");
if (gzsetparams(ztarfile, compresslevel,
Z_DEFAULT_STRATEGY) != Z_OK)
@@ -991,7 +1041,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
else
#endif
{
- snprintf(filename, sizeof(filename), "%s/base.tar", basedir);
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/base.%d.tar", basedir, worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/base.tar", basedir);
tarfile = fopen(filename, "wb");
}
}
@@ -1004,8 +1057,12 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
#ifdef HAVE_LIBZ
if (compresslevel != 0)
{
- snprintf(filename, sizeof(filename), "%s/%s.tar.gz", basedir,
- PQgetvalue(res, rownum, 0));
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/%s.%d.tar.gz", basedir,
+ PQgetvalue(res, rownum, 0), worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/%s.tar.gz", basedir,
+ PQgetvalue(res, rownum, 0));
ztarfile = gzopen(filename, "wb");
if (gzsetparams(ztarfile, compresslevel,
Z_DEFAULT_STRATEGY) != Z_OK)
@@ -1018,8 +1075,12 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
else
#endif
{
- snprintf(filename, sizeof(filename), "%s/%s.tar", basedir,
- PQgetvalue(res, rownum, 0));
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/%s.%d.tar", basedir,
+ PQgetvalue(res, rownum, 0), worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/%s.tar", basedir,
+ PQgetvalue(res, rownum, 0));
tarfile = fopen(filename, "wb");
}
}
@@ -1082,6 +1143,45 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
MemSet(zerobuf, 0, sizeof(zerobuf));
+ if (numWorkers > 1 && basetablespace && worker == 0)
+ {
+ char header[512];
+ int padding;
+ int len;
+
+ /* add backup_label and tablespace_map files to the tar */
+ len = strlen(backupInfo->backup_label);
+ tarCreateHeader(header,
+ "backup_label",
+ NULL,
+ len,
+ pg_file_create_mode, 04000, 02000,
+ time(NULL));
+
+ padding = ((len + 511) & ~511) - len;
+ WRITE_TAR_DATA(header, sizeof(header));
+ WRITE_TAR_DATA(backupInfo->backup_label, len);
+ if (padding)
+ WRITE_TAR_DATA(zerobuf, padding);
+
+ if (backupInfo->tablespace_map)
+ {
+ len = strlen(backupInfo->tablespace_map);
+ tarCreateHeader(header,
+ "tablespace_map",
+ NULL,
+ len,
+ pg_file_create_mode, 04000, 02000,
+ time(NULL));
+
+ padding = ((len + 511) & ~511) - len;
+ WRITE_TAR_DATA(header, sizeof(header));
+ WRITE_TAR_DATA(backupInfo->tablespace_map, len);
+ if (padding)
+ WRITE_TAR_DATA(zerobuf, padding);
+ }
+ }
+
if (basetablespace && writerecoveryconf)
{
char header[512];
@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*
@@ -1486,21 +1587,14 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* Directory
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
+
+ /*
+ * In parallel mode, we create directories before fetching
+ * files so its Ok if a directory already exist.
+ */
if (mkdir(filename, pg_dir_create_mode) != 0)
{
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal
- * receiver process. Also, when the WAL directory
- * location was specified, pg_wal (or pg_xlog) has
- * already been created as a symbolic link before
- * starting the actual backup. So just ignore creation
- * failures on related directories.
- */
- if (!((pg_str_endswith(filename, "/pg_wal") ||
- pg_str_endswith(filename, "/pg_xlog") ||
- pg_str_endswith(filename, "/archive_status")) &&
- errno == EEXIST))
+ if (errno != EEXIST)
{
pg_log_error("could not create directory \"%s\": %m",
filename);
@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
-
mapped_tblspc_path = get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
@@ -1716,7 +1810,8 @@ BaseBackup(void)
}
basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ psprintf("%s LABEL '%s' %s %s %s %s %s %s %s",
+ (numWorkers > 1) ? "START_BACKUP" : "BASE_BACKUP",
escaped_label,
showprogress ? "PROGRESS" : "",
includewal == FETCH_WAL ? "WAL" : "",
@@ -1774,7 +1869,7 @@ BaseBackup(void)
/*
* Get the header
*/
- res = PQgetResult(conn);
+ tablespacehdr = res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not get backup header: %s",
@@ -1830,20 +1925,62 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
+ if (numWorkers > 1)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ backupInfo = palloc0(sizeof(BackupInfo));
+
+ backupInfo->tablespacecount = tablespacecount;
+ backupInfo->numWorkers = numWorkers;
+ strlcpy(backupInfo->xlogstart, xlogstart, sizeof(backupInfo->xlogstart));
+ read_label_tblspcmap(conn, &backupInfo->backup_label, &backupInfo->tablespace_map);
+
+ /* retrive backup files from server. **/
+ GetBackupFilesList(conn, backupInfo);
+
+ /*
+ * add backup_label in backup, (for tar format, ReceiveTarFile() will
+ * takecare of it).
+ */
+ if (format == 'p')
+ writefile("backup_label", backupInfo->backup_label);
+
+ /*
+ * The backup files list is already in descending order, distribute it
+ * to workers.
+ */
+ backupInfo->worker_files = palloc0(sizeof(SimpleStringList) * tablespacecount);
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[i];
+
+ backupInfo->worker_files[i] = palloc0(sizeof(SimpleStringList) * numWorkers);
+ for (int j = 0; j < curTsInfo->numFiles; j++)
+ {
+ simple_string_list_append(&backupInfo->worker_files[i][j % numWorkers],
+ curTsInfo->backupFiles[j].name);
+ }
+ }
+
+ create_workers_and_fetch(backupInfo);
+ ParallelBackupEnd();
+ }
+ else
+ {
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i, 0);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
+ }
if (showprogress)
{
- progress_report(PQntuples(res), NULL, true);
+ progress_report(PQntuples(tablespacehdr), NULL, true);
if (isatty(fileno(stderr)))
fprintf(stderr, "\n"); /* Need to move to next line */
}
@@ -2043,6 +2180,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2070,7 +2208,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2211,6 +2349,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2325,6 +2466,14 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2397,3 +2546,367 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+static void
+ParallelBackupEnd(void)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP LABEL '%s' %s %s",
+ backupInfo->backup_label,
+ includewal == FETCH_WAL ? "WAL" : "",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* receive pg_control and wal files */
+ if (format == 't')
+ ReceiveTarFile(conn, res, tablespacecount, numWorkers);
+ else
+ ReceiveAndUnpackTarFile(conn, res, tablespacecount);
+
+ PQclear(res);
+}
+
+static void
+GetBackupFilesList(PGconn *conn, BackupInfo *backupInfo)
+{
+ int i;
+ PGresult *res = NULL;
+ char *basebkp;
+
+ backupInfo->tsInfo = palloc0(sizeof(TablespaceInfo) * backupInfo->tablespacecount);
+ TablespaceInfo *tsInfo = backupInfo->tsInfo;
+
+ /*
+ * Get list of files.
+ */
+ basebkp = psprintf("SEND_FILE_LIST %s",
+ format == 't' ? "TABLESPACE_MAP" : "");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "SEND_FILE_LIST", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /*
+ * The list of files is grouped by tablespaces, and we want to fetch them
+ * in the same order.
+ */
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ bool basetablespace;
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get backup header: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ tsInfo[i].tblspcOid = atol(PQgetvalue(tablespacehdr, i, 0));
+ tsInfo[i].tablespace = PQgetvalue(tablespacehdr, i, 1);
+ tsInfo[i].numFiles = PQntuples(res);
+ tsInfo[i].backupFiles =
+ palloc0(sizeof(BackupFile) * tsInfo[i].numFiles);
+
+ for (int j = 0; j < tsInfo[i].numFiles; j++)
+ {
+ char *name = PQgetvalue(res, j, 0);
+ char type = PQgetvalue(res, j, 1)[0];
+ int32 size = atol(PQgetvalue(res, j, 2));
+ time_t mtime = atol(PQgetvalue(res, j, 3));
+
+ /*
+ * In 'plain' format, create backup directories first.
+ */
+ if (format == 'p' && type == 'd')
+ create_backup_dirs(basetablespace, tsInfo[i].tablespace, name);
+
+ strlcpy(tsInfo[i].backupFiles[j].name, name, MAXPGPATH);
+ tsInfo[i].backupFiles[j].type = type;
+ tsInfo[i].backupFiles[j].size = size;
+ tsInfo[i].backupFiles[j].mtime = mtime;
+ }
+
+ /* sort files in descending order, based on size */
+ qsort(tsInfo[i].backupFiles, tsInfo[i].numFiles,
+ sizeof(BackupFile), &compareFileSize);
+ PQclear(res);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s", PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+}
+
+static int
+ReceiveFiles(BackupInfo *backupInfo, int worker)
+{
+ SimpleStringListCell *cell;
+ PGresult *res = NULL;
+ PGconn *worker_conn;
+ int i;
+
+ worker_conn = GetConnection();
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[i];
+ SimpleStringList *files = &backupInfo->worker_files[i][worker];
+ PQExpBuffer buf = createPQExpBuffer();
+
+ if (simple_list_length(files) <= 0)
+ continue;
+
+
+ /*
+ * build query in form of: SEND_FILES_CONTENT ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
+ appendPQExpBuffer(buf, "SEND_FILES_CONTENT (");
+ for (cell = files->head; cell; cell = cell->next)
+ {
+ if (cell != files->tail)
+ appendPQExpBuffer(buf, "'%s' ,", cell->val);
+ else
+ appendPQExpBuffer(buf, "'%s'", cell->val);
+ }
+ appendPQExpBufferStr(buf, ")");
+
+ /*
+ * Add backup options to the command. we are reusing the LABEL here to
+ * keep the original tablespace path on the server.
+ */
+ appendPQExpBuffer(buf, " LABEL '%s' LSN '%s' %s %s",
+ curTsInfo->tablespace,
+ backupInfo->xlogstart,
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (maxrate > 0)
+ appendPQExpBuffer(buf, " MAX_RATE %u", maxrate);
+
+ if (!worker_conn)
+ return 1;
+
+ if (PQsendQuery(worker_conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(worker_conn));
+ return 1;
+ }
+
+ destroyPQExpBuffer(buf);
+ if (format == 't')
+ ReceiveTarFile(worker_conn, tablespacehdr, i, worker);
+ else
+ ReceiveAndUnpackTarFile(worker_conn, tablespacehdr, i);
+
+ res = PQgetResult(worker_conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data stream: %s",
+ PQerrorMessage(worker_conn));
+ exit(1);
+ }
+
+ res = PQgetResult(worker_conn);
+ }
+
+ PQclear(res);
+ PQfinish(worker_conn);
+
+ return 0;
+}
+
+/* qsort comparator for BackupFile (sort descending order) */
+static int
+compareFileSize(const void *a, const void *b)
+{
+ const BackupFile *v1 = (BackupFile *) a;
+ const BackupFile *v2 = (BackupFile *) b;
+
+ if (v1->size > v2->size)
+ return -1;
+ if (v1->size < v2->size)
+ return 1;
+
+ return 0;
+}
+
+static void
+create_workers_and_fetch(BackupInfo *backupInfo)
+{
+ int status;
+ int pid,
+ i;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ pid = fork();
+ if (pid == 0)
+ {
+ /* in child process */
+ _exit(ReceiveFiles(backupInfo, i));
+ }
+ else if (pid < 0)
+ {
+ pg_log_error("could not create backup worker: %m");
+ exit(1);
+ }
+
+ simple_oid_list_append(&workerspid, pid);
+ if (verbose)
+ pg_log_info("backup worker (%d) created", pid);
+
+ /*
+ * Else we are in the parent process and all is well.
+ */
+ }
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ pid = waitpid(-1, &status, 0);
+
+ if (WIFEXITED(status) && WEXITSTATUS(status) == EXIT_FAILURE)
+ {
+ SimpleOidListCell *cell;
+
+ pg_log_error("backup worker (%d) failed with code %d", pid, WEXITSTATUS(status));
+
+ /* error. kill other workers and exit. */
+ for (cell = workerspid.head; cell; cell = cell->next)
+ {
+ if (pid != cell->val)
+ {
+ kill(cell->val, SIGTERM);
+ pg_log_error("backup worker killed %d", cell->val);
+ }
+ }
+
+ exit(1);
+ }
+ }
+}
+
+static void
+read_label_tblspcmap(PGconn *conn, char **backuplabel, char **tblspc_map)
+{
+ PGresult *res = NULL;
+
+ Assert(backuplabel != NULL);
+ Assert(tblspc_map != NULL);
+
+ /*
+ * Get Backup label and tablespace map data.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ *backuplabel = PQgetvalue(res, 0, 0); /* backup_label */
+ if (!PQgetisnull(res, 0, 1))
+ *tblspc_map = PQgetvalue(res, 0, 1); /* tablespae_map */
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ PQclear(res);
+}
+
+static void
+create_backup_dirs(bool basetablespace, char *tablespace, char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ Assert(name != NULL);
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, name);
+ else
+ {
+ Assert(tablespace != NULL);
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tablespace), (name + strlen(tablespace) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+}
+
+static void
+writefile(char *path, char *buf)
+{
+ FILE *f;
+ char pathbuf[MAXPGPATH];
+
+ snprintf(pathbuf, MAXPGPATH, "%s/%s", basedir, path);
+ f = fopen(pathbuf, "w");
+ if (f == NULL)
+ {
+ pg_log_error("could not open file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fwrite(buf, strlen(buf), 1, f) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fclose(f))
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+}
+
+static int
+simple_list_length(SimpleStringList *list)
+{
+ int len = 0;
+ SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next, len++)
+ ;
+
+ return len;
+}
--
2.21.0 (Apple Git-122)
0002-backend-changes-for-parallel-backup.patchapplication/octet-stream; name=0002-backend-changes-for-parallel-backup.patchDownload
From 3e62b74a0e8d22df942f625a343d1d6254ad1b08 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 22:59:28 +0500
Subject: [PATCH 2/4] backend changes for parallel backup
---
src/backend/replication/basebackup.c | 589 ++++++++++++++++++++++++-
src/backend/replication/repl_gram.y | 72 +++
src/backend/replication/repl_scanner.l | 7 +
src/include/nodes/replnodes.h | 10 +
src/include/replication/basebackup.h | 2 +-
5 files changed, 670 insertions(+), 10 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index a05a97ded2..cc262e49b8 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -41,6 +41,7 @@
#include "utils/ps_status.h"
#include "utils/relcache.h"
#include "utils/timestamp.h"
+#include "utils/pg_lsn.h"
typedef struct
@@ -52,11 +53,34 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ bool exclusive;
+ XLogRecPtr lsn;
} basebackup_options;
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
+
+#define STORE_BACKUPFILE(_backupfiles, _name, _type, _size, _mtime) \
+ do { \
+ if (_backupfiles != NULL) { \
+ BackupFile *file = palloc0(sizeof(BackupFile)); \
+ strlcpy(file->name, _name, sizeof(file->name)); \
+ file->type = _type; \
+ file->size = _size; \
+ file->mtime = _mtime; \
+ *_backupfiles = lappend(*_backupfiles, file); \
+ } \
+ } while(0)
static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks);
+static int64 sendDir_(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, List **files);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -76,6 +100,12 @@ static void throttle(size_t increment);
static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void StartBackup(basebackup_options *opt);
+static void StopBackup(basebackup_options *opt);
+static void SendFileList(basebackup_options *opt);
+static void SendFilesContents(basebackup_options *opt, List *filenames, bool missing_ok);
+static char *readfile(const char *readfilename, bool missing_ok);
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -338,7 +368,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -413,6 +443,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_exclusive = false;
+ bool o_lsn = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -501,6 +533,30 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "exclusive") == 0)
+ {
+ if (o_exclusive)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ opt->exclusive = intVal(defel->arg);
+ o_exclusive = true;
+ }
+ else if (strcmp(defel->defname, "lsn") == 0)
+ {
+ bool have_error = false;
+ char *lsn;
+
+ if (o_lsn)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ lsn = strVal(defel->arg);
+ opt->lsn = pg_lsn_in_internal(lsn, &have_error);
+ o_lsn = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
@@ -535,7 +591,29 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ StartBackup(&opt);
+ break;
+ case SEND_FILE_LIST:
+ SendFileList(&opt);
+ break;
+ case SEND_FILES_CONTENT:
+ SendFilesContents(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ StopBackup(&opt);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -678,6 +756,61 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_puttextmessage('C', "SELECT");
}
+/*
+ * Send a single resultset containing backup label and tablespace map
+ */
+static void
+SendStartBackupResult(StringInfo labelfile, StringInfo tblspc_map_file)
+{
+ StringInfoData buf;
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "label");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tablespacemap");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = labelfile->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, labelfile->data, len);
+
+ if (tblspc_map_file)
+ {
+ len = tblspc_map_file->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, tblspc_map_file->data, len);
+ }
+ else
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ }
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
/*
* Inject a file with given name and content in the output tar stream.
*/
@@ -729,7 +862,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool sizeonly, List **files)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -758,11 +891,11 @@ sendTablespace(char *path, bool sizeonly)
return 0;
}
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir_(pathbuf, strlen(path), sizeonly, NIL, true, files);
return size;
}
@@ -780,8 +913,16 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks)
+sendDir(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks)
+{
+ return sendDir_(path, basepathlen, sizeonly, tablespaces, sendtblspclinks, NULL);
+}
+
+/* Same as sendDir(), except that it also returns a list of filenames in PGDATA */
+static int64
+sendDir_(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+ bool sendtblspclinks, List **files)
{
DIR *dir;
struct dirent *de;
@@ -935,6 +1076,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
excludeFound = true;
break;
@@ -951,6 +1094,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
continue;
}
@@ -972,6 +1117,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
sizeonly);
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
+ STORE_BACKUPFILE(files, "./pg_wal/archive_status", 'd', -1, statbuf.st_mtime);
+
continue; /* don't recurse into pg_wal */
}
@@ -1001,6 +1149,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ STORE_BACKUPFILE(files, pathbuf, 'l', statbuf.st_size, statbuf.st_mtime);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1027,6 +1176,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
+
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1057,13 +1208,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir_(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks, files);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ STORE_BACKUPFILE(files, pathbuf, 'f', statbuf.st_size, statbuf.st_mtime);
+
+ if (!sizeonly && files == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
@@ -1769,3 +1922,421 @@ setup_throttle(int maxrate)
throttling_counter = -1;
}
}
+
+/*
+ * StartBackup - prepare to start an online backup.
+ *
+ * This function calls do_pg_start_backup() and sends back starting checkpoint,
+ * available tablespaces, content of backup_label and tablespace_map files.
+ */
+static void
+StartBackup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo labelfile;
+ StringInfo tblspc_map_file = NULL;
+ int datadirpathlen;
+ List *tablespaces = NIL;
+
+ datadirpathlen = strlen(DataDir);
+
+ backup_started_in_recovery = RecoveryInProgress();
+
+ labelfile = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+
+ total_checksum_failures = 0;
+
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ opt->exclusive? NULL : labelfile, &tablespaces,
+ tblspc_map_file,
+ opt->progress, opt->sendtblspcmapfile);
+
+ /*
+ * Once do_pg_start_backup has been called, ensure that any failure causes
+ * us to abort the backup so we don't "leak" a backup counter. For this
+ * reason, *all* functionality between do_pg_start_backup() and the end of
+ * do_pg_stop_backup() should be inside the error cleanup block!
+ */
+
+ PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ {
+ tablespaceinfo *ti;
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /*
+ * In exclusive mode, pg_start_backup creates backup_label and
+ * tablespace_map files and does not their contents in *labelfile
+ * and *tblspcmapfile. So we read them from these files to return
+ * to frontend.
+ *
+ * In non-exlusive mode, contents of these files are available in
+ * *labelfile and *tblspcmapfile and are retured directly.
+ */
+ if (opt->exclusive)
+ {
+ resetStringInfo(labelfile);
+ resetStringInfo(tblspc_map_file);
+
+ appendStringInfoString(labelfile, readfile(BACKUP_LABEL_FILE, false));
+ if (opt->sendtblspcmapfile)
+ appendStringInfoString(tblspc_map_file, readfile(TABLESPACE_MAP, false));
+ }
+
+ if ((tblspc_map_file && tblspc_map_file->len <= 0) ||
+ !opt->sendtblspcmapfile)
+ tblspc_map_file = NULL;
+
+ /* send backup_label and tablespace_map to frontend */
+ SendStartBackupResult(labelfile, tblspc_map_file);
+ }
+ PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+}
+
+/*
+ * StopBackup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionaly WAL segments and ending WAL location.
+ */
+static void
+StopBackup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+ struct stat statbuf;
+ StringInfoData buf;
+ char *labelfile = NULL;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
+
+ /* stop backup */
+ if (!opt->exclusive)
+ labelfile = (char *) opt->label;
+ endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
+
+ if (opt->includewal)
+ include_wal_files(endptr, endtli);
+
+ pq_putemptymessage('c'); /* CopyDone */
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * SendFileList() - sends a list of filenames to frontend
+ *
+ * The function collects a list of filenames, nessery for a complete backup and
+ * sends this list to the client.
+ */
+static void
+SendFileList(basebackup_options *opt)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ List *tablespaces = NIL;
+ StringInfo tblspc_map_file = NULL;
+
+ tblspc_map_file = makeStringInfo();
+ collectTablespaces(&tablespaces, tblspc_map_file, false, false);
+
+ /* Add a node for the base directory at the end */
+ tablespaceinfo *ti = palloc0(sizeof(tablespaceinfo));
+ tablespaces = lappend(tablespaces, ti);
+
+ foreach(lc, tablespaces)
+ {
+ List *backupFiles = NULL;
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+
+ if (ti->path == NULL)
+ sendDir_(".", 1, false, NIL, !opt->sendtblspcmapfile, &backupFiles);
+ else
+ sendTablespace(ti->path, false, &backupFiles);
+
+ /* Construct and send the list of filenames */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 4); /* n field */
+
+ /* First field - file name */
+ pq_sendstring(&buf, "filename");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "type");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, CHAROID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - mtime */
+ pq_sendstring(&buf, "mtime");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, backupFiles)
+ {
+ BackupFile *backupFile = (BackupFile *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 4); /* number of columns */
+
+ /* send file name */
+ len = strlen(backupFile->name);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, backupFile->name, len);
+
+ /* send type */
+ pq_sendint32(&buf, 1);
+ pq_sendbyte(&buf, backupFile->type);
+
+ /* send size */
+ send_int8_string(&buf, backupFile->size);
+
+ /* send mtime */
+ send_int8_string(&buf, backupFile->mtime);
+
+ pq_endmessage(&buf);
+ }
+
+ pfree(backupFiles);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * SendFilesContents() - sends the actual files to the caller
+ *
+ * The function sends out the given file(s) over to the caller using the COPY
+ * protocol.
+ */
+static void
+SendFilesContents(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ bool basetablespace = true;
+ int basepathlen = 1;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /*
+ * LABEL is reused here to identify the tablespace path on server. Its empty
+ * in case of 'base' tablespace.
+ */
+ if (is_absolute_path(opt->label))
+ {
+ basepathlen = strlen(opt->label);
+ basetablespace = false;
+ }
+
+ /* set backup start location. */
+ startptr = opt->lsn;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /* Allow symbolic links in pg_tblspc only */
+ if (strstr(pathbuf, "./pg_tblspc") != NULL &&
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ char linkpath[MAXPGPATH];
+ int rllen;
+
+ rllen = readlink(pathbuf, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read symbolic link \"%s\": %m",
+ pathbuf)));
+ if (rllen >= sizeof(linkpath))
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("symbolic link \"%s\" target is too long",
+ pathbuf)));
+ linkpath[rllen] = '\0';
+
+ _tarWriteHeader(pathbuf, linkpath, &statbuf, false);
+ }
+ else if (S_ISDIR(statbuf.st_mode))
+ {
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, InvalidOid);
+ }
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report totoal checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
+
+static char *
+readfile(const char *readfilename, bool missing_ok)
+{
+ struct stat statbuf;
+ FILE *fp;
+ char *data;
+ int r;
+
+ if (stat(readfilename, &statbuf))
+ {
+ if (errno == ENOENT && missing_ok)
+ return NULL;
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ readfilename)));
+ }
+
+ fp = AllocateFile(readfilename, "r");
+ if (!fp)
+ {
+ if (errno == ENOENT && missing_ok)
+ return NULL;
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", readfilename)));
+ }
+
+ data = palloc(statbuf.st_size + 1);
+ r = fread(data, statbuf.st_size, 1, fp);
+ data[statbuf.st_size] = '\0';
+
+ /* Close the file */
+ if (r != 1 || ferror(fp) || FreeFile(fp))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m",
+ readfilename)));
+
+ return data;
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..bba437c785 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,6 +87,12 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_SEND_FILE_LIST
+%token K_SEND_FILES_CONTENT
+%token K_STOP_BACKUP
+%token K_EXCLUSIVE
+%token K_LSN
%type <node> command
%type <node> base_backup start_replication start_logical_replication
@@ -102,6 +108,8 @@ static SQLCmd *make_sqlcmd(void);
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
+%type <list> backup_files backup_files_list
+%type <node> backup_file
%%
@@ -162,6 +170,36 @@ base_backup:
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_START_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_FILE_LIST base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = SEND_FILE_LIST;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_FILES_CONTENT backup_files base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_FILES_CONTENT;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
+ }
+ | K_STOP_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
$$ = (Node *) cmd;
}
;
@@ -214,6 +252,40 @@ base_backup_opt:
$$ = makeDefElem("noverify_checksums",
(Node *)makeInteger(true), -1);
}
+ | K_EXCLUSIVE
+ {
+ $$ = makeDefElem("exclusive",
+ (Node *)makeInteger(true), -1);
+ }
+ | K_LSN SCONST
+ {
+ $$ = makeDefElem("lsn",
+ (Node *)makeString($2), -1);
+ }
+ ;
+
+backup_files:
+ '(' backup_files_list ')'
+ {
+ $$ = $2;
+ }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ backup_file
+ {
+ $$ = list_make1($1);
+ }
+ | backup_files_list ',' backup_file
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
;
create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..f97fe804ff 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,13 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+SEND_FILE_LIST { return K_SEND_FILE_LIST; }
+SEND_FILES_CONTENT { return K_SEND_FILES_CONTENT; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+EXCLUSIVE { return K_EXCLUSIVE; }
+LSN { return K_LSN; }
+
"," { return ','; }
";" { return ';'; }
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 1e3ed4e19f..1a224122a2 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,14 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +50,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..9e792af99d 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool sizeonly, List **files);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122)
I quickly tried to have a look at your 0001-refactor patch.
Here are some comments:
1. The patch fails to compile.
Sorry if I am missing something, but am not able to understand why in new
function collectTablespaces() you have added an extra parameter NULL while
calling sendTablespace(), it fails the compilation :
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
-Wno-unused-command-line-argument -g -g -O0 -Wall -Werror
-I../../../../src/include -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po
xlog.c:12253:59: error: too many arguments to function call, expected 2,
have 3
ti->size = infotbssize ? sendTablespace(fullpath, true,
NULL) : -1;
~~~~~~~~~~~~~~ ^~~~
2. I think the patch needs to run via pg_indent. It does not follow 80
column
width.
e.g.
+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool
infotbssize, bool needtblspcmapfile)
+{
3.
The comments in re-factored code appear to be redundant. example:
Following comment:
/* Setup and activate network throttling, if client requested it */
appears thrice in the code, before calling setup_throttle(), in the
prologue of
the function setup_throttle(), and above the if() in that function.
Similarly - the comment:
/* Collect information about all tablespaces */
in collectTablespaces().
4.
In function include_wal_files() why is the parameter TimeLineID i.e. endtli
needed. I don't see it being used in the function at all. I think you can
safely
get rid of it.
+include_wal_files(XLogRecPtr endptr, TimeLineID endtli)
Regards,
Jeevan Ladhe
On Wed, Oct 16, 2019 at 6:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Show quoted text
On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:Sure. Though the backup manifest patch calculates and includes the
checksum of backup files and is done
while the file is being transferred to the frontend-end. The manifest
file itself is copied at the
very end of the backup. In parallel backup, I need the list of
filenames before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest
file has to be available when START_BACKUP
is called.
That means, backup manifest should support its creation while
excluding the checksum during START_BACKUP().
I also need the directory information as well for two reasons:
- In plain format, base path has to exist before we can write the
file. we can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories butthose directories although empty, are still
expected in PGDATA.
I can make these changes part of parallel backup (which would be on
top of backup manifest patch) or
these changes can be done as part of manifest patch and then parallel
can use them.
Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.Okay.
I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.yes current patch already returns the result set. will add the additional
information.I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.Currently pg_basebackup does not enter in exclusive backup mode and other
tools have to
use pg_start_backup() and pg_stop_backup() functions to achieve that.
Since we are breaking
backup into multiple command, I believe it would be a good idea to have
this option. I will include
it in next revision of this patch.I still think that the files should be requested one at a time, not a
huge long list in a single command.sure, will make the change.
I have refactored the functionality into multiple smaller patches in order
to make the review process easier. I have divided the code into backend
changes and pg_basebackup changes. The
backend replication system now supports the following commands:- START_BACKUP
- SEND_FILE_LIST
- SEND_FILES_CONTENT
- STOP_BACKUPThe START_BACKUP will not return the list of files, instead SEND_FILE_LIST
is used for that. The START_BACKUP
now calls pg_start_backup and returns starting WAL position, tablespace
header information and content of backup label file.
Initially I was using tmp files to store the backup_label content but that
turns out to be bad idea, because there can be multiple
non-exclusive backups running. The backup label information is needed by
stop_backup so pg_basebackup will send it as part
of STOP_BACKUP.The SEND_FILE_LIST will return the list of files. It will be returned as
resultset having four columns (filename, type, size, mtime).
The SEND_FILES_CONTENT can now return the single file or multiple files as
required. There is not much change required to
support both, so I believe it will be much useable this way if other tools
want to utilise it.As per suggestion from Robert, I am currently working on making changes in
pg_basebackup to fetch files one by one. However that's not complete and
the attach patch
is still using the old method of multi-file fetching to test the backend
commands. I will send an updated patch which will contain the changes on
fetching file one by one.I wanted to share the backend patch to get some feedback in the mean time.
Thanks,
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Thu, Oct 17, 2019 at 1:33 AM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
wrote:
I quickly tried to have a look at your 0001-refactor patch.
Here are some comments:1. The patch fails to compile.
Sorry if I am missing something, but am not able to understand why in new
function collectTablespaces() you have added an extra parameter NULL while
calling sendTablespace(), it fails the compilation :+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
-Wno-unused-command-line-argument -g -g -O0 -Wall -Werror
-I../../../../src/include -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po
xlog.c:12253:59: error: too many arguments to function call, expected 2,
have 3
ti->size = infotbssize ? sendTablespace(fullpath, true,
NULL) : -1;
~~~~~~~~~~~~~~
^~~~2. I think the patch needs to run via pg_indent. It does not follow 80
column
width.
e.g.+void +collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool infotbssize, bool needtblspcmapfile) +{3.
The comments in re-factored code appear to be redundant. example:
Following comment:
/* Setup and activate network throttling, if client requested it */
appears thrice in the code, before calling setup_throttle(), in the
prologue of
the function setup_throttle(), and above the if() in that function.
Similarly - the comment:
/* Collect information about all tablespaces */
in collectTablespaces().4.
In function include_wal_files() why is the parameter TimeLineID i.e. endtli
needed. I don't see it being used in the function at all. I think you can
safely
get rid of it.+include_wal_files(XLogRecPtr endptr, TimeLineID endtli)
Thanks Jeevan. Some changes that should be part of 2nd patch were left in
the 1st. I have fixed that and the above mentioned issues as well.
Attached are the updated patches.
Thanks,
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0001-Refactor-some-basebackup-code-to-increase-reusabilit.patchapplication/octet-stream; name=0001-Refactor-some-basebackup-code-to-increase-reusabilit.patchDownload
From f5bdfddd5efba1b66ab30c7220ae6b62b312337a Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 9 Oct 2019 12:39:41 +0500
Subject: [PATCH 1/4] Refactor some basebackup code to increase reusability, in
anticipation of adding parallel backup
---
src/backend/access/transam/xlog.c | 192 +++++-----
src/backend/replication/basebackup.c | 512 ++++++++++++++-------------
src/include/access/xlog.h | 2 +
3 files changed, 371 insertions(+), 335 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0ff9af53fe..54a430d041 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10282,10 +10282,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
{
bool gotUniqueStartpoint = false;
- DIR *tblspcdir;
- struct dirent *de;
- tablespaceinfo *ti;
- int datadirpathlen;
/*
* Force an XLOG file switch before the checkpoint, to ensure that the
@@ -10411,93 +10407,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
if (exclusive)
tblspcmapfile = makeStringInfo();
- datadirpathlen = strlen(DataDir);
-
- /* Collect information about all tablespaces */
- tblspcdir = AllocateDir("pg_tblspc");
- while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
- {
- char fullpath[MAXPGPATH + 10];
- char linkpath[MAXPGPATH];
- char *relpath = NULL;
- int rllen;
- StringInfoData buflinkpath;
- char *s = linkpath;
-
- /* Skip special stuff */
- if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
- continue;
-
- snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
-
-#if defined(HAVE_READLINK) || defined(WIN32)
- rllen = readlink(fullpath, linkpath, sizeof(linkpath));
- if (rllen < 0)
- {
- ereport(WARNING,
- (errmsg("could not read symbolic link \"%s\": %m",
- fullpath)));
- continue;
- }
- else if (rllen >= sizeof(linkpath))
- {
- ereport(WARNING,
- (errmsg("symbolic link \"%s\" target is too long",
- fullpath)));
- continue;
- }
- linkpath[rllen] = '\0';
-
- /*
- * Add the escape character '\\' before newline in a string to
- * ensure that we can distinguish between the newline in the
- * tablespace path and end of line while reading tablespace_map
- * file during archive recovery.
- */
- initStringInfo(&buflinkpath);
-
- while (*s)
- {
- if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
- appendStringInfoChar(&buflinkpath, '\\');
- appendStringInfoChar(&buflinkpath, *s++);
- }
-
- /*
- * Relpath holds the relative path of the tablespace directory
- * when it's located within PGDATA, or NULL if it's located
- * elsewhere.
- */
- if (rllen > datadirpathlen &&
- strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
- IS_DIR_SEP(linkpath[datadirpathlen]))
- relpath = linkpath + datadirpathlen + 1;
-
- ti = palloc(sizeof(tablespaceinfo));
- ti->oid = pstrdup(de->d_name);
- ti->path = pstrdup(buflinkpath.data);
- ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
-
- if (tablespaces)
- *tablespaces = lappend(*tablespaces, ti);
-
- appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
-
- pfree(buflinkpath.data);
-#else
-
- /*
- * If the platform does not have symbolic links, it should not be
- * possible to have tablespaces - clearly somebody else created
- * them. Warn about it and ignore.
- */
- ereport(WARNING,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("tablespaces are not supported on this platform")));
-#endif
- }
- FreeDir(tblspcdir);
+ collectTablespaces(tablespaces, tblspcmapfile, infotbssize, needtblspcmapfile);
/*
* Construct backup label file
@@ -12261,3 +12171,103 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Collect information about all tablespaces.
+ */
+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile)
+{
+ DIR *tblspcdir;
+ struct dirent *de;
+ tablespaceinfo *ti;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ tblspcdir = AllocateDir("pg_tblspc");
+ while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+ {
+ char fullpath[MAXPGPATH + 10];
+ char linkpath[MAXPGPATH];
+ char *relpath = NULL;
+ int rllen;
+ StringInfoData buflinkpath;
+ char *s = linkpath;
+
+ /* Skip special stuff */
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+
+#if defined(HAVE_READLINK) || defined(WIN32)
+ rllen = readlink(fullpath, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not read symbolic link \"%s\": %m",
+ fullpath)));
+ continue;
+ }
+ else if (rllen >= sizeof(linkpath))
+ {
+ ereport(WARNING,
+ (errmsg("symbolic link \"%s\" target is too long",
+ fullpath)));
+ continue;
+ }
+ linkpath[rllen] = '\0';
+
+ /*
+ * Add the escape character '\\' before newline in a string to
+ * ensure that we can distinguish between the newline in the
+ * tablespace path and end of line while reading tablespace_map
+ * file during archive recovery.
+ */
+ initStringInfo(&buflinkpath);
+
+ while (*s)
+ {
+ if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
+ appendStringInfoChar(&buflinkpath, '\\');
+ appendStringInfoChar(&buflinkpath, *s++);
+ }
+
+ /*
+ * Relpath holds the relative path of the tablespace directory
+ * when it's located within PGDATA, or NULL if it's located
+ * elsewhere.
+ */
+ if (rllen > datadirpathlen &&
+ strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
+ IS_DIR_SEP(linkpath[datadirpathlen]))
+ relpath = linkpath + datadirpathlen + 1;
+
+ ti = palloc(sizeof(tablespaceinfo));
+ ti->oid = pstrdup(de->d_name);
+ ti->path = pstrdup(buflinkpath.data);
+ ti->rpath = relpath ? pstrdup(relpath) : NULL;
+ ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+
+ if (tablespaces)
+ *tablespaces = lappend(*tablespaces, ti);
+
+ appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
+
+ pfree(buflinkpath.data);
+#else
+
+ /*
+ * If the platform does not have symbolic links, it should not be
+ * possible to have tablespaces - clearly somebody else created
+ * them. Warn about it and ignore.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("tablespaces are not supported on this platform")));
+#endif
+ }
+ FreeDir(tblspcdir);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d0f210de8c..5f25f5848d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -68,10 +68,12 @@ static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
static void perform_base_backup(basebackup_options *opt);
+static void include_wal_files(XLogRecPtr endptr);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
+static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
/* Was the backup currently in-progress initiated in recovery mode? */
@@ -293,29 +295,7 @@ perform_base_backup(basebackup_options *opt)
/* Send tablespace header */
SendBackupHeader(tablespaces);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ setup_throttle(opt->maxrate);
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
@@ -384,227 +364,7 @@ perform_base_backup(basebackup_options *opt)
* We've left the last tar file "open", so we can now append the
* required WAL files to it.
*/
- char pathbuf[MAXPGPATH];
- XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
- struct stat statbuf;
- List *historyFileList = NIL;
- List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
- ListCell *lc;
- TimeLineID tli;
-
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
- /* Ok, we have everything we need. Send the WAL files. */
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- FILE *fp;
- char buf[TAR_SEND_SIZE];
- size_t cnt;
- pgoff_t len = 0;
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
-
- fp = AllocateFile(pathbuf, "rb");
- if (fp == NULL)
- {
- int save_errno = errno;
-
- /*
- * Most likely reason for this is that the file was already
- * removed by a checkpoint, so check for that to get a better
- * error message.
- */
- CheckXLogRemoved(segno, tli);
-
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", pathbuf)));
- }
-
- if (fstat(fileno(fp), &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- pathbuf)));
- if (statbuf.st_size != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
-
- while ((cnt = fread(buf, 1,
- Min(sizeof(buf), wal_segment_size - len),
- fp)) > 0)
- {
- CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
-
- len += cnt;
- throttle(cnt);
-
- if (len == wal_segment_size)
- break;
- }
-
- CHECK_FREAD_ERROR(fp, pathbuf);
-
- if (len != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* wal_segment_size is a multiple of 512, so no need for padding */
-
- FreeFile(fp);
-
- /*
- * Mark file as archived, otherwise files can get archived again
- * after promotion of a new node. This is in line with
- * walreceiver.c always doing an XLogArchiveForceDone() after a
- * complete segment.
- */
- StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /*
- * Send timeline history files too. Only the latest timeline history
- * file is required for recovery, and even that only if there happens
- * to be a timeline switch in the first WAL segment that contains the
- * checkpoint record, or if we're taking a base backup from a standby
- * server and the target timeline changes while the backup is taken.
- * But they are small and highly useful for debugging purposes, so
- * better include them all, always.
- */
- foreach(lc, historyFileList)
- {
- char *fname = lfirst(lc);
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
-
- if (lstat(pathbuf, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m", pathbuf)));
-
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
-
- /* unconditionally mark file as archived */
- StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
- }
+ include_wal_files(endptr);
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
@@ -1743,3 +1503,267 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+/*
+ * Append the required WAL files to the backup tar file. It assumes that the
+ * last tar file is "open" and the WALs will be appended to it.
+ */
+static void
+include_wal_files(XLogRecPtr endptr)
+{
+ /*
+ * We've left the last tar file "open", so we can now append the
+ * required WAL files to it.
+ */
+ char pathbuf[MAXPGPATH];
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ struct stat statbuf;
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and
+ * include all WAL files in the range between 'startptr' and 'endptr',
+ * regardless of the timeline the file is stamped with. If there are
+ * some spurious WAL files belonging to timelines that don't belong in
+ * this server's history, they will be included too. Normally there
+ * shouldn't be such files, but if there are, there's little harm in
+ * including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we
+ * need were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from
+ * oldest to newest, to reduce the chance that a file is recycled
+ * before we get a chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since
+ * we are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ /* Ok, we have everything we need. Send the WAL files. */
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ FILE *fp;
+ char buf[TAR_SEND_SIZE];
+ size_t cnt;
+ pgoff_t len = 0;
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+
+ fp = AllocateFile(pathbuf, "rb");
+ if (fp == NULL)
+ {
+ int save_errno = errno;
+
+ /*
+ * Most likely reason for this is that the file was already
+ * removed by a checkpoint, so check for that to get a better
+ * error message.
+ */
+ CheckXLogRemoved(segno, tli);
+
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", pathbuf)));
+ }
+
+ if (fstat(fileno(fp), &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ pathbuf)));
+ if (statbuf.st_size != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* send the WAL file itself */
+ _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+
+ while ((cnt = fread(buf, 1,
+ Min(sizeof(buf), wal_segment_size - len),
+ fp)) > 0)
+ {
+ CheckXLogRemoved(segno, tli);
+ /* Send the chunk as a CopyData message */
+ if (pq_putmessage('d', buf, cnt))
+ ereport(ERROR,
+ (errmsg("base backup could not send data, aborting backup")));
+
+ len += cnt;
+ throttle(cnt);
+
+ if (len == wal_segment_size)
+ break;
+ }
+
+ CHECK_FREAD_ERROR(fp, pathbuf);
+
+ if (len != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* wal_segment_size is a multiple of 512, so no need for padding */
+
+ FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again
+ * after promotion of a new node. This is in line with
+ * walreceiver.c always doing an XLogArchiveForceDone() after a
+ * complete segment.
+ */
+ StatusFilePath(pathbuf, walFileName, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /*
+ * Send timeline history files too. Only the latest timeline history
+ * file is required for recovery, and even that only if there happens
+ * to be a timeline switch in the first WAL segment that contains the
+ * checkpoint record, or if we're taking a base backup from a standby
+ * server and the target timeline changes while the backup is taken.
+ * But they are small and highly useful for debugging purposes, so
+ * better include them all, always.
+ */
+ foreach(lc, historyFileList)
+ {
+ char *fname = lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+}
+
+/*
+ * Setup and activate network throttling, if client requested it
+ */
+static void
+setup_throttle(int maxrate)
+{
+ if (maxrate > 0)
+ {
+ throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ /* Enable throttling. */
+ throttling_counter = 0;
+
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..5b0aa8ae85 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -350,6 +350,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
bool needtblspcmapfile);
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
+extern void collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile);
extern void do_pg_abort_backup(void);
extern SessionBackupState get_backup_status(void);
--
2.21.0 (Apple Git-122)
0002-backend-changes-for-parallel-backup.patchapplication/octet-stream; name=0002-backend-changes-for-parallel-backup.patchDownload
From b60e132d16ddd026b48fc095a285458725ec74ca Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 22:59:28 +0500
Subject: [PATCH 2/4] backend changes for parallel backup
---
src/backend/access/transam/xlog.c | 2 +-
src/backend/replication/basebackup.c | 589 ++++++++++++++++++++++++-
src/backend/replication/repl_gram.y | 72 +++
src/backend/replication/repl_scanner.l | 7 +
src/include/nodes/replnodes.h | 10 +
src/include/replication/basebackup.h | 2 +-
6 files changed, 671 insertions(+), 11 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 54a430d041..eafa531389 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12249,7 +12249,7 @@ collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 5f25f5848d..e77e0114e1 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -41,6 +41,7 @@
#include "utils/ps_status.h"
#include "utils/relcache.h"
#include "utils/timestamp.h"
+#include "utils/pg_lsn.h"
typedef struct
@@ -52,11 +53,34 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ bool exclusive;
+ XLogRecPtr lsn;
} basebackup_options;
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
+
+#define STORE_BACKUPFILE(_backupfiles, _name, _type, _size, _mtime) \
+ do { \
+ if (_backupfiles != NULL) { \
+ BackupFile *file = palloc0(sizeof(BackupFile)); \
+ strlcpy(file->name, _name, sizeof(file->name)); \
+ file->type = _type; \
+ file->size = _size; \
+ file->mtime = _mtime; \
+ *_backupfiles = lappend(*_backupfiles, file); \
+ } \
+ } while(0)
static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks);
+static int64 sendDir_(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, List **files);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -76,6 +100,12 @@ static void throttle(size_t increment);
static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void StartBackup(basebackup_options *opt);
+static void StopBackup(basebackup_options *opt);
+static void SendFileList(basebackup_options *opt);
+static void SendFilesContents(basebackup_options *opt, List *filenames, bool missing_ok);
+static char *readfile(const char *readfilename, bool missing_ok);
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -337,7 +367,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -412,6 +442,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_exclusive = false;
+ bool o_lsn = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -500,6 +532,30 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "exclusive") == 0)
+ {
+ if (o_exclusive)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ opt->exclusive = intVal(defel->arg);
+ o_exclusive = true;
+ }
+ else if (strcmp(defel->defname, "lsn") == 0)
+ {
+ bool have_error = false;
+ char *lsn;
+
+ if (o_lsn)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ lsn = strVal(defel->arg);
+ opt->lsn = pg_lsn_in_internal(lsn, &have_error);
+ o_lsn = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
@@ -534,7 +590,29 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ StartBackup(&opt);
+ break;
+ case SEND_FILE_LIST:
+ SendFileList(&opt);
+ break;
+ case SEND_FILES_CONTENT:
+ SendFilesContents(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ StopBackup(&opt);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -677,6 +755,61 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_puttextmessage('C', "SELECT");
}
+/*
+ * Send a single resultset containing backup label and tablespace map
+ */
+static void
+SendStartBackupResult(StringInfo labelfile, StringInfo tblspc_map_file)
+{
+ StringInfoData buf;
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "label");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tablespacemap");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = labelfile->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, labelfile->data, len);
+
+ if (tblspc_map_file)
+ {
+ len = tblspc_map_file->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, tblspc_map_file->data, len);
+ }
+ else
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ }
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
/*
* Inject a file with given name and content in the output tar stream.
*/
@@ -728,7 +861,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool sizeonly, List **files)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -757,11 +890,11 @@ sendTablespace(char *path, bool sizeonly)
return 0;
}
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir_(pathbuf, strlen(path), sizeonly, NIL, true, files);
return size;
}
@@ -779,8 +912,16 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks)
+sendDir(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks)
+{
+ return sendDir_(path, basepathlen, sizeonly, tablespaces, sendtblspclinks, NULL);
+}
+
+/* Same as sendDir(), except that it also returns a list of filenames in PGDATA */
+static int64
+sendDir_(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+ bool sendtblspclinks, List **files)
{
DIR *dir;
struct dirent *de;
@@ -934,6 +1075,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
excludeFound = true;
break;
@@ -950,6 +1093,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
continue;
}
@@ -971,6 +1116,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
sizeonly);
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
+ STORE_BACKUPFILE(files, "./pg_wal/archive_status", 'd', -1, statbuf.st_mtime);
+
continue; /* don't recurse into pg_wal */
}
@@ -1000,6 +1148,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ STORE_BACKUPFILE(files, pathbuf, 'l', statbuf.st_size, statbuf.st_mtime);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1026,6 +1175,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
+
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1056,13 +1207,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir_(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks, files);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ STORE_BACKUPFILE(files, pathbuf, 'f', statbuf.st_size, statbuf.st_mtime);
+
+ if (!sizeonly && files == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
@@ -1767,3 +1920,421 @@ setup_throttle(int maxrate)
throttling_counter = -1;
}
}
+
+/*
+ * StartBackup - prepare to start an online backup.
+ *
+ * This function calls do_pg_start_backup() and sends back starting checkpoint,
+ * available tablespaces, content of backup_label and tablespace_map files.
+ */
+static void
+StartBackup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo labelfile;
+ StringInfo tblspc_map_file = NULL;
+ int datadirpathlen;
+ List *tablespaces = NIL;
+
+ datadirpathlen = strlen(DataDir);
+
+ backup_started_in_recovery = RecoveryInProgress();
+
+ labelfile = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+
+ total_checksum_failures = 0;
+
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ opt->exclusive? NULL : labelfile, &tablespaces,
+ tblspc_map_file,
+ opt->progress, opt->sendtblspcmapfile);
+
+ /*
+ * Once do_pg_start_backup has been called, ensure that any failure causes
+ * us to abort the backup so we don't "leak" a backup counter. For this
+ * reason, *all* functionality between do_pg_start_backup() and the end of
+ * do_pg_stop_backup() should be inside the error cleanup block!
+ */
+
+ PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ {
+ tablespaceinfo *ti;
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /*
+ * In exclusive mode, pg_start_backup creates backup_label and
+ * tablespace_map files and does not their contents in *labelfile
+ * and *tblspcmapfile. So we read them from these files to return
+ * to frontend.
+ *
+ * In non-exlusive mode, contents of these files are available in
+ * *labelfile and *tblspcmapfile and are retured directly.
+ */
+ if (opt->exclusive)
+ {
+ resetStringInfo(labelfile);
+ resetStringInfo(tblspc_map_file);
+
+ appendStringInfoString(labelfile, readfile(BACKUP_LABEL_FILE, false));
+ if (opt->sendtblspcmapfile)
+ appendStringInfoString(tblspc_map_file, readfile(TABLESPACE_MAP, false));
+ }
+
+ if ((tblspc_map_file && tblspc_map_file->len <= 0) ||
+ !opt->sendtblspcmapfile)
+ tblspc_map_file = NULL;
+
+ /* send backup_label and tablespace_map to frontend */
+ SendStartBackupResult(labelfile, tblspc_map_file);
+ }
+ PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+}
+
+/*
+ * StopBackup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionaly WAL segments and ending WAL location.
+ */
+static void
+StopBackup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+ struct stat statbuf;
+ StringInfoData buf;
+ char *labelfile = NULL;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
+
+ /* stop backup */
+ if (!opt->exclusive)
+ labelfile = (char *) opt->label;
+ endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
+
+ if (opt->includewal)
+ include_wal_files(endptr);
+
+ pq_putemptymessage('c'); /* CopyDone */
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * SendFileList() - sends a list of filenames to frontend
+ *
+ * The function collects a list of filenames, nessery for a complete backup and
+ * sends this list to the client.
+ */
+static void
+SendFileList(basebackup_options *opt)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ List *tablespaces = NIL;
+ StringInfo tblspc_map_file = NULL;
+
+ tblspc_map_file = makeStringInfo();
+ collectTablespaces(&tablespaces, tblspc_map_file, false, false);
+
+ /* Add a node for the base directory at the end */
+ tablespaceinfo *ti = palloc0(sizeof(tablespaceinfo));
+ tablespaces = lappend(tablespaces, ti);
+
+ foreach(lc, tablespaces)
+ {
+ List *backupFiles = NULL;
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+
+ if (ti->path == NULL)
+ sendDir_(".", 1, false, NIL, !opt->sendtblspcmapfile, &backupFiles);
+ else
+ sendTablespace(ti->path, false, &backupFiles);
+
+ /* Construct and send the list of filenames */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 4); /* n field */
+
+ /* First field - file name */
+ pq_sendstring(&buf, "filename");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "type");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, CHAROID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - mtime */
+ pq_sendstring(&buf, "mtime");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, backupFiles)
+ {
+ BackupFile *backupFile = (BackupFile *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 4); /* number of columns */
+
+ /* send file name */
+ len = strlen(backupFile->name);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, backupFile->name, len);
+
+ /* send type */
+ pq_sendint32(&buf, 1);
+ pq_sendbyte(&buf, backupFile->type);
+
+ /* send size */
+ send_int8_string(&buf, backupFile->size);
+
+ /* send mtime */
+ send_int8_string(&buf, backupFile->mtime);
+
+ pq_endmessage(&buf);
+ }
+
+ pfree(backupFiles);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * SendFilesContents() - sends the actual files to the caller
+ *
+ * The function sends out the given file(s) over to the caller using the COPY
+ * protocol.
+ */
+static void
+SendFilesContents(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ bool basetablespace = true;
+ int basepathlen = 1;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /*
+ * LABEL is reused here to identify the tablespace path on server. Its empty
+ * in case of 'base' tablespace.
+ */
+ if (is_absolute_path(opt->label))
+ {
+ basepathlen = strlen(opt->label);
+ basetablespace = false;
+ }
+
+ /* set backup start location. */
+ startptr = opt->lsn;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /* Allow symbolic links in pg_tblspc only */
+ if (strstr(pathbuf, "./pg_tblspc") != NULL &&
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ char linkpath[MAXPGPATH];
+ int rllen;
+
+ rllen = readlink(pathbuf, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read symbolic link \"%s\": %m",
+ pathbuf)));
+ if (rllen >= sizeof(linkpath))
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("symbolic link \"%s\" target is too long",
+ pathbuf)));
+ linkpath[rllen] = '\0';
+
+ _tarWriteHeader(pathbuf, linkpath, &statbuf, false);
+ }
+ else if (S_ISDIR(statbuf.st_mode))
+ {
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, InvalidOid);
+ }
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report totoal checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
+
+static char *
+readfile(const char *readfilename, bool missing_ok)
+{
+ struct stat statbuf;
+ FILE *fp;
+ char *data;
+ int r;
+
+ if (stat(readfilename, &statbuf))
+ {
+ if (errno == ENOENT && missing_ok)
+ return NULL;
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ readfilename)));
+ }
+
+ fp = AllocateFile(readfilename, "r");
+ if (!fp)
+ {
+ if (errno == ENOENT && missing_ok)
+ return NULL;
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", readfilename)));
+ }
+
+ data = palloc(statbuf.st_size + 1);
+ r = fread(data, statbuf.st_size, 1, fp);
+ data[statbuf.st_size] = '\0';
+
+ /* Close the file */
+ if (r != 1 || ferror(fp) || FreeFile(fp))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m",
+ readfilename)));
+
+ return data;
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..bba437c785 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,6 +87,12 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_SEND_FILE_LIST
+%token K_SEND_FILES_CONTENT
+%token K_STOP_BACKUP
+%token K_EXCLUSIVE
+%token K_LSN
%type <node> command
%type <node> base_backup start_replication start_logical_replication
@@ -102,6 +108,8 @@ static SQLCmd *make_sqlcmd(void);
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
+%type <list> backup_files backup_files_list
+%type <node> backup_file
%%
@@ -162,6 +170,36 @@ base_backup:
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_START_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_FILE_LIST base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = SEND_FILE_LIST;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_FILES_CONTENT backup_files base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_FILES_CONTENT;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
+ }
+ | K_STOP_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
$$ = (Node *) cmd;
}
;
@@ -214,6 +252,40 @@ base_backup_opt:
$$ = makeDefElem("noverify_checksums",
(Node *)makeInteger(true), -1);
}
+ | K_EXCLUSIVE
+ {
+ $$ = makeDefElem("exclusive",
+ (Node *)makeInteger(true), -1);
+ }
+ | K_LSN SCONST
+ {
+ $$ = makeDefElem("lsn",
+ (Node *)makeString($2), -1);
+ }
+ ;
+
+backup_files:
+ '(' backup_files_list ')'
+ {
+ $$ = $2;
+ }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ backup_file
+ {
+ $$ = list_make1($1);
+ }
+ | backup_files_list ',' backup_file
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
;
create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..f97fe804ff 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,13 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+SEND_FILE_LIST { return K_SEND_FILE_LIST; }
+SEND_FILES_CONTENT { return K_SEND_FILES_CONTENT; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+EXCLUSIVE { return K_EXCLUSIVE; }
+LSN { return K_LSN; }
+
"," { return ','; }
";" { return ';'; }
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 1e3ed4e19f..1a224122a2 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,14 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +50,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..9e792af99d 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool sizeonly, List **files);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122)
0004-parallel-backup-testcase.patchapplication/octet-stream; name=0004-parallel-backup-testcase.patchDownload
From 2915c35ccfa0c1de0fcbd35c03ad0a9cd0f4997b Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 21:54:23 +0500
Subject: [PATCH 4/4] parallel backup - testcase
---
.../t/040_pg_basebackup_parallel.pl | 571 ++++++++++++++++++
1 file changed, 571 insertions(+)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 0000000000..6c31214f3d
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,571 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 106;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+foreach my $filename (
+ qw(base.0.tar base.1.tar base.2.tar base.3.tar))
+{
+ ok(!-f "$tempdir/backup/$filename", "backup $filename tar created");
+}
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# Tar format doesn't support filenames longer than 100 bytes.
+my $superlongname = "superlongname_" . ("x" x 100);
+my $superlongpath = "$pgdata/$superlongname";
+
+open my $file, '>', "$superlongpath"
+ or die "unable to create file $superlongpath";
+close $file;
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/tarbackup_l1", '-Ft', "-j 4" ],
+ 'pg_basebackup tar with long name fails');
+unlink "$pgdata/$superlongname";
+
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+ # Create a temporary directory in the system location and symlink it
+ # to our physical temp location. That way we can use shorter names
+ # for the tablespace directories, which hopefully won't run afoul of
+ # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+ $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup2", '-Ft', "-j 4" ],
+ 'tar format with tablespaces');
+ ok(-f "$tempdir/tarbackup2/base.0.tar", 'backup tar was created');
+ my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
+ is(scalar(@tblspc_tars), 3, 'one tablespace tar was created');
+ rmtree("$tempdir/tarbackup2");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+
+ mkdir "$tempdir/$superlongname";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc3 LOCATION '$tempdir/$superlongname';");
+ $node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/tarbackup_l3", '-Ft' , '-j 4'],
+ 'pg_basebackup tar with long symlink target');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc3;");
+ rmtree("$tempdir/tarbackup_l3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' , "-j 4"],
+ 'pg_basebackup -X stream runs in tar mode');
+ok(-f "$tempdir/backupxst/pg_wal.tar", "tar file was created");
+rmtree("$tempdir/backupxst");
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+#$node->command_checks_all(
+# [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3", '-j 4'],
+# 1,
+# [qr{^$}],
+# [qr/^WARNING.*checksum verification failed/s],
+# 'pg_basebackup correctly report the total number of checksum mismatches');
+#rmtree("$tempdir/backup_corrupt3");
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
--
2.21.0 (Apple Git-122)
0003-pg_basebackup-changes-for-parallel-backup.patchapplication/octet-stream; name=0003-pg_basebackup-changes-for-parallel-backup.patchDownload
From 5c12e8fe83ba0fe2a7f24e1e84263fa112469390 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 14 Oct 2019 17:28:58 +0500
Subject: [PATCH 3/4] pg_basebackup changes for parallel backup.
---
src/bin/pg_basebackup/pg_basebackup.c | 583 ++++++++++++++++++++++++--
1 file changed, 548 insertions(+), 35 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 55ef13926d..311c1f94ca 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -41,6 +41,7 @@
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
+#include "fe_utils/simple_list.h"
#define ERRCODE_DATA_CORRUPTED "XX001"
@@ -57,6 +58,37 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
+
+typedef struct
+{
+ Oid tblspcOid;
+ char *tablespace; /* tablespace name or NULL if 'base' tablespace */
+ int numFiles; /* number of files */
+ BackupFile *backupFiles; /* list of files in tablespace */
+} TablespaceInfo;
+
+typedef struct
+{
+ int tablespacecount;
+ int numWorkers;
+
+ char xlogstart[64];
+ char *backup_label;
+ char *tablespace_map;
+
+ TablespaceInfo *tsInfo;
+ SimpleStringList **worker_files;
+} BackupInfo;
+
+static BackupInfo *backupInfo = NULL;
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -110,6 +142,10 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+static SimpleOidList workerspid = {NULL, NULL};
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -141,7 +177,7 @@ static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
-static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
+static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum, int worker);
static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void BaseBackup(void);
@@ -151,6 +187,16 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void ParallelBackupEnd(void);
+static void GetBackupFilesList(PGconn *conn, BackupInfo *binfo);
+static int ReceiveFiles(BackupInfo *backupInfo, int worker);
+static int compareFileSize(const void *a, const void *b);
+static void create_workers_and_fetch(BackupInfo *backupInfo);
+static void read_label_tblspcmap(PGconn *conn, char **backup_label, char **tablespace_map);
+static void create_backup_dirs(bool basetablespace, char *tablespace, char *name);
+static void writefile(char *path, char *buf);
+static int simple_list_length(SimpleStringList *list);
+
static void
cleanup_directories_atexit(void)
@@ -349,6 +395,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -921,7 +968,7 @@ writeTarData(
* No attempt to inspect or validate the contents of the file is done.
*/
static void
-ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
+ReceiveTarFile(PGconn *conn, PGresult *res, int rownum, int worker)
{
char filename[MAXPGPATH];
char *copybuf = NULL;
@@ -978,7 +1025,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
#ifdef HAVE_LIBZ
if (compresslevel != 0)
{
- snprintf(filename, sizeof(filename), "%s/base.tar.gz", basedir);
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/base.%d.tar.gz", basedir, worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/base.tar.gz", basedir);
ztarfile = gzopen(filename, "wb");
if (gzsetparams(ztarfile, compresslevel,
Z_DEFAULT_STRATEGY) != Z_OK)
@@ -991,7 +1041,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
else
#endif
{
- snprintf(filename, sizeof(filename), "%s/base.tar", basedir);
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/base.%d.tar", basedir, worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/base.tar", basedir);
tarfile = fopen(filename, "wb");
}
}
@@ -1004,8 +1057,12 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
#ifdef HAVE_LIBZ
if (compresslevel != 0)
{
- snprintf(filename, sizeof(filename), "%s/%s.tar.gz", basedir,
- PQgetvalue(res, rownum, 0));
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/%s.%d.tar.gz", basedir,
+ PQgetvalue(res, rownum, 0), worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/%s.tar.gz", basedir,
+ PQgetvalue(res, rownum, 0));
ztarfile = gzopen(filename, "wb");
if (gzsetparams(ztarfile, compresslevel,
Z_DEFAULT_STRATEGY) != Z_OK)
@@ -1018,8 +1075,12 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
else
#endif
{
- snprintf(filename, sizeof(filename), "%s/%s.tar", basedir,
- PQgetvalue(res, rownum, 0));
+ if (numWorkers > 1)
+ snprintf(filename, sizeof(filename), "%s/%s.%d.tar", basedir,
+ PQgetvalue(res, rownum, 0), worker);
+ else
+ snprintf(filename, sizeof(filename), "%s/%s.tar", basedir,
+ PQgetvalue(res, rownum, 0));
tarfile = fopen(filename, "wb");
}
}
@@ -1082,6 +1143,45 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
MemSet(zerobuf, 0, sizeof(zerobuf));
+ if (numWorkers > 1 && basetablespace && worker == 0)
+ {
+ char header[512];
+ int padding;
+ int len;
+
+ /* add backup_label and tablespace_map files to the tar */
+ len = strlen(backupInfo->backup_label);
+ tarCreateHeader(header,
+ "backup_label",
+ NULL,
+ len,
+ pg_file_create_mode, 04000, 02000,
+ time(NULL));
+
+ padding = ((len + 511) & ~511) - len;
+ WRITE_TAR_DATA(header, sizeof(header));
+ WRITE_TAR_DATA(backupInfo->backup_label, len);
+ if (padding)
+ WRITE_TAR_DATA(zerobuf, padding);
+
+ if (backupInfo->tablespace_map)
+ {
+ len = strlen(backupInfo->tablespace_map);
+ tarCreateHeader(header,
+ "tablespace_map",
+ NULL,
+ len,
+ pg_file_create_mode, 04000, 02000,
+ time(NULL));
+
+ padding = ((len + 511) & ~511) - len;
+ WRITE_TAR_DATA(header, sizeof(header));
+ WRITE_TAR_DATA(backupInfo->tablespace_map, len);
+ if (padding)
+ WRITE_TAR_DATA(zerobuf, padding);
+ }
+ }
+
if (basetablespace && writerecoveryconf)
{
char header[512];
@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*
@@ -1486,21 +1587,14 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* Directory
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
+
+ /*
+ * In parallel mode, we create directories before fetching
+ * files so its Ok if a directory already exist.
+ */
if (mkdir(filename, pg_dir_create_mode) != 0)
{
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal
- * receiver process. Also, when the WAL directory
- * location was specified, pg_wal (or pg_xlog) has
- * already been created as a symbolic link before
- * starting the actual backup. So just ignore creation
- * failures on related directories.
- */
- if (!((pg_str_endswith(filename, "/pg_wal") ||
- pg_str_endswith(filename, "/pg_xlog") ||
- pg_str_endswith(filename, "/archive_status")) &&
- errno == EEXIST))
+ if (errno != EEXIST)
{
pg_log_error("could not create directory \"%s\": %m",
filename);
@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
-
mapped_tblspc_path = get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
@@ -1716,7 +1810,8 @@ BaseBackup(void)
}
basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ psprintf("%s LABEL '%s' %s %s %s %s %s %s %s",
+ (numWorkers > 1) ? "START_BACKUP" : "BASE_BACKUP",
escaped_label,
showprogress ? "PROGRESS" : "",
includewal == FETCH_WAL ? "WAL" : "",
@@ -1774,7 +1869,7 @@ BaseBackup(void)
/*
* Get the header
*/
- res = PQgetResult(conn);
+ tablespacehdr = res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not get backup header: %s",
@@ -1830,20 +1925,62 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
+ if (numWorkers > 1)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ backupInfo = palloc0(sizeof(BackupInfo));
+
+ backupInfo->tablespacecount = tablespacecount;
+ backupInfo->numWorkers = numWorkers;
+ strlcpy(backupInfo->xlogstart, xlogstart, sizeof(backupInfo->xlogstart));
+ read_label_tblspcmap(conn, &backupInfo->backup_label, &backupInfo->tablespace_map);
+
+ /* retrive backup files from server. **/
+ GetBackupFilesList(conn, backupInfo);
+
+ /*
+ * add backup_label in backup, (for tar format, ReceiveTarFile() will
+ * takecare of it).
+ */
+ if (format == 'p')
+ writefile("backup_label", backupInfo->backup_label);
+
+ /*
+ * The backup files list is already in descending order, distribute it
+ * to workers.
+ */
+ backupInfo->worker_files = palloc0(sizeof(SimpleStringList) * tablespacecount);
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[i];
+
+ backupInfo->worker_files[i] = palloc0(sizeof(SimpleStringList) * numWorkers);
+ for (int j = 0; j < curTsInfo->numFiles; j++)
+ {
+ simple_string_list_append(&backupInfo->worker_files[i][j % numWorkers],
+ curTsInfo->backupFiles[j].name);
+ }
+ }
+
+ create_workers_and_fetch(backupInfo);
+ ParallelBackupEnd();
+ }
+ else
+ {
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i, 0);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
+ }
if (showprogress)
{
- progress_report(PQntuples(res), NULL, true);
+ progress_report(PQntuples(tablespacehdr), NULL, true);
if (isatty(fileno(stderr)))
fprintf(stderr, "\n"); /* Need to move to next line */
}
@@ -2043,6 +2180,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2070,7 +2208,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2211,6 +2349,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2325,6 +2466,14 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2397,3 +2546,367 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+static void
+ParallelBackupEnd(void)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP LABEL '%s' %s %s",
+ backupInfo->backup_label,
+ includewal == FETCH_WAL ? "WAL" : "",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* receive pg_control and wal files */
+ if (format == 't')
+ ReceiveTarFile(conn, res, tablespacecount, numWorkers);
+ else
+ ReceiveAndUnpackTarFile(conn, res, tablespacecount);
+
+ PQclear(res);
+}
+
+static void
+GetBackupFilesList(PGconn *conn, BackupInfo *backupInfo)
+{
+ int i;
+ PGresult *res = NULL;
+ char *basebkp;
+
+ backupInfo->tsInfo = palloc0(sizeof(TablespaceInfo) * backupInfo->tablespacecount);
+ TablespaceInfo *tsInfo = backupInfo->tsInfo;
+
+ /*
+ * Get list of files.
+ */
+ basebkp = psprintf("SEND_FILE_LIST %s",
+ format == 't' ? "TABLESPACE_MAP" : "");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "SEND_FILE_LIST", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /*
+ * The list of files is grouped by tablespaces, and we want to fetch them
+ * in the same order.
+ */
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ bool basetablespace;
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get backup header: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ tsInfo[i].tblspcOid = atol(PQgetvalue(tablespacehdr, i, 0));
+ tsInfo[i].tablespace = PQgetvalue(tablespacehdr, i, 1);
+ tsInfo[i].numFiles = PQntuples(res);
+ tsInfo[i].backupFiles =
+ palloc0(sizeof(BackupFile) * tsInfo[i].numFiles);
+
+ for (int j = 0; j < tsInfo[i].numFiles; j++)
+ {
+ char *name = PQgetvalue(res, j, 0);
+ char type = PQgetvalue(res, j, 1)[0];
+ int32 size = atol(PQgetvalue(res, j, 2));
+ time_t mtime = atol(PQgetvalue(res, j, 3));
+
+ /*
+ * In 'plain' format, create backup directories first.
+ */
+ if (format == 'p' && type == 'd')
+ create_backup_dirs(basetablespace, tsInfo[i].tablespace, name);
+
+ strlcpy(tsInfo[i].backupFiles[j].name, name, MAXPGPATH);
+ tsInfo[i].backupFiles[j].type = type;
+ tsInfo[i].backupFiles[j].size = size;
+ tsInfo[i].backupFiles[j].mtime = mtime;
+ }
+
+ /* sort files in descending order, based on size */
+ qsort(tsInfo[i].backupFiles, tsInfo[i].numFiles,
+ sizeof(BackupFile), &compareFileSize);
+ PQclear(res);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s", PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+}
+
+static int
+ReceiveFiles(BackupInfo *backupInfo, int worker)
+{
+ SimpleStringListCell *cell;
+ PGresult *res = NULL;
+ PGconn *worker_conn;
+ int i;
+
+ worker_conn = GetConnection();
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[i];
+ SimpleStringList *files = &backupInfo->worker_files[i][worker];
+ PQExpBuffer buf = createPQExpBuffer();
+
+ if (simple_list_length(files) <= 0)
+ continue;
+
+
+ /*
+ * build query in form of: SEND_FILES_CONTENT ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
+ appendPQExpBuffer(buf, "SEND_FILES_CONTENT (");
+ for (cell = files->head; cell; cell = cell->next)
+ {
+ if (cell != files->tail)
+ appendPQExpBuffer(buf, "'%s' ,", cell->val);
+ else
+ appendPQExpBuffer(buf, "'%s'", cell->val);
+ }
+ appendPQExpBufferStr(buf, ")");
+
+ /*
+ * Add backup options to the command. we are reusing the LABEL here to
+ * keep the original tablespace path on the server.
+ */
+ appendPQExpBuffer(buf, " LABEL '%s' LSN '%s' %s %s",
+ curTsInfo->tablespace,
+ backupInfo->xlogstart,
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (maxrate > 0)
+ appendPQExpBuffer(buf, " MAX_RATE %u", maxrate);
+
+ if (!worker_conn)
+ return 1;
+
+ if (PQsendQuery(worker_conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(worker_conn));
+ return 1;
+ }
+
+ destroyPQExpBuffer(buf);
+ if (format == 't')
+ ReceiveTarFile(worker_conn, tablespacehdr, i, worker);
+ else
+ ReceiveAndUnpackTarFile(worker_conn, tablespacehdr, i);
+
+ res = PQgetResult(worker_conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data stream: %s",
+ PQerrorMessage(worker_conn));
+ exit(1);
+ }
+
+ res = PQgetResult(worker_conn);
+ }
+
+ PQclear(res);
+ PQfinish(worker_conn);
+
+ return 0;
+}
+
+/* qsort comparator for BackupFile (sort descending order) */
+static int
+compareFileSize(const void *a, const void *b)
+{
+ const BackupFile *v1 = (BackupFile *) a;
+ const BackupFile *v2 = (BackupFile *) b;
+
+ if (v1->size > v2->size)
+ return -1;
+ if (v1->size < v2->size)
+ return 1;
+
+ return 0;
+}
+
+static void
+create_workers_and_fetch(BackupInfo *backupInfo)
+{
+ int status;
+ int pid,
+ i;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ pid = fork();
+ if (pid == 0)
+ {
+ /* in child process */
+ _exit(ReceiveFiles(backupInfo, i));
+ }
+ else if (pid < 0)
+ {
+ pg_log_error("could not create backup worker: %m");
+ exit(1);
+ }
+
+ simple_oid_list_append(&workerspid, pid);
+ if (verbose)
+ pg_log_info("backup worker (%d) created", pid);
+
+ /*
+ * Else we are in the parent process and all is well.
+ */
+ }
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ pid = waitpid(-1, &status, 0);
+
+ if (WIFEXITED(status) && WEXITSTATUS(status) == EXIT_FAILURE)
+ {
+ SimpleOidListCell *cell;
+
+ pg_log_error("backup worker (%d) failed with code %d", pid, WEXITSTATUS(status));
+
+ /* error. kill other workers and exit. */
+ for (cell = workerspid.head; cell; cell = cell->next)
+ {
+ if (pid != cell->val)
+ {
+ kill(cell->val, SIGTERM);
+ pg_log_error("backup worker killed %d", cell->val);
+ }
+ }
+
+ exit(1);
+ }
+ }
+}
+
+static void
+read_label_tblspcmap(PGconn *conn, char **backuplabel, char **tblspc_map)
+{
+ PGresult *res = NULL;
+
+ Assert(backuplabel != NULL);
+ Assert(tblspc_map != NULL);
+
+ /*
+ * Get Backup label and tablespace map data.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ *backuplabel = PQgetvalue(res, 0, 0); /* backup_label */
+ if (!PQgetisnull(res, 0, 1))
+ *tblspc_map = PQgetvalue(res, 0, 1); /* tablespae_map */
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ PQclear(res);
+}
+
+static void
+create_backup_dirs(bool basetablespace, char *tablespace, char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ Assert(name != NULL);
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, name);
+ else
+ {
+ Assert(tablespace != NULL);
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tablespace), (name + strlen(tablespace) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+}
+
+static void
+writefile(char *path, char *buf)
+{
+ FILE *f;
+ char pathbuf[MAXPGPATH];
+
+ snprintf(pathbuf, MAXPGPATH, "%s/%s", basedir, path);
+ f = fopen(pathbuf, "w");
+ if (f == NULL)
+ {
+ pg_log_error("could not open file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fwrite(buf, strlen(buf), 1, f) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fclose(f))
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+}
+
+static int
+simple_list_length(SimpleStringList *list)
+{
+ int len = 0;
+ SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next, len++)
+ ;
+
+ return len;
+}
--
2.21.0 (Apple Git-122)
On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
Attached are the updated patches.
I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.
--- 0002 patch
1.
Can lsn option be renamed to start-wal-location? This will be more clear
too.
2.
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
I think it will be good if we keep this structure in a common place so that
the client can also use it.
3.
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file
lists
here instead we are getting a few more details with that too. And for
others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total
--- 0003 patch
1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+ int len = 0;
+ SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next, len++)
+ ;
+
+ return len;
+}
I think it will be good if it goes to simple_list.c. That will help in other
usages as well.
2.
Please revert these unnecessary changes:
@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*
@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove
trailing slash */
-
mapped_tblspc_path =
get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link from
\"%s\" to \"%s\": %m",
3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace
4.
ParallelBackupEnd() function does not do anything for parallelism. Will it
be
better to just rename it as EndBackup()?
5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
reusing
a LABEL option, that seems odd. How about adding a new option for that?
6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()
Thanks
Thanks,
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:Attached are the updated patches.
I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the
patches
0002 and 0003.--- 0002 patch1.
Can lsn option be renamed to start-wal-location? This will be more clear
too.2. +typedef struct +{ + char name[MAXPGPATH]; + char type; + int32 size; + time_t mtime; +} BackupFile;I think it will be good if we keep this structure in a common place so that
the client can also use it.3.
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file
lists
here instead we are getting a few more details with that too. And for
others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total--- 0003 patch1. +static int +simple_list_length(SimpleStringList *list) +{ + int len = 0; + SimpleStringListCell *cell; + + for (cell = list->head; cell; cell = cell->next, len++) + ; + + return len; +}I think it will be good if it goes to simple_list.c. That will help in
other
usages as well.2.
Please revert these unnecessary changes:@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove
trailing slash */
-
mapped_tblspc_path =
get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link from
\"%s\" to \"%s\": %m",3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace4.
ParallelBackupEnd() function does not do anything for parallelism. Will it
be
better to just rename it as EndBackup()?5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
reusing
a LABEL option, that seems odd. How about adding a new option for that?6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()Thanks
Thanks,
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
I had a detailed discussion with Robert Haas at PostgreConf Europe about
parallel backup.
We discussed the current state of the patch and what needs to be done to
get the patch committed.
- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as this
is a client utility it makes
more sense to use threads. The data needs to be shared amongst different
threads and the main process,
handling that is simpler as compared to interprocess communication.
- Fetching a single file or multiple files was also discussed. We concluded
in our discussion that we
need to benchmark to see if disk I/O is a bottleneck or not and if parallel
writing gives us
any benefit. This benchmark needs to be done on different hardware and
different
network to identify which are the real bottlenecks. In general, we agreed
that we could start with fetching
one file at a time but that will be revisited after the benchmarks are done.
- There is also an ongoing debate in this thread that we should have one
single tar file for all files or one
TAR file per thread. I really want to have a single tar file because the
main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per
thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible for
writing on tar file and all
the other threads will receive the data from the network and stream to the
master thread. This also
supports the idea of using a thread-based model rather than a process-based
approach because it
requires too much data sharing between processes. If we cannot achieve
this, then we can disable the
TAR option for parallel backup in the first version.
- In the case of data sharing, we need to try to avoid unnecessary locking
and more suitable algorithm to
solve the reader-writer problem is required.
--
Ibrar Ahmed
On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:Attached are the updated patches.
I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the
patches
0002 and 0003.--- 0002 patch1.
Can lsn option be renamed to start-wal-location? This will be more clear
too.2. +typedef struct +{ + char name[MAXPGPATH]; + char type; + int32 size; + time_t mtime; +} BackupFile;I think it will be good if we keep this structure in a common place so
that
the client can also use it.3.
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and
SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file
lists
here instead we are getting a few more details with that too. And for
others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total--- 0003 patch1. +static int +simple_list_length(SimpleStringList *list) +{ + int len = 0; + SimpleStringListCell *cell; + + for (cell = list->head; cell; cell = cell->next, len++) + ; + + return len; +}I think it will be good if it goes to simple_list.c. That will help in
other
usages as well.2.
Please revert these unnecessary changes:@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
*res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
*res, int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove
trailing slash */
-
mapped_tblspc_path =
get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link
from \"%s\" to \"%s\": %m",3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace4.
ParallelBackupEnd() function does not do anything for parallelism. Will
it be
better to just rename it as EndBackup()?5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
reusing
a LABEL option, that seems odd. How about adding a new option for that?6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()Thanks
Thanks,
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL CompanyI had a detailed discussion with Robert Haas at PostgreConf Europe about
parallel backup.
We discussed the current state of the patch and what needs to be done to
get the patch committed.- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as
this is a client utility it makes
more sense to use threads. The data needs to be shared amongst different
threads and the main process,
handling that is simpler as compared to interprocess communication.
Yes I agree. I have already converted the code to use threads instead of
processes. This avoids the overhead
of interprocess communication.
With a single file fetching strategy, this requires communication between
competing threads/processes. To handle
that in a multiprocess application, it requires IPC. The current approach
of multiple threads instead of processes
avoids this overhead.
- Fetching a single file or multiple files was also discussed. We
concluded in our discussion that we
need to benchmark to see if disk I/O is a bottleneck or not and if
parallel writing gives us
any benefit. This benchmark needs to be done on different hardware and
different
network to identify which are the real bottlenecks. In general, we agreed
that we could start with fetching
one file at a time but that will be revisited after the benchmarks are
done.
I'll share the updated patch in the next couple of days. After that, I'll
work on benchmarking that in
different environments that I have.
- There is also an ongoing debate in this thread that we should have one
single tar file for all files or one
TAR file per thread. I really want to have a single tar file because the
main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per
thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible
for writing on tar file and all
the other threads will receive the data from the network and stream to the
master thread. This also
supports the idea of using a thread-based model rather than a
process-based approach because it
requires too much data sharing between processes. If we cannot achieve
this, then we can disable the
TAR option for parallel backup in the first version.
I am in favour of disabling the tar format for the first version of
parallel backup.
- In the case of data sharing, we need to try to avoid unnecessary locking
and more suitable algorithm to
solve the reader-writer problem is required.--
Ibrar Ahmed
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Thu, Oct 24, 2019 at 4:24 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:Attached are the updated patches.
I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the
patches
0002 and 0003.--- 0002 patch1.
Can lsn option be renamed to start-wal-location? This will be more clear
too.2. +typedef struct +{ + char name[MAXPGPATH]; + char type; + int32 size; + time_t mtime; +} BackupFile;I think it will be good if we keep this structure in a common place so
that
the client can also use it.3.
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and
SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file
lists
here instead we are getting a few more details with that too. And for
others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total--- 0003 patch1. +static int +simple_list_length(SimpleStringList *list) +{ + int len = 0; + SimpleStringListCell *cell; + + for (cell = list->head; cell; cell = cell->next, len++) + ; + + return len; +}I think it will be good if it goes to simple_list.c. That will help in
other
usages as well.2.
Please revert these unnecessary changes:@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
*res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
*res, int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove
trailing slash */
-
mapped_tblspc_path =
get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link
from \"%s\" to \"%s\": %m",3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace4.
ParallelBackupEnd() function does not do anything for parallelism. Will
it be
better to just rename it as EndBackup()?5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
reusing
a LABEL option, that seems odd. How about adding a new option for that?6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()Thanks
Thanks,
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL CompanyI had a detailed discussion with Robert Haas at PostgreConf Europe about
parallel backup.
We discussed the current state of the patch and what needs to be done to
get the patch committed.- The current patch uses a process to implement parallelism. There are
many
reasons we need to use threads instead of processes. To start with, as
this is a client utility it makes
more sense to use threads. The data needs to be shared amongst different
threads and the main process,
handling that is simpler as compared to interprocess communication.Yes I agree. I have already converted the code to use threads instead of
processes. This avoids the overhead
of interprocess communication.With a single file fetching strategy, this requires communication between
competing threads/processes. To handle
that in a multiprocess application, it requires IPC. The current approach
of multiple threads instead of processes
avoids this overhead.- Fetching a single file or multiple files was also discussed. We
concluded in our discussion that we
need to benchmark to see if disk I/O is a bottleneck or not and if
parallel writing gives us
any benefit. This benchmark needs to be done on different hardware and
different
network to identify which are the real bottlenecks. In general, we agreed
that we could start with fetching
one file at a time but that will be revisited after the benchmarks are
done.I'll share the updated patch in the next couple of days. After that, I'll
work on benchmarking that in
different environments that I have.- There is also an ongoing debate in this thread that we should have one
single tar file for all files or one
TAR file per thread. I really want to have a single tar file because the
main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per
thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible
for writing on tar file and all
the other threads will receive the data from the network and stream to
the master thread. This also
supports the idea of using a thread-based model rather than a
process-based approach because it
requires too much data sharing between processes. If we cannot achieve
this, then we can disable the
TAR option for parallel backup in the first version.I am in favour of disabling the tar format for the first version of
parallel backup.- In the case of data sharing, we need to try to avoid unnecessary
locking and more suitable algorithm to
solve the reader-writer problem is required.--
Ibrar Ahmed--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
I have updated the patch to include the changes suggested by Jeevan. This
patch also implements the thread workers instead of
processes and fetches a single file at a time. The tar format has been
disabled for first version of parallel backup.
Conversion from the previous process based application to the current
thread based one required slight modification in data structure,
addition of a few new functions and progress reporting functionality.
The core data structure remains in tact where table space based file
listing is maintained, however, we are now maintaining a list of all
files (maintaining pointers to FileInfo structure; so no duplication of
data), so that we can sequentially access these without adding too
much processing in critical section. The current scope of the critical
section for thread workers is limited to incrementing the file index
within the list of files.
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0005-parallel-backup-testcase_v3.patchapplication/octet-stream; name=0005-parallel-backup-testcase_v3.patchDownload
From 30e3c102ad5782d3c814455824be9a53c93c3e9a Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 21:54:23 +0500
Subject: [PATCH 5/5] parallel backup - testcase
---
.../t/040_pg_basebackup_parallel.pl | 567 ++++++++++++++++++
1 file changed, 567 insertions(+)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 0000000000..2dac7bc82a
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,567 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 95;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# Tar format doesn't support filenames longer than 100 bytes.
+#my $superlongname = "superlongname_" . ("x" x 100);
+#my $superlongpath = "$pgdata/$superlongname";
+#
+#open my $file, '>', "$superlongpath"
+# or die "unable to create file $superlongpath";
+#close $file;
+#$node->command_fails(
+# [ 'pg_basebackup', '-D', "$tempdir/tarbackup_l1", '-Ft', "-j 4" ],
+# 'pg_basebackup tar with long name fails');
+#unlink "$pgdata/$superlongname";
+my $file;
+
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+# # Create a temporary directory in the system location and symlink it
+# # to our physical temp location. That way we can use shorter names
+# # for the tablespace directories, which hopefully won't run afoul of
+# # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+# $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup2", '-Ft', "-j 4" ],
+# 'tar format with tablespaces');
+# ok(-f "$tempdir/tarbackup2/base.0.tar", 'backup tar was created');
+# my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
+# is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+# rmtree("$tempdir/tarbackup2");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+
+# mkdir "$tempdir/$superlongname";
+# $node->safe_psql('postgres',
+# "CREATE TABLESPACE tblspc3 LOCATION '$tempdir/$superlongname';");
+# $node->command_ok(
+# [ 'pg_basebackup', '-D', "$tempdir/tarbackup_l3", '-Ft' , '-j 4'],
+# 'pg_basebackup tar with long symlink target');
+# $node->safe_psql('postgres', "DROP TABLESPACE tblspc3;");
+# rmtree("$tempdir/tarbackup_l3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+#$node->command_ok(
+# [ 'pg_basebackup', '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' , "-j 4"],
+# 'pg_basebackup -X stream runs in tar mode');
+#ok(-f "$tempdir/backupxst/pg_wal.tar", "tar file was created");
+#rmtree("$tempdir/backupxst");
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+#$node->command_checks_all(
+# [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3", '-j 4'],
+# 1,
+# [qr{^$}],
+# [qr/^WARNING.*checksum verification failed/s],
+# 'pg_basebackup correctly report the total number of checksum mismatches');
+#rmtree("$tempdir/backup_corrupt3");
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
--
2.21.0 (Apple Git-122)
0003-add-exclusive-backup-option-in-parallel-backup_v3.patchapplication/octet-stream; name=0003-add-exclusive-backup-option-in-parallel-backup_v3.patchDownload
From bf494ca68028e22fdd8984170bc7c41801e4b82a Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 27 Oct 2019 23:27:03 +0500
Subject: [PATCH 3/5] add 'exclusive' backup option in parallel backup
---
src/backend/replication/basebackup.c | 99 ++++++++++++++++++++++++--
src/backend/replication/repl_gram.y | 6 ++
src/backend/replication/repl_scanner.l | 1 +
3 files changed, 99 insertions(+), 7 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 4a382c4558..1c657e247a 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,6 +55,7 @@ typedef struct
bool sendtblspcmapfile;
const char *tablespace_path;
XLogRecPtr wal_location;
+ bool exclusive;
} basebackup_options;
typedef struct
@@ -104,6 +105,7 @@ static void StartBackup(basebackup_options *opt);
static void StopBackup(basebackup_options *opt);
static void SendBackupManifest(basebackup_options *opt);
static void SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok);
+static char *readfile(const char *readfilename, bool missing_ok);
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -256,7 +258,14 @@ static const char *const noChecksumFiles[] = {
static void
base_backup_cleanup(int code, Datum arg)
{
- do_pg_abort_backup();
+ bool exclusive = DatumGetBool(arg);
+
+ /*
+ * do_pg_abort_backup is only for non-exclusive backups, exclusive backup
+ * is terminated by calling pg_stop_backup().
+ */
+ if (!exclusive)
+ do_pg_abort_backup();
}
/*
@@ -443,6 +452,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_noverify_checksums = false;
bool o_tablespace_path = false;
bool o_wal_location = false;
+ bool o_exclusive = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -554,6 +564,16 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->wal_location = pg_lsn_in_internal(wal_location, &have_error);
o_wal_location = true;
}
+ else if (strcmp(defel->defname, "exclusive") == 0)
+ {
+ if (o_exclusive)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ opt->exclusive = intVal(defel->arg);
+ o_exclusive = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
@@ -1944,7 +1964,7 @@ StartBackup(basebackup_options *opt)
total_checksum_failures = 0;
startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
- labelfile, &tablespaces,
+ opt->exclusive? NULL : labelfile, &tablespaces,
tblspc_map_file,
opt->progress, opt->sendtblspcmapfile);
@@ -1955,7 +1975,7 @@ StartBackup(basebackup_options *opt)
* do_pg_stop_backup() should be inside the error cleanup block!
*/
- PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) BoolGetDatum(opt->exclusive));
{
tablespaceinfo *ti;
@@ -1984,6 +2004,25 @@ StartBackup(basebackup_options *opt)
/* Setup and activate network throttling, if client requested it */
setup_throttle(opt->maxrate);
+ /*
+ * In exclusive mode, pg_start_backup creates backup_label and
+ * tablespace_map files and does not return their contents in
+ * *labelfile and *tblspcmapfile. So we read them from these files to
+ * return to frontend.
+ *
+ * In non-exclusive mode, contents of these files are available in
+ * *labelfile and *tblspcmapfile and are returned directly.
+ */
+ if (opt->exclusive)
+ {
+ resetStringInfo(labelfile);
+ resetStringInfo(tblspc_map_file);
+
+ appendStringInfoString(labelfile, readfile(BACKUP_LABEL_FILE, false));
+ if (opt->sendtblspcmapfile)
+ appendStringInfoString(tblspc_map_file, readfile(TABLESPACE_MAP, false));
+ }
+
if ((tblspc_map_file && tblspc_map_file->len <= 0) ||
!opt->sendtblspcmapfile)
tblspc_map_file = NULL;
@@ -1991,14 +2030,14 @@ StartBackup(basebackup_options *opt)
/* send backup_label and tablespace_map to frontend */
SendStartBackupResult(labelfile, tblspc_map_file);
}
- PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) BoolGetDatum(opt->exclusive));
}
/*
* StopBackup() - ends an online backup
*
* The function is called at the end of an online backup. It sends out pg_control
- * file, optionaly WAL segments and ending WAL location.
+ * file, optionally WAL segments and ending WAL location.
*/
static void
StopBackup(basebackup_options *opt)
@@ -2009,7 +2048,7 @@ StopBackup(basebackup_options *opt)
StringInfoData buf;
char *labelfile = NULL;
- PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) BoolGetDatum(opt->exclusive));
{
/* Setup and activate network throttling, if client requested it */
setup_throttle(opt->maxrate);
@@ -2028,6 +2067,8 @@ StopBackup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
/* stop backup */
+ if (!opt->exclusive)
+ labelfile = (char *) opt->label;
endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
if (opt->includewal)
@@ -2036,7 +2077,7 @@ StopBackup(basebackup_options *opt)
pq_putemptymessage('c'); /* CopyDone */
SendXlogRecPtrResult(endptr, endtli);
}
- PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) BoolGetDatum(opt->exclusive));
}
/*
@@ -2271,3 +2312,47 @@ SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok)
errmsg("checksum verification failure during base backup")));
}
}
+
+static char *
+readfile(const char *readfilename, bool missing_ok)
+{
+ struct stat statbuf;
+ FILE *fp;
+ char *data;
+ int r;
+
+ if (stat(readfilename, &statbuf))
+ {
+ if (errno == ENOENT && missing_ok)
+ return NULL;
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ readfilename)));
+ }
+
+ fp = AllocateFile(readfilename, "r");
+ if (!fp)
+ {
+ if (errno == ENOENT && missing_ok)
+ return NULL;
+
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", readfilename)));
+ }
+
+ data = palloc(statbuf.st_size + 1);
+ r = fread(data, statbuf.st_size, 1, fp);
+ data[statbuf.st_size] = '\0';
+
+ /* Close the file */
+ if (r != 1 || ferror(fp) || FreeFile(fp))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m",
+ readfilename)));
+
+ return data;
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 9e2499814b..94c6aafbed 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -93,6 +93,7 @@ static SQLCmd *make_sqlcmd(void);
%token K_STOP_BACKUP
%token K_START_WAL_LOCATION
%token K_TABLESPACE_PATH
+%token K_EXCLUSIVE
%type <node> command
%type <node> base_backup start_replication start_logical_replication
@@ -262,6 +263,11 @@ base_backup_opt:
$$ = makeDefElem("tablespace_path",
(Node *)makeString($2), -1);
}
+ | K_EXCLUSIVE
+ {
+ $$ = makeDefElem("exclusive",
+ (Node *)makeInteger(true), -1);
+ }
;
backup_files:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 7a1bb54da8..ad0dd04cb1 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -113,6 +113,7 @@ SEND_BACKUP_FILES { return K_SEND_BACKUP_FILES; }
STOP_BACKUP { return K_STOP_BACKUP; }
START_WAL_LOCATION { return K_START_WAL_LOCATION; }
TABLESPACE_PATH { return K_TABLESPACE_PATH; }
+EXCLUSIVE { return K_EXCLUSIVE; }
"," { return ','; }
--
2.21.0 (Apple Git-122)
0004-pg_basebackup-changes-for-parallel-backup_v3.patchapplication/octet-stream; name=0004-pg_basebackup-changes-for-parallel-backup_v3.patchDownload
From 3ef5c3f40137ed15039f95d4e6e9487fa6edc3c7 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 14 Oct 2019 17:28:58 +0500
Subject: [PATCH 4/5] pg_basebackup changes for parallel backup.
---
src/bin/pg_basebackup/pg_basebackup.c | 710 ++++++++++++++++++++++++--
1 file changed, 672 insertions(+), 38 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a9d162a7da..1dff398c11 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -19,6 +19,7 @@
#include <sys/wait.h>
#include <signal.h>
#include <time.h>
+#include <pthread.h>
#ifdef HAVE_SYS_SELECT_H
#include <sys/select.h>
#endif
@@ -41,6 +42,7 @@
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
+#include "fe_utils/simple_list.h"
#define ERRCODE_DATA_CORRUPTED "XX001"
@@ -57,6 +59,57 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+
+ int tsIndex; /* index of tsInfo this file belongs to. */
+} BackupFile;
+
+typedef struct
+{
+ Oid tblspcOid;
+ char *tablespace; /* tablespace name or NULL if 'base' tablespace */
+ int numFiles; /* number of files */
+ BackupFile *backupFiles; /* list of files in a tablespace */
+} TablespaceInfo;
+
+typedef struct
+{
+ int tablespacecount;
+ int totalfiles;
+ int numWorkers;
+
+ char xlogstart[64];
+ char *backup_label;
+ char *tablespace_map;
+
+ TablespaceInfo *tsInfo;
+ BackupFile **files; /* list of BackupFile pointers */
+ int fileIndex; /* index of file to be fetched */
+
+ PGconn **workerConns;
+} BackupInfo;
+
+typedef struct
+{
+ BackupInfo *backupInfo;
+ uint64 bytesRead;
+
+ int workerid;
+ pthread_t worker;
+
+ bool terminated;
+} WorkerState;
+
+BackupInfo *backupInfo = NULL;
+WorkerState *workers = NULL;
+
+static pthread_mutex_t fetch_mutex = PTHREAD_MUTEX_INITIALIZER;
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -110,6 +163,9 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -140,9 +196,10 @@ static PQExpBuffer recoveryconfcontents = NULL;
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
+static void workers_progress_report(uint64 totalBytesRead, const char *filename, bool force);
static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
+static int ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void BaseBackup(void);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
@@ -151,6 +208,17 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void ParallelBackupRun(BackupInfo *backupInfo);
+static void StopBackup(BackupInfo *backupInfo);
+static void GetBackupManifest(PGconn *conn, BackupInfo *backupInfo);
+static int GetBackupFile(WorkerState *wstate);
+static BackupFile *getNextFile(BackupInfo *backupInfo);
+static int compareFileSize(const void *a, const void *b);
+static void read_label_tblspcmap(PGconn *conn, char **backup_label, char **tablespace_map);
+static void create_backup_dirs(bool basetablespace, char *tablespace, char *name);
+static void writefile(char *path, char *buf);
+static void *workerRun(void *arg);
+
static void
cleanup_directories_atexit(void)
@@ -202,6 +270,17 @@ cleanup_directories_atexit(void)
static void
disconnect_atexit(void)
{
+ /* close worker connections */
+ if (backupInfo && backupInfo->workerConns != NULL)
+ {
+ int i;
+ for (i = 0; i < numWorkers; i++)
+ {
+ if (backupInfo->workerConns[i] != NULL)
+ PQfinish(backupInfo->workerConns[i]);
+ }
+ }
+
if (conn != NULL)
PQfinish(conn);
}
@@ -349,6 +428,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -695,6 +775,93 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Print a progress report of worker threads. If verbose output
+ * is enabled, also print the current file name.
+ *
+ * Progress report is written at maximum once per second, unless the
+ * force parameter is set to true.
+ */
+static void
+workers_progress_report(uint64 totalBytesRead, const char *filename, bool force)
+{
+ int percent;
+ char totalBytesRead_str[32];
+ char totalsize_str[32];
+ pg_time_t now;
+
+ if (!showprogress)
+ return;
+
+ now = time(NULL);
+ if (now == last_progress_report && !force)
+ return; /* Max once per second */
+
+ last_progress_report = now;
+ percent = totalsize_kb ? (int) ((totalBytesRead / 1024) * 100 / totalsize_kb) : 0;
+
+ /*
+ * Avoid overflowing past 100% or the full size. This may make the total
+ * size number change as we approach the end of the backup (the estimate
+ * will always be wrong if WAL is included), but that's better than having
+ * the done column be bigger than the total.
+ */
+ if (percent > 100)
+ percent = 100;
+ if (totalBytesRead / 1024 > totalsize_kb)
+ totalsize_kb = totalBytesRead / 1024;
+
+ /*
+ * Separate step to keep platform-dependent format code out of
+ * translatable strings. And we only test for INT64_FORMAT availability
+ * in snprintf, not fprintf.
+ */
+ snprintf(totalBytesRead_str, sizeof(totalBytesRead_str), INT64_FORMAT,
+ totalBytesRead / 1024);
+ snprintf(totalsize_str, sizeof(totalsize_str), INT64_FORMAT, totalsize_kb);
+
+#define VERBOSE_FILENAME_LENGTH 35
+
+ if (verbose)
+ {
+ if (!filename)
+
+ /*
+ * No filename given, so clear the status line (used for last
+ * call)
+ */
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied %*s"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent,
+ VERBOSE_FILENAME_LENGTH + 5, "");
+ else
+ {
+ bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied, current file (%s%-*.*s)"),
+ (int) strlen(totalsize_str), totalBytesRead_str, totalsize_str,
+ percent,
+ /* Prefix with "..." if we do leading truncation */
+ truncate ? "..." : "",
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ /* Truncate filename at beginning if it's too long */
+ truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ }
+ }
+ else
+ {
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent);
+ }
+
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\r");
+ else
+ fprintf(stderr, "\n");
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -711,7 +878,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
char totalsize_str[32];
pg_time_t now;
- if (!showprogress)
+ if (!showprogress || numWorkers > 1)
return;
now = time(NULL);
@@ -1381,7 +1548,7 @@ get_tablespace_mapping(const char *dir)
* specified directory. If it's for another tablespace, it will be restored
* in the original or mapped directory.
*/
-static void
+static int
ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
{
char current_path[MAXPGPATH];
@@ -1392,6 +1559,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
bool basetablespace;
char *copybuf = NULL;
FILE *file = NULL;
+ int readBytes = 0;
basetablespace = PQgetisnull(res, rownum, 0);
if (basetablespace)
@@ -1455,7 +1623,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("invalid tar block header size: %d", r);
exit(1);
}
- totaldone += 512;
+ readBytes += 512;
current_len_left = read_tar_number(©buf[124], 12);
@@ -1486,21 +1654,14 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* Directory
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
+
+ /*
+ * In parallel mode, we create directories before fetching
+ * files so its Ok if a directory already exist.
+ */
if (mkdir(filename, pg_dir_create_mode) != 0)
{
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal
- * receiver process. Also, when the WAL directory
- * location was specified, pg_wal (or pg_xlog) has
- * already been created as a symbolic link before
- * starting the actual backup. So just ignore creation
- * failures on related directories.
- */
- if (!((pg_str_endswith(filename, "/pg_wal") ||
- pg_str_endswith(filename, "/pg_xlog") ||
- pg_str_endswith(filename, "/archive_status")) &&
- errno == EEXIST))
+ if (errno != EEXIST)
{
pg_log_error("could not create directory \"%s\": %m",
filename);
@@ -1585,7 +1746,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
fclose(file);
file = NULL;
- totaldone += r;
+ readBytes += r;
continue;
}
@@ -1594,7 +1755,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("could not write to file \"%s\": %m", filename);
exit(1);
}
- totaldone += r;
+ readBytes += r;
+ totaldone = readBytes;
progress_report(rownum, filename, false);
current_len_left -= r;
@@ -1622,13 +1784,11 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
if (copybuf != NULL)
PQfreemem(copybuf);
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
/*
* No data is synced here, everything is done for all tablespaces at the
* end.
*/
+ return readBytes;
}
@@ -1716,7 +1876,8 @@ BaseBackup(void)
}
basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ psprintf("%s LABEL '%s' %s %s %s %s %s %s %s",
+ (numWorkers > 1) ? "START_BACKUP" : "BASE_BACKUP",
escaped_label,
showprogress ? "PROGRESS" : "",
includewal == FETCH_WAL ? "WAL" : "",
@@ -1774,7 +1935,7 @@ BaseBackup(void)
/*
* Get the header
*/
- res = PQgetResult(conn);
+ tablespacehdr = res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not get backup header: %s",
@@ -1830,24 +1991,74 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
+ if (numWorkers > 1)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ int j = 0,
+ k = 0;
- if (showprogress)
+ backupInfo = palloc0(sizeof(BackupInfo));
+ backupInfo->workerConns = (PGconn **) palloc0(sizeof(PGconn *) * numWorkers);
+ backupInfo->tablespacecount = tablespacecount;
+ backupInfo->numWorkers = numWorkers;
+ strlcpy(backupInfo->xlogstart, xlogstart, sizeof(backupInfo->xlogstart));
+
+ read_label_tblspcmap(conn, &backupInfo->backup_label, &backupInfo->tablespace_map);
+
+ /* retrieve backup manifest from the server. **/
+ GetBackupManifest(conn, backupInfo);
+
+ /*
+ * add backup_label in backup, (for tar format, ReceiveTarFile() will
+ * take care of it).
+ */
+ if (format == 'p')
+ writefile("backup_label", backupInfo->backup_label);
+
+ /*
+ * Flatten the file list to avoid unnecessary locks and enable the sequential
+ * access to file list. (Creating an array of BackupFile structre pointers).
+ */
+ backupInfo->files =
+ (BackupFile **) palloc0(sizeof(BackupFile *) * backupInfo->totalfiles);
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[i];
+
+ for (j = 0; j < curTsInfo->numFiles; j++)
+ {
+ backupInfo->files[k] = &curTsInfo->backupFiles[j];
+ k++;
+ }
+ }
+
+ ParallelBackupRun(backupInfo);
+ StopBackup(backupInfo);
+ }
+ else
{
- progress_report(PQntuples(res), NULL, true);
- if (isatty(fileno(stderr)))
- fprintf(stderr, "\n"); /* Need to move to next line */
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
+
+ if (showprogress)
+ {
+ progress_report(PQntuples(tablespacehdr), NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
}
+ /* Write recovery contents */
+ if (format == 'p' && writerecoveryconf)
+ WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
+
PQclear(res);
/*
@@ -2043,6 +2254,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2070,7 +2282,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2211,6 +2423,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2325,6 +2540,22 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (format != 'p' && numWorkers > 1)
+ {
+ pg_log_error("parallel jobs are only supported with 'plain' format");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2397,3 +2628,406 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+/*
+ * Thread worker
+ */
+static void *
+workerRun(void *arg)
+{
+ WorkerState *wstate = (WorkerState *) arg;
+
+ GetBackupFile(wstate);
+
+ wstate->terminated = true;
+ return NULL;
+}
+
+/*
+ * Runs the worker threads and updates progress until all workers have
+ * terminated/completed.
+ */
+static void
+ParallelBackupRun(BackupInfo *backupInfo)
+{
+ int status,
+ i;
+ bool threadsActive = true;
+ uint64 totalBytes = 0;
+
+ workers = (WorkerState *) palloc0(sizeof(WorkerState) * numWorkers);
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ worker->backupInfo = backupInfo;
+ worker->workerid = i;
+ worker->bytesRead = 0;
+ worker->terminated = false;
+
+ backupInfo->workerConns[i] = GetConnection();
+ status = pthread_create(&worker->worker, NULL, workerRun, worker);
+ if (status != 0)
+ {
+ pg_log_error("failed to create thread: %m");
+ exit(1);
+ }
+
+ if (verbose)
+ pg_log_info("backup worker (%d) created, %d", i, status);
+ }
+
+ /*
+ * This is the main thread for updating progrsss. It waits for workers to
+ * complete and gets updated status during every loop iteration.
+ */
+ while(threadsActive)
+ {
+ char *filename = NULL;
+
+ threadsActive = false;
+ totalBytes = 0;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ totalBytes += worker->bytesRead;
+ threadsActive |= !worker->terminated;
+ }
+
+ if (backupInfo->fileIndex < backupInfo->totalfiles)
+ filename = backupInfo->files[backupInfo->fileIndex]->name;
+
+ workers_progress_report(totalBytes, filename, false);
+ pg_usleep(100000);
+ }
+
+ if (showprogress)
+ {
+ workers_progress_report(totalBytes, NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
+}
+
+/*
+ * Take the system out of backup mode.
+ */
+static void
+StopBackup(BackupInfo *backupInfo)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP LABEL '%s' %s %s",
+ backupInfo->backup_label,
+ includewal == FETCH_WAL ? "WAL" : "",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* receive pg_control and wal files */
+ ReceiveAndUnpackTarFile(conn, res, tablespacecount);
+ PQclear(res);
+}
+
+/*
+ * Retrive backup manifest from the server and populate TablespaceInfo struct
+ * to keep track of tablespaces and its files.
+ */
+static void
+GetBackupManifest(PGconn *conn, BackupInfo *backupInfo)
+{
+ int i;
+ PGresult *res = NULL;
+ char *basebkp;
+
+ backupInfo->tsInfo = palloc0(sizeof(TablespaceInfo) * backupInfo->tablespacecount);
+ TablespaceInfo *tsInfo = backupInfo->tsInfo;
+
+ /*
+ * Get list of files.
+ */
+ basebkp = psprintf("SEND_BACKUP_MANIFEST %s",
+ format == 't' ? "TABLESPACE_MAP" : "");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "SEND_BACKUP_MANIFEST", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /*
+ * The list of files is grouped by tablespaces, and we want to fetch them
+ * in the same order.
+ */
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ bool basetablespace;
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get backup header: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ tsInfo[i].tblspcOid = atol(PQgetvalue(tablespacehdr, i, 0));
+ tsInfo[i].tablespace = PQgetvalue(tablespacehdr, i, 1);
+ tsInfo[i].numFiles = PQntuples(res);
+ tsInfo[i].backupFiles = palloc0(sizeof(BackupFile) * tsInfo[i].numFiles);
+
+ /* keep count of all files in backup */
+ backupInfo->totalfiles += tsInfo[i].numFiles;
+
+ for (int j = 0; j < tsInfo[i].numFiles; j++)
+ {
+ char *name = PQgetvalue(res, j, 0);
+ char type = PQgetvalue(res, j, 1)[0];
+ int32 size = atol(PQgetvalue(res, j, 2));
+ time_t mtime = atol(PQgetvalue(res, j, 3));
+
+ /*
+ * In 'plain' format, create backup directories first.
+ */
+ if (format == 'p' && type == 'd')
+ create_backup_dirs(basetablespace, tsInfo[i].tablespace, name);
+
+ strlcpy(tsInfo[i].backupFiles[j].name, name, MAXPGPATH);
+ tsInfo[i].backupFiles[j].type = type;
+ tsInfo[i].backupFiles[j].size = size;
+ tsInfo[i].backupFiles[j].mtime = mtime;
+ tsInfo[i].backupFiles[j].tsIndex = i;
+ }
+
+ /* sort files in descending order, based on size */
+ qsort(tsInfo[i].backupFiles, tsInfo[i].numFiles,
+ sizeof(BackupFile), &compareFileSize);
+ PQclear(res);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s", PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+}
+
+/*
+ * Retrive and write backup file from the server. The file list is provided by
+ * worker state. It pulls a single file from this list and writes it to the
+ * backup directory.
+ */
+static int
+GetBackupFile(WorkerState *wstate)
+{
+ PGresult *res = NULL;
+ PGconn *worker_conn = NULL;
+ BackupFile *fetchFile = NULL;
+ BackupInfo *backupInfo = NULL;
+
+ backupInfo = wstate->backupInfo;
+ worker_conn = backupInfo->workerConns[wstate->workerid];
+ while ((fetchFile = getNextFile(backupInfo)) != NULL)
+ {
+ PQExpBuffer buf = createPQExpBuffer();
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[fetchFile->tsIndex];
+
+
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
+ appendPQExpBuffer(buf, "SEND_BACKUP_FILES ( '%s' )", fetchFile->name);
+
+ /* add options */
+ appendPQExpBuffer(buf, " TABLESPACE_PATH '%s' START_WAL_LOCATION '%s' %s %s",
+ curTsInfo->tablespace,
+ backupInfo->xlogstart,
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (maxrate > 0)
+ appendPQExpBuffer(buf, " MAX_RATE %u", maxrate);
+
+ if (!worker_conn)
+ return 1;
+
+ if (PQsendQuery(worker_conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(worker_conn));
+ return 1;
+ }
+
+ destroyPQExpBuffer(buf);
+
+ /* process file contents, also count bytesRead for progress */
+ wstate->bytesRead +=
+ ReceiveAndUnpackTarFile(worker_conn, tablespacehdr, fetchFile->tsIndex);
+
+ res = PQgetResult(worker_conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data stream: %s",
+ PQerrorMessage(worker_conn));
+ exit(1);
+ }
+
+ res = PQgetResult(worker_conn);
+ }
+
+ PQclear(res);
+ return 0;
+}
+
+/*
+ * Increment fileIndex and store it in a local variable so that even a
+ * context switch does not affect the file index value and we don't accidentally
+ * increment the value twice and therefore skip some files.
+ */
+static BackupFile*
+getNextFile(BackupInfo *backupInfo)
+{
+ int fileIndex = 0;
+
+ pthread_mutex_lock(&fetch_mutex);
+ fileIndex = backupInfo->fileIndex++;
+ pthread_mutex_unlock(&fetch_mutex);
+
+ if (fileIndex >= backupInfo->totalfiles)
+ return NULL;
+
+ return backupInfo->files[fileIndex];
+}
+
+/* qsort comparator for BackupFile (sort descending order) */
+static int
+compareFileSize(const void *a, const void *b)
+{
+ const BackupFile *v1 = (BackupFile *) a;
+ const BackupFile *v2 = (BackupFile *) b;
+
+ if (v1->size > v2->size)
+ return -1;
+ if (v1->size < v2->size)
+ return 1;
+
+ return 0;
+}
+
+static void
+read_label_tblspcmap(PGconn *conn, char **backuplabel, char **tblspc_map)
+{
+ PGresult *res = NULL;
+
+ Assert(backuplabel != NULL);
+ Assert(tblspc_map != NULL);
+
+ /*
+ * Get Backup label and tablespace map data.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ *backuplabel = PQgetvalue(res, 0, 0); /* backup_label */
+ if (!PQgetisnull(res, 0, 1))
+ *tblspc_map = PQgetvalue(res, 0, 1); /* tablespace_map */
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ PQclear(res);
+}
+
+/*
+ * Create backup direcotries while taking care of tablespace path. If tablespace
+ * mapping (with -T) is given then the directory will be created on the mapped
+ * path.
+ */
+static void
+create_backup_dirs(bool basetablespace, char *tablespace, char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ Assert(name != NULL);
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, name);
+ else
+ {
+ Assert(tablespace != NULL);
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tablespace), (name + strlen(tablespace) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+}
+
+/*
+ * General function for writing to a file; creates one if it doesn't exist
+ */
+static void
+writefile(char *path, char *buf)
+{
+ FILE *f;
+ char pathbuf[MAXPGPATH];
+
+ snprintf(pathbuf, MAXPGPATH, "%s/%s", basedir, path);
+ f = fopen(pathbuf, "w");
+ if (f == NULL)
+ {
+ pg_log_error("could not open file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fwrite(buf, strlen(buf), 1, f) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fclose(f))
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+}
--
2.21.0 (Apple Git-122)
0001-Refactor-some-basebackup-code-to-increase-reusabilit_v3.patchapplication/octet-stream; name=0001-Refactor-some-basebackup-code-to-increase-reusabilit_v3.patchDownload
From 3c0dc234ab7d4fd88261a0cf16727a7ebbf1e69e Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 9 Oct 2019 12:39:41 +0500
Subject: [PATCH 1/5] Refactor some basebackup code to increase reusability, in
anticipation of adding parallel backup
---
src/backend/access/transam/xlog.c | 192 +++++-----
src/backend/replication/basebackup.c | 512 ++++++++++++++-------------
src/include/access/xlog.h | 2 +
3 files changed, 371 insertions(+), 335 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2e3cc51006..aa7d82a045 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10286,10 +10286,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
{
bool gotUniqueStartpoint = false;
- DIR *tblspcdir;
- struct dirent *de;
- tablespaceinfo *ti;
- int datadirpathlen;
/*
* Force an XLOG file switch before the checkpoint, to ensure that the
@@ -10415,93 +10411,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
if (exclusive)
tblspcmapfile = makeStringInfo();
- datadirpathlen = strlen(DataDir);
-
- /* Collect information about all tablespaces */
- tblspcdir = AllocateDir("pg_tblspc");
- while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
- {
- char fullpath[MAXPGPATH + 10];
- char linkpath[MAXPGPATH];
- char *relpath = NULL;
- int rllen;
- StringInfoData buflinkpath;
- char *s = linkpath;
-
- /* Skip special stuff */
- if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
- continue;
-
- snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
-
-#if defined(HAVE_READLINK) || defined(WIN32)
- rllen = readlink(fullpath, linkpath, sizeof(linkpath));
- if (rllen < 0)
- {
- ereport(WARNING,
- (errmsg("could not read symbolic link \"%s\": %m",
- fullpath)));
- continue;
- }
- else if (rllen >= sizeof(linkpath))
- {
- ereport(WARNING,
- (errmsg("symbolic link \"%s\" target is too long",
- fullpath)));
- continue;
- }
- linkpath[rllen] = '\0';
-
- /*
- * Add the escape character '\\' before newline in a string to
- * ensure that we can distinguish between the newline in the
- * tablespace path and end of line while reading tablespace_map
- * file during archive recovery.
- */
- initStringInfo(&buflinkpath);
-
- while (*s)
- {
- if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
- appendStringInfoChar(&buflinkpath, '\\');
- appendStringInfoChar(&buflinkpath, *s++);
- }
-
- /*
- * Relpath holds the relative path of the tablespace directory
- * when it's located within PGDATA, or NULL if it's located
- * elsewhere.
- */
- if (rllen > datadirpathlen &&
- strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
- IS_DIR_SEP(linkpath[datadirpathlen]))
- relpath = linkpath + datadirpathlen + 1;
-
- ti = palloc(sizeof(tablespaceinfo));
- ti->oid = pstrdup(de->d_name);
- ti->path = pstrdup(buflinkpath.data);
- ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
-
- if (tablespaces)
- *tablespaces = lappend(*tablespaces, ti);
-
- appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
-
- pfree(buflinkpath.data);
-#else
-
- /*
- * If the platform does not have symbolic links, it should not be
- * possible to have tablespaces - clearly somebody else created
- * them. Warn about it and ignore.
- */
- ereport(WARNING,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("tablespaces are not supported on this platform")));
-#endif
- }
- FreeDir(tblspcdir);
+ collectTablespaces(tablespaces, tblspcmapfile, infotbssize, needtblspcmapfile);
/*
* Construct backup label file
@@ -12277,3 +12187,103 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Collect information about all tablespaces.
+ */
+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile)
+{
+ DIR *tblspcdir;
+ struct dirent *de;
+ tablespaceinfo *ti;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ tblspcdir = AllocateDir("pg_tblspc");
+ while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+ {
+ char fullpath[MAXPGPATH + 10];
+ char linkpath[MAXPGPATH];
+ char *relpath = NULL;
+ int rllen;
+ StringInfoData buflinkpath;
+ char *s = linkpath;
+
+ /* Skip special stuff */
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+
+#if defined(HAVE_READLINK) || defined(WIN32)
+ rllen = readlink(fullpath, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not read symbolic link \"%s\": %m",
+ fullpath)));
+ continue;
+ }
+ else if (rllen >= sizeof(linkpath))
+ {
+ ereport(WARNING,
+ (errmsg("symbolic link \"%s\" target is too long",
+ fullpath)));
+ continue;
+ }
+ linkpath[rllen] = '\0';
+
+ /*
+ * Add the escape character '\\' before newline in a string to
+ * ensure that we can distinguish between the newline in the
+ * tablespace path and end of line while reading tablespace_map
+ * file during archive recovery.
+ */
+ initStringInfo(&buflinkpath);
+
+ while (*s)
+ {
+ if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
+ appendStringInfoChar(&buflinkpath, '\\');
+ appendStringInfoChar(&buflinkpath, *s++);
+ }
+
+ /*
+ * Relpath holds the relative path of the tablespace directory
+ * when it's located within PGDATA, or NULL if it's located
+ * elsewhere.
+ */
+ if (rllen > datadirpathlen &&
+ strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
+ IS_DIR_SEP(linkpath[datadirpathlen]))
+ relpath = linkpath + datadirpathlen + 1;
+
+ ti = palloc(sizeof(tablespaceinfo));
+ ti->oid = pstrdup(de->d_name);
+ ti->path = pstrdup(buflinkpath.data);
+ ti->rpath = relpath ? pstrdup(relpath) : NULL;
+ ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+
+ if (tablespaces)
+ *tablespaces = lappend(*tablespaces, ti);
+
+ appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
+
+ pfree(buflinkpath.data);
+#else
+
+ /*
+ * If the platform does not have symbolic links, it should not be
+ * possible to have tablespaces - clearly somebody else created
+ * them. Warn about it and ignore.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("tablespaces are not supported on this platform")));
+#endif
+ }
+ FreeDir(tblspcdir);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d0f210de8c..5f25f5848d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -68,10 +68,12 @@ static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
static void perform_base_backup(basebackup_options *opt);
+static void include_wal_files(XLogRecPtr endptr);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
+static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
/* Was the backup currently in-progress initiated in recovery mode? */
@@ -293,29 +295,7 @@ perform_base_backup(basebackup_options *opt)
/* Send tablespace header */
SendBackupHeader(tablespaces);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ setup_throttle(opt->maxrate);
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
@@ -384,227 +364,7 @@ perform_base_backup(basebackup_options *opt)
* We've left the last tar file "open", so we can now append the
* required WAL files to it.
*/
- char pathbuf[MAXPGPATH];
- XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
- struct stat statbuf;
- List *historyFileList = NIL;
- List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
- ListCell *lc;
- TimeLineID tli;
-
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
- /* Ok, we have everything we need. Send the WAL files. */
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- FILE *fp;
- char buf[TAR_SEND_SIZE];
- size_t cnt;
- pgoff_t len = 0;
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
-
- fp = AllocateFile(pathbuf, "rb");
- if (fp == NULL)
- {
- int save_errno = errno;
-
- /*
- * Most likely reason for this is that the file was already
- * removed by a checkpoint, so check for that to get a better
- * error message.
- */
- CheckXLogRemoved(segno, tli);
-
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", pathbuf)));
- }
-
- if (fstat(fileno(fp), &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- pathbuf)));
- if (statbuf.st_size != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
-
- while ((cnt = fread(buf, 1,
- Min(sizeof(buf), wal_segment_size - len),
- fp)) > 0)
- {
- CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
-
- len += cnt;
- throttle(cnt);
-
- if (len == wal_segment_size)
- break;
- }
-
- CHECK_FREAD_ERROR(fp, pathbuf);
-
- if (len != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* wal_segment_size is a multiple of 512, so no need for padding */
-
- FreeFile(fp);
-
- /*
- * Mark file as archived, otherwise files can get archived again
- * after promotion of a new node. This is in line with
- * walreceiver.c always doing an XLogArchiveForceDone() after a
- * complete segment.
- */
- StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /*
- * Send timeline history files too. Only the latest timeline history
- * file is required for recovery, and even that only if there happens
- * to be a timeline switch in the first WAL segment that contains the
- * checkpoint record, or if we're taking a base backup from a standby
- * server and the target timeline changes while the backup is taken.
- * But they are small and highly useful for debugging purposes, so
- * better include them all, always.
- */
- foreach(lc, historyFileList)
- {
- char *fname = lfirst(lc);
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
-
- if (lstat(pathbuf, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m", pathbuf)));
-
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
-
- /* unconditionally mark file as archived */
- StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
- }
+ include_wal_files(endptr);
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
@@ -1743,3 +1503,267 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+/*
+ * Append the required WAL files to the backup tar file. It assumes that the
+ * last tar file is "open" and the WALs will be appended to it.
+ */
+static void
+include_wal_files(XLogRecPtr endptr)
+{
+ /*
+ * We've left the last tar file "open", so we can now append the
+ * required WAL files to it.
+ */
+ char pathbuf[MAXPGPATH];
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ struct stat statbuf;
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and
+ * include all WAL files in the range between 'startptr' and 'endptr',
+ * regardless of the timeline the file is stamped with. If there are
+ * some spurious WAL files belonging to timelines that don't belong in
+ * this server's history, they will be included too. Normally there
+ * shouldn't be such files, but if there are, there's little harm in
+ * including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we
+ * need were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from
+ * oldest to newest, to reduce the chance that a file is recycled
+ * before we get a chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since
+ * we are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ /* Ok, we have everything we need. Send the WAL files. */
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ FILE *fp;
+ char buf[TAR_SEND_SIZE];
+ size_t cnt;
+ pgoff_t len = 0;
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+
+ fp = AllocateFile(pathbuf, "rb");
+ if (fp == NULL)
+ {
+ int save_errno = errno;
+
+ /*
+ * Most likely reason for this is that the file was already
+ * removed by a checkpoint, so check for that to get a better
+ * error message.
+ */
+ CheckXLogRemoved(segno, tli);
+
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", pathbuf)));
+ }
+
+ if (fstat(fileno(fp), &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ pathbuf)));
+ if (statbuf.st_size != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* send the WAL file itself */
+ _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+
+ while ((cnt = fread(buf, 1,
+ Min(sizeof(buf), wal_segment_size - len),
+ fp)) > 0)
+ {
+ CheckXLogRemoved(segno, tli);
+ /* Send the chunk as a CopyData message */
+ if (pq_putmessage('d', buf, cnt))
+ ereport(ERROR,
+ (errmsg("base backup could not send data, aborting backup")));
+
+ len += cnt;
+ throttle(cnt);
+
+ if (len == wal_segment_size)
+ break;
+ }
+
+ CHECK_FREAD_ERROR(fp, pathbuf);
+
+ if (len != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* wal_segment_size is a multiple of 512, so no need for padding */
+
+ FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again
+ * after promotion of a new node. This is in line with
+ * walreceiver.c always doing an XLogArchiveForceDone() after a
+ * complete segment.
+ */
+ StatusFilePath(pathbuf, walFileName, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /*
+ * Send timeline history files too. Only the latest timeline history
+ * file is required for recovery, and even that only if there happens
+ * to be a timeline switch in the first WAL segment that contains the
+ * checkpoint record, or if we're taking a base backup from a standby
+ * server and the target timeline changes while the backup is taken.
+ * But they are small and highly useful for debugging purposes, so
+ * better include them all, always.
+ */
+ foreach(lc, historyFileList)
+ {
+ char *fname = lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+}
+
+/*
+ * Setup and activate network throttling, if client requested it
+ */
+static void
+setup_throttle(int maxrate)
+{
+ if (maxrate > 0)
+ {
+ throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ /* Enable throttling. */
+ throttling_counter = 0;
+
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..5b0aa8ae85 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -350,6 +350,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
bool needtblspcmapfile);
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
+extern void collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile);
extern void do_pg_abort_backup(void);
extern SessionBackupState get_backup_status(void);
--
2.21.0 (Apple Git-122)
0002-backend-changes-for-parallel-backup_v3.patchapplication/octet-stream; name=0002-backend-changes-for-parallel-backup_v3.patchDownload
From ffb578517e75d81f175cdbb86a6d3f62e971ccda Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 22:59:28 +0500
Subject: [PATCH 2/5] backend changes for parallel backup
---
src/backend/access/transam/xlog.c | 2 +-
src/backend/replication/basebackup.c | 522 ++++++++++++++++++++++++-
src/backend/replication/repl_gram.y | 72 ++++
src/backend/replication/repl_scanner.l | 7 +
src/include/nodes/replnodes.h | 10 +
src/include/replication/basebackup.h | 2 +-
6 files changed, 604 insertions(+), 11 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa7d82a045..842b317c8d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12265,7 +12265,7 @@ collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 5f25f5848d..4a382c4558 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -41,6 +41,7 @@
#include "utils/ps_status.h"
#include "utils/relcache.h"
#include "utils/timestamp.h"
+#include "utils/pg_lsn.h"
typedef struct
@@ -52,11 +53,34 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ const char *tablespace_path;
+ XLogRecPtr wal_location;
} basebackup_options;
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
+
+#define STORE_BACKUPFILE(_backupfiles, _name, _type, _size, _mtime) \
+ do { \
+ if (_backupfiles != NULL) { \
+ BackupFile *file = palloc0(sizeof(BackupFile)); \
+ strlcpy(file->name, _name, sizeof(file->name)); \
+ file->type = _type; \
+ file->size = _size; \
+ file->mtime = _mtime; \
+ *_backupfiles = lappend(*_backupfiles, file); \
+ } \
+ } while(0)
static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks);
+static int64 sendDir_(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, List **files);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -76,6 +100,11 @@ static void throttle(size_t increment);
static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void StartBackup(basebackup_options *opt);
+static void StopBackup(basebackup_options *opt);
+static void SendBackupManifest(basebackup_options *opt);
+static void SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok);
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -337,7 +366,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -412,6 +441,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_tablespace_path = false;
+ bool o_wal_location = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -500,6 +531,29 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "tablespace_path") == 0)
+ {
+ if (o_tablespace_path)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->tablespace_path = strVal(defel->arg);
+ o_tablespace_path = true;
+ }
+ else if (strcmp(defel->defname, "start_wal_location") == 0)
+ {
+ bool have_error = false;
+ char *wal_location;
+
+ if (o_wal_location)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ wal_location = strVal(defel->arg);
+ opt->wal_location = pg_lsn_in_internal(wal_location, &have_error);
+ o_wal_location = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
@@ -534,7 +588,29 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ StartBackup(&opt);
+ break;
+ case SEND_BACKUP_MANIFEST:
+ SendBackupManifest(&opt);
+ break;
+ case SEND_BACKUP_FILES:
+ SendBackupFiles(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ StopBackup(&opt);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -677,6 +753,61 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_puttextmessage('C', "SELECT");
}
+/*
+ * Send a single resultset containing backup label and tablespace map
+ */
+static void
+SendStartBackupResult(StringInfo labelfile, StringInfo tblspc_map_file)
+{
+ StringInfoData buf;
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "label");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tablespacemap");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = labelfile->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, labelfile->data, len);
+
+ if (tblspc_map_file)
+ {
+ len = tblspc_map_file->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, tblspc_map_file->data, len);
+ }
+ else
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ }
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
/*
* Inject a file with given name and content in the output tar stream.
*/
@@ -728,7 +859,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool sizeonly, List **files)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -757,11 +888,11 @@ sendTablespace(char *path, bool sizeonly)
return 0;
}
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir_(pathbuf, strlen(path), sizeonly, NIL, true, files);
return size;
}
@@ -779,8 +910,16 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks)
+sendDir(const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks)
+{
+ return sendDir_(path, basepathlen, sizeonly, tablespaces, sendtblspclinks, NULL);
+}
+
+/* Same as sendDir(), except that it also returns a list of filenames in PGDATA */
+static int64
+sendDir_(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+ bool sendtblspclinks, List **files)
{
DIR *dir;
struct dirent *de;
@@ -934,6 +1073,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
excludeFound = true;
break;
@@ -950,6 +1091,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
continue;
}
@@ -971,6 +1114,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
sizeonly);
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
+ STORE_BACKUPFILE(files, "./pg_wal/archive_status", 'd', -1, statbuf.st_mtime);
+
continue; /* don't recurse into pg_wal */
}
@@ -1000,6 +1146,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ STORE_BACKUPFILE(files, pathbuf, 'l', statbuf.st_size, statbuf.st_mtime);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1026,6 +1173,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
+ STORE_BACKUPFILE(files, pathbuf, 'd', -1, statbuf.st_mtime);
+
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1056,13 +1205,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir_(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks, files);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ STORE_BACKUPFILE(files, pathbuf, 'f', statbuf.st_size, statbuf.st_mtime);
+
+ if (!sizeonly && files == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
@@ -1767,3 +1918,356 @@ setup_throttle(int maxrate)
throttling_counter = -1;
}
}
+
+/*
+ * StartBackup - prepare to start an online backup.
+ *
+ * This function calls do_pg_start_backup() and sends back starting checkpoint,
+ * available tablespaces, content of backup_label and tablespace_map files.
+ */
+static void
+StartBackup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo labelfile;
+ StringInfo tblspc_map_file = NULL;
+ int datadirpathlen;
+ List *tablespaces = NIL;
+
+ datadirpathlen = strlen(DataDir);
+
+ backup_started_in_recovery = RecoveryInProgress();
+
+ labelfile = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+
+ total_checksum_failures = 0;
+
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ labelfile, &tablespaces,
+ tblspc_map_file,
+ opt->progress, opt->sendtblspcmapfile);
+
+ /*
+ * Once do_pg_start_backup has been called, ensure that any failure causes
+ * us to abort the backup so we don't "leak" a backup counter. For this
+ * reason, *all* functionality between do_pg_start_backup() and the end of
+ * do_pg_stop_backup() should be inside the error cleanup block!
+ */
+
+ PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ {
+ tablespaceinfo *ti;
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ if ((tblspc_map_file && tblspc_map_file->len <= 0) ||
+ !opt->sendtblspcmapfile)
+ tblspc_map_file = NULL;
+
+ /* send backup_label and tablespace_map to frontend */
+ SendStartBackupResult(labelfile, tblspc_map_file);
+ }
+ PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+}
+
+/*
+ * StopBackup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionaly WAL segments and ending WAL location.
+ */
+static void
+StopBackup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+ struct stat statbuf;
+ StringInfoData buf;
+ char *labelfile = NULL;
+
+ PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ {
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
+
+ /* stop backup */
+ endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
+
+ if (opt->includewal)
+ include_wal_files(endptr);
+
+ pq_putemptymessage('c'); /* CopyDone */
+ SendXlogRecPtrResult(endptr, endtli);
+ }
+ PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+}
+
+/*
+ * SendBackupManifest() - sends a list of filenames to frontend
+ *
+ * The function collects a list of filenames, necessary for a complete backup and
+ * sends this list to the client.
+ */
+static void
+SendBackupManifest(basebackup_options *opt)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ List *tablespaces = NIL;
+ StringInfo tblspc_map_file = NULL;
+
+ tblspc_map_file = makeStringInfo();
+ collectTablespaces(&tablespaces, tblspc_map_file, false, false);
+
+ /* Add a node for the base directory at the end */
+ tablespaceinfo *ti = palloc0(sizeof(tablespaceinfo));
+ tablespaces = lappend(tablespaces, ti);
+
+ foreach(lc, tablespaces)
+ {
+ List *backupFiles = NULL;
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+
+ if (ti->path == NULL)
+ sendDir_(".", 1, false, NIL, !opt->sendtblspcmapfile, &backupFiles);
+ else
+ sendTablespace(ti->path, false, &backupFiles);
+
+ /* Construct and send the list of filenames */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 4); /* n field */
+
+ /* First field - file name */
+ pq_sendstring(&buf, "filename");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "type");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, CHAROID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - mtime */
+ pq_sendstring(&buf, "mtime");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, backupFiles)
+ {
+ BackupFile *backupFile = (BackupFile *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 4); /* number of columns */
+
+ /* send file name */
+ len = strlen(backupFile->name);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, backupFile->name, len);
+
+ /* send type */
+ pq_sendint32(&buf, 1);
+ pq_sendbyte(&buf, backupFile->type);
+
+ /* send size */
+ send_int8_string(&buf, backupFile->size);
+
+ /* send mtime */
+ send_int8_string(&buf, backupFile->mtime);
+
+ pq_endmessage(&buf);
+ }
+
+ pfree(backupFiles);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * SendBackupFiles() - sends the actual files to the caller
+ *
+ * The function sends out the given file(s) over to the caller using the COPY
+ * protocol.
+ */
+static void
+SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ bool basetablespace = true;
+ int basepathlen = 1;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ if (is_absolute_path(opt->tablespace_path))
+ {
+ basepathlen = strlen(opt->tablespace_path);
+ basetablespace = false;
+ }
+
+ /* set backup start location. */
+ startptr = opt->wal_location;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /* Allow symbolic links in pg_tblspc only */
+ if (strstr(pathbuf, "./pg_tblspc") != NULL &&
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ char linkpath[MAXPGPATH];
+ int rllen;
+
+ rllen = readlink(pathbuf, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read symbolic link \"%s\": %m",
+ pathbuf)));
+ if (rllen >= sizeof(linkpath))
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("symbolic link \"%s\" target is too long",
+ pathbuf)));
+ linkpath[rllen] = '\0';
+
+ _tarWriteHeader(pathbuf, linkpath, &statbuf, false);
+ }
+ else if (S_ISDIR(statbuf.st_mode))
+ {
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, InvalidOid);
+ }
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..9e2499814b 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,6 +87,12 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_SEND_BACKUP_MANIFEST
+%token K_SEND_BACKUP_FILES
+%token K_STOP_BACKUP
+%token K_START_WAL_LOCATION
+%token K_TABLESPACE_PATH
%type <node> command
%type <node> base_backup start_replication start_logical_replication
@@ -102,6 +108,8 @@ static SQLCmd *make_sqlcmd(void);
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
+%type <list> backup_files backup_files_list
+%type <node> backup_file
%%
@@ -162,6 +170,36 @@ base_backup:
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_START_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_BACKUP_MANIFEST base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = SEND_BACKUP_MANIFEST;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_BACKUP_FILES backup_files base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_BACKUP_FILES;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
+ }
+ | K_STOP_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
$$ = (Node *) cmd;
}
;
@@ -214,6 +252,40 @@ base_backup_opt:
$$ = makeDefElem("noverify_checksums",
(Node *)makeInteger(true), -1);
}
+ | K_START_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1);
+ }
+ | K_TABLESPACE_PATH SCONST
+ {
+ $$ = makeDefElem("tablespace_path",
+ (Node *)makeString($2), -1);
+ }
+ ;
+
+backup_files:
+ '(' backup_files_list ')'
+ {
+ $$ = $2;
+ }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ backup_file
+ {
+ $$ = list_make1($1);
+ }
+ | backup_files_list ',' backup_file
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
;
create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..7a1bb54da8 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,13 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+SEND_BACKUP_MANIFEST { return K_SEND_BACKUP_MANIFEST; }
+SEND_BACKUP_FILES { return K_SEND_BACKUP_FILES; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+START_WAL_LOCATION { return K_START_WAL_LOCATION; }
+TABLESPACE_PATH { return K_TABLESPACE_PATH; }
+
"," { return ','; }
";" { return ';'; }
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 1e3ed4e19f..bc47446176 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,14 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ SEND_BACKUP_MANIFEST,
+ SEND_BACKUP_FILES,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +50,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..9e792af99d 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool sizeonly, List **files);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122)
On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers instead of
processes and fetches a single file at a time. The tar format has been disabled for first version of parallel backup.
Looking at 0001-0003:
It's not clear to me what the purpose of the start WAL location is
supposed to be. As far as I can see, SendBackupFiles() stores it in a
variable which is then used for exactly nothing, and nothing else uses
it. It seems like that would be part of a potential incremental
backup feature, but I don't see what it's got to do with parallel full
backup.
The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.
STORE_BACKUPFILE() seems like maybe it should be a function rather
than a macro, and also probably be renamed, because it doesn't store
files and the argument's not necessarily a file.
SendBackupManifest() does not send a backup manifest in the sense
contemplated by the email thread on that subject. It sends a file
list. That seems like the right idea - IMHO, anyway - but you need to
do a thorough renaming.
I think it would be fine to decide that this facility won't support
exclusive-mode backup.
I don't think much of having both sendDir() and sendDir_(). The latter
name is inconsistent with any naming convention we have, and there
seems to be no reason not to just add an argument to sendDir() and
change the callers.
I think we should rename - perhaps as a preparatory patch - the
sizeonly flag to dryrun, or something like that.
The resource cleanup does not look right. You've included calls to
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
and StopBackup(), but what happens if there is an error or even a
clean shutdown of the connection in between? I think that there needs
to be some change here to ensure that a walsender will always call
base_backup_cleanup() when it exits; I think that'd probably remove
the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
we have already. This might also be something that could be done as a
separate, prepatory refactoring patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:I have updated the patch to include the changes suggested by Jeevan.
This patch also implements the thread workers instead of
processes and fetches a single file at a time. The tar format has been
disabled for first version of parallel backup.
Looking at 0001-0003:
It's not clear to me what the purpose of the start WAL location is
supposed to be. As far as I can see, SendBackupFiles() stores it in a
variable which is then used for exactly nothing, and nothing else uses
it. It seems like that would be part of a potential incremental
backup feature, but I don't see what it's got to do with parallel full
backup.
'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WAL location.
The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.
This is to calculate the basepathlen. We need to exclude the tablespace
location (or
base path) from the filename before it is sent to the client with sendFile
call. I added
this option primarily to avoid performing string manipulation on filename
to extract the
tablespace location and then calculate the basepathlen.
Alternatively we can do it by extracting the base path from the received
filename. What
do you suggest?
STORE_BACKUPFILE() seems like maybe it should be a function rather
than a macro, and also probably be renamed, because it doesn't store
files and the argument's not necessarily a file.
Sure.
SendBackupManifest() does not send a backup manifest in the sense
contemplated by the email thread on that subject. It sends a file
list. That seems like the right idea - IMHO, anyway - but you need to
do a thorough renaming.
I'm considering the following command names:
START_BACKUP
- Starts the backup process
SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
- Sends the list of all files (along with file information such as
filename, file type (directory/file/link),
file size and file mtime for each file) to be backed up.
SEND_BACKUP_FILES
- Sends one or more files to the client.
STOP_BACKUP
- Stops the backup process.
I'll update the function names accordingly after your confirmation. Of
course, suggestions for
better names are welcome.
I think it would be fine to decide that this facility won't support
exclusive-mode backup.
Sure. Will drop this patch.
I don't think much of having both sendDir() and sendDir_(). The latter
name is inconsistent with any naming convention we have, and there
seems to be no reason not to just add an argument to sendDir() and
change the callers.
I think we should rename - perhaps as a preparatory patch - the
sizeonly flag to dryrun, or something like that.
Sure, will take care of it.
The resource cleanup does not look right. You've included calls to
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
and StopBackup(), but what happens if there is an error or even a
clean shutdown of the connection in between? I think that there needs
to be some change here to ensure that a walsender will always call
base_backup_cleanup() when it exits; I think that'd probably remove
the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
we have already. This might also be something that could be done as a
separate, prepatory refactoring patch.
You're right. I didn't handle this case properly. I will removed
PG_ENSURE_ERROR_CLEANUP
calls and replace it with before_shmem_exit handler. This way
whenever backend process exits,
base_backup_cleanup will be called:
- If it exists before calling the do_pg_stop_backup, base_backup_cleanup
will take care of cleanup.
- otherwise in case of a clean shutdown (after calling do_pg_stop_backup)
then base_backup_cleanup
will simply return without doing anything.
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Wed, Oct 30, 2019 at 7:16 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:I have updated the patch to include the changes suggested by Jeevan.
This patch also implements the thread workers instead of
processes and fetches a single file at a time. The tar format has been
disabled for first version of parallel backup.
Looking at 0001-0003:
It's not clear to me what the purpose of the start WAL location is
supposed to be. As far as I can see, SendBackupFiles() stores it in a
variable which is then used for exactly nothing, and nothing else uses
it. It seems like that would be part of a potential incremental
backup feature, but I don't see what it's got to do with parallel full
backup.'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WAL location.The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.This is to calculate the basepathlen. We need to exclude the tablespace
location (or
base path) from the filename before it is sent to the client with sendFile
call. I added
this option primarily to avoid performing string manipulation on filename
to extract the
tablespace location and then calculate the basepathlen.Alternatively we can do it by extracting the base path from the received
filename. What
do you suggest?STORE_BACKUPFILE() seems like maybe it should be a function rather
than a macro, and also probably be renamed, because it doesn't store
files and the argument's not necessarily a file.Sure.
SendBackupManifest() does not send a backup manifest in the sense
contemplated by the email thread on that subject. It sends a file
list. That seems like the right idea - IMHO, anyway - but you need to
do a thorough renaming.I'm considering the following command names:
START_BACKUP
- Starts the backup processSEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
- Sends the list of all files (along with file information such as
filename, file type (directory/file/link),
file size and file mtime for each file) to be backed up.SEND_BACKUP_FILES
- Sends one or more files to the client.STOP_BACKUP
- Stops the backup process.I'll update the function names accordingly after your confirmation. Of
course, suggestions for
better names are welcome.I think it would be fine to decide that this facility won't support
exclusive-mode backup.Sure. Will drop this patch.
I don't think much of having both sendDir() and sendDir_(). The latter
name is inconsistent with any naming convention we have, and there
seems to be no reason not to just add an argument to sendDir() and
change the callers.I think we should rename - perhaps as a preparatory patch - the
sizeonly flag to dryrun, or something like that.Sure, will take care of it.
The resource cleanup does not look right. You've included calls to
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
and StopBackup(), but what happens if there is an error or even a
clean shutdown of the connection in between? I think that there needsto be some change here to ensure that a walsender will always call
base_backup_cleanup() when it exits; I think that'd probably remove
the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
we have already. This might also be something that could be done as a
separate, prepatory refactoring patch.You're right. I didn't handle this case properly. I will removed
PG_ENSURE_ERROR_CLEANUP
calls and replace it with before_shmem_exit handler. This way
whenever backend process exits,
base_backup_cleanup will be called:
- If it exists before calling the do_pg_stop_backup, base_backup_cleanup
will take care of cleanup.
- otherwise in case of a clean shutdown (after calling do_pg_stop_backup)
then base_backup_cleanup
will simply return without doing anything.
The updated patches are attached.
Thanks,
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v4.patchapplication/octet-stream; name=0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v4.patchDownload
From bc21821fde87bbcca31dfecf3fd482e38967a586 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 16:45:28 +0500
Subject: [PATCH 2/6] Rename sizeonly to dryrun for few functions in
basebackup.
---
src/backend/replication/basebackup.c | 44 ++++++++++++++--------------
src/include/replication/basebackup.h | 2 +-
2 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c640748c35..9442486b66 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,15 +55,15 @@ typedef struct
} basebackup_options;
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(const char *path, int basepathlen, bool dryrun,
List *tablespaces, bool sendtblspclinks);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+ struct stat *statbuf, bool dryrun);
static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+ bool dryrun);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
@@ -960,13 +960,13 @@ sendFileWithContent(const char *filename, const char *content)
/*
* Include the tablespace directory pointed to by 'path' in the output tar
- * stream. If 'sizeonly' is true, we just calculate a total length and return
+ * stream. If 'dryrun' is true, we just calculate a total length and return
* it, without actually sending anything.
*
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool dryrun)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -996,17 +996,17 @@ sendTablespace(char *path, bool sizeonly)
}
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
- sizeonly);
+ dryrun);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
return size;
}
/*
* Include all files from the given directory in the output tar stream. If
- * 'sizeonly' is true, we just calculate a total length and return it, without
+ * 'dryrun' is true, we just calculate a total length and return it, without
* actually sending anything.
*
* Omit any directory in the tablespaces list, to avoid backing up
@@ -1017,7 +1017,7 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
bool sendtblspclinks)
{
DIR *dir;
@@ -1172,7 +1172,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
}
@@ -1188,7 +1188,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -1200,14 +1200,14 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ dryrun);
continue; /* don't recurse into pg_wal */
}
@@ -1239,7 +1239,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
linkpath[rllen] = '\0';
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
- &statbuf, sizeonly);
+ &statbuf, dryrun);
#else
/*
@@ -1263,7 +1263,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* permissions right.
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ dryrun);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1294,17 +1294,17 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ if (!dryrun)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
- if (sent || sizeonly)
+ if (sent || dryrun)
{
/* Add size, rounded up to 512byte block */
size += ((statbuf.st_size + 511) & ~511);
@@ -1613,12 +1613,12 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
static int64
_tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly)
+ struct stat *statbuf, bool dryrun)
{
char h[512];
enum tarError rc;
- if (!sizeonly)
+ if (!dryrun)
{
rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
@@ -1655,7 +1655,7 @@ _tarWriteHeader(const char *filename, const char *linktarget,
*/
static int64
_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+ bool dryrun)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1665,7 +1665,7 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
+ return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, dryrun);
}
/*
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..b55917b9b6 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool dryrun);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122)
0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup_v4.patchapplication/octet-stream; name=0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup_v4.patchDownload
From 3f37a9014ecc43679a1bffc1d3f54d2a266512c1 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 10:21:38 +0500
Subject: [PATCH 1/6] remove PG_ENSURE_ERROR_CLEANUP macro from basebackup.
register base_backup_cleanup with before_shmem_exit handler. This will make
sure that call is always made when wal sender exits.
---
src/backend/replication/basebackup.c | 182 +++++++++++++--------------
1 file changed, 90 insertions(+), 92 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d0f210de8c..c640748c35 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -244,6 +244,8 @@ perform_base_backup(basebackup_options *opt)
StringInfo tblspc_map_file = NULL;
int datadirpathlen;
List *tablespaces = NIL;
+ ListCell *lc;
+ tablespaceinfo *ti;
datadirpathlen = strlen(DataDir);
@@ -262,121 +264,117 @@ perform_base_backup(basebackup_options *opt)
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
* us to abort the backup so we don't "leak" a backup counter. For this
- * reason, *all* functionality between do_pg_start_backup() and the end of
- * do_pg_stop_backup() should be inside the error cleanup block!
+ * reason, register base_backup_cleanup with before_shmem_exit handler. This
+ * will make sure that call is always made when process exits. In success,
+ * do_pg_stop_backup will have taken the system out of backup mode and this
+ * callback will have no effect, Otherwise the required cleanup will be done
+ * in any case.
*/
+ before_shmem_exit(base_backup_cleanup, (Datum) 0);
- PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
- {
- ListCell *lc;
- tablespaceinfo *ti;
-
- SendXlogRecPtrResult(startptr, starttli);
+ SendXlogRecPtrResult(startptr, starttli);
- /*
- * Calculate the relative path of temporary statistics directory in
- * order to skip the files which are located in that directory later.
- */
- if (is_absolute_path(pgstat_stat_directory) &&
- strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
- else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory);
- else
- statrelpath = pgstat_stat_directory;
-
- /* Add a node for the base directory at the end */
- ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
- tablespaces = lappend(tablespaces, ti);
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ tablespaces = lappend(tablespaces, ti);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
+ /* Setup and activate network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ {
+ throttling_sample =
+ (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
- /* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
+ /* Enable throttling. */
+ throttling_counter = 0;
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
- if (ti->path == NULL)
- {
- struct stat statbuf;
+ /* Send off our tablespaces one by one */
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+ StringInfoData buf;
- /* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data);
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
- /*
- * Send tablespace_map file if required and then the bulk of
- * the files.
- */
- if (tblspc_map_file && opt->sendtblspcmapfile)
- {
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
- }
- else
- sendDir(".", 1, false, tablespaces, true);
+ if (ti->path == NULL)
+ {
+ struct stat statbuf;
- /* ... and pg_control after everything else. */
- if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
- }
- else
- sendTablespace(ti->path, false);
+ /* In the main tar, include the backup_label first... */
+ sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data);
/*
- * If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * Send tablespace_map file if required and then the bulk of
+ * the files.
*/
- if (opt->includewal && ti->path == NULL)
+ if (tblspc_map_file && opt->sendtblspcmapfile)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
+ sendDir(".", 1, false, tablespaces, false);
}
else
- pq_putemptymessage('c'); /* CopyDone */
+ sendDir(".", 1, false, tablespaces, true);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
+ else
+ sendTablespace(ti->path, false);
- endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
+ /*
+ * If we're including WAL, and this is the main data directory we
+ * don't terminate the tar stream here. Instead, we will append
+ * the xlog files below and terminate it then. This is safe since
+ * the main data directory is always sent *last*.
+ */
+ if (opt->includewal && ti->path == NULL)
+ {
+ Assert(lnext(tablespaces, lc) == NULL);
+ }
+ else
+ pq_putemptymessage('c'); /* CopyDone */
}
- PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
if (opt->includewal)
{
--
2.21.0 (Apple Git-122)
0005-pg_basebackup-changes-for-parallel-backup_v4.patchapplication/octet-stream; name=0005-pg_basebackup-changes-for-parallel-backup_v4.patchDownload
From 16b77550d4e4e185b6bb45176301212db0edb09b Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 14 Oct 2019 17:28:58 +0500
Subject: [PATCH 5/6] pg_basebackup changes for parallel backup.
---
src/bin/pg_basebackup/pg_basebackup.c | 710 ++++++++++++++++++++++++--
1 file changed, 672 insertions(+), 38 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a9d162a7da..9dd7c62933 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -19,6 +19,7 @@
#include <sys/wait.h>
#include <signal.h>
#include <time.h>
+#include <pthread.h>
#ifdef HAVE_SYS_SELECT_H
#include <sys/select.h>
#endif
@@ -41,6 +42,7 @@
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
+#include "fe_utils/simple_list.h"
#define ERRCODE_DATA_CORRUPTED "XX001"
@@ -57,6 +59,57 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+
+ int tsIndex; /* index of tsInfo this file belongs to. */
+} BackupFile;
+
+typedef struct
+{
+ Oid tblspcOid;
+ char *tablespace; /* tablespace name or NULL if 'base' tablespace */
+ int numFiles; /* number of files */
+ BackupFile *backupFiles; /* list of files in a tablespace */
+} TablespaceInfo;
+
+typedef struct
+{
+ int tablespacecount;
+ int totalfiles;
+ int numWorkers;
+
+ char xlogstart[64];
+ char *backup_label;
+ char *tablespace_map;
+
+ TablespaceInfo *tsInfo;
+ BackupFile **files; /* list of BackupFile pointers */
+ int fileIndex; /* index of file to be fetched */
+
+ PGconn **workerConns;
+} BackupInfo;
+
+typedef struct
+{
+ BackupInfo *backupInfo;
+ uint64 bytesRead;
+
+ int workerid;
+ pthread_t worker;
+
+ bool terminated;
+} WorkerState;
+
+BackupInfo *backupInfo = NULL;
+WorkerState *workers = NULL;
+
+static pthread_mutex_t fetch_mutex = PTHREAD_MUTEX_INITIALIZER;
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -110,6 +163,9 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -140,9 +196,10 @@ static PQExpBuffer recoveryconfcontents = NULL;
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
+static void workers_progress_report(uint64 totalBytesRead, const char *filename, bool force);
static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
+static int ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void BaseBackup(void);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
@@ -151,6 +208,17 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void ParallelBackupRun(BackupInfo *backupInfo);
+static void StopBackup(BackupInfo *backupInfo);
+static void GetBackupFileList(PGconn *conn, BackupInfo *backupInfo);
+static int GetBackupFile(WorkerState *wstate);
+static BackupFile *getNextFile(BackupInfo *backupInfo);
+static int compareFileSize(const void *a, const void *b);
+static void read_label_tblspcmap(PGconn *conn, char **backup_label, char **tablespace_map);
+static void create_backup_dirs(bool basetablespace, char *tablespace, char *name);
+static void writefile(char *path, char *buf);
+static void *workerRun(void *arg);
+
static void
cleanup_directories_atexit(void)
@@ -202,6 +270,17 @@ cleanup_directories_atexit(void)
static void
disconnect_atexit(void)
{
+ /* close worker connections */
+ if (backupInfo && backupInfo->workerConns != NULL)
+ {
+ int i;
+ for (i = 0; i < numWorkers; i++)
+ {
+ if (backupInfo->workerConns[i] != NULL)
+ PQfinish(backupInfo->workerConns[i]);
+ }
+ }
+
if (conn != NULL)
PQfinish(conn);
}
@@ -349,6 +428,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -695,6 +775,93 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Print a progress report of worker threads. If verbose output
+ * is enabled, also print the current file name.
+ *
+ * Progress report is written at maximum once per second, unless the
+ * force parameter is set to true.
+ */
+static void
+workers_progress_report(uint64 totalBytesRead, const char *filename, bool force)
+{
+ int percent;
+ char totalBytesRead_str[32];
+ char totalsize_str[32];
+ pg_time_t now;
+
+ if (!showprogress)
+ return;
+
+ now = time(NULL);
+ if (now == last_progress_report && !force)
+ return; /* Max once per second */
+
+ last_progress_report = now;
+ percent = totalsize_kb ? (int) ((totalBytesRead / 1024) * 100 / totalsize_kb) : 0;
+
+ /*
+ * Avoid overflowing past 100% or the full size. This may make the total
+ * size number change as we approach the end of the backup (the estimate
+ * will always be wrong if WAL is included), but that's better than having
+ * the done column be bigger than the total.
+ */
+ if (percent > 100)
+ percent = 100;
+ if (totalBytesRead / 1024 > totalsize_kb)
+ totalsize_kb = totalBytesRead / 1024;
+
+ /*
+ * Separate step to keep platform-dependent format code out of
+ * translatable strings. And we only test for INT64_FORMAT availability
+ * in snprintf, not fprintf.
+ */
+ snprintf(totalBytesRead_str, sizeof(totalBytesRead_str), INT64_FORMAT,
+ totalBytesRead / 1024);
+ snprintf(totalsize_str, sizeof(totalsize_str), INT64_FORMAT, totalsize_kb);
+
+#define VERBOSE_FILENAME_LENGTH 35
+
+ if (verbose)
+ {
+ if (!filename)
+
+ /*
+ * No filename given, so clear the status line (used for last
+ * call)
+ */
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied %*s"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent,
+ VERBOSE_FILENAME_LENGTH + 5, "");
+ else
+ {
+ bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied, current file (%s%-*.*s)"),
+ (int) strlen(totalsize_str), totalBytesRead_str, totalsize_str,
+ percent,
+ /* Prefix with "..." if we do leading truncation */
+ truncate ? "..." : "",
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ /* Truncate filename at beginning if it's too long */
+ truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ }
+ }
+ else
+ {
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent);
+ }
+
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\r");
+ else
+ fprintf(stderr, "\n");
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -711,7 +878,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
char totalsize_str[32];
pg_time_t now;
- if (!showprogress)
+ if (!showprogress || numWorkers > 1)
return;
now = time(NULL);
@@ -1381,7 +1548,7 @@ get_tablespace_mapping(const char *dir)
* specified directory. If it's for another tablespace, it will be restored
* in the original or mapped directory.
*/
-static void
+static int
ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
{
char current_path[MAXPGPATH];
@@ -1392,6 +1559,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
bool basetablespace;
char *copybuf = NULL;
FILE *file = NULL;
+ int readBytes = 0;
basetablespace = PQgetisnull(res, rownum, 0);
if (basetablespace)
@@ -1455,7 +1623,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("invalid tar block header size: %d", r);
exit(1);
}
- totaldone += 512;
+ readBytes += 512;
current_len_left = read_tar_number(©buf[124], 12);
@@ -1486,21 +1654,14 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* Directory
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
+
+ /*
+ * In parallel mode, we create directories before fetching
+ * files so its Ok if a directory already exist.
+ */
if (mkdir(filename, pg_dir_create_mode) != 0)
{
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal
- * receiver process. Also, when the WAL directory
- * location was specified, pg_wal (or pg_xlog) has
- * already been created as a symbolic link before
- * starting the actual backup. So just ignore creation
- * failures on related directories.
- */
- if (!((pg_str_endswith(filename, "/pg_wal") ||
- pg_str_endswith(filename, "/pg_xlog") ||
- pg_str_endswith(filename, "/archive_status")) &&
- errno == EEXIST))
+ if (errno != EEXIST)
{
pg_log_error("could not create directory \"%s\": %m",
filename);
@@ -1585,7 +1746,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
fclose(file);
file = NULL;
- totaldone += r;
+ readBytes += r;
continue;
}
@@ -1594,7 +1755,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("could not write to file \"%s\": %m", filename);
exit(1);
}
- totaldone += r;
+ readBytes += r;
+ totaldone = readBytes;
progress_report(rownum, filename, false);
current_len_left -= r;
@@ -1622,13 +1784,11 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
if (copybuf != NULL)
PQfreemem(copybuf);
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
/*
* No data is synced here, everything is done for all tablespaces at the
* end.
*/
+ return readBytes;
}
@@ -1716,7 +1876,8 @@ BaseBackup(void)
}
basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ psprintf("%s LABEL '%s' %s %s %s %s %s %s %s",
+ (numWorkers > 1) ? "START_BACKUP" : "BASE_BACKUP",
escaped_label,
showprogress ? "PROGRESS" : "",
includewal == FETCH_WAL ? "WAL" : "",
@@ -1774,7 +1935,7 @@ BaseBackup(void)
/*
* Get the header
*/
- res = PQgetResult(conn);
+ tablespacehdr = res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not get backup header: %s",
@@ -1830,24 +1991,74 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
+ if (numWorkers > 1)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ int j = 0,
+ k = 0;
- if (showprogress)
+ backupInfo = palloc0(sizeof(BackupInfo));
+ backupInfo->workerConns = (PGconn **) palloc0(sizeof(PGconn *) * numWorkers);
+ backupInfo->tablespacecount = tablespacecount;
+ backupInfo->numWorkers = numWorkers;
+ strlcpy(backupInfo->xlogstart, xlogstart, sizeof(backupInfo->xlogstart));
+
+ read_label_tblspcmap(conn, &backupInfo->backup_label, &backupInfo->tablespace_map);
+
+ /* retrieve backup file list from the server. **/
+ GetBackupFileList(conn, backupInfo);
+
+ /*
+ * add backup_label in backup, (for tar format, ReceiveTarFile() will
+ * take care of it).
+ */
+ if (format == 'p')
+ writefile("backup_label", backupInfo->backup_label);
+
+ /*
+ * Flatten the file list to avoid unnecessary locks and enable the sequential
+ * access to file list. (Creating an array of BackupFile structre pointers).
+ */
+ backupInfo->files =
+ (BackupFile **) palloc0(sizeof(BackupFile *) * backupInfo->totalfiles);
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[i];
+
+ for (j = 0; j < curTsInfo->numFiles; j++)
+ {
+ backupInfo->files[k] = &curTsInfo->backupFiles[j];
+ k++;
+ }
+ }
+
+ ParallelBackupRun(backupInfo);
+ StopBackup(backupInfo);
+ }
+ else
{
- progress_report(PQntuples(res), NULL, true);
- if (isatty(fileno(stderr)))
- fprintf(stderr, "\n"); /* Need to move to next line */
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
+
+ if (showprogress)
+ {
+ progress_report(PQntuples(tablespacehdr), NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
}
+ /* Write recovery contents */
+ if (format == 'p' && writerecoveryconf)
+ WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
+
PQclear(res);
/*
@@ -2043,6 +2254,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2070,7 +2282,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2211,6 +2423,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2325,6 +2540,22 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (format != 'p' && numWorkers > 1)
+ {
+ pg_log_error("parallel jobs are only supported with 'plain' format");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2397,3 +2628,406 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+/*
+ * Thread worker
+ */
+static void *
+workerRun(void *arg)
+{
+ WorkerState *wstate = (WorkerState *) arg;
+
+ GetBackupFile(wstate);
+
+ wstate->terminated = true;
+ return NULL;
+}
+
+/*
+ * Runs the worker threads and updates progress until all workers have
+ * terminated/completed.
+ */
+static void
+ParallelBackupRun(BackupInfo *backupInfo)
+{
+ int status,
+ i;
+ bool threadsActive = true;
+ uint64 totalBytes = 0;
+
+ workers = (WorkerState *) palloc0(sizeof(WorkerState) * numWorkers);
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ worker->backupInfo = backupInfo;
+ worker->workerid = i;
+ worker->bytesRead = 0;
+ worker->terminated = false;
+
+ backupInfo->workerConns[i] = GetConnection();
+ status = pthread_create(&worker->worker, NULL, workerRun, worker);
+ if (status != 0)
+ {
+ pg_log_error("failed to create thread: %m");
+ exit(1);
+ }
+
+ if (verbose)
+ pg_log_info("backup worker (%d) created, %d", i, status);
+ }
+
+ /*
+ * This is the main thread for updating progrsss. It waits for workers to
+ * complete and gets updated status during every loop iteration.
+ */
+ while(threadsActive)
+ {
+ char *filename = NULL;
+
+ threadsActive = false;
+ totalBytes = 0;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ totalBytes += worker->bytesRead;
+ threadsActive |= !worker->terminated;
+ }
+
+ if (backupInfo->fileIndex < backupInfo->totalfiles)
+ filename = backupInfo->files[backupInfo->fileIndex]->path;
+
+ workers_progress_report(totalBytes, filename, false);
+ pg_usleep(100000);
+ }
+
+ if (showprogress)
+ {
+ workers_progress_report(totalBytes, NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
+}
+
+/*
+ * Take the system out of backup mode.
+ */
+static void
+StopBackup(BackupInfo *backupInfo)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP LABEL '%s' %s %s",
+ backupInfo->backup_label,
+ includewal == FETCH_WAL ? "WAL" : "",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* receive pg_control and wal files */
+ ReceiveAndUnpackTarFile(conn, res, tablespacecount);
+ PQclear(res);
+}
+
+/*
+ * Retrive backup file list from the server and populate TablespaceInfo struct
+ * to keep track of tablespaces and its files.
+ */
+static void
+GetBackupFileList(PGconn *conn, BackupInfo *backupInfo)
+{
+ int i;
+ PGresult *res = NULL;
+ char *basebkp;
+
+ backupInfo->tsInfo = palloc0(sizeof(TablespaceInfo) * backupInfo->tablespacecount);
+ TablespaceInfo *tsInfo = backupInfo->tsInfo;
+
+ /*
+ * Get list of files.
+ */
+ basebkp = psprintf("SEND_BACKUP_FILELIST %s",
+ format == 't' ? "TABLESPACE_MAP" : "");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "SEND_BACKUP_FILELIST", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /*
+ * The list of files is grouped by tablespaces, and we want to fetch them
+ * in the same order.
+ */
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ bool basetablespace;
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get backup header: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ tsInfo[i].tblspcOid = atol(PQgetvalue(tablespacehdr, i, 0));
+ tsInfo[i].tablespace = PQgetvalue(tablespacehdr, i, 1);
+ tsInfo[i].numFiles = PQntuples(res);
+ tsInfo[i].backupFiles = palloc0(sizeof(BackupFile) * tsInfo[i].numFiles);
+
+ /* keep count of all files in backup */
+ backupInfo->totalfiles += tsInfo[i].numFiles;
+
+ for (int j = 0; j < tsInfo[i].numFiles; j++)
+ {
+ char *path = PQgetvalue(res, j, 0);
+ char type = PQgetvalue(res, j, 1)[0];
+ int32 size = atol(PQgetvalue(res, j, 2));
+ time_t mtime = atol(PQgetvalue(res, j, 3));
+
+ /*
+ * In 'plain' format, create backup directories first.
+ */
+ if (format == 'p' && type == 'd')
+ create_backup_dirs(basetablespace, tsInfo[i].tablespace, path);
+
+ strlcpy(tsInfo[i].backupFiles[j].path, path, MAXPGPATH);
+ tsInfo[i].backupFiles[j].type = type;
+ tsInfo[i].backupFiles[j].size = size;
+ tsInfo[i].backupFiles[j].mtime = mtime;
+ tsInfo[i].backupFiles[j].tsIndex = i;
+ }
+
+ /* sort files in descending order, based on size */
+ qsort(tsInfo[i].backupFiles, tsInfo[i].numFiles,
+ sizeof(BackupFile), &compareFileSize);
+ PQclear(res);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s", PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+}
+
+/*
+ * Retrive and write backup file from the server. The file list is provided by
+ * worker state. It pulls a single file from this list and writes it to the
+ * backup directory.
+ */
+static int
+GetBackupFile(WorkerState *wstate)
+{
+ PGresult *res = NULL;
+ PGconn *worker_conn = NULL;
+ BackupFile *fetchFile = NULL;
+ BackupInfo *backupInfo = NULL;
+
+ backupInfo = wstate->backupInfo;
+ worker_conn = backupInfo->workerConns[wstate->workerid];
+ while ((fetchFile = getNextFile(backupInfo)) != NULL)
+ {
+ PQExpBuffer buf = createPQExpBuffer();
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[fetchFile->tsIndex];
+
+
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
+ appendPQExpBuffer(buf, "SEND_BACKUP_FILES ( '%s' )", fetchFile->path);
+
+ /* add options */
+ appendPQExpBuffer(buf, " TABLESPACE_PATH '%s' START_WAL_LOCATION '%s' %s %s",
+ curTsInfo->tablespace,
+ backupInfo->xlogstart,
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (maxrate > 0)
+ appendPQExpBuffer(buf, " MAX_RATE %u", maxrate);
+
+ if (!worker_conn)
+ return 1;
+
+ if (PQsendQuery(worker_conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(worker_conn));
+ return 1;
+ }
+
+ destroyPQExpBuffer(buf);
+
+ /* process file contents, also count bytesRead for progress */
+ wstate->bytesRead +=
+ ReceiveAndUnpackTarFile(worker_conn, tablespacehdr, fetchFile->tsIndex);
+
+ res = PQgetResult(worker_conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data stream: %s",
+ PQerrorMessage(worker_conn));
+ exit(1);
+ }
+
+ res = PQgetResult(worker_conn);
+ }
+
+ PQclear(res);
+ return 0;
+}
+
+/*
+ * Increment fileIndex and store it in a local variable so that even a
+ * context switch does not affect the file index value and we don't accidentally
+ * increment the value twice and therefore skip some files.
+ */
+static BackupFile*
+getNextFile(BackupInfo *backupInfo)
+{
+ int fileIndex = 0;
+
+ pthread_mutex_lock(&fetch_mutex);
+ fileIndex = backupInfo->fileIndex++;
+ pthread_mutex_unlock(&fetch_mutex);
+
+ if (fileIndex >= backupInfo->totalfiles)
+ return NULL;
+
+ return backupInfo->files[fileIndex];
+}
+
+/* qsort comparator for BackupFile (sort descending order) */
+static int
+compareFileSize(const void *a, const void *b)
+{
+ const BackupFile *v1 = (BackupFile *) a;
+ const BackupFile *v2 = (BackupFile *) b;
+
+ if (v1->size > v2->size)
+ return -1;
+ if (v1->size < v2->size)
+ return 1;
+
+ return 0;
+}
+
+static void
+read_label_tblspcmap(PGconn *conn, char **backuplabel, char **tblspc_map)
+{
+ PGresult *res = NULL;
+
+ Assert(backuplabel != NULL);
+ Assert(tblspc_map != NULL);
+
+ /*
+ * Get Backup label and tablespace map data.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ *backuplabel = PQgetvalue(res, 0, 0); /* backup_label */
+ if (!PQgetisnull(res, 0, 1))
+ *tblspc_map = PQgetvalue(res, 0, 1); /* tablespace_map */
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ PQclear(res);
+}
+
+/*
+ * Create backup direcotries while taking care of tablespace path. If tablespace
+ * mapping (with -T) is given then the directory will be created on the mapped
+ * path.
+ */
+static void
+create_backup_dirs(bool basetablespace, char *tablespace, char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ Assert(name != NULL);
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, name);
+ else
+ {
+ Assert(tablespace != NULL);
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tablespace), (name + strlen(tablespace) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+}
+
+/*
+ * General function for writing to a file; creates one if it doesn't exist
+ */
+static void
+writefile(char *path, char *buf)
+{
+ FILE *f;
+ char pathbuf[MAXPGPATH];
+
+ snprintf(pathbuf, MAXPGPATH, "%s/%s", basedir, path);
+ f = fopen(pathbuf, "w");
+ if (f == NULL)
+ {
+ pg_log_error("could not open file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fwrite(buf, strlen(buf), 1, f) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fclose(f))
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+}
--
2.21.0 (Apple Git-122)
0004-backend-changes-for-parallel-backup_v4.patchapplication/octet-stream; name=0004-backend-changes-for-parallel-backup_v4.patchDownload
From 42818d0ebcdfa119e27e95ed6428cc7026a38143 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 22:59:28 +0500
Subject: [PATCH 4/6] backend changes for parallel backup
---
src/backend/access/transam/xlog.c | 2 +-
src/backend/replication/basebackup.c | 526 ++++++++++++++++++++++++-
src/backend/replication/repl_gram.y | 72 ++++
src/backend/replication/repl_scanner.l | 7 +
src/include/nodes/replnodes.h | 10 +
src/include/replication/basebackup.h | 2 +-
6 files changed, 605 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa7d82a045..842b317c8d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12265,7 +12265,7 @@ collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b8e3daf711..a0a6e816b0 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -41,6 +41,7 @@
#include "utils/ps_status.h"
#include "utils/relcache.h"
#include "utils/timestamp.h"
+#include "utils/pg_lsn.h"
typedef struct
@@ -52,11 +53,21 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ const char *tablespace_path;
+ XLogRecPtr wal_location;
} basebackup_options;
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
+
static int64 sendDir(const char *path, int basepathlen, bool dryrun,
- List *tablespaces, bool sendtblspclinks);
+ List *tablespaces, bool sendtblspclinks, List **filelist);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -76,6 +87,13 @@ static void throttle(size_t increment);
static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void StartBackup(basebackup_options *opt);
+static void StopBackup(basebackup_options *opt);
+static void SendBackupFileList(basebackup_options *opt);
+static void SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok);
+static void addToBackupFileList(List **filelist, char *path, char type, int32 size,
+ time_t mtime);
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -290,7 +308,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
tablespaces = lappend(tablespaces, ti);
/* Send tablespace header */
@@ -324,10 +342,10 @@ perform_base_backup(basebackup_options *opt)
if (tblspc_map_file && opt->sendtblspcmapfile)
{
sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
+ sendDir(".", 1, false, tablespaces, false, NULL);
}
else
- sendDir(".", 1, false, tablespaces, true);
+ sendDir(".", 1, false, tablespaces, true, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -338,7 +356,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -410,6 +428,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_tablespace_path = false;
+ bool o_wal_location = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -498,6 +518,29 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "tablespace_path") == 0)
+ {
+ if (o_tablespace_path)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->tablespace_path = strVal(defel->arg);
+ o_tablespace_path = true;
+ }
+ else if (strcmp(defel->defname, "start_wal_location") == 0)
+ {
+ bool have_error = false;
+ char *wal_location;
+
+ if (o_wal_location)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ wal_location = strVal(defel->arg);
+ opt->wal_location = pg_lsn_in_internal(wal_location, &have_error);
+ o_wal_location = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
@@ -532,7 +575,29 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ StartBackup(&opt);
+ break;
+ case SEND_BACKUP_FILELIST:
+ SendBackupFileList(&opt);
+ break;
+ case SEND_BACKUP_FILES:
+ SendBackupFiles(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ StopBackup(&opt);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -675,6 +740,61 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_puttextmessage('C', "SELECT");
}
+/*
+ * Send a single resultset containing backup label and tablespace map
+ */
+static void
+SendStartBackupResult(StringInfo labelfile, StringInfo tblspc_map_file)
+{
+ StringInfoData buf;
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "label");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tablespacemap");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = labelfile->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, labelfile->data, len);
+
+ if (tblspc_map_file)
+ {
+ len = tblspc_map_file->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, tblspc_map_file->data, len);
+ }
+ else
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ }
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
/*
* Inject a file with given name and content in the output tar stream.
*/
@@ -726,7 +846,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool dryrun)
+sendTablespace(char *path, bool dryrun, List **filelist)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -755,11 +875,11 @@ sendTablespace(char *path, bool dryrun)
return 0;
}
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
dryrun);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true, filelist);
return size;
}
@@ -778,7 +898,7 @@ sendTablespace(char *path, bool dryrun)
*/
static int64
sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
- bool sendtblspclinks)
+ bool sendtblspclinks, List **filelist)
{
DIR *dir;
struct dirent *de;
@@ -932,6 +1052,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
@@ -948,6 +1070,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -969,6 +1093,10 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
dryrun);
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
+ addToBackupFileList(filelist, "./pg_wal/archive_status", 'd', -1,
+ statbuf.st_mtime);
+
continue; /* don't recurse into pg_wal */
}
@@ -998,6 +1126,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ addToBackupFileList(filelist, pathbuf, 'l', statbuf.st_size, statbuf.st_mtime);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, dryrun);
#else
@@ -1024,6 +1153,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
dryrun);
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1054,13 +1184,15 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks, filelist);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!dryrun)
+ addToBackupFileList(filelist, pathbuf, 'f', statbuf.st_size, statbuf.st_mtime);
+
+ if (!dryrun && filelist == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
@@ -1765,3 +1897,373 @@ setup_throttle(int maxrate)
throttling_counter = -1;
}
}
+
+/*
+ * StartBackup - prepare to start an online backup.
+ *
+ * This function calls do_pg_start_backup() and sends back starting checkpoint,
+ * available tablespaces, content of backup_label and tablespace_map files.
+ */
+static void
+StartBackup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo labelfile;
+ StringInfo tblspc_map_file = NULL;
+ int datadirpathlen;
+ List *tablespaces = NIL;
+ tablespaceinfo *ti;
+
+ datadirpathlen = strlen(DataDir);
+
+ backup_started_in_recovery = RecoveryInProgress();
+
+ labelfile = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+
+ total_checksum_failures = 0;
+
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ labelfile, &tablespaces,
+ tblspc_map_file,
+ opt->progress, opt->sendtblspcmapfile);
+
+ /*
+ * Once do_pg_start_backup has been called, ensure that any failure causes
+ * us to abort the backup so we don't "leak" a backup counter. For this
+ * reason, register base_backup_cleanup with before_shmem_exit handler. This
+ * will make sure that call is always made when process exits. In success,
+ * do_pg_stop_backup will have taken the system out of backup mode and this
+ * callback will have no effect, Otherwise the required cleanup will be done
+ * in any case.
+ */
+ before_shmem_exit(base_backup_cleanup, (Datum) 0);
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ if ((tblspc_map_file && tblspc_map_file->len <= 0) ||
+ !opt->sendtblspcmapfile)
+ tblspc_map_file = NULL;
+
+ /* send backup_label and tablespace_map to frontend */
+ SendStartBackupResult(labelfile, tblspc_map_file);
+}
+
+/*
+ * StopBackup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionaly WAL segments and ending WAL location.
+ */
+static void
+StopBackup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+ struct stat statbuf;
+ StringInfoData buf;
+ char *labelfile = NULL;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
+
+ /* stop backup */
+ labelfile = (char *) opt->label;
+ endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
+
+ if (opt->includewal)
+ include_wal_files(endptr);
+
+ pq_putemptymessage('c'); /* CopyDone */
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * SendBackupFileList() - sends a list of filenames to frontend
+ *
+ * The function collects a list of filenames, necessary for a complete backup and
+ * sends this list to the client.
+ */
+static void
+SendBackupFileList(basebackup_options *opt)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ List *tablespaces = NIL;
+ StringInfo tblspc_map_file = NULL;
+
+ tblspc_map_file = makeStringInfo();
+ collectTablespaces(&tablespaces, tblspc_map_file, false, false);
+
+ /* Add a node for the base directory at the end */
+ tablespaceinfo *ti = palloc0(sizeof(tablespaceinfo));
+ tablespaces = lappend(tablespaces, ti);
+
+ foreach(lc, tablespaces)
+ {
+ List *filelist = NULL;
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+
+ if (ti->path == NULL)
+ sendDir(".", 1, true, NIL, !opt->sendtblspcmapfile, &filelist);
+ else
+ sendTablespace(ti->path, true, &filelist);
+
+ /* Construct and send the list of filenames */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 4); /* n field */
+
+ /* First field - file name */
+ pq_sendstring(&buf, "filename");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "type");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, CHAROID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - mtime */
+ pq_sendstring(&buf, "mtime");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, filelist)
+ {
+ BackupFile *backupFile = (BackupFile *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 4); /* number of columns */
+
+ /* send file name */
+ len = strlen(backupFile->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, backupFile->path, len);
+
+ /* send type */
+ pq_sendint32(&buf, 1);
+ pq_sendbyte(&buf, backupFile->type);
+
+ /* send size */
+ send_int8_string(&buf, backupFile->size);
+
+ /* send mtime */
+ send_int8_string(&buf, backupFile->mtime);
+
+ pq_endmessage(&buf);
+ }
+
+ pfree(filelist);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * SendBackupFiles() - sends the actual files to the caller
+ *
+ * The function sends out the given file(s) over to the caller using the COPY
+ * protocol.
+ */
+static void
+SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ bool basetablespace = true;
+ int basepathlen = 1;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ if (is_absolute_path(opt->tablespace_path))
+ {
+ basepathlen = strlen(opt->tablespace_path);
+ basetablespace = false;
+ }
+
+ /* set backup start location. */
+ startptr = opt->wal_location;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /* Allow symbolic links in pg_tblspc only */
+ if (strstr(pathbuf, "./pg_tblspc") != NULL &&
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ char linkpath[MAXPGPATH];
+ int rllen;
+
+ rllen = readlink(pathbuf, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read symbolic link \"%s\": %m",
+ pathbuf)));
+ if (rllen >= sizeof(linkpath))
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("symbolic link \"%s\" target is too long",
+ pathbuf)));
+ linkpath[rllen] = '\0';
+
+ _tarWriteHeader(pathbuf, linkpath, &statbuf, false);
+ }
+ else if (S_ISDIR(statbuf.st_mode))
+ {
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, InvalidOid);
+ }
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
+
+/*
+ * Construct a BackupFile entry and add to the list.
+ */
+static void
+addToBackupFileList(List **filelist, char *path, char type, int32 size,
+ time_t mtime)
+{
+ BackupFile *backupFile;
+
+ if (filelist)
+ {
+ backupFile = (BackupFile *) palloc0(sizeof(BackupFile));
+ strlcpy(backupFile->path, path, sizeof(backupFile->path));
+ backupFile->type = type;
+ backupFile->size = size;
+ backupFile->mtime = mtime;
+
+ *filelist = lappend(*filelist, backupFile);
+ }
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..5619837ebe 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,6 +87,12 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_SEND_BACKUP_FILELIST
+%token K_SEND_BACKUP_FILES
+%token K_STOP_BACKUP
+%token K_START_WAL_LOCATION
+%token K_TABLESPACE_PATH
%type <node> command
%type <node> base_backup start_replication start_logical_replication
@@ -102,6 +108,8 @@ static SQLCmd *make_sqlcmd(void);
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
+%type <list> backup_files backup_files_list
+%type <node> backup_file
%%
@@ -162,6 +170,36 @@ base_backup:
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_START_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_BACKUP_FILELIST base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = SEND_BACKUP_FILELIST;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_BACKUP_FILES backup_files base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_BACKUP_FILES;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
+ }
+ | K_STOP_BACKUP base_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
$$ = (Node *) cmd;
}
;
@@ -214,6 +252,40 @@ base_backup_opt:
$$ = makeDefElem("noverify_checksums",
(Node *)makeInteger(true), -1);
}
+ | K_START_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1);
+ }
+ | K_TABLESPACE_PATH SCONST
+ {
+ $$ = makeDefElem("tablespace_path",
+ (Node *)makeString($2), -1);
+ }
+ ;
+
+backup_files:
+ '(' backup_files_list ')'
+ {
+ $$ = $2;
+ }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ backup_file
+ {
+ $$ = list_make1($1);
+ }
+ | backup_files_list ',' backup_file
+ {
+ $$ = lappend($1, $3);
+ }
+ ;
+
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
;
create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..c57ff02d39 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,13 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+SEND_BACKUP_FILELIST { return K_SEND_BACKUP_FILELIST; }
+SEND_BACKUP_FILES { return K_SEND_BACKUP_FILES; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+START_WAL_LOCATION { return K_START_WAL_LOCATION; }
+TABLESPACE_PATH { return K_TABLESPACE_PATH; }
+
"," { return ','; }
";" { return ';'; }
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 1e3ed4e19f..3685f260b5 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,14 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ SEND_BACKUP_FILELIST,
+ SEND_BACKUP_FILES,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +50,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index b55917b9b6..5202e4160b 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool dryrun);
+extern int64 sendTablespace(char *path, bool dryrun, List **filelist);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122)
0003-Refactor-some-basebackup-code-to-increase-reusabilit_v4.patchapplication/octet-stream; name=0003-Refactor-some-basebackup-code-to-increase-reusabilit_v4.patchDownload
From 21866fd73f852c4064c3e588f62964fe1bd52440 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 9 Oct 2019 12:39:41 +0500
Subject: [PATCH 3/6] Refactor some basebackup code to increase reusability, in
anticipation of adding parallel backup
---
src/backend/access/transam/xlog.c | 192 +++++-----
src/backend/replication/basebackup.c | 512 ++++++++++++++-------------
src/include/access/xlog.h | 2 +
3 files changed, 371 insertions(+), 335 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2e3cc51006..aa7d82a045 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10286,10 +10286,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
{
bool gotUniqueStartpoint = false;
- DIR *tblspcdir;
- struct dirent *de;
- tablespaceinfo *ti;
- int datadirpathlen;
/*
* Force an XLOG file switch before the checkpoint, to ensure that the
@@ -10415,93 +10411,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
if (exclusive)
tblspcmapfile = makeStringInfo();
- datadirpathlen = strlen(DataDir);
-
- /* Collect information about all tablespaces */
- tblspcdir = AllocateDir("pg_tblspc");
- while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
- {
- char fullpath[MAXPGPATH + 10];
- char linkpath[MAXPGPATH];
- char *relpath = NULL;
- int rllen;
- StringInfoData buflinkpath;
- char *s = linkpath;
-
- /* Skip special stuff */
- if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
- continue;
-
- snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
-
-#if defined(HAVE_READLINK) || defined(WIN32)
- rllen = readlink(fullpath, linkpath, sizeof(linkpath));
- if (rllen < 0)
- {
- ereport(WARNING,
- (errmsg("could not read symbolic link \"%s\": %m",
- fullpath)));
- continue;
- }
- else if (rllen >= sizeof(linkpath))
- {
- ereport(WARNING,
- (errmsg("symbolic link \"%s\" target is too long",
- fullpath)));
- continue;
- }
- linkpath[rllen] = '\0';
-
- /*
- * Add the escape character '\\' before newline in a string to
- * ensure that we can distinguish between the newline in the
- * tablespace path and end of line while reading tablespace_map
- * file during archive recovery.
- */
- initStringInfo(&buflinkpath);
-
- while (*s)
- {
- if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
- appendStringInfoChar(&buflinkpath, '\\');
- appendStringInfoChar(&buflinkpath, *s++);
- }
-
- /*
- * Relpath holds the relative path of the tablespace directory
- * when it's located within PGDATA, or NULL if it's located
- * elsewhere.
- */
- if (rllen > datadirpathlen &&
- strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
- IS_DIR_SEP(linkpath[datadirpathlen]))
- relpath = linkpath + datadirpathlen + 1;
-
- ti = palloc(sizeof(tablespaceinfo));
- ti->oid = pstrdup(de->d_name);
- ti->path = pstrdup(buflinkpath.data);
- ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
-
- if (tablespaces)
- *tablespaces = lappend(*tablespaces, ti);
-
- appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
-
- pfree(buflinkpath.data);
-#else
-
- /*
- * If the platform does not have symbolic links, it should not be
- * possible to have tablespaces - clearly somebody else created
- * them. Warn about it and ignore.
- */
- ereport(WARNING,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("tablespaces are not supported on this platform")));
-#endif
- }
- FreeDir(tblspcdir);
+ collectTablespaces(tablespaces, tblspcmapfile, infotbssize, needtblspcmapfile);
/*
* Construct backup label file
@@ -12277,3 +12187,103 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Collect information about all tablespaces.
+ */
+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile)
+{
+ DIR *tblspcdir;
+ struct dirent *de;
+ tablespaceinfo *ti;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ tblspcdir = AllocateDir("pg_tblspc");
+ while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+ {
+ char fullpath[MAXPGPATH + 10];
+ char linkpath[MAXPGPATH];
+ char *relpath = NULL;
+ int rllen;
+ StringInfoData buflinkpath;
+ char *s = linkpath;
+
+ /* Skip special stuff */
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+
+#if defined(HAVE_READLINK) || defined(WIN32)
+ rllen = readlink(fullpath, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not read symbolic link \"%s\": %m",
+ fullpath)));
+ continue;
+ }
+ else if (rllen >= sizeof(linkpath))
+ {
+ ereport(WARNING,
+ (errmsg("symbolic link \"%s\" target is too long",
+ fullpath)));
+ continue;
+ }
+ linkpath[rllen] = '\0';
+
+ /*
+ * Add the escape character '\\' before newline in a string to
+ * ensure that we can distinguish between the newline in the
+ * tablespace path and end of line while reading tablespace_map
+ * file during archive recovery.
+ */
+ initStringInfo(&buflinkpath);
+
+ while (*s)
+ {
+ if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
+ appendStringInfoChar(&buflinkpath, '\\');
+ appendStringInfoChar(&buflinkpath, *s++);
+ }
+
+ /*
+ * Relpath holds the relative path of the tablespace directory
+ * when it's located within PGDATA, or NULL if it's located
+ * elsewhere.
+ */
+ if (rllen > datadirpathlen &&
+ strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
+ IS_DIR_SEP(linkpath[datadirpathlen]))
+ relpath = linkpath + datadirpathlen + 1;
+
+ ti = palloc(sizeof(tablespaceinfo));
+ ti->oid = pstrdup(de->d_name);
+ ti->path = pstrdup(buflinkpath.data);
+ ti->rpath = relpath ? pstrdup(relpath) : NULL;
+ ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+
+ if (tablespaces)
+ *tablespaces = lappend(*tablespaces, ti);
+
+ appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
+
+ pfree(buflinkpath.data);
+#else
+
+ /*
+ * If the platform does not have symbolic links, it should not be
+ * possible to have tablespaces - clearly somebody else created
+ * them. Warn about it and ignore.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("tablespaces are not supported on this platform")));
+#endif
+ }
+ FreeDir(tblspcdir);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 9442486b66..b8e3daf711 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -68,10 +68,12 @@ static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
static void perform_base_backup(basebackup_options *opt);
+static void include_wal_files(XLogRecPtr endptr);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
+static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
/* Was the backup currently in-progress initiated in recovery mode? */
@@ -294,29 +296,7 @@ perform_base_backup(basebackup_options *opt)
/* Send tablespace header */
SendBackupHeader(tablespaces);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ setup_throttle(opt->maxrate);
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
@@ -382,227 +362,7 @@ perform_base_backup(basebackup_options *opt)
* We've left the last tar file "open", so we can now append the
* required WAL files to it.
*/
- char pathbuf[MAXPGPATH];
- XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
- struct stat statbuf;
- List *historyFileList = NIL;
- List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
- ListCell *lc;
- TimeLineID tli;
-
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
- /* Ok, we have everything we need. Send the WAL files. */
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- FILE *fp;
- char buf[TAR_SEND_SIZE];
- size_t cnt;
- pgoff_t len = 0;
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
-
- fp = AllocateFile(pathbuf, "rb");
- if (fp == NULL)
- {
- int save_errno = errno;
-
- /*
- * Most likely reason for this is that the file was already
- * removed by a checkpoint, so check for that to get a better
- * error message.
- */
- CheckXLogRemoved(segno, tli);
-
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", pathbuf)));
- }
-
- if (fstat(fileno(fp), &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- pathbuf)));
- if (statbuf.st_size != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
-
- while ((cnt = fread(buf, 1,
- Min(sizeof(buf), wal_segment_size - len),
- fp)) > 0)
- {
- CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
-
- len += cnt;
- throttle(cnt);
-
- if (len == wal_segment_size)
- break;
- }
-
- CHECK_FREAD_ERROR(fp, pathbuf);
-
- if (len != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* wal_segment_size is a multiple of 512, so no need for padding */
-
- FreeFile(fp);
-
- /*
- * Mark file as archived, otherwise files can get archived again
- * after promotion of a new node. This is in line with
- * walreceiver.c always doing an XLogArchiveForceDone() after a
- * complete segment.
- */
- StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /*
- * Send timeline history files too. Only the latest timeline history
- * file is required for recovery, and even that only if there happens
- * to be a timeline switch in the first WAL segment that contains the
- * checkpoint record, or if we're taking a base backup from a standby
- * server and the target timeline changes while the backup is taken.
- * But they are small and highly useful for debugging purposes, so
- * better include them all, always.
- */
- foreach(lc, historyFileList)
- {
- char *fname = lfirst(lc);
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
-
- if (lstat(pathbuf, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m", pathbuf)));
-
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
-
- /* unconditionally mark file as archived */
- StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
- }
+ include_wal_files(endptr);
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
@@ -1741,3 +1501,267 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+/*
+ * Append the required WAL files to the backup tar file. It assumes that the
+ * last tar file is "open" and the WALs will be appended to it.
+ */
+static void
+include_wal_files(XLogRecPtr endptr)
+{
+ /*
+ * We've left the last tar file "open", so we can now append the
+ * required WAL files to it.
+ */
+ char pathbuf[MAXPGPATH];
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ struct stat statbuf;
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and
+ * include all WAL files in the range between 'startptr' and 'endptr',
+ * regardless of the timeline the file is stamped with. If there are
+ * some spurious WAL files belonging to timelines that don't belong in
+ * this server's history, they will be included too. Normally there
+ * shouldn't be such files, but if there are, there's little harm in
+ * including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we
+ * need were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from
+ * oldest to newest, to reduce the chance that a file is recycled
+ * before we get a chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since
+ * we are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ /* Ok, we have everything we need. Send the WAL files. */
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ FILE *fp;
+ char buf[TAR_SEND_SIZE];
+ size_t cnt;
+ pgoff_t len = 0;
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+
+ fp = AllocateFile(pathbuf, "rb");
+ if (fp == NULL)
+ {
+ int save_errno = errno;
+
+ /*
+ * Most likely reason for this is that the file was already
+ * removed by a checkpoint, so check for that to get a better
+ * error message.
+ */
+ CheckXLogRemoved(segno, tli);
+
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", pathbuf)));
+ }
+
+ if (fstat(fileno(fp), &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ pathbuf)));
+ if (statbuf.st_size != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* send the WAL file itself */
+ _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+
+ while ((cnt = fread(buf, 1,
+ Min(sizeof(buf), wal_segment_size - len),
+ fp)) > 0)
+ {
+ CheckXLogRemoved(segno, tli);
+ /* Send the chunk as a CopyData message */
+ if (pq_putmessage('d', buf, cnt))
+ ereport(ERROR,
+ (errmsg("base backup could not send data, aborting backup")));
+
+ len += cnt;
+ throttle(cnt);
+
+ if (len == wal_segment_size)
+ break;
+ }
+
+ CHECK_FREAD_ERROR(fp, pathbuf);
+
+ if (len != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* wal_segment_size is a multiple of 512, so no need for padding */
+
+ FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again
+ * after promotion of a new node. This is in line with
+ * walreceiver.c always doing an XLogArchiveForceDone() after a
+ * complete segment.
+ */
+ StatusFilePath(pathbuf, walFileName, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /*
+ * Send timeline history files too. Only the latest timeline history
+ * file is required for recovery, and even that only if there happens
+ * to be a timeline switch in the first WAL segment that contains the
+ * checkpoint record, or if we're taking a base backup from a standby
+ * server and the target timeline changes while the backup is taken.
+ * But they are small and highly useful for debugging purposes, so
+ * better include them all, always.
+ */
+ foreach(lc, historyFileList)
+ {
+ char *fname = lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+}
+
+/*
+ * Setup and activate network throttling, if client requested it
+ */
+static void
+setup_throttle(int maxrate)
+{
+ if (maxrate > 0)
+ {
+ throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ /* Enable throttling. */
+ throttling_counter = 0;
+
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..5b0aa8ae85 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -350,6 +350,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
bool needtblspcmapfile);
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
+extern void collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile);
extern void do_pg_abort_backup(void);
extern SessionBackupState get_backup_status(void);
--
2.21.0 (Apple Git-122)
0006-parallel-backup-testcase_v4.patchapplication/octet-stream; name=0006-parallel-backup-testcase_v4.patchDownload
From 0ce590c1a4027408989988a74f9e7d45bbeb8875 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 21:54:23 +0500
Subject: [PATCH 6/6] parallel backup - testcase
---
.../t/040_pg_basebackup_parallel.pl | 527 ++++++++++++++++++
1 file changed, 527 insertions(+)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 0000000000..4ec4c1e0f6
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,527 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 95;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+# # Create a temporary directory in the system location and symlink it
+# # to our physical temp location. That way we can use shorter names
+# # for the tablespace directories, which hopefully won't run afoul of
+# # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+my $file;
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
--
2.21.0 (Apple Git-122)
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WAL location.
Ugh, global variables.
Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.
You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)
The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.This is to calculate the basepathlen. We need to exclude the tablespace location (or
base path) from the filename before it is sent to the client with sendFile call. I added
this option primarily to avoid performing string manipulation on filename to extract the
tablespace location and then calculate the basepathlen.Alternatively we can do it by extracting the base path from the received filename. What
do you suggest?
I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WAL location.Ugh, global variables.
Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.
Are you expecting something like the attached patch? Basically I have
reorganised the grammar
rules so each command can have the options required by it.
I was feeling a bit reluctant for this change because it may add some
unwanted grammar rules in
the replication grammar. Since these commands are using the same options as
base backup, may
be we could throw error inside the relevant functions on unwanted options?
You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)
Yes I will add it in the next patch.
The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.This is to calculate the basepathlen. We need to exclude the tablespace
location (or
base path) from the filename before it is sent to the client with
sendFile call. I added
this option primarily to avoid performing string manipulation on
filename to extract the
tablespace location and then calculate the basepathlen.
Alternatively we can do it by extracting the base path from the received
filename. What
do you suggest?
I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
repl_grammar.patchapplication/octet-stream; name=repl_grammar.patchDownload
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 5619837ebe..f94961132e 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -99,7 +99,13 @@ static SQLCmd *make_sqlcmd(void);
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
%type <list> base_backup_opt_list
+ start_backup_opt_list stop_backup_opt_list
+ send_backup_files_opt_list send_backup_filelist
%type <defelt> base_backup_opt
+ backup_opt_label backup_opt_progress backup_opt_maxrate
+ backup_opt_fast backup_opt_tsmap backup_opt_wal backup_opt_nowait
+ backup_opt_chksum backup_opt_wal_loc backup_opt_tspath
+ start_backup_opt stop_backup_opt send_backup_filelist_opt send_backup_files_opt
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
@@ -173,21 +179,21 @@ base_backup:
cmd->cmdtag = BASE_BACKUP;
$$ = (Node *) cmd;
}
- | K_START_BACKUP base_backup_opt_list
+ | K_START_BACKUP start_backup_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
cmd->cmdtag = START_BACKUP;
$$ = (Node *) cmd;
}
- | K_SEND_BACKUP_FILELIST base_backup_opt_list
+ | K_SEND_BACKUP_FILELIST send_backup_filelist
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
cmd->cmdtag = SEND_BACKUP_FILELIST;
$$ = (Node *) cmd;
}
- | K_SEND_BACKUP_FILES backup_files base_backup_opt_list
+ | K_SEND_BACKUP_FILES backup_files send_backup_files_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $3;
@@ -195,7 +201,7 @@ base_backup:
cmd->backupfiles = $2;
$$ = (Node *) cmd;
}
- | K_STOP_BACKUP base_backup_opt_list
+ | K_STOP_BACKUP stop_backup_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
@@ -204,6 +210,34 @@ base_backup:
}
;
+start_backup_opt_list:
+ start_backup_opt_list start_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+stop_backup_opt_list:
+ stop_backup_opt_list stop_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+send_backup_filelist:
+ send_backup_filelist send_backup_filelist_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+send_backup_files_opt_list:
+ send_backup_files_opt_list send_backup_files_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
base_backup_opt_list:
base_backup_opt_list base_backup_opt
{ $$ = lappend($1, $2); }
@@ -211,59 +245,114 @@ base_backup_opt_list:
{ $$ = NIL; }
;
+start_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_maxrate { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ ;
+
+stop_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_maxrate { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ ;
+
+send_backup_filelist_opt:
+ backup_opt_tsmap { $$ = $1; }
+ ;
+
+send_backup_files_opt:
+ backup_opt_maxrate { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ | backup_opt_wal_loc { $$ = $1; }
+ | backup_opt_tspath { $$ = $1; }
+ ;
+
base_backup_opt:
- K_LABEL SCONST
- {
- $$ = makeDefElem("label",
- (Node *)makeString($2), -1);
- }
- | K_PROGRESS
- {
- $$ = makeDefElem("progress",
- (Node *)makeInteger(true), -1);
- }
- | K_FAST
- {
- $$ = makeDefElem("fast",
- (Node *)makeInteger(true), -1);
- }
- | K_WAL
- {
- $$ = makeDefElem("wal",
- (Node *)makeInteger(true), -1);
- }
- | K_NOWAIT
- {
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
- }
- | K_MAX_RATE UCONST
- {
- $$ = makeDefElem("max_rate",
- (Node *)makeInteger($2), -1);
- }
- | K_TABLESPACE_MAP
- {
- $$ = makeDefElem("tablespace_map",
- (Node *)makeInteger(true), -1);
- }
- | K_NOVERIFY_CHECKSUMS
- {
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
- }
- | K_START_WAL_LOCATION SCONST
- {
- $$ = makeDefElem("start_wal_location",
- (Node *)makeString($2), -1);
- }
- | K_TABLESPACE_PATH SCONST
- {
- $$ = makeDefElem("tablespace_path",
- (Node *)makeString($2), -1);
- }
+ backup_opt_label { $$ = $1; }
+ | backup_opt_maxrate { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ | backup_opt_wal_loc { $$ = $1; }
+ | backup_opt_tspath { $$ = $1; }
;
+backup_opt_label:
+ K_LABEL SCONST
+ {
+ $$ = makeDefElem("label",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_progress:
+ K_PROGRESS
+ {
+ $$ = makeDefElem("progress",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_fast:
+ K_FAST
+ {
+ $$ = makeDefElem("fast",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal:
+ K_WAL
+ {
+ $$ = makeDefElem("wal",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_nowait:
+ K_NOWAIT
+ {
+ $$ = makeDefElem("nowait",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_maxrate:
+ K_MAX_RATE UCONST
+ {
+ $$ = makeDefElem("max_rate",
+ (Node *)makeInteger($2), -1);
+ };
+
+backup_opt_tsmap:
+ K_TABLESPACE_MAP
+ {
+ $$ = makeDefElem("tablespace_map",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_chksum:
+ K_NOVERIFY_CHECKSUMS
+ {
+ $$ = makeDefElem("noverify_checksums",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal_loc:
+ K_START_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_tspath:
+ K_TABLESPACE_PATH SCONST
+ {
+ $$ = makeDefElem("tablespace_path",
+ (Node *)makeString($2), -1);
+ };
+
backup_files:
'(' backup_files_list ')'
{
On Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WAL location.Ugh, global variables.
Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.Are you expecting something like the attached patch? Basically I have
reorganised the grammar
rules so each command can have the options required by it.I was feeling a bit reluctant for this change because it may add some
unwanted grammar rules in
the replication grammar. Since these commands are using the same options
as base backup, may
be we could throw error inside the relevant functions on unwanted options?You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)Yes I will add it in the next patch.
The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.This is to calculate the basepathlen. We need to exclude the tablespace
location (or
base path) from the filename before it is sent to the client with
sendFile call. I added
this option primarily to avoid performing string manipulation on
filename to extract the
tablespace location and then calculate the basepathlen.
Alternatively we can do it by extracting the base path from the
received filename. What
do you suggest?
I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
I have updated the replication grammar with some new rules to differentiate
the options production
for base backup and newly added commands.
I have also created a separate patch to include the documentation changes.
The current syntax is as below:
- START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
- STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
- SEND_BACKUP_FILELIST
- SEND_BACKUP_FILES ( 'FILE' [, ...] ) [ MAX_RATE rate ] [
NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0004-backend-changes-for-parallel-backup.patchapplication/octet-stream; name=0004-backend-changes-for-parallel-backup.patchDownload
From 2f71ddec4a9e75538af61aafc1a5bc85642d4139 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 22:59:28 +0500
Subject: [PATCH 4/7] backend changes for parallel backup
---
src/backend/access/transam/xlog.c | 2 +-
src/backend/replication/basebackup.c | 552 ++++++++++++++++++++++++-
src/backend/replication/repl_gram.y | 217 ++++++++--
src/backend/replication/repl_scanner.l | 7 +
src/include/nodes/replnodes.h | 10 +
src/include/replication/basebackup.h | 2 +-
6 files changed, 740 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 451fe6c0d1..445aad291e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12279,7 +12279,7 @@ collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b679f36021..57e0b7a0ab 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -41,6 +41,7 @@
#include "utils/ps_status.h"
#include "utils/relcache.h"
#include "utils/timestamp.h"
+#include "utils/pg_lsn.h"
typedef struct
{
@@ -51,11 +52,21 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ const char *tablespace_path;
+ XLogRecPtr wal_location;
} basebackup_options;
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
+
static int64 sendDir(const char *path, int basepathlen, bool dryrun,
- List *tablespaces, bool sendtblspclinks);
+ List *tablespaces, bool sendtblspclinks, List **filelist);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -75,6 +86,13 @@ static void throttle(size_t increment);
static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void StartBackup(basebackup_options *opt);
+static void StopBackup(basebackup_options *opt);
+static void SendBackupFileList(void);
+static void SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok);
+static void addToBackupFileList(List **filelist, char *path, char type, int32 size,
+ time_t mtime);
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -289,7 +307,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
tablespaces = lappend(tablespaces, ti);
/* Send tablespace header */
@@ -323,10 +341,10 @@ perform_base_backup(basebackup_options *opt)
if (tblspc_map_file && opt->sendtblspcmapfile)
{
sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
+ sendDir(".", 1, false, tablespaces, false, NULL);
}
else
- sendDir(".", 1, false, tablespaces, true);
+ sendDir(".", 1, false, tablespaces, true, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -337,7 +355,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -409,6 +427,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_tablespace_path = false;
+ bool o_wal_location = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -497,12 +517,33 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "tablespace_path") == 0)
+ {
+ if (o_tablespace_path)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->tablespace_path = strVal(defel->arg);
+ o_tablespace_path = true;
+ }
+ else if (strcmp(defel->defname, "start_wal_location") == 0)
+ {
+ bool have_error = false;
+ char *wal_location;
+
+ if (o_wal_location)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ wal_location = strVal(defel->arg);
+ opt->wal_location = pg_lsn_in_internal(wal_location, &have_error);
+ o_wal_location = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
}
- if (opt->label == NULL)
- opt->label = "base backup";
}
@@ -520,6 +561,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
parse_basebackup_options(cmd->options, &opt);
+ /* default value for label, if not specified. */
+ if (opt.label == NULL)
+ {
+ if (cmd->cmdtag == BASE_BACKUP)
+ opt.label = "base backup";
+ else
+ opt.label = "start backup";
+ }
+
WalSndSetState(WALSNDSTATE_BACKUP);
if (update_process_title)
@@ -531,7 +581,29 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ StartBackup(&opt);
+ break;
+ case SEND_BACKUP_FILELIST:
+ SendBackupFileList();
+ break;
+ case SEND_BACKUP_FILES:
+ SendBackupFiles(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ StopBackup(&opt);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -674,6 +746,61 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_puttextmessage('C', "SELECT");
}
+/*
+ * Send a single resultset containing backup label and tablespace map
+ */
+static void
+SendStartBackupResult(StringInfo labelfile, StringInfo tblspc_map_file)
+{
+ StringInfoData buf;
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "label");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tablespacemap");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = labelfile->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, labelfile->data, len);
+
+ if (tblspc_map_file)
+ {
+ len = tblspc_map_file->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, tblspc_map_file->data, len);
+ }
+ else
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ }
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
/*
* Inject a file with given name and content in the output tar stream.
*/
@@ -725,7 +852,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool dryrun)
+sendTablespace(char *path, bool dryrun, List **filelist)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -754,11 +881,11 @@ sendTablespace(char *path, bool dryrun)
return 0;
}
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
dryrun);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true, filelist);
return size;
}
@@ -777,7 +904,7 @@ sendTablespace(char *path, bool dryrun)
*/
static int64
sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
- bool sendtblspclinks)
+ bool sendtblspclinks, List **filelist)
{
DIR *dir;
struct dirent *de;
@@ -931,6 +1058,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
@@ -947,6 +1076,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -968,6 +1099,10 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
dryrun);
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
+ addToBackupFileList(filelist, "./pg_wal/archive_status", 'd', -1,
+ statbuf.st_mtime);
+
continue; /* don't recurse into pg_wal */
}
@@ -997,6 +1132,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ addToBackupFileList(filelist, pathbuf, 'l', statbuf.st_size, statbuf.st_mtime);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, dryrun);
#else
@@ -1023,6 +1159,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
dryrun);
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1053,13 +1190,15 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks, filelist);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!dryrun)
+ addToBackupFileList(filelist, pathbuf, 'f', statbuf.st_size, statbuf.st_mtime);
+
+ if (!dryrun && filelist == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
@@ -1764,3 +1903,388 @@ setup_throttle(int maxrate)
throttling_counter = -1;
}
}
+
+/*
+ * StartBackup - prepare to start an online backup.
+ *
+ * This function calls do_pg_start_backup() and sends back starting checkpoint,
+ * available tablespaces, content of backup_label and tablespace_map files.
+ */
+static void
+StartBackup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo labelfile;
+ StringInfo tblspc_map_file = NULL;
+ int datadirpathlen;
+ List *tablespaces = NIL;
+ tablespaceinfo *ti;
+
+ datadirpathlen = strlen(DataDir);
+
+ backup_started_in_recovery = RecoveryInProgress();
+
+ labelfile = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+
+ total_checksum_failures = 0;
+
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ labelfile, &tablespaces,
+ tblspc_map_file,
+ opt->progress, opt->sendtblspcmapfile);
+
+ /*
+ * Once do_pg_start_backup has been called, ensure that any failure causes
+ * us to abort the backup so we don't "leak" a backup counter. For this
+ * reason, register base_backup_cleanup with before_shmem_exit handler. This
+ * will make sure that call is always made when process exits. In success,
+ * do_pg_stop_backup will have taken the system out of backup mode and this
+ * callback will have no effect, Otherwise the required cleanup will be done
+ * in any case.
+ */
+ before_shmem_exit(base_backup_cleanup, (Datum) 0);
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ if ((tblspc_map_file && tblspc_map_file->len <= 0) ||
+ !opt->sendtblspcmapfile)
+ tblspc_map_file = NULL;
+
+ /* send backup_label and tablespace_map to frontend */
+ SendStartBackupResult(labelfile, tblspc_map_file);
+}
+
+/*
+ * StopBackup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionaly WAL segments and ending WAL location.
+ */
+static void
+StopBackup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+ struct stat statbuf;
+ StringInfoData buf;
+ char *labelfile = NULL;
+
+ if (get_backup_status() != SESSION_BACKUP_NON_EXCLUSIVE)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("non-exclusive backup is not in progress")));
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
+
+ /* stop backup */
+ labelfile = (char *) opt->label;
+ endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
+
+ if (opt->includewal)
+ include_wal_files(endptr);
+
+ pq_putemptymessage('c'); /* CopyDone */
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * SendBackupFileList() - sends a list of filenames to frontend
+ *
+ * The function collects a list of filenames, necessary for a complete backup and
+ * sends this list to the client.
+ */
+static void
+SendBackupFileList(void)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ List *tablespaces = NIL;
+ StringInfo tblspc_map_file = NULL;
+ tablespaceinfo *ti;
+
+ tblspc_map_file = makeStringInfo();
+ collectTablespaces(&tablespaces, tblspc_map_file, false, false);
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ tablespaces = lappend(tablespaces, ti);
+
+ foreach(lc, tablespaces)
+ {
+ List *filelist = NULL;
+ tablespaceinfo *ti;
+
+ ti = (tablespaceinfo *) lfirst(lc);
+ if (ti->path == NULL)
+ sendDir(".", 1, true, NIL, true, &filelist);
+ else
+ sendTablespace(ti->path, true, &filelist);
+
+ /* Construct and send the list of filenames */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 4); /* n field */
+
+ /* First field - file name */
+ pq_sendstring(&buf, "path");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "type");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, CHAROID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - mtime */
+ pq_sendstring(&buf, "mtime");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, filelist)
+ {
+ BackupFile *backupFile = (BackupFile *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 4); /* number of columns */
+
+ /* send path */
+ len = strlen(backupFile->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, backupFile->path, len);
+
+ /* send type */
+ pq_sendint32(&buf, 1);
+ pq_sendbyte(&buf, backupFile->type);
+
+ /* send size */
+ send_int8_string(&buf, backupFile->size);
+
+ /* send mtime */
+ send_int8_string(&buf, backupFile->mtime);
+
+ pq_endmessage(&buf);
+ }
+
+ if (filelist)
+ pfree(filelist);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * SendBackupFiles() - sends the actual files to the caller
+ *
+ * The function sends out the given file(s) over to the caller using the COPY
+ * protocol.
+ */
+static void
+SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ int basepathlen = 1;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /* set backup start location. */
+ startptr = opt->wal_location;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (is_absolute_path(pathbuf))
+ {
+ char *basepath;
+
+ /*
+ * 'pathbuf' points to the tablespace location, but we only want to
+ * include the version directory in it that belongs to us.
+ */
+ basepath = strstr(pathbuf, TABLESPACE_VERSION_DIRECTORY);
+ if (basepath)
+ basepathlen = basepath - pathbuf - 1;
+ }
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /* Allow symbolic links in pg_tblspc only */
+ if (strstr(pathbuf, "./pg_tblspc") != NULL &&
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ char linkpath[MAXPGPATH];
+ int rllen;
+
+ rllen = readlink(pathbuf, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read symbolic link \"%s\": %m",
+ pathbuf)));
+ if (rllen >= sizeof(linkpath))
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("symbolic link \"%s\" target is too long",
+ pathbuf)));
+ linkpath[rllen] = '\0';
+
+ _tarWriteHeader(pathbuf, linkpath, &statbuf, false);
+ }
+ else if (S_ISDIR(statbuf.st_mode))
+ {
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, InvalidOid);
+ }
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
+
+/*
+ * Construct a BackupFile entry and add to the list.
+ */
+static void
+addToBackupFileList(List **filelist, char *path, char type, int32 size,
+ time_t mtime)
+{
+ BackupFile *backupFile;
+
+ if (filelist)
+ {
+ backupFile = (BackupFile *) palloc0(sizeof(BackupFile));
+ strlcpy(backupFile->path, path, sizeof(backupFile->path));
+ backupFile->type = type;
+ backupFile->size = size;
+ backupFile->mtime = mtime;
+
+ *filelist = lappend(*filelist, backupFile);
+ }
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..9a652ff556 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,13 +87,25 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_SEND_BACKUP_FILELIST
+%token K_SEND_BACKUP_FILES
+%token K_STOP_BACKUP
+%token K_START_WAL_LOCATION
+%token K_TABLESPACE_PATH
%type <node> command
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
%type <list> base_backup_opt_list
+ start_backup_opt_list stop_backup_opt_list
+ send_backup_files_opt_list
%type <defelt> base_backup_opt
+ backup_opt_label backup_opt_progress backup_opt_maxrate
+ backup_opt_fast backup_opt_tsmap backup_opt_wal backup_opt_nowait
+ backup_opt_chksum backup_opt_wal_loc backup_opt_tspath
+ start_backup_opt stop_backup_opt send_backup_files_opt
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
@@ -102,6 +114,8 @@ static SQLCmd *make_sqlcmd(void);
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
+%type <list> backup_files backup_files_list
+%type <node> backup_file
%%
@@ -162,10 +176,61 @@ base_backup:
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
$$ = (Node *) cmd;
}
+ | K_START_BACKUP start_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_BACKUP_FILELIST
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = NIL;
+ cmd->cmdtag = SEND_BACKUP_FILELIST;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_BACKUP_FILES backup_files send_backup_files_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_BACKUP_FILES;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
+ }
+ | K_STOP_BACKUP stop_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ ;
+
+start_backup_opt_list:
+ start_backup_opt_list start_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
;
+stop_backup_opt_list:
+ stop_backup_opt_list stop_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+send_backup_files_opt_list:
+ send_backup_files_opt_list send_backup_files_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
base_backup_opt_list:
base_backup_opt_list base_backup_opt
{ $$ = lappend($1, $2); }
@@ -173,49 +238,133 @@ base_backup_opt_list:
{ $$ = NIL; }
;
+start_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ ;
+
+stop_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ ;
+
+send_backup_files_opt:
+ backup_opt_maxrate { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ | backup_opt_wal_loc { $$ = $1; }
+ | backup_opt_tspath { $$ = $1; }
+ ;
+
base_backup_opt:
- K_LABEL SCONST
- {
- $$ = makeDefElem("label",
- (Node *)makeString($2), -1);
- }
- | K_PROGRESS
- {
- $$ = makeDefElem("progress",
- (Node *)makeInteger(true), -1);
- }
- | K_FAST
- {
- $$ = makeDefElem("fast",
- (Node *)makeInteger(true), -1);
- }
- | K_WAL
- {
- $$ = makeDefElem("wal",
- (Node *)makeInteger(true), -1);
- }
- | K_NOWAIT
- {
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
- }
- | K_MAX_RATE UCONST
+ backup_opt_label { $$ = $1; }
+ | backup_opt_maxrate { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ | backup_opt_wal_loc { $$ = $1; }
+ | backup_opt_tspath { $$ = $1; }
+ ;
+
+backup_opt_label:
+ K_LABEL SCONST
+ {
+ $$ = makeDefElem("label",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_progress:
+ K_PROGRESS
+ {
+ $$ = makeDefElem("progress",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_fast:
+ K_FAST
+ {
+ $$ = makeDefElem("fast",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal:
+ K_WAL
+ {
+ $$ = makeDefElem("wal",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_nowait:
+ K_NOWAIT
+ {
+ $$ = makeDefElem("nowait",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_maxrate:
+ K_MAX_RATE UCONST
+ {
+ $$ = makeDefElem("max_rate",
+ (Node *)makeInteger($2), -1);
+ };
+
+backup_opt_tsmap:
+ K_TABLESPACE_MAP
+ {
+ $$ = makeDefElem("tablespace_map",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_chksum:
+ K_NOVERIFY_CHECKSUMS
+ {
+ $$ = makeDefElem("noverify_checksums",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal_loc:
+ K_START_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_tspath:
+ K_TABLESPACE_PATH SCONST
+ {
+ $$ = makeDefElem("tablespace_path",
+ (Node *)makeString($2), -1);
+ };
+
+backup_files:
+ '(' backup_files_list ')'
{
- $$ = makeDefElem("max_rate",
- (Node *)makeInteger($2), -1);
+ $$ = $2;
}
- | K_TABLESPACE_MAP
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ backup_file
{
- $$ = makeDefElem("tablespace_map",
- (Node *)makeInteger(true), -1);
+ $$ = list_make1($1);
}
- | K_NOVERIFY_CHECKSUMS
+ | backup_files_list ',' backup_file
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ $$ = lappend($1, $3);
}
;
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
+ ;
+
create_replication_slot:
/* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL RESERVE_WAL */
K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_opt_list
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..c57ff02d39 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,13 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+SEND_BACKUP_FILELIST { return K_SEND_BACKUP_FILELIST; }
+SEND_BACKUP_FILES { return K_SEND_BACKUP_FILES; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+START_WAL_LOCATION { return K_START_WAL_LOCATION; }
+TABLESPACE_PATH { return K_TABLESPACE_PATH; }
+
"," { return ','; }
";" { return ';'; }
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 1e3ed4e19f..3685f260b5 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,14 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ SEND_BACKUP_FILELIST,
+ SEND_BACKUP_FILES,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +50,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index b55917b9b6..5202e4160b 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool dryrun);
+extern int64 sendTablespace(char *path, bool dryrun, List **filelist);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122.2)
0005-pg_basebackup-changes-for-parallel-backup.patchapplication/octet-stream; name=0005-pg_basebackup-changes-for-parallel-backup.patchDownload
From 6e55a5e2ee030b204c80ca575e0647b44c698310 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 14 Oct 2019 17:28:58 +0500
Subject: [PATCH 5/7] pg_basebackup changes for parallel backup.
---
src/bin/pg_basebackup/pg_basebackup.c | 737 ++++++++++++++++++++++++--
1 file changed, 690 insertions(+), 47 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a9d162a7da..41dba42f06 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -19,6 +19,7 @@
#include <sys/wait.h>
#include <signal.h>
#include <time.h>
+#include <pthread.h>
#ifdef HAVE_SYS_SELECT_H
#include <sys/select.h>
#endif
@@ -41,6 +42,7 @@
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
+#include "fe_utils/simple_list.h"
#define ERRCODE_DATA_CORRUPTED "XX001"
@@ -57,6 +59,57 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+
+ int tsIndex; /* index of tsInfo this file belongs to. */
+} BackupFile;
+
+typedef struct
+{
+ Oid tblspcOid;
+ char *tablespace; /* tablespace name or NULL if 'base' tablespace */
+ int numFiles; /* number of files */
+ BackupFile *backupFiles; /* list of files in a tablespace */
+} TablespaceInfo;
+
+typedef struct
+{
+ int tablespacecount;
+ int totalfiles;
+ int numWorkers;
+
+ char xlogstart[64];
+ char *backup_label;
+ char *tablespace_map;
+
+ TablespaceInfo *tsInfo;
+ BackupFile **files; /* list of BackupFile pointers */
+ int fileIndex; /* index of file to be fetched */
+
+ PGconn **workerConns;
+} BackupInfo;
+
+typedef struct
+{
+ BackupInfo *backupInfo;
+ uint64 bytesRead;
+
+ int workerid;
+ pthread_t worker;
+
+ bool terminated;
+} WorkerState;
+
+BackupInfo *backupInfo = NULL;
+WorkerState *workers = NULL;
+
+static pthread_mutex_t fetch_mutex = PTHREAD_MUTEX_INITIALIZER;
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -110,6 +163,9 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -140,9 +196,10 @@ static PQExpBuffer recoveryconfcontents = NULL;
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
+static void workers_progress_report(uint64 totalBytesRead, const char *filename, bool force);
static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
+static int ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void BaseBackup(void);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
@@ -151,6 +208,17 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void ParallelBackupRun(BackupInfo *backupInfo);
+static void StopBackup(BackupInfo *backupInfo);
+static void GetBackupFileList(PGconn *conn, BackupInfo *backupInfo);
+static int GetBackupFile(WorkerState *wstate);
+static BackupFile *getNextFile(BackupInfo *backupInfo);
+static int compareFileSize(const void *a, const void *b);
+static void read_label_tblspcmap(PGconn *conn, char **backup_label, char **tablespace_map);
+static void create_backup_dirs(bool basetablespace, char *tablespace, char *name);
+static void writefile(char *path, char *buf);
+static void *workerRun(void *arg);
+
static void
cleanup_directories_atexit(void)
@@ -202,6 +270,17 @@ cleanup_directories_atexit(void)
static void
disconnect_atexit(void)
{
+ /* close worker connections */
+ if (backupInfo && backupInfo->workerConns != NULL)
+ {
+ int i;
+ for (i = 0; i < numWorkers; i++)
+ {
+ if (backupInfo->workerConns[i] != NULL)
+ PQfinish(backupInfo->workerConns[i]);
+ }
+ }
+
if (conn != NULL)
PQfinish(conn);
}
@@ -349,6 +428,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -695,6 +775,93 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Print a progress report of worker threads. If verbose output
+ * is enabled, also print the current file name.
+ *
+ * Progress report is written at maximum once per second, unless the
+ * force parameter is set to true.
+ */
+static void
+workers_progress_report(uint64 totalBytesRead, const char *filename, bool force)
+{
+ int percent;
+ char totalBytesRead_str[32];
+ char totalsize_str[32];
+ pg_time_t now;
+
+ if (!showprogress)
+ return;
+
+ now = time(NULL);
+ if (now == last_progress_report && !force)
+ return; /* Max once per second */
+
+ last_progress_report = now;
+ percent = totalsize_kb ? (int) ((totalBytesRead / 1024) * 100 / totalsize_kb) : 0;
+
+ /*
+ * Avoid overflowing past 100% or the full size. This may make the total
+ * size number change as we approach the end of the backup (the estimate
+ * will always be wrong if WAL is included), but that's better than having
+ * the done column be bigger than the total.
+ */
+ if (percent > 100)
+ percent = 100;
+ if (totalBytesRead / 1024 > totalsize_kb)
+ totalsize_kb = totalBytesRead / 1024;
+
+ /*
+ * Separate step to keep platform-dependent format code out of
+ * translatable strings. And we only test for INT64_FORMAT availability
+ * in snprintf, not fprintf.
+ */
+ snprintf(totalBytesRead_str, sizeof(totalBytesRead_str), INT64_FORMAT,
+ totalBytesRead / 1024);
+ snprintf(totalsize_str, sizeof(totalsize_str), INT64_FORMAT, totalsize_kb);
+
+#define VERBOSE_FILENAME_LENGTH 35
+
+ if (verbose)
+ {
+ if (!filename)
+
+ /*
+ * No filename given, so clear the status line (used for last
+ * call)
+ */
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied %*s"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent,
+ VERBOSE_FILENAME_LENGTH + 5, "");
+ else
+ {
+ bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied, current file (%s%-*.*s)"),
+ (int) strlen(totalsize_str), totalBytesRead_str, totalsize_str,
+ percent,
+ /* Prefix with "..." if we do leading truncation */
+ truncate ? "..." : "",
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ /* Truncate filename at beginning if it's too long */
+ truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ }
+ }
+ else
+ {
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent);
+ }
+
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\r");
+ else
+ fprintf(stderr, "\n");
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -711,7 +878,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
char totalsize_str[32];
pg_time_t now;
- if (!showprogress)
+ if (!showprogress || numWorkers > 1)
return;
now = time(NULL);
@@ -1381,7 +1548,7 @@ get_tablespace_mapping(const char *dir)
* specified directory. If it's for another tablespace, it will be restored
* in the original or mapped directory.
*/
-static void
+static int
ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
{
char current_path[MAXPGPATH];
@@ -1392,6 +1559,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
bool basetablespace;
char *copybuf = NULL;
FILE *file = NULL;
+ int readBytes = 0;
basetablespace = PQgetisnull(res, rownum, 0);
if (basetablespace)
@@ -1455,7 +1623,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("invalid tar block header size: %d", r);
exit(1);
}
- totaldone += 512;
+ readBytes += 512;
current_len_left = read_tar_number(©buf[124], 12);
@@ -1486,21 +1654,14 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* Directory
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
+
+ /*
+ * In parallel mode, we create directories before fetching
+ * files so its Ok if a directory already exist.
+ */
if (mkdir(filename, pg_dir_create_mode) != 0)
{
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal
- * receiver process. Also, when the WAL directory
- * location was specified, pg_wal (or pg_xlog) has
- * already been created as a symbolic link before
- * starting the actual backup. So just ignore creation
- * failures on related directories.
- */
- if (!((pg_str_endswith(filename, "/pg_wal") ||
- pg_str_endswith(filename, "/pg_xlog") ||
- pg_str_endswith(filename, "/archive_status")) &&
- errno == EEXIST))
+ if (errno != EEXIST)
{
pg_log_error("could not create directory \"%s\": %m",
filename);
@@ -1585,7 +1746,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
fclose(file);
file = NULL;
- totaldone += r;
+ readBytes += r;
continue;
}
@@ -1594,7 +1755,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("could not write to file \"%s\": %m", filename);
exit(1);
}
- totaldone += r;
+ readBytes += r;
+ totaldone = readBytes;
progress_report(rownum, filename, false);
current_len_left -= r;
@@ -1622,13 +1784,11 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
if (copybuf != NULL)
PQfreemem(copybuf);
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
/*
* No data is synced here, everything is done for all tablespaces at the
* end.
*/
+ return readBytes;
}
@@ -1715,16 +1875,29 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
- escaped_label,
- showprogress ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (numWorkers <= 1)
+ {
+ basebkp =
+ psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ escaped_label,
+ showprogress ? "PROGRESS" : "",
+ includewal == FETCH_WAL ? "WAL" : "",
+ fastcheckpoint ? "FAST" : "",
+ includewal == NO_WAL ? "" : "NOWAIT",
+ maxrate_clause ? maxrate_clause : "",
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ }
+ else
+ {
+ basebkp =
+ psprintf("START_BACKUP LABEL '%s' %s %s %s %s",
+ escaped_label,
+ showprogress ? "PROGRESS" : "",
+ fastcheckpoint ? "FAST" : "",
+ maxrate_clause ? maxrate_clause : "",
+ format == 't' ? "TABLESPACE_MAP" : "");
+ }
if (PQsendQuery(conn, basebkp) == 0)
{
@@ -1774,7 +1947,7 @@ BaseBackup(void)
/*
* Get the header
*/
- res = PQgetResult(conn);
+ tablespacehdr = res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not get backup header: %s",
@@ -1830,24 +2003,74 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
+ if (numWorkers > 1)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ int j = 0,
+ k = 0;
- if (showprogress)
+ backupInfo = palloc0(sizeof(BackupInfo));
+ backupInfo->workerConns = (PGconn **) palloc0(sizeof(PGconn *) * numWorkers);
+ backupInfo->tablespacecount = tablespacecount;
+ backupInfo->numWorkers = numWorkers;
+ strlcpy(backupInfo->xlogstart, xlogstart, sizeof(backupInfo->xlogstart));
+
+ read_label_tblspcmap(conn, &backupInfo->backup_label, &backupInfo->tablespace_map);
+
+ /* retrieve backup file list from the server. **/
+ GetBackupFileList(conn, backupInfo);
+
+ /*
+ * add backup_label in backup, (for tar format, ReceiveTarFile() will
+ * take care of it).
+ */
+ if (format == 'p')
+ writefile("backup_label", backupInfo->backup_label);
+
+ /*
+ * Flatten the file list to avoid unnecessary locks and enable the sequential
+ * access to file list. (Creating an array of BackupFile structre pointers).
+ */
+ backupInfo->files =
+ (BackupFile **) palloc0(sizeof(BackupFile *) * backupInfo->totalfiles);
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[i];
+
+ for (j = 0; j < curTsInfo->numFiles; j++)
+ {
+ backupInfo->files[k] = &curTsInfo->backupFiles[j];
+ k++;
+ }
+ }
+
+ ParallelBackupRun(backupInfo);
+ StopBackup(backupInfo);
+ }
+ else
{
- progress_report(PQntuples(res), NULL, true);
- if (isatty(fileno(stderr)))
- fprintf(stderr, "\n"); /* Need to move to next line */
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
+
+ if (showprogress)
+ {
+ progress_report(PQntuples(tablespacehdr), NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
}
+ /* Write recovery contents */
+ if (format == 'p' && writerecoveryconf)
+ WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
+
PQclear(res);
/*
@@ -2043,6 +2266,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2070,7 +2294,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2211,6 +2435,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2325,6 +2552,22 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (format != 'p' && numWorkers > 1)
+ {
+ pg_log_error("parallel jobs are only supported with 'plain' format");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2397,3 +2640,403 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+/*
+ * Thread worker
+ */
+static void *
+workerRun(void *arg)
+{
+ WorkerState *wstate = (WorkerState *) arg;
+
+ GetBackupFile(wstate);
+
+ wstate->terminated = true;
+ return NULL;
+}
+
+/*
+ * Runs the worker threads and updates progress until all workers have
+ * terminated/completed.
+ */
+static void
+ParallelBackupRun(BackupInfo *backupInfo)
+{
+ int status,
+ i;
+ bool threadsActive = true;
+ uint64 totalBytes = 0;
+
+ workers = (WorkerState *) palloc0(sizeof(WorkerState) * numWorkers);
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ worker->backupInfo = backupInfo;
+ worker->workerid = i;
+ worker->bytesRead = 0;
+ worker->terminated = false;
+
+ backupInfo->workerConns[i] = GetConnection();
+ status = pthread_create(&worker->worker, NULL, workerRun, worker);
+ if (status != 0)
+ {
+ pg_log_error("failed to create thread: %m");
+ exit(1);
+ }
+
+ if (verbose)
+ pg_log_info("backup worker (%d) created, %d", i, status);
+ }
+
+ /*
+ * This is the main thread for updating progrsss. It waits for workers to
+ * complete and gets updated status during every loop iteration.
+ */
+ while(threadsActive)
+ {
+ char *filename = NULL;
+
+ threadsActive = false;
+ totalBytes = 0;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ totalBytes += worker->bytesRead;
+ threadsActive |= !worker->terminated;
+ }
+
+ if (backupInfo->fileIndex < backupInfo->totalfiles)
+ filename = backupInfo->files[backupInfo->fileIndex]->path;
+
+ workers_progress_report(totalBytes, filename, false);
+ pg_usleep(100000);
+ }
+
+ if (showprogress)
+ {
+ workers_progress_report(totalBytes, NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
+}
+
+/*
+ * Take the system out of backup mode.
+ */
+static void
+StopBackup(BackupInfo *backupInfo)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP LABEL '%s' %s %s",
+ backupInfo->backup_label,
+ includewal == FETCH_WAL ? "WAL" : "",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* receive pg_control and wal files */
+ ReceiveAndUnpackTarFile(conn, res, tablespacecount);
+ PQclear(res);
+}
+
+/*
+ * Retrive backup file list from the server and populate TablespaceInfo struct
+ * to keep track of tablespaces and its files.
+ */
+static void
+GetBackupFileList(PGconn *conn, BackupInfo *backupInfo)
+{
+ TablespaceInfo *tsInfo;
+ PGresult *res = NULL;
+ char *basebkp;
+ int i;
+
+ backupInfo->tsInfo = palloc0(sizeof(TablespaceInfo) * backupInfo->tablespacecount);
+ tsInfo = backupInfo->tsInfo;
+
+ /*
+ * Get list of files.
+ */
+ basebkp = psprintf("SEND_BACKUP_FILELIST");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "SEND_BACKUP_FILELIST", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /*
+ * The list of files is grouped by tablespaces, and we want to fetch them
+ * in the same order.
+ */
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ bool basetablespace;
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get backup header: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ tsInfo[i].tblspcOid = atol(PQgetvalue(tablespacehdr, i, 0));
+ tsInfo[i].tablespace = PQgetvalue(tablespacehdr, i, 1);
+ tsInfo[i].numFiles = PQntuples(res);
+ tsInfo[i].backupFiles = palloc0(sizeof(BackupFile) * tsInfo[i].numFiles);
+
+ /* keep count of all files in backup */
+ backupInfo->totalfiles += tsInfo[i].numFiles;
+
+ for (int j = 0; j < tsInfo[i].numFiles; j++)
+ {
+ char *path = PQgetvalue(res, j, 0);
+ char type = PQgetvalue(res, j, 1)[0];
+ int32 size = atol(PQgetvalue(res, j, 2));
+ time_t mtime = atol(PQgetvalue(res, j, 3));
+
+ /*
+ * In 'plain' format, create backup directories first.
+ */
+ if (format == 'p' && type == 'd')
+ create_backup_dirs(basetablespace, tsInfo[i].tablespace, path);
+
+ strlcpy(tsInfo[i].backupFiles[j].path, path, MAXPGPATH);
+ tsInfo[i].backupFiles[j].type = type;
+ tsInfo[i].backupFiles[j].size = size;
+ tsInfo[i].backupFiles[j].mtime = mtime;
+ tsInfo[i].backupFiles[j].tsIndex = i;
+ }
+
+ /* sort files in descending order, based on size */
+ qsort(tsInfo[i].backupFiles, tsInfo[i].numFiles,
+ sizeof(BackupFile), &compareFileSize);
+ PQclear(res);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s", PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+}
+
+/*
+ * Retrive and write backup file from the server. The file list is provided by
+ * worker state. It pulls a single file from this list and writes it to the
+ * backup directory.
+ */
+static int
+GetBackupFile(WorkerState *wstate)
+{
+ PGresult *res = NULL;
+ PGconn *worker_conn = NULL;
+ BackupFile *fetchFile = NULL;
+ BackupInfo *backupInfo = NULL;
+
+ backupInfo = wstate->backupInfo;
+ worker_conn = backupInfo->workerConns[wstate->workerid];
+ while ((fetchFile = getNextFile(backupInfo)) != NULL)
+ {
+ PQExpBuffer buf = createPQExpBuffer();
+
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
+ appendPQExpBuffer(buf, "SEND_BACKUP_FILES ( '%s' )", fetchFile->path);
+
+ /* add options */
+ appendPQExpBuffer(buf, " START_WAL_LOCATION '%s' %s %s",
+ backupInfo->xlogstart,
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (maxrate > 0)
+ appendPQExpBuffer(buf, " MAX_RATE %u", maxrate);
+
+ if (!worker_conn)
+ return 1;
+
+ if (PQsendQuery(worker_conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(worker_conn));
+ return 1;
+ }
+
+ destroyPQExpBuffer(buf);
+
+ /* process file contents, also count bytesRead for progress */
+ wstate->bytesRead +=
+ ReceiveAndUnpackTarFile(worker_conn, tablespacehdr, fetchFile->tsIndex);
+
+ res = PQgetResult(worker_conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data stream: %s",
+ PQerrorMessage(worker_conn));
+ exit(1);
+ }
+
+ res = PQgetResult(worker_conn);
+ }
+
+ PQclear(res);
+ return 0;
+}
+
+/*
+ * Increment fileIndex and store it in a local variable so that even a
+ * context switch does not affect the file index value and we don't accidentally
+ * increment the value twice and therefore skip some files.
+ */
+static BackupFile*
+getNextFile(BackupInfo *backupInfo)
+{
+ int fileIndex = 0;
+
+ pthread_mutex_lock(&fetch_mutex);
+ fileIndex = backupInfo->fileIndex++;
+ pthread_mutex_unlock(&fetch_mutex);
+
+ if (fileIndex >= backupInfo->totalfiles)
+ return NULL;
+
+ return backupInfo->files[fileIndex];
+}
+
+/* qsort comparator for BackupFile (sort descending order) */
+static int
+compareFileSize(const void *a, const void *b)
+{
+ const BackupFile *v1 = (BackupFile *) a;
+ const BackupFile *v2 = (BackupFile *) b;
+
+ if (v1->size > v2->size)
+ return -1;
+ if (v1->size < v2->size)
+ return 1;
+
+ return 0;
+}
+
+static void
+read_label_tblspcmap(PGconn *conn, char **backuplabel, char **tblspc_map)
+{
+ PGresult *res = NULL;
+
+ Assert(backuplabel != NULL);
+ Assert(tblspc_map != NULL);
+
+ /*
+ * Get Backup label and tablespace map data.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ *backuplabel = PQgetvalue(res, 0, 0); /* backup_label */
+ if (!PQgetisnull(res, 0, 1))
+ *tblspc_map = PQgetvalue(res, 0, 1); /* tablespace_map */
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ PQclear(res);
+}
+
+/*
+ * Create backup direcotries while taking care of tablespace path. If tablespace
+ * mapping (with -T) is given then the directory will be created on the mapped
+ * path.
+ */
+static void
+create_backup_dirs(bool basetablespace, char *tablespace, char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ Assert(name != NULL);
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, name);
+ else
+ {
+ Assert(tablespace != NULL);
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tablespace), (name + strlen(tablespace) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+}
+
+/*
+ * General function for writing to a file; creates one if it doesn't exist
+ */
+static void
+writefile(char *path, char *buf)
+{
+ FILE *f;
+ char pathbuf[MAXPGPATH];
+
+ snprintf(pathbuf, MAXPGPATH, "%s/%s", basedir, path);
+ f = fopen(pathbuf, "w");
+ if (f == NULL)
+ {
+ pg_log_error("could not open file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fwrite(buf, strlen(buf), 1, f) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fclose(f))
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+}
--
2.21.0 (Apple Git-122.2)
0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup.patchapplication/octet-stream; name=0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup.patchDownload
From 15979957334de59b21082ce1029cad240e937d99 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 10:21:38 +0500
Subject: [PATCH 1/7] remove PG_ENSURE_ERROR_CLEANUP macro from basebackup.
register base_backup_cleanup with before_shmem_exit handler. This will make
sure that call is always made when wal sender exits.
---
src/backend/replication/basebackup.c | 182 +++++++++++++--------------
1 file changed, 90 insertions(+), 92 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1fa4551eff..71a8b4fb4c 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -243,6 +243,8 @@ perform_base_backup(basebackup_options *opt)
StringInfo tblspc_map_file = NULL;
int datadirpathlen;
List *tablespaces = NIL;
+ ListCell *lc;
+ tablespaceinfo *ti;
datadirpathlen = strlen(DataDir);
@@ -261,121 +263,117 @@ perform_base_backup(basebackup_options *opt)
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
* us to abort the backup so we don't "leak" a backup counter. For this
- * reason, *all* functionality between do_pg_start_backup() and the end of
- * do_pg_stop_backup() should be inside the error cleanup block!
+ * reason, register base_backup_cleanup with before_shmem_exit handler. This
+ * will make sure that call is always made when process exits. In success,
+ * do_pg_stop_backup will have taken the system out of backup mode and this
+ * callback will have no effect, Otherwise the required cleanup will be done
+ * in any case.
*/
+ before_shmem_exit(base_backup_cleanup, (Datum) 0);
- PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
- {
- ListCell *lc;
- tablespaceinfo *ti;
-
- SendXlogRecPtrResult(startptr, starttli);
+ SendXlogRecPtrResult(startptr, starttli);
- /*
- * Calculate the relative path of temporary statistics directory in
- * order to skip the files which are located in that directory later.
- */
- if (is_absolute_path(pgstat_stat_directory) &&
- strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
- else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory);
- else
- statrelpath = pgstat_stat_directory;
-
- /* Add a node for the base directory at the end */
- ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
- tablespaces = lappend(tablespaces, ti);
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ tablespaces = lappend(tablespaces, ti);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
+ /* Setup and activate network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ {
+ throttling_sample =
+ (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
- /* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
+ /* Enable throttling. */
+ throttling_counter = 0;
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
- if (ti->path == NULL)
- {
- struct stat statbuf;
+ /* Send off our tablespaces one by one */
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+ StringInfoData buf;
- /* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data);
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
- /*
- * Send tablespace_map file if required and then the bulk of
- * the files.
- */
- if (tblspc_map_file && opt->sendtblspcmapfile)
- {
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
- }
- else
- sendDir(".", 1, false, tablespaces, true);
+ if (ti->path == NULL)
+ {
+ struct stat statbuf;
- /* ... and pg_control after everything else. */
- if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
- }
- else
- sendTablespace(ti->path, false);
+ /* In the main tar, include the backup_label first... */
+ sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data);
/*
- * If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * Send tablespace_map file if required and then the bulk of
+ * the files.
*/
- if (opt->includewal && ti->path == NULL)
+ if (tblspc_map_file && opt->sendtblspcmapfile)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
+ sendDir(".", 1, false, tablespaces, false);
}
else
- pq_putemptymessage('c'); /* CopyDone */
+ sendDir(".", 1, false, tablespaces, true);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
+ else
+ sendTablespace(ti->path, false);
- endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
+ /*
+ * If we're including WAL, and this is the main data directory we
+ * don't terminate the tar stream here. Instead, we will append
+ * the xlog files below and terminate it then. This is safe since
+ * the main data directory is always sent *last*.
+ */
+ if (opt->includewal && ti->path == NULL)
+ {
+ Assert(lnext(tablespaces, lc) == NULL);
+ }
+ else
+ pq_putemptymessage('c'); /* CopyDone */
}
- PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
if (opt->includewal)
{
--
2.21.0 (Apple Git-122.2)
0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb.patchapplication/octet-stream; name=0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb.patchDownload
From f24c0bfcae411b38d689523d0329d830d9aa191f Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 16:45:28 +0500
Subject: [PATCH 2/7] Rename sizeonly to dryrun for few functions in
basebackup.
---
src/backend/replication/basebackup.c | 44 ++++++++++++++--------------
src/include/replication/basebackup.h | 2 +-
2 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 71a8b4fb4c..267163ed29 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -54,15 +54,15 @@ typedef struct
} basebackup_options;
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(const char *path, int basepathlen, bool dryrun,
List *tablespaces, bool sendtblspclinks);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+ struct stat *statbuf, bool dryrun);
static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+ bool dryrun);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
@@ -959,13 +959,13 @@ sendFileWithContent(const char *filename, const char *content)
/*
* Include the tablespace directory pointed to by 'path' in the output tar
- * stream. If 'sizeonly' is true, we just calculate a total length and return
+ * stream. If 'dryrun' is true, we just calculate a total length and return
* it, without actually sending anything.
*
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool dryrun)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -995,17 +995,17 @@ sendTablespace(char *path, bool sizeonly)
}
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
- sizeonly);
+ dryrun);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
return size;
}
/*
* Include all files from the given directory in the output tar stream. If
- * 'sizeonly' is true, we just calculate a total length and return it, without
+ * 'dryrun' is true, we just calculate a total length and return it, without
* actually sending anything.
*
* Omit any directory in the tablespaces list, to avoid backing up
@@ -1016,7 +1016,7 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
bool sendtblspclinks)
{
DIR *dir;
@@ -1171,7 +1171,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
}
@@ -1187,7 +1187,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -1199,14 +1199,14 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ dryrun);
continue; /* don't recurse into pg_wal */
}
@@ -1238,7 +1238,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
linkpath[rllen] = '\0';
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
- &statbuf, sizeonly);
+ &statbuf, dryrun);
#else
/*
@@ -1262,7 +1262,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* permissions right.
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ dryrun);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1293,17 +1293,17 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ if (!dryrun)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
- if (sent || sizeonly)
+ if (sent || dryrun)
{
/* Add size, rounded up to 512byte block */
size += ((statbuf.st_size + 511) & ~511);
@@ -1612,12 +1612,12 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
static int64
_tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly)
+ struct stat *statbuf, bool dryrun)
{
char h[512];
enum tarError rc;
- if (!sizeonly)
+ if (!dryrun)
{
rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
@@ -1654,7 +1654,7 @@ _tarWriteHeader(const char *filename, const char *linktarget,
*/
static int64
_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+ bool dryrun)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1664,7 +1664,7 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
+ return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, dryrun);
}
/*
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..b55917b9b6 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool dryrun);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122.2)
0003-Refactor-some-basebackup-code-to-increase-reusabilit.patchapplication/octet-stream; name=0003-Refactor-some-basebackup-code-to-increase-reusabilit.patchDownload
From 0d9d46b8c0a21eec50a5ee4f2855e17f453ae03a Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 9 Oct 2019 12:39:41 +0500
Subject: [PATCH 3/7] Refactor some basebackup code to increase reusability, in
anticipation of adding parallel backup
---
src/backend/access/transam/xlog.c | 192 +++++-----
src/backend/replication/basebackup.c | 512 ++++++++++++++-------------
src/include/access/xlog.h | 2 +
3 files changed, 371 insertions(+), 335 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3b766e66b9..451fe6c0d1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10300,10 +10300,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
{
bool gotUniqueStartpoint = false;
- DIR *tblspcdir;
- struct dirent *de;
- tablespaceinfo *ti;
- int datadirpathlen;
/*
* Force an XLOG file switch before the checkpoint, to ensure that the
@@ -10429,93 +10425,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
if (exclusive)
tblspcmapfile = makeStringInfo();
- datadirpathlen = strlen(DataDir);
-
- /* Collect information about all tablespaces */
- tblspcdir = AllocateDir("pg_tblspc");
- while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
- {
- char fullpath[MAXPGPATH + 10];
- char linkpath[MAXPGPATH];
- char *relpath = NULL;
- int rllen;
- StringInfoData buflinkpath;
- char *s = linkpath;
-
- /* Skip special stuff */
- if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
- continue;
-
- snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
-
-#if defined(HAVE_READLINK) || defined(WIN32)
- rllen = readlink(fullpath, linkpath, sizeof(linkpath));
- if (rllen < 0)
- {
- ereport(WARNING,
- (errmsg("could not read symbolic link \"%s\": %m",
- fullpath)));
- continue;
- }
- else if (rllen >= sizeof(linkpath))
- {
- ereport(WARNING,
- (errmsg("symbolic link \"%s\" target is too long",
- fullpath)));
- continue;
- }
- linkpath[rllen] = '\0';
-
- /*
- * Add the escape character '\\' before newline in a string to
- * ensure that we can distinguish between the newline in the
- * tablespace path and end of line while reading tablespace_map
- * file during archive recovery.
- */
- initStringInfo(&buflinkpath);
-
- while (*s)
- {
- if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
- appendStringInfoChar(&buflinkpath, '\\');
- appendStringInfoChar(&buflinkpath, *s++);
- }
-
- /*
- * Relpath holds the relative path of the tablespace directory
- * when it's located within PGDATA, or NULL if it's located
- * elsewhere.
- */
- if (rllen > datadirpathlen &&
- strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
- IS_DIR_SEP(linkpath[datadirpathlen]))
- relpath = linkpath + datadirpathlen + 1;
-
- ti = palloc(sizeof(tablespaceinfo));
- ti->oid = pstrdup(de->d_name);
- ti->path = pstrdup(buflinkpath.data);
- ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
-
- if (tablespaces)
- *tablespaces = lappend(*tablespaces, ti);
-
- appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
-
- pfree(buflinkpath.data);
-#else
-
- /*
- * If the platform does not have symbolic links, it should not be
- * possible to have tablespaces - clearly somebody else created
- * them. Warn about it and ignore.
- */
- ereport(WARNING,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("tablespaces are not supported on this platform")));
-#endif
- }
- FreeDir(tblspcdir);
+ collectTablespaces(tablespaces, tblspcmapfile, infotbssize, needtblspcmapfile);
/*
* Construct backup label file
@@ -12291,3 +12201,103 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Collect information about all tablespaces.
+ */
+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile)
+{
+ DIR *tblspcdir;
+ struct dirent *de;
+ tablespaceinfo *ti;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ tblspcdir = AllocateDir("pg_tblspc");
+ while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+ {
+ char fullpath[MAXPGPATH + 10];
+ char linkpath[MAXPGPATH];
+ char *relpath = NULL;
+ int rllen;
+ StringInfoData buflinkpath;
+ char *s = linkpath;
+
+ /* Skip special stuff */
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+
+#if defined(HAVE_READLINK) || defined(WIN32)
+ rllen = readlink(fullpath, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not read symbolic link \"%s\": %m",
+ fullpath)));
+ continue;
+ }
+ else if (rllen >= sizeof(linkpath))
+ {
+ ereport(WARNING,
+ (errmsg("symbolic link \"%s\" target is too long",
+ fullpath)));
+ continue;
+ }
+ linkpath[rllen] = '\0';
+
+ /*
+ * Add the escape character '\\' before newline in a string to
+ * ensure that we can distinguish between the newline in the
+ * tablespace path and end of line while reading tablespace_map
+ * file during archive recovery.
+ */
+ initStringInfo(&buflinkpath);
+
+ while (*s)
+ {
+ if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
+ appendStringInfoChar(&buflinkpath, '\\');
+ appendStringInfoChar(&buflinkpath, *s++);
+ }
+
+ /*
+ * Relpath holds the relative path of the tablespace directory
+ * when it's located within PGDATA, or NULL if it's located
+ * elsewhere.
+ */
+ if (rllen > datadirpathlen &&
+ strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
+ IS_DIR_SEP(linkpath[datadirpathlen]))
+ relpath = linkpath + datadirpathlen + 1;
+
+ ti = palloc(sizeof(tablespaceinfo));
+ ti->oid = pstrdup(de->d_name);
+ ti->path = pstrdup(buflinkpath.data);
+ ti->rpath = relpath ? pstrdup(relpath) : NULL;
+ ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+
+ if (tablespaces)
+ *tablespaces = lappend(*tablespaces, ti);
+
+ appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
+
+ pfree(buflinkpath.data);
+#else
+
+ /*
+ * If the platform does not have symbolic links, it should not be
+ * possible to have tablespaces - clearly somebody else created
+ * them. Warn about it and ignore.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("tablespaces are not supported on this platform")));
+#endif
+ }
+ FreeDir(tblspcdir);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 267163ed29..b679f36021 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -67,10 +67,12 @@ static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
static void perform_base_backup(basebackup_options *opt);
+static void include_wal_files(XLogRecPtr endptr);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
+static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
/* Was the backup currently in-progress initiated in recovery mode? */
@@ -293,29 +295,7 @@ perform_base_backup(basebackup_options *opt)
/* Send tablespace header */
SendBackupHeader(tablespaces);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ setup_throttle(opt->maxrate);
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
@@ -381,227 +361,7 @@ perform_base_backup(basebackup_options *opt)
* We've left the last tar file "open", so we can now append the
* required WAL files to it.
*/
- char pathbuf[MAXPGPATH];
- XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
- struct stat statbuf;
- List *historyFileList = NIL;
- List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
- ListCell *lc;
- TimeLineID tli;
-
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
- /* Ok, we have everything we need. Send the WAL files. */
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- FILE *fp;
- char buf[TAR_SEND_SIZE];
- size_t cnt;
- pgoff_t len = 0;
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
-
- fp = AllocateFile(pathbuf, "rb");
- if (fp == NULL)
- {
- int save_errno = errno;
-
- /*
- * Most likely reason for this is that the file was already
- * removed by a checkpoint, so check for that to get a better
- * error message.
- */
- CheckXLogRemoved(segno, tli);
-
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", pathbuf)));
- }
-
- if (fstat(fileno(fp), &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- pathbuf)));
- if (statbuf.st_size != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
-
- while ((cnt = fread(buf, 1,
- Min(sizeof(buf), wal_segment_size - len),
- fp)) > 0)
- {
- CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
-
- len += cnt;
- throttle(cnt);
-
- if (len == wal_segment_size)
- break;
- }
-
- CHECK_FREAD_ERROR(fp, pathbuf);
-
- if (len != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* wal_segment_size is a multiple of 512, so no need for padding */
-
- FreeFile(fp);
-
- /*
- * Mark file as archived, otherwise files can get archived again
- * after promotion of a new node. This is in line with
- * walreceiver.c always doing an XLogArchiveForceDone() after a
- * complete segment.
- */
- StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /*
- * Send timeline history files too. Only the latest timeline history
- * file is required for recovery, and even that only if there happens
- * to be a timeline switch in the first WAL segment that contains the
- * checkpoint record, or if we're taking a base backup from a standby
- * server and the target timeline changes while the backup is taken.
- * But they are small and highly useful for debugging purposes, so
- * better include them all, always.
- */
- foreach(lc, historyFileList)
- {
- char *fname = lfirst(lc);
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
-
- if (lstat(pathbuf, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m", pathbuf)));
-
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
-
- /* unconditionally mark file as archived */
- StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
- }
+ include_wal_files(endptr);
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
@@ -1740,3 +1500,267 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+/*
+ * Append the required WAL files to the backup tar file. It assumes that the
+ * last tar file is "open" and the WALs will be appended to it.
+ */
+static void
+include_wal_files(XLogRecPtr endptr)
+{
+ /*
+ * We've left the last tar file "open", so we can now append the
+ * required WAL files to it.
+ */
+ char pathbuf[MAXPGPATH];
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ struct stat statbuf;
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and
+ * include all WAL files in the range between 'startptr' and 'endptr',
+ * regardless of the timeline the file is stamped with. If there are
+ * some spurious WAL files belonging to timelines that don't belong in
+ * this server's history, they will be included too. Normally there
+ * shouldn't be such files, but if there are, there's little harm in
+ * including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we
+ * need were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from
+ * oldest to newest, to reduce the chance that a file is recycled
+ * before we get a chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since
+ * we are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ /* Ok, we have everything we need. Send the WAL files. */
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ FILE *fp;
+ char buf[TAR_SEND_SIZE];
+ size_t cnt;
+ pgoff_t len = 0;
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+
+ fp = AllocateFile(pathbuf, "rb");
+ if (fp == NULL)
+ {
+ int save_errno = errno;
+
+ /*
+ * Most likely reason for this is that the file was already
+ * removed by a checkpoint, so check for that to get a better
+ * error message.
+ */
+ CheckXLogRemoved(segno, tli);
+
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", pathbuf)));
+ }
+
+ if (fstat(fileno(fp), &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ pathbuf)));
+ if (statbuf.st_size != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* send the WAL file itself */
+ _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+
+ while ((cnt = fread(buf, 1,
+ Min(sizeof(buf), wal_segment_size - len),
+ fp)) > 0)
+ {
+ CheckXLogRemoved(segno, tli);
+ /* Send the chunk as a CopyData message */
+ if (pq_putmessage('d', buf, cnt))
+ ereport(ERROR,
+ (errmsg("base backup could not send data, aborting backup")));
+
+ len += cnt;
+ throttle(cnt);
+
+ if (len == wal_segment_size)
+ break;
+ }
+
+ CHECK_FREAD_ERROR(fp, pathbuf);
+
+ if (len != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* wal_segment_size is a multiple of 512, so no need for padding */
+
+ FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again
+ * after promotion of a new node. This is in line with
+ * walreceiver.c always doing an XLogArchiveForceDone() after a
+ * complete segment.
+ */
+ StatusFilePath(pathbuf, walFileName, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /*
+ * Send timeline history files too. Only the latest timeline history
+ * file is required for recovery, and even that only if there happens
+ * to be a timeline switch in the first WAL segment that contains the
+ * checkpoint record, or if we're taking a base backup from a standby
+ * server and the target timeline changes while the backup is taken.
+ * But they are small and highly useful for debugging purposes, so
+ * better include them all, always.
+ */
+ foreach(lc, historyFileList)
+ {
+ char *fname = lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+}
+
+/*
+ * Setup and activate network throttling, if client requested it
+ */
+static void
+setup_throttle(int maxrate)
+{
+ if (maxrate > 0)
+ {
+ throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ /* Enable throttling. */
+ throttling_counter = 0;
+
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..5b0aa8ae85 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -350,6 +350,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
bool needtblspcmapfile);
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
+extern void collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile);
extern void do_pg_abort_backup(void);
extern SessionBackupState get_backup_status(void);
--
2.21.0 (Apple Git-122.2)
0006-parallel-backup-testcase.patchapplication/octet-stream; name=0006-parallel-backup-testcase.patchDownload
From 0fe5f785e1c426abf660215279ae28ff5b6e156a Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 21:54:23 +0500
Subject: [PATCH 6/7] parallel backup - testcase
---
.../t/040_pg_basebackup_parallel.pl | 527 ++++++++++++++++++
1 file changed, 527 insertions(+)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 0000000000..4ec4c1e0f6
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,527 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 95;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+# # Create a temporary directory in the system location and symlink it
+# # to our physical temp location. That way we can use shorter names
+# # for the tablespace directories, which hopefully won't run afoul of
+# # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+my $file;
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
--
2.21.0 (Apple Git-122.2)
0007-parallel-backup-documentation.patchapplication/octet-stream; name=0007-parallel-backup-documentation.patchDownload
From fa4fe2ed932ddef90ff2e4cff1e42715139f8d4c Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Thu, 7 Nov 2019 16:52:40 +0500
Subject: [PATCH 7/7] parallel backup documentation
---
doc/src/sgml/protocol.sgml | 386 ++++++++++++++++++++++++++++
doc/src/sgml/ref/pg_basebackup.sgml | 20 ++
2 files changed, 406 insertions(+)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 80275215e0..22d620c346 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2700,6 +2700,392 @@ The commands accepted in replication mode are:
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>START_BACKUP</literal>
+ [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+ [ <literal>PROGRESS</literal> ]
+ [ <literal>FAST</literal> ]
+ [ <literal>TABLESPACE_MAP</literal> ]
+
+ <indexterm><primary>START_BACKUP</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instructs the server to prepare for performing on-line backup. The following
+ options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>LABEL</literal> <replaceable>'label'</replaceable></term>
+ <listitem>
+ <para>
+ Sets the label of the backup. If none is specified, a backup label
+ of <literal>start backup</literal> will be used. The quoting rules
+ for the label are the same as a standard SQL string with
+ <xref linkend="guc-standard-conforming-strings"/> turned on.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>PROGRESS</literal></term>
+ <listitem>
+ <para>
+ Request information required to generate a progress report. This will
+ send back an approximate size in the header of each tablespace, which
+ can be used to calculate how far along the stream is done. This is
+ calculated by enumerating all the file sizes once before the transfer
+ is even started, and might as such have a negative impact on the
+ performance. In particular, it might take longer before the first data
+ is streamed. Since the database files can change during the backup,
+ the size is only approximate and might both grow and shrink between
+ the time of approximation and the sending of the actual files.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>FAST</literal></term>
+ <listitem>
+ <para>
+ Request a fast checkpoint.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>TABLESPACE_MAP</literal></term>
+ <listitem>
+ <para>
+ Include information about symbolic links present in the directory
+ <filename>pg_tblspc</filename> in a file named
+ <filename>tablespace_map</filename>. The tablespace map file includes
+ each symbolic link name as it exists in the directory
+ <filename>pg_tblspc/</filename> and the full path of that symbolic link.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ In response to this command, server will send out three result sets.
+ </para>
+ <para>
+ The first ordinary result set contains the starting position of the
+ backup, in a single row with two columns. The first column contains
+ the start position given in XLogRecPtr format, and the second column
+ contains the corresponding timeline ID.
+ </para>
+
+ <para>
+ The second ordinary result set has one row for each tablespace.
+ The fields in this row are:
+ <variablelist>
+ <varlistentry>
+ <term><literal>spcoid</literal> (<type>oid</type>)</term>
+ <listitem>
+ <para>
+ The OID of the tablespace, or null if it's the base
+ directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>spclocation</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The full path of the tablespace directory, or null
+ if it's the base directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the tablespace, in kilobytes (1024 bytes),
+ if progress report has been requested; otherwise it's null.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The final result set will be sent in a single row with two columns. The
+ first column contains the data of <filename>backup_label</filename> file,
+ and the second column contains the data of <filename>tablespace_map</filename>.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>STOP_BACKUP</literal>
+ [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+ [ <literal>WAL</literal> ]
+ [ <literal>NOWAIT</literal> ]
+
+ <indexterm><primary>STOP_BACKUP</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instructs the server to finish performing on-line backup. The following
+ options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">LABEL</replaceable><replaceable>'string'</replaceable></term>
+ <listitem>
+ <para>
+ Provides the content of backup_label file to the backup. The content are
+ the same that were returned by <command>START_BACKUP</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include the necessary WAL segments in the backup. This will include
+ all the files between start and stop backup in the
+ <filename>pg_wal</filename> directory of the base directory tar
+ file.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>NOWAIT</literal></term>
+ <listitem>
+ <para>
+ By default, the backup will wait until the last required WAL
+ segment has been archived, or emit a warning if log archiving is
+ not enabled. Specifying <literal>NOWAIT</literal> disables both
+ the waiting and the warning, leaving the client responsible for
+ ensuring the required log is available.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ In response to this command, server will send one or more CopyResponse
+ results followed by a single result set, containing the WAL end position of
+ the backup. The CopyResponse contains <filename>pg_control</filename> and
+ WAL files, if stop backup is run with WAL option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SEND_BACKUP_FILELIST</literal>
+ <indexterm><primary>SEND_BACKUP_FILELIST</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instruct the server to return a list of files and directories, available in
+ data directory. In response to this command, server will send one result set
+ per tablespace. The result sets consist of following fields:
+ </para>
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>path</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The path and name of the file. In case of tablespace, it is an absolute
+ path on the database server, however, in case of <filename>base</filename>
+ tablespace, it is relative to $PGDATA.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>type</literal> (<type>char</type>)</term>
+ <listitem>
+ <para>
+ A single character, identifing the type of file.
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <literal>'f'</literal> - Regular file. Can be any relation or
+ non-relation file in $PGDATA.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>'d'</literal> - Directory.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>'l'</literal> - Symbolic link.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the file, in kilobytes (1024 bytes). It's null if
+ type is 'd' or 'l'.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>mtime</literal> (<type>Int64</type>)</term>
+ <listitem>
+ <para>
+ The file or directory last modification time, as seconds since the Epoch.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>
+ This list will contain all files and directories in the $PGDATA, regardless of
+ whether they are PostgreSQL files or other files added to the same directory.
+ The only excluded files are:
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <filename>postmaster.pid</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>postmaster.opts</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_internal.init</filename> (found in multiple directories)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Various temporary files and directories created during the operation
+ of the PostgreSQL server, such as any file or directory beginning
+ with <filename>pgsql_tmp</filename> and temporary relations.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_wal</filename>, including subdirectories. If the backup is run
+ with WAL files included, a synthesized version of <filename>pg_wal</filename> will be
+ included, but it will only contain the files necessary for the
+ backup to work, not the rest of the contents.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_dynshmem</filename>, <filename>pg_notify</filename>,
+ <filename>pg_replslot</filename>, <filename>pg_serial</filename>,
+ <filename>pg_snapshots</filename>, <filename>pg_stat_tmp</filename>, and
+ <filename>pg_subtrans</filename> are copied as empty directories (even if
+ they are symbolic links).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Files other than regular files and directories, such as symbolic
+ links (other than for the directories listed above) and special
+ device files, are skipped. (Symbolic links
+ in <filename>pg_tblspc</filename> are maintained.)
+ </para>
+ </listitem>
+ </itemizedlist>
+ Owner, group, and file mode are set if the underlying file system on the server
+ supports it.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SEND_BACKUP_FILES ( <replaceable class="parameter">'FILE'</replaceable> [, ...] )</literal>
+ [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ]
+ [ <literal>NOVERIFY_CHECKSUMS</literal> ]
+ [ <literal>START_WAL_LOCATION</literal> ]
+
+ <indexterm><primary>SEND_BACKUP_FILES</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instructs the server to send the contents of the requested FILE(s).
+ </para>
+
+ <para>
+ A clause of the form <literal>SEND_BACKUP_FILES ( 'FILE', 'FILE', ... ) [OPTIONS]</literal>
+ is accepted where one or more FILE(s) can be requested.
+ </para>
+
+ <para>
+ In response to this command, one or more CopyResponse results will be sent,
+ one for each FILE requested. The data in the CopyResponse results will be
+ a tar format (following the “ustar interchange format” specified in the
+ POSIX 1003.1-2008 standard) dump of the tablespace contents, except that
+ the two trailing blocks of zeroes specified in the standard are omitted.
+ </para>
+
+ <para>
+ The following options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>MAX_RATE</literal> <replaceable>rate</replaceable></term>
+ <listitem>
+ <para>
+ Limit (throttle) the maximum amount of data transferred from server
+ to client per unit of time. The expected unit is kilobytes per second.
+ If this option is specified, the value must either be equal to zero
+ or it must fall within the range from 32 kB through 1 GB (inclusive).
+ If zero is passed or the option is not specified, no restriction is
+ imposed on the transfer.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+ <listitem>
+ <para>
+ By default, checksums are verified during a base backup if they are
+ enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+ this verification.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>START_WAL_LOCATION</literal></term>
+ <listitem>
+ <para>
+ The starting WAL position when START BACKUP command was issued,
+ returned in the form of XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index fc9e222f8d..339e68bda7 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -536,6 +536,26 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-j <replaceable class="parameter">n</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">n</replaceable></option></term>
+ <listitem>
+ <para>
+ Create <replaceable class="parameter">n</replaceable> threads to copy
+ backup files from the database server. <application>pg_basebackup</application>
+ will open <replaceable class="parameter">n</replaceable> +1 connections
+ to the database. Therefore, the server must be configured with
+ <xref linkend="guc-max-wal-senders"/> set high enough to accommodate all
+ connections.
+ </para>
+
+ <para>
+ parallel mode only works with plain format.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
--
2.21.0 (Apple Git-122.2)
On Tue, Nov 12, 2019 at 5:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WALlocation.
Ugh, global variables.
Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.Are you expecting something like the attached patch? Basically I have
reorganised the grammar
rules so each command can have the options required by it.I was feeling a bit reluctant for this change because it may add some
unwanted grammar rules in
the replication grammar. Since these commands are using the same options
as base backup, may
be we could throw error inside the relevant functions on unwanted options?You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)Yes I will add it in the next patch.
The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.This is to calculate the basepathlen. We need to exclude the
tablespace location (or
base path) from the filename before it is sent to the client with
sendFile call. I added
this option primarily to avoid performing string manipulation on
filename to extract the
tablespace location and then calculate the basepathlen.
Alternatively we can do it by extracting the base path from the
received filename. What
do you suggest?
I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyI have updated the replication grammar with some new rules to
differentiate the options production
for base backup and newly added commands.I have also created a separate patch to include the documentation changes.
The current syntax is as below:- START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
- STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
- SEND_BACKUP_FILELIST
- SEND_BACKUP_FILES ( 'FILE' [, ...] ) [ MAX_RATE rate ] [
NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]
Sorry, I sent the wrong patches. Please see the correct version of the
patches (_v6).
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup_v6.patchapplication/octet-stream; name=0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup_v6.patchDownload
From 97d7f929ffd5832437c332976c9252b2d9fefe5b Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 10:21:38 +0500
Subject: [PATCH 1/7] remove PG_ENSURE_ERROR_CLEANUP macro from basebackup.
register base_backup_cleanup with before_shmem_exit handler. This will make
sure that call is always made when wal sender exits.
---
src/backend/replication/basebackup.c | 182 +++++++++++++--------------
1 file changed, 90 insertions(+), 92 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1fa4551eff..71a8b4fb4c 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -243,6 +243,8 @@ perform_base_backup(basebackup_options *opt)
StringInfo tblspc_map_file = NULL;
int datadirpathlen;
List *tablespaces = NIL;
+ ListCell *lc;
+ tablespaceinfo *ti;
datadirpathlen = strlen(DataDir);
@@ -261,121 +263,117 @@ perform_base_backup(basebackup_options *opt)
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
* us to abort the backup so we don't "leak" a backup counter. For this
- * reason, *all* functionality between do_pg_start_backup() and the end of
- * do_pg_stop_backup() should be inside the error cleanup block!
+ * reason, register base_backup_cleanup with before_shmem_exit handler. This
+ * will make sure that call is always made when process exits. In success,
+ * do_pg_stop_backup will have taken the system out of backup mode and this
+ * callback will have no effect, Otherwise the required cleanup will be done
+ * in any case.
*/
+ before_shmem_exit(base_backup_cleanup, (Datum) 0);
- PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
- {
- ListCell *lc;
- tablespaceinfo *ti;
-
- SendXlogRecPtrResult(startptr, starttli);
+ SendXlogRecPtrResult(startptr, starttli);
- /*
- * Calculate the relative path of temporary statistics directory in
- * order to skip the files which are located in that directory later.
- */
- if (is_absolute_path(pgstat_stat_directory) &&
- strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
- else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory);
- else
- statrelpath = pgstat_stat_directory;
-
- /* Add a node for the base directory at the end */
- ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
- tablespaces = lappend(tablespaces, ti);
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ tablespaces = lappend(tablespaces, ti);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
+ /* Setup and activate network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ {
+ throttling_sample =
+ (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
- /* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
+ /* Enable throttling. */
+ throttling_counter = 0;
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
- if (ti->path == NULL)
- {
- struct stat statbuf;
+ /* Send off our tablespaces one by one */
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+ StringInfoData buf;
- /* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data);
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
- /*
- * Send tablespace_map file if required and then the bulk of
- * the files.
- */
- if (tblspc_map_file && opt->sendtblspcmapfile)
- {
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
- }
- else
- sendDir(".", 1, false, tablespaces, true);
+ if (ti->path == NULL)
+ {
+ struct stat statbuf;
- /* ... and pg_control after everything else. */
- if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
- }
- else
- sendTablespace(ti->path, false);
+ /* In the main tar, include the backup_label first... */
+ sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data);
/*
- * If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * Send tablespace_map file if required and then the bulk of
+ * the files.
*/
- if (opt->includewal && ti->path == NULL)
+ if (tblspc_map_file && opt->sendtblspcmapfile)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
+ sendDir(".", 1, false, tablespaces, false);
}
else
- pq_putemptymessage('c'); /* CopyDone */
+ sendDir(".", 1, false, tablespaces, true);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
+ else
+ sendTablespace(ti->path, false);
- endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
+ /*
+ * If we're including WAL, and this is the main data directory we
+ * don't terminate the tar stream here. Instead, we will append
+ * the xlog files below and terminate it then. This is safe since
+ * the main data directory is always sent *last*.
+ */
+ if (opt->includewal && ti->path == NULL)
+ {
+ Assert(lnext(tablespaces, lc) == NULL);
+ }
+ else
+ pq_putemptymessage('c'); /* CopyDone */
}
- PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
if (opt->includewal)
{
--
2.21.0 (Apple Git-122.2)
0003-Refactor-some-basebackup-code-to-increase-reusabilit_v6.patchapplication/octet-stream; name=0003-Refactor-some-basebackup-code-to-increase-reusabilit_v6.patchDownload
From b7d3df86224b4b3b10014d2b4a9d0d7a873f0772 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 9 Oct 2019 12:39:41 +0500
Subject: [PATCH 3/7] Refactor some basebackup code to increase reusability, in
anticipation of adding parallel backup
---
src/backend/access/transam/xlog.c | 192 +++++-----
src/backend/replication/basebackup.c | 512 ++++++++++++++-------------
src/include/access/xlog.h | 2 +
3 files changed, 371 insertions(+), 335 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3b766e66b9..451fe6c0d1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10300,10 +10300,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
{
bool gotUniqueStartpoint = false;
- DIR *tblspcdir;
- struct dirent *de;
- tablespaceinfo *ti;
- int datadirpathlen;
/*
* Force an XLOG file switch before the checkpoint, to ensure that the
@@ -10429,93 +10425,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
if (exclusive)
tblspcmapfile = makeStringInfo();
- datadirpathlen = strlen(DataDir);
-
- /* Collect information about all tablespaces */
- tblspcdir = AllocateDir("pg_tblspc");
- while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
- {
- char fullpath[MAXPGPATH + 10];
- char linkpath[MAXPGPATH];
- char *relpath = NULL;
- int rllen;
- StringInfoData buflinkpath;
- char *s = linkpath;
-
- /* Skip special stuff */
- if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
- continue;
-
- snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
-
-#if defined(HAVE_READLINK) || defined(WIN32)
- rllen = readlink(fullpath, linkpath, sizeof(linkpath));
- if (rllen < 0)
- {
- ereport(WARNING,
- (errmsg("could not read symbolic link \"%s\": %m",
- fullpath)));
- continue;
- }
- else if (rllen >= sizeof(linkpath))
- {
- ereport(WARNING,
- (errmsg("symbolic link \"%s\" target is too long",
- fullpath)));
- continue;
- }
- linkpath[rllen] = '\0';
-
- /*
- * Add the escape character '\\' before newline in a string to
- * ensure that we can distinguish between the newline in the
- * tablespace path and end of line while reading tablespace_map
- * file during archive recovery.
- */
- initStringInfo(&buflinkpath);
-
- while (*s)
- {
- if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
- appendStringInfoChar(&buflinkpath, '\\');
- appendStringInfoChar(&buflinkpath, *s++);
- }
-
- /*
- * Relpath holds the relative path of the tablespace directory
- * when it's located within PGDATA, or NULL if it's located
- * elsewhere.
- */
- if (rllen > datadirpathlen &&
- strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
- IS_DIR_SEP(linkpath[datadirpathlen]))
- relpath = linkpath + datadirpathlen + 1;
-
- ti = palloc(sizeof(tablespaceinfo));
- ti->oid = pstrdup(de->d_name);
- ti->path = pstrdup(buflinkpath.data);
- ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
-
- if (tablespaces)
- *tablespaces = lappend(*tablespaces, ti);
-
- appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
-
- pfree(buflinkpath.data);
-#else
-
- /*
- * If the platform does not have symbolic links, it should not be
- * possible to have tablespaces - clearly somebody else created
- * them. Warn about it and ignore.
- */
- ereport(WARNING,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("tablespaces are not supported on this platform")));
-#endif
- }
- FreeDir(tblspcdir);
+ collectTablespaces(tablespaces, tblspcmapfile, infotbssize, needtblspcmapfile);
/*
* Construct backup label file
@@ -12291,3 +12201,103 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Collect information about all tablespaces.
+ */
+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile)
+{
+ DIR *tblspcdir;
+ struct dirent *de;
+ tablespaceinfo *ti;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ tblspcdir = AllocateDir("pg_tblspc");
+ while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+ {
+ char fullpath[MAXPGPATH + 10];
+ char linkpath[MAXPGPATH];
+ char *relpath = NULL;
+ int rllen;
+ StringInfoData buflinkpath;
+ char *s = linkpath;
+
+ /* Skip special stuff */
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+
+#if defined(HAVE_READLINK) || defined(WIN32)
+ rllen = readlink(fullpath, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not read symbolic link \"%s\": %m",
+ fullpath)));
+ continue;
+ }
+ else if (rllen >= sizeof(linkpath))
+ {
+ ereport(WARNING,
+ (errmsg("symbolic link \"%s\" target is too long",
+ fullpath)));
+ continue;
+ }
+ linkpath[rllen] = '\0';
+
+ /*
+ * Add the escape character '\\' before newline in a string to
+ * ensure that we can distinguish between the newline in the
+ * tablespace path and end of line while reading tablespace_map
+ * file during archive recovery.
+ */
+ initStringInfo(&buflinkpath);
+
+ while (*s)
+ {
+ if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
+ appendStringInfoChar(&buflinkpath, '\\');
+ appendStringInfoChar(&buflinkpath, *s++);
+ }
+
+ /*
+ * Relpath holds the relative path of the tablespace directory
+ * when it's located within PGDATA, or NULL if it's located
+ * elsewhere.
+ */
+ if (rllen > datadirpathlen &&
+ strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
+ IS_DIR_SEP(linkpath[datadirpathlen]))
+ relpath = linkpath + datadirpathlen + 1;
+
+ ti = palloc(sizeof(tablespaceinfo));
+ ti->oid = pstrdup(de->d_name);
+ ti->path = pstrdup(buflinkpath.data);
+ ti->rpath = relpath ? pstrdup(relpath) : NULL;
+ ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+
+ if (tablespaces)
+ *tablespaces = lappend(*tablespaces, ti);
+
+ appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
+
+ pfree(buflinkpath.data);
+#else
+
+ /*
+ * If the platform does not have symbolic links, it should not be
+ * possible to have tablespaces - clearly somebody else created
+ * them. Warn about it and ignore.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("tablespaces are not supported on this platform")));
+#endif
+ }
+ FreeDir(tblspcdir);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 267163ed29..b679f36021 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -67,10 +67,12 @@ static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
static void perform_base_backup(basebackup_options *opt);
+static void include_wal_files(XLogRecPtr endptr);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
+static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
/* Was the backup currently in-progress initiated in recovery mode? */
@@ -293,29 +295,7 @@ perform_base_backup(basebackup_options *opt)
/* Send tablespace header */
SendBackupHeader(tablespaces);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ setup_throttle(opt->maxrate);
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
@@ -381,227 +361,7 @@ perform_base_backup(basebackup_options *opt)
* We've left the last tar file "open", so we can now append the
* required WAL files to it.
*/
- char pathbuf[MAXPGPATH];
- XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
- struct stat statbuf;
- List *historyFileList = NIL;
- List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
- ListCell *lc;
- TimeLineID tli;
-
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
- /* Ok, we have everything we need. Send the WAL files. */
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- FILE *fp;
- char buf[TAR_SEND_SIZE];
- size_t cnt;
- pgoff_t len = 0;
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
-
- fp = AllocateFile(pathbuf, "rb");
- if (fp == NULL)
- {
- int save_errno = errno;
-
- /*
- * Most likely reason for this is that the file was already
- * removed by a checkpoint, so check for that to get a better
- * error message.
- */
- CheckXLogRemoved(segno, tli);
-
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", pathbuf)));
- }
-
- if (fstat(fileno(fp), &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- pathbuf)));
- if (statbuf.st_size != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
-
- while ((cnt = fread(buf, 1,
- Min(sizeof(buf), wal_segment_size - len),
- fp)) > 0)
- {
- CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
-
- len += cnt;
- throttle(cnt);
-
- if (len == wal_segment_size)
- break;
- }
-
- CHECK_FREAD_ERROR(fp, pathbuf);
-
- if (len != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* wal_segment_size is a multiple of 512, so no need for padding */
-
- FreeFile(fp);
-
- /*
- * Mark file as archived, otherwise files can get archived again
- * after promotion of a new node. This is in line with
- * walreceiver.c always doing an XLogArchiveForceDone() after a
- * complete segment.
- */
- StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /*
- * Send timeline history files too. Only the latest timeline history
- * file is required for recovery, and even that only if there happens
- * to be a timeline switch in the first WAL segment that contains the
- * checkpoint record, or if we're taking a base backup from a standby
- * server and the target timeline changes while the backup is taken.
- * But they are small and highly useful for debugging purposes, so
- * better include them all, always.
- */
- foreach(lc, historyFileList)
- {
- char *fname = lfirst(lc);
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
-
- if (lstat(pathbuf, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m", pathbuf)));
-
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
-
- /* unconditionally mark file as archived */
- StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
- }
+ include_wal_files(endptr);
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
@@ -1740,3 +1500,267 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+/*
+ * Append the required WAL files to the backup tar file. It assumes that the
+ * last tar file is "open" and the WALs will be appended to it.
+ */
+static void
+include_wal_files(XLogRecPtr endptr)
+{
+ /*
+ * We've left the last tar file "open", so we can now append the
+ * required WAL files to it.
+ */
+ char pathbuf[MAXPGPATH];
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ struct stat statbuf;
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and
+ * include all WAL files in the range between 'startptr' and 'endptr',
+ * regardless of the timeline the file is stamped with. If there are
+ * some spurious WAL files belonging to timelines that don't belong in
+ * this server's history, they will be included too. Normally there
+ * shouldn't be such files, but if there are, there's little harm in
+ * including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we
+ * need were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from
+ * oldest to newest, to reduce the chance that a file is recycled
+ * before we get a chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since
+ * we are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ /* Ok, we have everything we need. Send the WAL files. */
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ FILE *fp;
+ char buf[TAR_SEND_SIZE];
+ size_t cnt;
+ pgoff_t len = 0;
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+
+ fp = AllocateFile(pathbuf, "rb");
+ if (fp == NULL)
+ {
+ int save_errno = errno;
+
+ /*
+ * Most likely reason for this is that the file was already
+ * removed by a checkpoint, so check for that to get a better
+ * error message.
+ */
+ CheckXLogRemoved(segno, tli);
+
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", pathbuf)));
+ }
+
+ if (fstat(fileno(fp), &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ pathbuf)));
+ if (statbuf.st_size != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* send the WAL file itself */
+ _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+
+ while ((cnt = fread(buf, 1,
+ Min(sizeof(buf), wal_segment_size - len),
+ fp)) > 0)
+ {
+ CheckXLogRemoved(segno, tli);
+ /* Send the chunk as a CopyData message */
+ if (pq_putmessage('d', buf, cnt))
+ ereport(ERROR,
+ (errmsg("base backup could not send data, aborting backup")));
+
+ len += cnt;
+ throttle(cnt);
+
+ if (len == wal_segment_size)
+ break;
+ }
+
+ CHECK_FREAD_ERROR(fp, pathbuf);
+
+ if (len != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* wal_segment_size is a multiple of 512, so no need for padding */
+
+ FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again
+ * after promotion of a new node. This is in line with
+ * walreceiver.c always doing an XLogArchiveForceDone() after a
+ * complete segment.
+ */
+ StatusFilePath(pathbuf, walFileName, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /*
+ * Send timeline history files too. Only the latest timeline history
+ * file is required for recovery, and even that only if there happens
+ * to be a timeline switch in the first WAL segment that contains the
+ * checkpoint record, or if we're taking a base backup from a standby
+ * server and the target timeline changes while the backup is taken.
+ * But they are small and highly useful for debugging purposes, so
+ * better include them all, always.
+ */
+ foreach(lc, historyFileList)
+ {
+ char *fname = lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+}
+
+/*
+ * Setup and activate network throttling, if client requested it
+ */
+static void
+setup_throttle(int maxrate)
+{
+ if (maxrate > 0)
+ {
+ throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ /* Enable throttling. */
+ throttling_counter = 0;
+
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..5b0aa8ae85 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -350,6 +350,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
bool needtblspcmapfile);
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
+extern void collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile);
extern void do_pg_abort_backup(void);
extern SessionBackupState get_backup_status(void);
--
2.21.0 (Apple Git-122.2)
0005-pg_basebackup-changes-for-parallel-backup_v6.patchapplication/octet-stream; name=0005-pg_basebackup-changes-for-parallel-backup_v6.patchDownload
From d4663820abbec2944e5c65500e574691a251e170 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 14 Oct 2019 17:28:58 +0500
Subject: [PATCH 5/7] pg_basebackup changes for parallel backup.
---
src/bin/pg_basebackup/pg_basebackup.c | 736 ++++++++++++++++++++++++--
1 file changed, 689 insertions(+), 47 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a9d162a7da..f63c106130 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -19,6 +19,7 @@
#include <sys/wait.h>
#include <signal.h>
#include <time.h>
+#include <pthread.h>
#ifdef HAVE_SYS_SELECT_H
#include <sys/select.h>
#endif
@@ -41,6 +42,7 @@
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
+#include "fe_utils/simple_list.h"
#define ERRCODE_DATA_CORRUPTED "XX001"
@@ -57,6 +59,57 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+
+ int tsIndex; /* index of tsInfo this file belongs to. */
+} BackupFile;
+
+typedef struct
+{
+ Oid tblspcOid;
+ char *tablespace; /* tablespace name or NULL if 'base' tablespace */
+ int numFiles; /* number of files */
+ BackupFile *backupFiles; /* list of files in a tablespace */
+} TablespaceInfo;
+
+typedef struct
+{
+ int tablespacecount;
+ int totalfiles;
+ int numWorkers;
+
+ char xlogstart[64];
+ char *backup_label;
+ char *tablespace_map;
+
+ TablespaceInfo *tsInfo;
+ BackupFile **files; /* list of BackupFile pointers */
+ int fileIndex; /* index of file to be fetched */
+
+ PGconn **workerConns;
+} BackupInfo;
+
+typedef struct
+{
+ BackupInfo *backupInfo;
+ uint64 bytesRead;
+
+ int workerid;
+ pthread_t worker;
+
+ bool terminated;
+} WorkerState;
+
+BackupInfo *backupInfo = NULL;
+WorkerState *workers = NULL;
+
+static pthread_mutex_t fetch_mutex = PTHREAD_MUTEX_INITIALIZER;
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -110,6 +163,9 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -140,9 +196,10 @@ static PQExpBuffer recoveryconfcontents = NULL;
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
+static void workers_progress_report(uint64 totalBytesRead, const char *filename, bool force);
static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
+static int ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void BaseBackup(void);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
@@ -151,6 +208,17 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void ParallelBackupRun(BackupInfo *backupInfo);
+static void StopBackup(BackupInfo *backupInfo);
+static void GetBackupFileList(PGconn *conn, BackupInfo *backupInfo);
+static int GetBackupFile(WorkerState *wstate);
+static BackupFile *getNextFile(BackupInfo *backupInfo);
+static int compareFileSize(const void *a, const void *b);
+static void read_label_tblspcmap(PGconn *conn, char **backup_label, char **tablespace_map);
+static void create_backup_dirs(bool basetablespace, char *tablespace, char *name);
+static void writefile(char *path, char *buf);
+static void *workerRun(void *arg);
+
static void
cleanup_directories_atexit(void)
@@ -202,6 +270,17 @@ cleanup_directories_atexit(void)
static void
disconnect_atexit(void)
{
+ /* close worker connections */
+ if (backupInfo && backupInfo->workerConns != NULL)
+ {
+ int i;
+ for (i = 0; i < numWorkers; i++)
+ {
+ if (backupInfo->workerConns[i] != NULL)
+ PQfinish(backupInfo->workerConns[i]);
+ }
+ }
+
if (conn != NULL)
PQfinish(conn);
}
@@ -349,6 +428,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -695,6 +775,93 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Print a progress report of worker threads. If verbose output
+ * is enabled, also print the current file name.
+ *
+ * Progress report is written at maximum once per second, unless the
+ * force parameter is set to true.
+ */
+static void
+workers_progress_report(uint64 totalBytesRead, const char *filename, bool force)
+{
+ int percent;
+ char totalBytesRead_str[32];
+ char totalsize_str[32];
+ pg_time_t now;
+
+ if (!showprogress)
+ return;
+
+ now = time(NULL);
+ if (now == last_progress_report && !force)
+ return; /* Max once per second */
+
+ last_progress_report = now;
+ percent = totalsize_kb ? (int) ((totalBytesRead / 1024) * 100 / totalsize_kb) : 0;
+
+ /*
+ * Avoid overflowing past 100% or the full size. This may make the total
+ * size number change as we approach the end of the backup (the estimate
+ * will always be wrong if WAL is included), but that's better than having
+ * the done column be bigger than the total.
+ */
+ if (percent > 100)
+ percent = 100;
+ if (totalBytesRead / 1024 > totalsize_kb)
+ totalsize_kb = totalBytesRead / 1024;
+
+ /*
+ * Separate step to keep platform-dependent format code out of
+ * translatable strings. And we only test for INT64_FORMAT availability
+ * in snprintf, not fprintf.
+ */
+ snprintf(totalBytesRead_str, sizeof(totalBytesRead_str), INT64_FORMAT,
+ totalBytesRead / 1024);
+ snprintf(totalsize_str, sizeof(totalsize_str), INT64_FORMAT, totalsize_kb);
+
+#define VERBOSE_FILENAME_LENGTH 35
+
+ if (verbose)
+ {
+ if (!filename)
+
+ /*
+ * No filename given, so clear the status line (used for last
+ * call)
+ */
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied %*s"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent,
+ VERBOSE_FILENAME_LENGTH + 5, "");
+ else
+ {
+ bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied, current file (%s%-*.*s)"),
+ (int) strlen(totalsize_str), totalBytesRead_str, totalsize_str,
+ percent,
+ /* Prefix with "..." if we do leading truncation */
+ truncate ? "..." : "",
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ /* Truncate filename at beginning if it's too long */
+ truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ }
+ }
+ else
+ {
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent);
+ }
+
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\r");
+ else
+ fprintf(stderr, "\n");
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -711,7 +878,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
char totalsize_str[32];
pg_time_t now;
- if (!showprogress)
+ if (!showprogress || numWorkers > 1)
return;
now = time(NULL);
@@ -1381,7 +1548,7 @@ get_tablespace_mapping(const char *dir)
* specified directory. If it's for another tablespace, it will be restored
* in the original or mapped directory.
*/
-static void
+static int
ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
{
char current_path[MAXPGPATH];
@@ -1392,6 +1559,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
bool basetablespace;
char *copybuf = NULL;
FILE *file = NULL;
+ int readBytes = 0;
basetablespace = PQgetisnull(res, rownum, 0);
if (basetablespace)
@@ -1455,7 +1623,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("invalid tar block header size: %d", r);
exit(1);
}
- totaldone += 512;
+ readBytes += 512;
current_len_left = read_tar_number(©buf[124], 12);
@@ -1486,21 +1654,14 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* Directory
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
+
+ /*
+ * In parallel mode, we create directories before fetching
+ * files so its Ok if a directory already exist.
+ */
if (mkdir(filename, pg_dir_create_mode) != 0)
{
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal
- * receiver process. Also, when the WAL directory
- * location was specified, pg_wal (or pg_xlog) has
- * already been created as a symbolic link before
- * starting the actual backup. So just ignore creation
- * failures on related directories.
- */
- if (!((pg_str_endswith(filename, "/pg_wal") ||
- pg_str_endswith(filename, "/pg_xlog") ||
- pg_str_endswith(filename, "/archive_status")) &&
- errno == EEXIST))
+ if (errno != EEXIST)
{
pg_log_error("could not create directory \"%s\": %m",
filename);
@@ -1585,7 +1746,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
fclose(file);
file = NULL;
- totaldone += r;
+ readBytes += r;
continue;
}
@@ -1594,7 +1755,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("could not write to file \"%s\": %m", filename);
exit(1);
}
- totaldone += r;
+ readBytes += r;
+ totaldone = readBytes;
progress_report(rownum, filename, false);
current_len_left -= r;
@@ -1622,13 +1784,11 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
if (copybuf != NULL)
PQfreemem(copybuf);
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
/*
* No data is synced here, everything is done for all tablespaces at the
* end.
*/
+ return readBytes;
}
@@ -1715,16 +1875,28 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
- escaped_label,
- showprogress ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (numWorkers <= 1)
+ {
+ basebkp =
+ psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ escaped_label,
+ showprogress ? "PROGRESS" : "",
+ includewal == FETCH_WAL ? "WAL" : "",
+ fastcheckpoint ? "FAST" : "",
+ includewal == NO_WAL ? "" : "NOWAIT",
+ maxrate_clause ? maxrate_clause : "",
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ }
+ else
+ {
+ basebkp =
+ psprintf("START_BACKUP LABEL '%s' %s %s %s",
+ escaped_label,
+ showprogress ? "PROGRESS" : "",
+ fastcheckpoint ? "FAST" : "",
+ format == 't' ? "TABLESPACE_MAP" : "");
+ }
if (PQsendQuery(conn, basebkp) == 0)
{
@@ -1774,7 +1946,7 @@ BaseBackup(void)
/*
* Get the header
*/
- res = PQgetResult(conn);
+ tablespacehdr = res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not get backup header: %s",
@@ -1830,24 +2002,74 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
+ if (numWorkers > 1)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ int j = 0,
+ k = 0;
- if (showprogress)
+ backupInfo = palloc0(sizeof(BackupInfo));
+ backupInfo->workerConns = (PGconn **) palloc0(sizeof(PGconn *) * numWorkers);
+ backupInfo->tablespacecount = tablespacecount;
+ backupInfo->numWorkers = numWorkers;
+ strlcpy(backupInfo->xlogstart, xlogstart, sizeof(backupInfo->xlogstart));
+
+ read_label_tblspcmap(conn, &backupInfo->backup_label, &backupInfo->tablespace_map);
+
+ /* retrieve backup file list from the server. **/
+ GetBackupFileList(conn, backupInfo);
+
+ /*
+ * add backup_label in backup, (for tar format, ReceiveTarFile() will
+ * take care of it).
+ */
+ if (format == 'p')
+ writefile("backup_label", backupInfo->backup_label);
+
+ /*
+ * Flatten the file list to avoid unnecessary locks and enable the sequential
+ * access to file list. (Creating an array of BackupFile structre pointers).
+ */
+ backupInfo->files =
+ (BackupFile **) palloc0(sizeof(BackupFile *) * backupInfo->totalfiles);
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[i];
+
+ for (j = 0; j < curTsInfo->numFiles; j++)
+ {
+ backupInfo->files[k] = &curTsInfo->backupFiles[j];
+ k++;
+ }
+ }
+
+ ParallelBackupRun(backupInfo);
+ StopBackup(backupInfo);
+ }
+ else
{
- progress_report(PQntuples(res), NULL, true);
- if (isatty(fileno(stderr)))
- fprintf(stderr, "\n"); /* Need to move to next line */
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
+
+ if (showprogress)
+ {
+ progress_report(PQntuples(tablespacehdr), NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
}
+ /* Write recovery contents */
+ if (format == 'p' && writerecoveryconf)
+ WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
+
PQclear(res);
/*
@@ -2043,6 +2265,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2070,7 +2293,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2211,6 +2434,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2325,6 +2551,22 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (format != 'p' && numWorkers > 1)
+ {
+ pg_log_error("parallel jobs are only supported with 'plain' format");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2397,3 +2639,403 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+/*
+ * Thread worker
+ */
+static void *
+workerRun(void *arg)
+{
+ WorkerState *wstate = (WorkerState *) arg;
+
+ GetBackupFile(wstate);
+
+ wstate->terminated = true;
+ return NULL;
+}
+
+/*
+ * Runs the worker threads and updates progress until all workers have
+ * terminated/completed.
+ */
+static void
+ParallelBackupRun(BackupInfo *backupInfo)
+{
+ int status,
+ i;
+ bool threadsActive = true;
+ uint64 totalBytes = 0;
+
+ workers = (WorkerState *) palloc0(sizeof(WorkerState) * numWorkers);
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ worker->backupInfo = backupInfo;
+ worker->workerid = i;
+ worker->bytesRead = 0;
+ worker->terminated = false;
+
+ backupInfo->workerConns[i] = GetConnection();
+ status = pthread_create(&worker->worker, NULL, workerRun, worker);
+ if (status != 0)
+ {
+ pg_log_error("failed to create thread: %m");
+ exit(1);
+ }
+
+ if (verbose)
+ pg_log_info("backup worker (%d) created, %d", i, status);
+ }
+
+ /*
+ * This is the main thread for updating progrsss. It waits for workers to
+ * complete and gets updated status during every loop iteration.
+ */
+ while(threadsActive)
+ {
+ char *filename = NULL;
+
+ threadsActive = false;
+ totalBytes = 0;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ totalBytes += worker->bytesRead;
+ threadsActive |= !worker->terminated;
+ }
+
+ if (backupInfo->fileIndex < backupInfo->totalfiles)
+ filename = backupInfo->files[backupInfo->fileIndex]->path;
+
+ workers_progress_report(totalBytes, filename, false);
+ pg_usleep(100000);
+ }
+
+ if (showprogress)
+ {
+ workers_progress_report(totalBytes, NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
+}
+
+/*
+ * Take the system out of backup mode.
+ */
+static void
+StopBackup(BackupInfo *backupInfo)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP LABEL '%s' %s %s",
+ backupInfo->backup_label,
+ includewal == FETCH_WAL ? "WAL" : "",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* receive pg_control and wal files */
+ ReceiveAndUnpackTarFile(conn, res, tablespacecount);
+ PQclear(res);
+}
+
+/*
+ * Retrive backup file list from the server and populate TablespaceInfo struct
+ * to keep track of tablespaces and its files.
+ */
+static void
+GetBackupFileList(PGconn *conn, BackupInfo *backupInfo)
+{
+ TablespaceInfo *tsInfo;
+ PGresult *res = NULL;
+ char *basebkp;
+ int i;
+
+ backupInfo->tsInfo = palloc0(sizeof(TablespaceInfo) * backupInfo->tablespacecount);
+ tsInfo = backupInfo->tsInfo;
+
+ /*
+ * Get list of files.
+ */
+ basebkp = psprintf("SEND_BACKUP_FILELIST");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "SEND_BACKUP_FILELIST", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /*
+ * The list of files is grouped by tablespaces, and we want to fetch them
+ * in the same order.
+ */
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ bool basetablespace;
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get backup header: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ tsInfo[i].tblspcOid = atol(PQgetvalue(tablespacehdr, i, 0));
+ tsInfo[i].tablespace = PQgetvalue(tablespacehdr, i, 1);
+ tsInfo[i].numFiles = PQntuples(res);
+ tsInfo[i].backupFiles = palloc0(sizeof(BackupFile) * tsInfo[i].numFiles);
+
+ /* keep count of all files in backup */
+ backupInfo->totalfiles += tsInfo[i].numFiles;
+
+ for (int j = 0; j < tsInfo[i].numFiles; j++)
+ {
+ char *path = PQgetvalue(res, j, 0);
+ char type = PQgetvalue(res, j, 1)[0];
+ int32 size = atol(PQgetvalue(res, j, 2));
+ time_t mtime = atol(PQgetvalue(res, j, 3));
+
+ /*
+ * In 'plain' format, create backup directories first.
+ */
+ if (format == 'p' && type == 'd')
+ create_backup_dirs(basetablespace, tsInfo[i].tablespace, path);
+
+ strlcpy(tsInfo[i].backupFiles[j].path, path, MAXPGPATH);
+ tsInfo[i].backupFiles[j].type = type;
+ tsInfo[i].backupFiles[j].size = size;
+ tsInfo[i].backupFiles[j].mtime = mtime;
+ tsInfo[i].backupFiles[j].tsIndex = i;
+ }
+
+ /* sort files in descending order, based on size */
+ qsort(tsInfo[i].backupFiles, tsInfo[i].numFiles,
+ sizeof(BackupFile), &compareFileSize);
+ PQclear(res);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s", PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+}
+
+/*
+ * Retrive and write backup file from the server. The file list is provided by
+ * worker state. It pulls a single file from this list and writes it to the
+ * backup directory.
+ */
+static int
+GetBackupFile(WorkerState *wstate)
+{
+ PGresult *res = NULL;
+ PGconn *worker_conn = NULL;
+ BackupFile *fetchFile = NULL;
+ BackupInfo *backupInfo = NULL;
+
+ backupInfo = wstate->backupInfo;
+ worker_conn = backupInfo->workerConns[wstate->workerid];
+ while ((fetchFile = getNextFile(backupInfo)) != NULL)
+ {
+ PQExpBuffer buf = createPQExpBuffer();
+
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
+ appendPQExpBuffer(buf, "SEND_BACKUP_FILES ( '%s' )", fetchFile->path);
+
+ /* add options */
+ appendPQExpBuffer(buf, " START_WAL_LOCATION '%s' %s %s",
+ backupInfo->xlogstart,
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (maxrate > 0)
+ appendPQExpBuffer(buf, " MAX_RATE %u", maxrate);
+
+ if (!worker_conn)
+ return 1;
+
+ if (PQsendQuery(worker_conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(worker_conn));
+ return 1;
+ }
+
+ destroyPQExpBuffer(buf);
+
+ /* process file contents, also count bytesRead for progress */
+ wstate->bytesRead +=
+ ReceiveAndUnpackTarFile(worker_conn, tablespacehdr, fetchFile->tsIndex);
+
+ res = PQgetResult(worker_conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data stream: %s",
+ PQerrorMessage(worker_conn));
+ exit(1);
+ }
+
+ res = PQgetResult(worker_conn);
+ }
+
+ PQclear(res);
+ return 0;
+}
+
+/*
+ * Increment fileIndex and store it in a local variable so that even a
+ * context switch does not affect the file index value and we don't accidentally
+ * increment the value twice and therefore skip some files.
+ */
+static BackupFile*
+getNextFile(BackupInfo *backupInfo)
+{
+ int fileIndex = 0;
+
+ pthread_mutex_lock(&fetch_mutex);
+ fileIndex = backupInfo->fileIndex++;
+ pthread_mutex_unlock(&fetch_mutex);
+
+ if (fileIndex >= backupInfo->totalfiles)
+ return NULL;
+
+ return backupInfo->files[fileIndex];
+}
+
+/* qsort comparator for BackupFile (sort descending order) */
+static int
+compareFileSize(const void *a, const void *b)
+{
+ const BackupFile *v1 = (BackupFile *) a;
+ const BackupFile *v2 = (BackupFile *) b;
+
+ if (v1->size > v2->size)
+ return -1;
+ if (v1->size < v2->size)
+ return 1;
+
+ return 0;
+}
+
+static void
+read_label_tblspcmap(PGconn *conn, char **backuplabel, char **tblspc_map)
+{
+ PGresult *res = NULL;
+
+ Assert(backuplabel != NULL);
+ Assert(tblspc_map != NULL);
+
+ /*
+ * Get Backup label and tablespace map data.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ *backuplabel = PQgetvalue(res, 0, 0); /* backup_label */
+ if (!PQgetisnull(res, 0, 1))
+ *tblspc_map = PQgetvalue(res, 0, 1); /* tablespace_map */
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ PQclear(res);
+}
+
+/*
+ * Create backup direcotries while taking care of tablespace path. If tablespace
+ * mapping (with -T) is given then the directory will be created on the mapped
+ * path.
+ */
+static void
+create_backup_dirs(bool basetablespace, char *tablespace, char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ Assert(name != NULL);
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, name);
+ else
+ {
+ Assert(tablespace != NULL);
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tablespace), (name + strlen(tablespace) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+}
+
+/*
+ * General function for writing to a file; creates one if it doesn't exist
+ */
+static void
+writefile(char *path, char *buf)
+{
+ FILE *f;
+ char pathbuf[MAXPGPATH];
+
+ snprintf(pathbuf, MAXPGPATH, "%s/%s", basedir, path);
+ f = fopen(pathbuf, "w");
+ if (f == NULL)
+ {
+ pg_log_error("could not open file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fwrite(buf, strlen(buf), 1, f) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fclose(f))
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+}
--
2.21.0 (Apple Git-122.2)
0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v6.patchapplication/octet-stream; name=0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v6.patchDownload
From 3e02fe0d50c1dfdcb708e105af475ed53877efd0 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 16:45:28 +0500
Subject: [PATCH 2/7] Rename sizeonly to dryrun for few functions in
basebackup.
---
src/backend/replication/basebackup.c | 44 ++++++++++++++--------------
src/include/replication/basebackup.h | 2 +-
2 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 71a8b4fb4c..267163ed29 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -54,15 +54,15 @@ typedef struct
} basebackup_options;
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(const char *path, int basepathlen, bool dryrun,
List *tablespaces, bool sendtblspclinks);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+ struct stat *statbuf, bool dryrun);
static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+ bool dryrun);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
@@ -959,13 +959,13 @@ sendFileWithContent(const char *filename, const char *content)
/*
* Include the tablespace directory pointed to by 'path' in the output tar
- * stream. If 'sizeonly' is true, we just calculate a total length and return
+ * stream. If 'dryrun' is true, we just calculate a total length and return
* it, without actually sending anything.
*
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool dryrun)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -995,17 +995,17 @@ sendTablespace(char *path, bool sizeonly)
}
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
- sizeonly);
+ dryrun);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
return size;
}
/*
* Include all files from the given directory in the output tar stream. If
- * 'sizeonly' is true, we just calculate a total length and return it, without
+ * 'dryrun' is true, we just calculate a total length and return it, without
* actually sending anything.
*
* Omit any directory in the tablespaces list, to avoid backing up
@@ -1016,7 +1016,7 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
bool sendtblspclinks)
{
DIR *dir;
@@ -1171,7 +1171,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
}
@@ -1187,7 +1187,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -1199,14 +1199,14 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ dryrun);
continue; /* don't recurse into pg_wal */
}
@@ -1238,7 +1238,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
linkpath[rllen] = '\0';
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
- &statbuf, sizeonly);
+ &statbuf, dryrun);
#else
/*
@@ -1262,7 +1262,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* permissions right.
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ dryrun);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1293,17 +1293,17 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ if (!dryrun)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
- if (sent || sizeonly)
+ if (sent || dryrun)
{
/* Add size, rounded up to 512byte block */
size += ((statbuf.st_size + 511) & ~511);
@@ -1612,12 +1612,12 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
static int64
_tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly)
+ struct stat *statbuf, bool dryrun)
{
char h[512];
enum tarError rc;
- if (!sizeonly)
+ if (!dryrun)
{
rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
@@ -1654,7 +1654,7 @@ _tarWriteHeader(const char *filename, const char *linktarget,
*/
static int64
_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+ bool dryrun)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1664,7 +1664,7 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
+ return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, dryrun);
}
/*
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..b55917b9b6 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool dryrun);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122.2)
0004-backend-changes-for-parallel-backup_v6.patchapplication/octet-stream; name=0004-backend-changes-for-parallel-backup_v6.patchDownload
From e49af53b950e2cfa5a38ad4a5db21651f1374c59 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 22:59:28 +0500
Subject: [PATCH 4/7] backend changes for parallel backup
---
src/backend/access/transam/xlog.c | 2 +-
src/backend/replication/basebackup.c | 541 ++++++++++++++++++++++++-
src/backend/replication/repl_gram.y | 206 ++++++++--
src/backend/replication/repl_scanner.l | 6 +
src/include/nodes/replnodes.h | 10 +
src/include/replication/basebackup.h | 2 +-
6 files changed, 717 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 451fe6c0d1..445aad291e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12279,7 +12279,7 @@ collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b679f36021..e55b156092 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -41,6 +41,7 @@
#include "utils/ps_status.h"
#include "utils/relcache.h"
#include "utils/timestamp.h"
+#include "utils/pg_lsn.h"
typedef struct
{
@@ -51,11 +52,20 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ XLogRecPtr wal_location;
} basebackup_options;
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
+
static int64 sendDir(const char *path, int basepathlen, bool dryrun,
- List *tablespaces, bool sendtblspclinks);
+ List *tablespaces, bool sendtblspclinks, List **filelist);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -75,6 +85,13 @@ static void throttle(size_t increment);
static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void StartBackup(basebackup_options *opt);
+static void StopBackup(basebackup_options *opt);
+static void SendBackupFileList(void);
+static void SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok);
+static void addToBackupFileList(List **filelist, char *path, char type, int32 size,
+ time_t mtime);
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -289,7 +306,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
tablespaces = lappend(tablespaces, ti);
/* Send tablespace header */
@@ -323,10 +340,10 @@ perform_base_backup(basebackup_options *opt)
if (tblspc_map_file && opt->sendtblspcmapfile)
{
sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
+ sendDir(".", 1, false, tablespaces, false, NULL);
}
else
- sendDir(".", 1, false, tablespaces, true);
+ sendDir(".", 1, false, tablespaces, true, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -337,7 +354,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -409,6 +426,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_wal_location = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -497,12 +515,24 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "start_wal_location") == 0)
+ {
+ bool have_error = false;
+ char *wal_location;
+
+ if (o_wal_location)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ wal_location = strVal(defel->arg);
+ opt->wal_location = pg_lsn_in_internal(wal_location, &have_error);
+ o_wal_location = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
}
- if (opt->label == NULL)
- opt->label = "base backup";
}
@@ -520,6 +550,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
parse_basebackup_options(cmd->options, &opt);
+ /* default value for label, if not specified. */
+ if (opt.label == NULL)
+ {
+ if (cmd->cmdtag == BASE_BACKUP)
+ opt.label = "base backup";
+ else
+ opt.label = "start backup";
+ }
+
WalSndSetState(WALSNDSTATE_BACKUP);
if (update_process_title)
@@ -531,7 +570,29 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ StartBackup(&opt);
+ break;
+ case SEND_BACKUP_FILELIST:
+ SendBackupFileList();
+ break;
+ case SEND_BACKUP_FILES:
+ SendBackupFiles(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ StopBackup(&opt);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -674,6 +735,61 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_puttextmessage('C', "SELECT");
}
+/*
+ * Send a single resultset containing backup label and tablespace map
+ */
+static void
+SendStartBackupResult(StringInfo labelfile, StringInfo tblspc_map_file)
+{
+ StringInfoData buf;
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "label");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tablespacemap");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = labelfile->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, labelfile->data, len);
+
+ if (tblspc_map_file)
+ {
+ len = tblspc_map_file->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, tblspc_map_file->data, len);
+ }
+ else
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ }
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
/*
* Inject a file with given name and content in the output tar stream.
*/
@@ -725,7 +841,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool dryrun)
+sendTablespace(char *path, bool dryrun, List **filelist)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -754,11 +870,11 @@ sendTablespace(char *path, bool dryrun)
return 0;
}
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
dryrun);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true, filelist);
return size;
}
@@ -777,7 +893,7 @@ sendTablespace(char *path, bool dryrun)
*/
static int64
sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
- bool sendtblspclinks)
+ bool sendtblspclinks, List **filelist)
{
DIR *dir;
struct dirent *de;
@@ -931,6 +1047,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
@@ -947,6 +1065,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -968,6 +1088,10 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
dryrun);
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
+ addToBackupFileList(filelist, "./pg_wal/archive_status", 'd', -1,
+ statbuf.st_mtime);
+
continue; /* don't recurse into pg_wal */
}
@@ -997,6 +1121,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ addToBackupFileList(filelist, pathbuf, 'l', statbuf.st_size, statbuf.st_mtime);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, dryrun);
#else
@@ -1023,6 +1148,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
dryrun);
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1053,13 +1179,15 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks, filelist);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!dryrun)
+ addToBackupFileList(filelist, pathbuf, 'f', statbuf.st_size, statbuf.st_mtime);
+
+ if (!dryrun && filelist == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
@@ -1764,3 +1892,388 @@ setup_throttle(int maxrate)
throttling_counter = -1;
}
}
+
+/*
+ * StartBackup - prepare to start an online backup.
+ *
+ * This function calls do_pg_start_backup() and sends back starting checkpoint,
+ * available tablespaces, content of backup_label and tablespace_map files.
+ */
+static void
+StartBackup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo labelfile;
+ StringInfo tblspc_map_file = NULL;
+ int datadirpathlen;
+ List *tablespaces = NIL;
+ tablespaceinfo *ti;
+
+ datadirpathlen = strlen(DataDir);
+
+ backup_started_in_recovery = RecoveryInProgress();
+
+ labelfile = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+
+ total_checksum_failures = 0;
+
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ labelfile, &tablespaces,
+ tblspc_map_file,
+ opt->progress, opt->sendtblspcmapfile);
+
+ /*
+ * Once do_pg_start_backup has been called, ensure that any failure causes
+ * us to abort the backup so we don't "leak" a backup counter. For this
+ * reason, register base_backup_cleanup with before_shmem_exit handler. This
+ * will make sure that call is always made when process exits. In success,
+ * do_pg_stop_backup will have taken the system out of backup mode and this
+ * callback will have no effect, Otherwise the required cleanup will be done
+ * in any case.
+ */
+ before_shmem_exit(base_backup_cleanup, (Datum) 0);
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in
+ * order to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ if ((tblspc_map_file && tblspc_map_file->len <= 0) ||
+ !opt->sendtblspcmapfile)
+ tblspc_map_file = NULL;
+
+ /* send backup_label and tablespace_map to frontend */
+ SendStartBackupResult(labelfile, tblspc_map_file);
+}
+
+/*
+ * StopBackup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionaly WAL segments and ending WAL location.
+ */
+static void
+StopBackup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+ struct stat statbuf;
+ StringInfoData buf;
+ char *labelfile = NULL;
+
+ if (get_backup_status() != SESSION_BACKUP_NON_EXCLUSIVE)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("non-exclusive backup is not in progress")));
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
+
+ /* stop backup */
+ labelfile = (char *) opt->label;
+ endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
+
+ if (opt->includewal)
+ include_wal_files(endptr);
+
+ pq_putemptymessage('c'); /* CopyDone */
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * SendBackupFileList() - sends a list of filenames to frontend
+ *
+ * The function collects a list of filenames, necessary for a complete backup and
+ * sends this list to the client.
+ */
+static void
+SendBackupFileList(void)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ List *tablespaces = NIL;
+ StringInfo tblspc_map_file = NULL;
+ tablespaceinfo *ti;
+
+ tblspc_map_file = makeStringInfo();
+ collectTablespaces(&tablespaces, tblspc_map_file, false, false);
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ tablespaces = lappend(tablespaces, ti);
+
+ foreach(lc, tablespaces)
+ {
+ List *filelist = NULL;
+ tablespaceinfo *ti;
+
+ ti = (tablespaceinfo *) lfirst(lc);
+ if (ti->path == NULL)
+ sendDir(".", 1, true, NIL, true, &filelist);
+ else
+ sendTablespace(ti->path, true, &filelist);
+
+ /* Construct and send the list of filenames */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 4); /* n field */
+
+ /* First field - file name */
+ pq_sendstring(&buf, "path");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "type");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, CHAROID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - mtime */
+ pq_sendstring(&buf, "mtime");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, filelist)
+ {
+ BackupFile *backupFile = (BackupFile *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 4); /* number of columns */
+
+ /* send path */
+ len = strlen(backupFile->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, backupFile->path, len);
+
+ /* send type */
+ pq_sendint32(&buf, 1);
+ pq_sendbyte(&buf, backupFile->type);
+
+ /* send size */
+ send_int8_string(&buf, backupFile->size);
+
+ /* send mtime */
+ send_int8_string(&buf, backupFile->mtime);
+
+ pq_endmessage(&buf);
+ }
+
+ if (filelist)
+ pfree(filelist);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * SendBackupFiles() - sends the actual files to the caller
+ *
+ * The function sends out the given file(s) over to the caller using the COPY
+ * protocol.
+ */
+static void
+SendBackupFiles(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ int basepathlen = 1;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /* set backup start location. */
+ startptr = opt->wal_location;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (is_absolute_path(pathbuf))
+ {
+ char *basepath;
+
+ /*
+ * 'pathbuf' points to the tablespace location, but we only want to
+ * include the version directory in it that belongs to us.
+ */
+ basepath = strstr(pathbuf, TABLESPACE_VERSION_DIRECTORY);
+ if (basepath)
+ basepathlen = basepath - pathbuf - 1;
+ }
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /* Allow symbolic links in pg_tblspc only */
+ if (strstr(pathbuf, "./pg_tblspc") != NULL &&
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ char linkpath[MAXPGPATH];
+ int rllen;
+
+ rllen = readlink(pathbuf, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read symbolic link \"%s\": %m",
+ pathbuf)));
+ if (rllen >= sizeof(linkpath))
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("symbolic link \"%s\" target is too long",
+ pathbuf)));
+ linkpath[rllen] = '\0';
+
+ _tarWriteHeader(pathbuf, linkpath, &statbuf, false);
+ }
+ else if (S_ISDIR(statbuf.st_mode))
+ {
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
+ else
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, InvalidOid);
+ }
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
+
+/*
+ * Construct a BackupFile entry and add to the list.
+ */
+static void
+addToBackupFileList(List **filelist, char *path, char type, int32 size,
+ time_t mtime)
+{
+ BackupFile *backupFile;
+
+ if (filelist)
+ {
+ backupFile = (BackupFile *) palloc0(sizeof(BackupFile));
+ strlcpy(backupFile->path, path, sizeof(backupFile->path));
+ backupFile->type = type;
+ backupFile->size = size;
+ backupFile->mtime = mtime;
+
+ *filelist = lappend(*filelist, backupFile);
+ }
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..225c35efdb 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,13 +87,24 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_SEND_BACKUP_FILELIST
+%token K_SEND_BACKUP_FILES
+%token K_STOP_BACKUP
+%token K_START_WAL_LOCATION
%type <node> command
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
%type <list> base_backup_opt_list
+ start_backup_opt_list stop_backup_opt_list
+ send_backup_files_opt_list
%type <defelt> base_backup_opt
+ backup_opt_label backup_opt_progress backup_opt_maxrate
+ backup_opt_fast backup_opt_tsmap backup_opt_wal backup_opt_nowait
+ backup_opt_chksum backup_opt_wal_loc
+ start_backup_opt stop_backup_opt send_backup_files_opt
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
@@ -102,6 +113,8 @@ static SQLCmd *make_sqlcmd(void);
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
+%type <list> backup_files backup_files_list
+%type <node> backup_file
%%
@@ -162,10 +175,61 @@ base_backup:
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_START_BACKUP start_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_BACKUP_FILELIST
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = NIL;
+ cmd->cmdtag = SEND_BACKUP_FILELIST;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_BACKUP_FILES backup_files send_backup_files_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_BACKUP_FILES;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
+ }
+ | K_STOP_BACKUP stop_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
$$ = (Node *) cmd;
}
;
+start_backup_opt_list:
+ start_backup_opt_list start_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+stop_backup_opt_list:
+ stop_backup_opt_list stop_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+send_backup_files_opt_list:
+ send_backup_files_opt_list send_backup_files_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
base_backup_opt_list:
base_backup_opt_list base_backup_opt
{ $$ = lappend($1, $2); }
@@ -173,49 +237,123 @@ base_backup_opt_list:
{ $$ = NIL; }
;
+start_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ ;
+
+stop_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ ;
+
+send_backup_files_opt:
+ backup_opt_maxrate { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ | backup_opt_wal_loc { $$ = $1; }
+ ;
+
base_backup_opt:
- K_LABEL SCONST
- {
- $$ = makeDefElem("label",
- (Node *)makeString($2), -1);
- }
- | K_PROGRESS
- {
- $$ = makeDefElem("progress",
- (Node *)makeInteger(true), -1);
- }
- | K_FAST
- {
- $$ = makeDefElem("fast",
- (Node *)makeInteger(true), -1);
- }
- | K_WAL
- {
- $$ = makeDefElem("wal",
- (Node *)makeInteger(true), -1);
- }
- | K_NOWAIT
- {
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
- }
- | K_MAX_RATE UCONST
+ backup_opt_label { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ | backup_opt_maxrate { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ ;
+
+backup_opt_label:
+ K_LABEL SCONST
+ {
+ $$ = makeDefElem("label",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_progress:
+ K_PROGRESS
+ {
+ $$ = makeDefElem("progress",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_fast:
+ K_FAST
+ {
+ $$ = makeDefElem("fast",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal:
+ K_WAL
+ {
+ $$ = makeDefElem("wal",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_nowait:
+ K_NOWAIT
+ {
+ $$ = makeDefElem("nowait",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_maxrate:
+ K_MAX_RATE UCONST
+ {
+ $$ = makeDefElem("max_rate",
+ (Node *)makeInteger($2), -1);
+ };
+
+backup_opt_tsmap:
+ K_TABLESPACE_MAP
+ {
+ $$ = makeDefElem("tablespace_map",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_chksum:
+ K_NOVERIFY_CHECKSUMS
+ {
+ $$ = makeDefElem("noverify_checksums",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal_loc:
+ K_START_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1);
+ };
+
+backup_files:
+ '(' backup_files_list ')'
{
- $$ = makeDefElem("max_rate",
- (Node *)makeInteger($2), -1);
+ $$ = $2;
}
- | K_TABLESPACE_MAP
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ backup_file
{
- $$ = makeDefElem("tablespace_map",
- (Node *)makeInteger(true), -1);
+ $$ = list_make1($1);
}
- | K_NOVERIFY_CHECKSUMS
+ | backup_files_list ',' backup_file
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ $$ = lappend($1, $3);
}
;
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
+ ;
+
create_replication_slot:
/* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL RESERVE_WAL */
K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_opt_list
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..0a88639239 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,12 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+SEND_BACKUP_FILELIST { return K_SEND_BACKUP_FILELIST; }
+SEND_BACKUP_FILES { return K_SEND_BACKUP_FILES; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+START_WAL_LOCATION { return K_START_WAL_LOCATION; }
+
"," { return ','; }
";" { return ';'; }
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 1e3ed4e19f..3685f260b5 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,14 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ SEND_BACKUP_FILELIST,
+ SEND_BACKUP_FILES,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +50,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index b55917b9b6..5202e4160b 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool dryrun);
+extern int64 sendTablespace(char *path, bool dryrun, List **filelist);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122.2)
0006-parallel-backup-testcase_v6.patchapplication/octet-stream; name=0006-parallel-backup-testcase_v6.patchDownload
From a8c2207928c415421433a80492b2a1f7f7b3930e Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 21:54:23 +0500
Subject: [PATCH 6/7] parallel backup - testcase
---
.../t/040_pg_basebackup_parallel.pl | 527 ++++++++++++++++++
1 file changed, 527 insertions(+)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 0000000000..4ec4c1e0f6
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,527 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 95;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+# # Create a temporary directory in the system location and symlink it
+# # to our physical temp location. That way we can use shorter names
+# # for the tablespace directories, which hopefully won't run afoul of
+# # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+my $file;
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
--
2.21.0 (Apple Git-122.2)
0007-parallel-backup-documentation_v6.patchapplication/octet-stream; name=0007-parallel-backup-documentation_v6.patchDownload
From d81aad9f5e18004dc721cf0dcc108d9e797a0c20 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Thu, 7 Nov 2019 16:52:40 +0500
Subject: [PATCH 7/7] parallel backup documentation
---
doc/src/sgml/protocol.sgml | 386 ++++++++++++++++++++++++++++
doc/src/sgml/ref/pg_basebackup.sgml | 20 ++
2 files changed, 406 insertions(+)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 80275215e0..22d620c346 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2700,6 +2700,392 @@ The commands accepted in replication mode are:
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>START_BACKUP</literal>
+ [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+ [ <literal>PROGRESS</literal> ]
+ [ <literal>FAST</literal> ]
+ [ <literal>TABLESPACE_MAP</literal> ]
+
+ <indexterm><primary>START_BACKUP</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instructs the server to prepare for performing on-line backup. The following
+ options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>LABEL</literal> <replaceable>'label'</replaceable></term>
+ <listitem>
+ <para>
+ Sets the label of the backup. If none is specified, a backup label
+ of <literal>start backup</literal> will be used. The quoting rules
+ for the label are the same as a standard SQL string with
+ <xref linkend="guc-standard-conforming-strings"/> turned on.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>PROGRESS</literal></term>
+ <listitem>
+ <para>
+ Request information required to generate a progress report. This will
+ send back an approximate size in the header of each tablespace, which
+ can be used to calculate how far along the stream is done. This is
+ calculated by enumerating all the file sizes once before the transfer
+ is even started, and might as such have a negative impact on the
+ performance. In particular, it might take longer before the first data
+ is streamed. Since the database files can change during the backup,
+ the size is only approximate and might both grow and shrink between
+ the time of approximation and the sending of the actual files.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>FAST</literal></term>
+ <listitem>
+ <para>
+ Request a fast checkpoint.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>TABLESPACE_MAP</literal></term>
+ <listitem>
+ <para>
+ Include information about symbolic links present in the directory
+ <filename>pg_tblspc</filename> in a file named
+ <filename>tablespace_map</filename>. The tablespace map file includes
+ each symbolic link name as it exists in the directory
+ <filename>pg_tblspc/</filename> and the full path of that symbolic link.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ In response to this command, server will send out three result sets.
+ </para>
+ <para>
+ The first ordinary result set contains the starting position of the
+ backup, in a single row with two columns. The first column contains
+ the start position given in XLogRecPtr format, and the second column
+ contains the corresponding timeline ID.
+ </para>
+
+ <para>
+ The second ordinary result set has one row for each tablespace.
+ The fields in this row are:
+ <variablelist>
+ <varlistentry>
+ <term><literal>spcoid</literal> (<type>oid</type>)</term>
+ <listitem>
+ <para>
+ The OID of the tablespace, or null if it's the base
+ directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>spclocation</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The full path of the tablespace directory, or null
+ if it's the base directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the tablespace, in kilobytes (1024 bytes),
+ if progress report has been requested; otherwise it's null.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The final result set will be sent in a single row with two columns. The
+ first column contains the data of <filename>backup_label</filename> file,
+ and the second column contains the data of <filename>tablespace_map</filename>.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>STOP_BACKUP</literal>
+ [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+ [ <literal>WAL</literal> ]
+ [ <literal>NOWAIT</literal> ]
+
+ <indexterm><primary>STOP_BACKUP</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instructs the server to finish performing on-line backup. The following
+ options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">LABEL</replaceable><replaceable>'string'</replaceable></term>
+ <listitem>
+ <para>
+ Provides the content of backup_label file to the backup. The content are
+ the same that were returned by <command>START_BACKUP</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include the necessary WAL segments in the backup. This will include
+ all the files between start and stop backup in the
+ <filename>pg_wal</filename> directory of the base directory tar
+ file.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>NOWAIT</literal></term>
+ <listitem>
+ <para>
+ By default, the backup will wait until the last required WAL
+ segment has been archived, or emit a warning if log archiving is
+ not enabled. Specifying <literal>NOWAIT</literal> disables both
+ the waiting and the warning, leaving the client responsible for
+ ensuring the required log is available.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ In response to this command, server will send one or more CopyResponse
+ results followed by a single result set, containing the WAL end position of
+ the backup. The CopyResponse contains <filename>pg_control</filename> and
+ WAL files, if stop backup is run with WAL option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SEND_BACKUP_FILELIST</literal>
+ <indexterm><primary>SEND_BACKUP_FILELIST</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instruct the server to return a list of files and directories, available in
+ data directory. In response to this command, server will send one result set
+ per tablespace. The result sets consist of following fields:
+ </para>
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>path</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The path and name of the file. In case of tablespace, it is an absolute
+ path on the database server, however, in case of <filename>base</filename>
+ tablespace, it is relative to $PGDATA.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>type</literal> (<type>char</type>)</term>
+ <listitem>
+ <para>
+ A single character, identifing the type of file.
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <literal>'f'</literal> - Regular file. Can be any relation or
+ non-relation file in $PGDATA.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>'d'</literal> - Directory.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>'l'</literal> - Symbolic link.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the file, in kilobytes (1024 bytes). It's null if
+ type is 'd' or 'l'.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>mtime</literal> (<type>Int64</type>)</term>
+ <listitem>
+ <para>
+ The file or directory last modification time, as seconds since the Epoch.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>
+ This list will contain all files and directories in the $PGDATA, regardless of
+ whether they are PostgreSQL files or other files added to the same directory.
+ The only excluded files are:
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <filename>postmaster.pid</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>postmaster.opts</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_internal.init</filename> (found in multiple directories)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Various temporary files and directories created during the operation
+ of the PostgreSQL server, such as any file or directory beginning
+ with <filename>pgsql_tmp</filename> and temporary relations.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_wal</filename>, including subdirectories. If the backup is run
+ with WAL files included, a synthesized version of <filename>pg_wal</filename> will be
+ included, but it will only contain the files necessary for the
+ backup to work, not the rest of the contents.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_dynshmem</filename>, <filename>pg_notify</filename>,
+ <filename>pg_replslot</filename>, <filename>pg_serial</filename>,
+ <filename>pg_snapshots</filename>, <filename>pg_stat_tmp</filename>, and
+ <filename>pg_subtrans</filename> are copied as empty directories (even if
+ they are symbolic links).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Files other than regular files and directories, such as symbolic
+ links (other than for the directories listed above) and special
+ device files, are skipped. (Symbolic links
+ in <filename>pg_tblspc</filename> are maintained.)
+ </para>
+ </listitem>
+ </itemizedlist>
+ Owner, group, and file mode are set if the underlying file system on the server
+ supports it.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SEND_BACKUP_FILES ( <replaceable class="parameter">'FILE'</replaceable> [, ...] )</literal>
+ [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ]
+ [ <literal>NOVERIFY_CHECKSUMS</literal> ]
+ [ <literal>START_WAL_LOCATION</literal> ]
+
+ <indexterm><primary>SEND_BACKUP_FILES</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instructs the server to send the contents of the requested FILE(s).
+ </para>
+
+ <para>
+ A clause of the form <literal>SEND_BACKUP_FILES ( 'FILE', 'FILE', ... ) [OPTIONS]</literal>
+ is accepted where one or more FILE(s) can be requested.
+ </para>
+
+ <para>
+ In response to this command, one or more CopyResponse results will be sent,
+ one for each FILE requested. The data in the CopyResponse results will be
+ a tar format (following the “ustar interchange format” specified in the
+ POSIX 1003.1-2008 standard) dump of the tablespace contents, except that
+ the two trailing blocks of zeroes specified in the standard are omitted.
+ </para>
+
+ <para>
+ The following options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>MAX_RATE</literal> <replaceable>rate</replaceable></term>
+ <listitem>
+ <para>
+ Limit (throttle) the maximum amount of data transferred from server
+ to client per unit of time. The expected unit is kilobytes per second.
+ If this option is specified, the value must either be equal to zero
+ or it must fall within the range from 32 kB through 1 GB (inclusive).
+ If zero is passed or the option is not specified, no restriction is
+ imposed on the transfer.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+ <listitem>
+ <para>
+ By default, checksums are verified during a base backup if they are
+ enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+ this verification.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>START_WAL_LOCATION</literal></term>
+ <listitem>
+ <para>
+ The starting WAL position when START BACKUP command was issued,
+ returned in the form of XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index fc9e222f8d..339e68bda7 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -536,6 +536,26 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-j <replaceable class="parameter">n</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">n</replaceable></option></term>
+ <listitem>
+ <para>
+ Create <replaceable class="parameter">n</replaceable> threads to copy
+ backup files from the database server. <application>pg_basebackup</application>
+ will open <replaceable class="parameter">n</replaceable> +1 connections
+ to the database. Therefore, the server must be configured with
+ <xref linkend="guc-max-wal-senders"/> set high enough to accommodate all
+ connections.
+ </para>
+
+ <para>
+ parallel mode only works with plain format.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
--
2.21.0 (Apple Git-122.2)
On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Sorry, I sent the wrong patches. Please see the correct version of the
patches (_v6).
Review comments on these patches:
1.
+ XLogRecPtr wal_location;
Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.
2.
+ int32 size;
Should we use size_t here?
3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?
4.
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only
allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
false);
+ }
In normal backup mode, we skip the special file which is not a regular file
or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special
files?
5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c
6.
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
Please update these comments as we fetch one file at a time.
7.
+backup_file:
+ SCONST { $$ = (Node *)
makeString($1); }
+ ;
+
Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.
8.
Please indent code within 80 char limit at all applicable places.
9.
Please fix following typos:
identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories
=====
The other mail thread related to backup manifest [1]/messages/by-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com, is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.
Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the
backup-manifest
infrastructure from that patch?
When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.
[1]: /messages/by-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
/messages/by-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?
I had the same thought, but I think it's better to keep them separate.
Somebody might want to use the SEND_BACKUP_FILELIST command for
something other than a backup (I actually think it should be called
just SEND_FILE_LIST). Somebody might want to start a backup without
getting a file list because they're going to copy the files at the FS
level. Somebody might want to get a list of files to process after
somebody else has started the backup on another connection. Or maybe
nobody wants to do any of those things, but it doesn't seem to cost us
much of anything to split the commands, so I think we should.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Nov 27, 2019 at 1:38 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:Sorry, I sent the wrong patches. Please see the correct version of the
patches (_v6).Review comments on these patches:
1.
+ XLogRecPtr wal_location;Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.2.
+ int32 size;Should we use size_t here?
3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?4. + else if ( +#ifndef WIN32 + S_ISLNK(statbuf.st_mode) +#else + pgwin32_is_junction(pathbuf) +#endif + ) + { + /* + * If symlink, write it as a directory. file symlinks only allowed + * in pg_tblspc + */ + statbuf.st_mode = S_IFDIR | pg_dir_create_mode; + _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false); + }In normal backup mode, we skip the special file which is not a regular
file or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special
files?
Yeah going through the code again, I found it a little bit inconsistent. In
fact
SendBackupFiles function is supposed to send the files that were requested
of
it. However, currently is performing these tasks:
1) If the requested file were to be a directory, it will return a TAR
directory entry.
2) If the requested files were to be symlink inside pg_tblspc, it will
return the link path.
3) and as you pointed out above, if the requested files were a symlink
outside pg_tblspc
and inside PGDATA then it will return TAR directory entry.
I think that this function should not take care of any of the above.
Instead, it should
be the client (i.e. pg_basebackup) managing it. The SendBackupFiles should
only send the
regular files and ignore the request of any other kind, be it a directory
or symlink.
Any thoughts?
5.
Please keep header file inclusions in alphabetical order in basebackup.c
and
pg_basebackup.c6. + /* + * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683', + * 'base/1/1245/32683', ...) [options] + */Please update these comments as we fetch one file at a time.
7. +backup_file: + SCONST { $$ = (Node *) makeString($1); } + ; +Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.8.
Please indent code within 80 char limit at all applicable places.9.
Please fix following typos:identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories=====
The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup
too.Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the
backup-manifest
infrastructure from that patch?
This was discussed earlier in the thread, and as Robert suggested, it would
complicate the
code to no real benefit.
When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both
these
features are checked-in.
Yes, I think it should. Since the full backup will have a manifest file,
there is no
reason for parallel backup to not support it.
I'll share the updated patch in the next couple of days.
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Thu, Nov 28, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:I am still not sure why we need SEND_BACKUP_FILELIST as a separate
command.
Can't we return the file list with START_BACKUP itself?
I had the same thought, but I think it's better to keep them separate.
Somebody might want to use the SEND_BACKUP_FILELIST command for
something other than a backup (I actually think it should be called
just SEND_FILE_LIST)
Sure. Thanks for the recommendation. To keep the function names in sync, I
intend to do following the
following renamings:
- SEND_BACKUP_FILES --> SEND_FILES
- SEND_BACKUP_FILELIST --> SEND_FILE_LIST
. Somebody might want to start a backup without
getting a file list because they're going to copy the files at the FS
level. Somebody might want to get a list of files to process after
somebody else has started the backup on another connection. Or maybe
nobody wants to do any of those things, but it doesn't seem to cost us
much of anything to split the commands, so I think we should.
+1
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Tue, Dec 10, 2019 at 7:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Thu, Nov 28, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com>
wrote:On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:I am still not sure why we need SEND_BACKUP_FILELIST as a separate
command.
Can't we return the file list with START_BACKUP itself?
I had the same thought, but I think it's better to keep them separate.
Somebody might want to use the SEND_BACKUP_FILELIST command for
something other than a backup (I actually think it should be called
just SEND_FILE_LIST)Sure. Thanks for the recommendation. To keep the function names in sync, I
intend to do following the
following renamings:
- SEND_BACKUP_FILES --> SEND_FILES
- SEND_BACKUP_FILELIST --> SEND_FILE_LIST. Somebody might want to start a backup without
getting a file list because they're going to copy the files at the FS
level. Somebody might want to get a list of files to process after
somebody else has started the backup on another connection. Or maybe
nobody wants to do any of those things, but it doesn't seem to cost us
much of anything to split the commands, so I think we should.+1
I have updated the patches (v7 attached) and have taken care of all issues
pointed by Jeevan, additionally
ran the pgindent on each patch. Furthermore, Command names have been
renamed as suggested and I
have simplified the SendFiles function. Client can only request the regular
files, any other kind such as
directories or symlinks will be skipped, the client will be responsible for
taking care of such.
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0007-parallel-backup-documentation_v7.patchapplication/octet-stream; name=0007-parallel-backup-documentation_v7.patchDownload
From 63952eafd3d2dbda70535048dbed2815fc75c3d0 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Thu, 7 Nov 2019 16:52:40 +0500
Subject: [PATCH 7/7] parallel backup documentation
---
doc/src/sgml/protocol.sgml | 386 ++++++++++++++++++++++++++++
doc/src/sgml/ref/pg_basebackup.sgml | 20 ++
2 files changed, 406 insertions(+)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 80275215e0..d582209229 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2700,6 +2700,392 @@ The commands accepted in replication mode are:
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>START_BACKUP</literal>
+ [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+ [ <literal>PROGRESS</literal> ]
+ [ <literal>FAST</literal> ]
+ [ <literal>TABLESPACE_MAP</literal> ]
+
+ <indexterm><primary>START_BACKUP</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instructs the server to prepare for performing on-line backup. The following
+ options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>LABEL</literal> <replaceable>'label'</replaceable></term>
+ <listitem>
+ <para>
+ Sets the label of the backup. If none is specified, a backup label
+ of <literal>start backup</literal> will be used. The quoting rules
+ for the label are the same as a standard SQL string with
+ <xref linkend="guc-standard-conforming-strings"/> turned on.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>PROGRESS</literal></term>
+ <listitem>
+ <para>
+ Request information required to generate a progress report. This will
+ send back an approximate size in the header of each tablespace, which
+ can be used to calculate how far along the stream is done. This is
+ calculated by enumerating all the file sizes once before the transfer
+ is even started, and might as such have a negative impact on the
+ performance. In particular, it might take longer before the first data
+ is streamed. Since the database files can change during the backup,
+ the size is only approximate and might both grow and shrink between
+ the time of approximation and the sending of the actual files.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>FAST</literal></term>
+ <listitem>
+ <para>
+ Request a fast checkpoint.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>TABLESPACE_MAP</literal></term>
+ <listitem>
+ <para>
+ Include information about symbolic links present in the directory
+ <filename>pg_tblspc</filename> in a file named
+ <filename>tablespace_map</filename>. The tablespace map file includes
+ each symbolic link name as it exists in the directory
+ <filename>pg_tblspc/</filename> and the full path of that symbolic link.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ In response to this command, server will send out three result sets.
+ </para>
+ <para>
+ The first ordinary result set contains the starting position of the
+ backup, in a single row with two columns. The first column contains
+ the start position given in XLogRecPtr format, and the second column
+ contains the corresponding timeline ID.
+ </para>
+
+ <para>
+ The second ordinary result set has one row for each tablespace.
+ The fields in this row are:
+ <variablelist>
+ <varlistentry>
+ <term><literal>spcoid</literal> (<type>oid</type>)</term>
+ <listitem>
+ <para>
+ The OID of the tablespace, or null if it's the base
+ directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>spclocation</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The full path of the tablespace directory, or null
+ if it's the base directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the tablespace, in kilobytes (1024 bytes),
+ if progress report has been requested; otherwise it's null.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The final result set will be sent in a single row with two columns. The
+ first column contains the data of <filename>backup_label</filename> file,
+ and the second column contains the data of <filename>tablespace_map</filename>.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>STOP_BACKUP</literal>
+ [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+ [ <literal>WAL</literal> ]
+ [ <literal>NOWAIT</literal> ]
+
+ <indexterm><primary>STOP_BACKUP</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instructs the server to finish performing on-line backup. The following
+ options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">LABEL</replaceable><replaceable>'string'</replaceable></term>
+ <listitem>
+ <para>
+ Provides the content of backup_label file to the backup. The content are
+ the same that were returned by <command>START_BACKUP</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include the necessary WAL segments in the backup. This will include
+ all the files between start and stop backup in the
+ <filename>pg_wal</filename> directory of the base directory tar
+ file.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>NOWAIT</literal></term>
+ <listitem>
+ <para>
+ By default, the backup will wait until the last required WAL
+ segment has been archived, or emit a warning if log archiving is
+ not enabled. Specifying <literal>NOWAIT</literal> disables both
+ the waiting and the warning, leaving the client responsible for
+ ensuring the required log is available.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ In response to this command, server will send one or more CopyResponse
+ results followed by a single result set, containing the WAL end position of
+ the backup. The CopyResponse contains <filename>pg_control</filename> and
+ WAL files, if stop backup is run with WAL option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SEND_FILE_LIST</literal>
+ <indexterm><primary>SEND_FILE_LIST</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instruct the server to return a list of files and directories, available in
+ data directory. In response to this command, server will send one result set
+ per tablespace. The result sets consist of following fields:
+ </para>
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>path</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The path and name of the file. In case of tablespace, it is an absolute
+ path on the database server, however, in case of <filename>base</filename>
+ tablespace, it is relative to $PGDATA.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>type</literal> (<type>char</type>)</term>
+ <listitem>
+ <para>
+ A single character, identifying the type of file.
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <literal>'f'</literal> - Regular file. Can be any relation or
+ non-relation file in $PGDATA.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>'d'</literal> - Directory.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>'l'</literal> - Symbolic link.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the file, in kilobytes (1024 bytes). It's null if
+ type is 'd' or 'l'.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>mtime</literal> (<type>Int64</type>)</term>
+ <listitem>
+ <para>
+ The file or directory last modification time, as seconds since the Epoch.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>
+ This list will contain all files and directories in the $PGDATA, regardless of
+ whether they are PostgreSQL files or other files added to the same directory.
+ The only excluded files are:
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <filename>postmaster.pid</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>postmaster.opts</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_internal.init</filename> (found in multiple directories)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Various temporary files and directories created during the operation
+ of the PostgreSQL server, such as any file or directory beginning
+ with <filename>pgsql_tmp</filename> and temporary relations.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_wal</filename>, including subdirectories. If the backup is run
+ with WAL files included, a synthesized version of <filename>pg_wal</filename> will be
+ included, but it will only contain the files necessary for the
+ backup to work, not the rest of the contents.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_dynshmem</filename>, <filename>pg_notify</filename>,
+ <filename>pg_replslot</filename>, <filename>pg_serial</filename>,
+ <filename>pg_snapshots</filename>, <filename>pg_stat_tmp</filename>, and
+ <filename>pg_subtrans</filename> are copied as empty directories (even if
+ they are symbolic links).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Files other than regular files and directories, such as symbolic
+ links (other than for the directories listed above) and special
+ device files, are skipped. (Symbolic links
+ in <filename>pg_tblspc</filename> are maintained.)
+ </para>
+ </listitem>
+ </itemizedlist>
+ Owner, group, and file mode are set if the underlying file system on the server
+ supports it.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SEND_FILES ( <replaceable class="parameter">'FILE'</replaceable> [, ...] )</literal>
+ [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ]
+ [ <literal>NOVERIFY_CHECKSUMS</literal> ]
+ [ <literal>START_WAL_LOCATION</literal> ]
+
+ <indexterm><primary>SEND_FILES</primary></indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Instructs the server to send the contents of the requested FILE(s).
+ </para>
+
+ <para>
+ A clause of the form <literal>SEND_FILES ( 'FILE', 'FILE', ... ) [OPTIONS]</literal>
+ is accepted where one or more FILE(s) can be requested.
+ </para>
+
+ <para>
+ In response to this command, one or more CopyResponse results will be sent,
+ one for each FILE requested. The data in the CopyResponse results will be
+ a tar format (following the “ustar interchange format” specified in the
+ POSIX 1003.1-2008 standard) dump of the tablespace contents, except that
+ the two trailing blocks of zeroes specified in the standard are omitted.
+ </para>
+
+ <para>
+ The following options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>MAX_RATE</literal> <replaceable>rate</replaceable></term>
+ <listitem>
+ <para>
+ Limit (throttle) the maximum amount of data transferred from server
+ to client per unit of time. The expected unit is kilobytes per second.
+ If this option is specified, the value must either be equal to zero
+ or it must fall within the range from 32 kB through 1 GB (inclusive).
+ If zero is passed or the option is not specified, no restriction is
+ imposed on the transfer.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+ <listitem>
+ <para>
+ By default, checksums are verified during a base backup if they are
+ enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+ this verification.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>START_WAL_LOCATION</literal></term>
+ <listitem>
+ <para>
+ The starting WAL position when START BACKUP command was issued,
+ returned in the form of XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index fc9e222f8d..339e68bda7 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -536,6 +536,26 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-j <replaceable class="parameter">n</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">n</replaceable></option></term>
+ <listitem>
+ <para>
+ Create <replaceable class="parameter">n</replaceable> threads to copy
+ backup files from the database server. <application>pg_basebackup</application>
+ will open <replaceable class="parameter">n</replaceable> +1 connections
+ to the database. Therefore, the server must be configured with
+ <xref linkend="guc-max-wal-senders"/> set high enough to accommodate all
+ connections.
+ </para>
+
+ <para>
+ parallel mode only works with plain format.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
--
2.21.0 (Apple Git-122.2)
0004-Parallel-Backup-Backend-Replication-commands_v7.patchapplication/octet-stream; name=0004-Parallel-Backup-Backend-Replication-commands_v7.patchDownload
From 6ccc7e8970e3ccf9183627692c50a235e05c8c68 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 22:59:28 +0500
Subject: [PATCH 4/7] Parallel Backup - Backend Replication commands
This feature adds four new replication commands to the backend replication
system, to help facilitate taking a full backup in parallel using multiple
connections.
- START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
This command instructs the server to get prepared for performing an
online backup.
- STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
This command instructs the server that online backup is finished. It
will bring the system out of backup mode.
- SEND_FILE_LIST
Instruct the server to return a list of files and directories that
are available in $PGDATA directory.
- SEND_FILES ( 'FILE' [, ...] ) [ MAX_RATE rate ] [ NOVERIFY_CHECKSUMS ]
[ START_WAL_LOCATION ]
Instructs the server to send the contents of the requested FILE(s).
---
src/backend/access/transam/xlog.c | 2 +-
src/backend/replication/basebackup.c | 503 ++++++++++++++++++++++++-
src/backend/replication/repl_gram.y | 201 ++++++++--
src/backend/replication/repl_scanner.l | 6 +
src/include/nodes/replnodes.h | 10 +
src/include/replication/basebackup.h | 2 +-
6 files changed, 674 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c20dc447f1..f8d9e0655a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12288,7 +12288,7 @@ collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index dee590f16a..30d06d72a9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -38,6 +38,7 @@
#include "storage/ipc.h"
#include "storage/reinit.h"
#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/relcache.h"
#include "utils/timestamp.h"
@@ -51,11 +52,20 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ XLogRecPtr startwallocation;
} basebackup_options;
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ size_t size;
+ time_t mtime;
+} BackupFile;
+
static int64 sendDir(const char *path, int basepathlen, bool dryrun,
- List *tablespaces, bool sendtblspclinks);
+ List *tablespaces, bool sendtblspclinks, List **filelist);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -75,6 +85,13 @@ static void throttle(size_t increment);
static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void StartBackup(basebackup_options *opt);
+static void StopBackup(basebackup_options *opt);
+static void SendFileList(void);
+static void SendFiles(basebackup_options *opt, List *filenames, bool missing_ok);
+static void addToBackupFileList(List **filelist, char *path, char type, size_t size,
+ time_t mtime);
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -289,7 +306,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
tablespaces = lappend(tablespaces, ti);
/* Send tablespace header */
@@ -323,10 +340,10 @@ perform_base_backup(basebackup_options *opt)
if (tblspc_map_file && opt->sendtblspcmapfile)
{
sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
+ sendDir(".", 1, false, tablespaces, false, NULL);
}
else
- sendDir(".", 1, false, tablespaces, true);
+ sendDir(".", 1, false, tablespaces, true, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -337,7 +354,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -409,6 +426,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_startwallocation = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -497,12 +515,24 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "start_wal_location") == 0)
+ {
+ bool have_error = false;
+ char *startwallocation;
+
+ if (o_startwallocation)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ startwallocation = strVal(defel->arg);
+ opt->startwallocation = pg_lsn_in_internal(startwallocation, &have_error);
+ o_startwallocation = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
}
- if (opt->label == NULL)
- opt->label = "base backup";
}
@@ -520,6 +550,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
parse_basebackup_options(cmd->options, &opt);
+ /* default value for label, if not specified. */
+ if (opt.label == NULL)
+ {
+ if (cmd->cmdtag == BASE_BACKUP)
+ opt.label = "base backup";
+ else
+ opt.label = "start backup";
+ }
+
WalSndSetState(WALSNDSTATE_BACKUP);
if (update_process_title)
@@ -531,7 +570,29 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ StartBackup(&opt);
+ break;
+ case SEND_FILE_LIST:
+ SendFileList();
+ break;
+ case SEND_FILES:
+ SendFiles(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ StopBackup(&opt);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -674,6 +735,61 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_puttextmessage('C', "SELECT");
}
+/*
+ * Send a single resultset containing backup label and tablespace map
+ */
+static void
+SendStartBackupResult(StringInfo labelfile, StringInfo tblspc_map_file)
+{
+ StringInfoData buf;
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "label");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tablespacemap");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = labelfile->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, labelfile->data, len);
+
+ if (tblspc_map_file)
+ {
+ len = tblspc_map_file->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, tblspc_map_file->data, len);
+ }
+ else
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ }
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
/*
* Inject a file with given name and content in the output tar stream.
*/
@@ -725,7 +841,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool dryrun)
+sendTablespace(char *path, bool dryrun, List **filelist)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -754,11 +870,11 @@ sendTablespace(char *path, bool dryrun)
return 0;
}
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
dryrun);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true, filelist);
return size;
}
@@ -777,7 +893,7 @@ sendTablespace(char *path, bool dryrun)
*/
static int64
sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
- bool sendtblspclinks)
+ bool sendtblspclinks, List **filelist)
{
DIR *dir;
struct dirent *de;
@@ -931,6 +1047,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
@@ -947,6 +1065,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -968,6 +1088,10 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
dryrun);
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
+ addToBackupFileList(filelist, "./pg_wal/archive_status", 'd', -1,
+ statbuf.st_mtime);
+
continue; /* don't recurse into pg_wal */
}
@@ -997,6 +1121,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ addToBackupFileList(filelist, pathbuf, 'l', statbuf.st_size, statbuf.st_mtime);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, dryrun);
#else
@@ -1023,6 +1148,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
dryrun);
+ addToBackupFileList(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1053,13 +1179,15 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks, filelist);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!dryrun)
+ addToBackupFileList(filelist, pathbuf, 'f', statbuf.st_size, statbuf.st_mtime);
+
+ if (!dryrun && filelist == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
@@ -1762,3 +1890,350 @@ setup_throttle(int maxrate)
throttling_counter = -1;
}
}
+
+/*
+ * StartBackup - prepare to start an online backup.
+ *
+ * This function calls do_pg_start_backup() and sends back starting checkpoint,
+ * available tablespaces, content of backup_label and tablespace_map files.
+ */
+static void
+StartBackup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo labelfile;
+ StringInfo tblspc_map_file = NULL;
+ int datadirpathlen;
+ List *tablespaces = NIL;
+ tablespaceinfo *ti;
+
+ datadirpathlen = strlen(DataDir);
+
+ backup_started_in_recovery = RecoveryInProgress();
+
+ labelfile = makeStringInfo();
+ tblspc_map_file = makeStringInfo();
+
+ total_checksum_failures = 0;
+
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ labelfile, &tablespaces,
+ tblspc_map_file,
+ opt->progress, opt->sendtblspcmapfile);
+
+ /*
+ * Once do_pg_start_backup has been called, ensure that any failure causes
+ * us to abort the backup so we don't "leak" a backup counter. For this
+ * reason, register base_backup_cleanup with before_shmem_exit handler.
+ * This will make sure that call is always made when process exits. In
+ * success, do_pg_stop_backup will have taken the system out of backup
+ * mode and this callback will have no effect, Otherwise the required
+ * cleanup will be done in any case.
+ */
+ before_shmem_exit(base_backup_cleanup, (Datum) 0);
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in order
+ * to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ if ((tblspc_map_file && tblspc_map_file->len <= 0) ||
+ !opt->sendtblspcmapfile)
+ tblspc_map_file = NULL;
+
+ /* send backup_label and tablespace_map to frontend */
+ SendStartBackupResult(labelfile, tblspc_map_file);
+}
+
+/*
+ * StopBackup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
+static void
+StopBackup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+ struct stat statbuf;
+ StringInfoData buf;
+ char *labelfile = NULL;
+
+ if (get_backup_status() != SESSION_BACKUP_NON_EXCLUSIVE)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("non-exclusive backup is not in progress")));
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
+
+ /* stop backup */
+ labelfile = (char *) opt->label;
+ endptr = do_pg_stop_backup(labelfile, !opt->nowait, &endtli);
+
+ if (opt->includewal)
+ include_wal_files(endptr);
+
+ pq_putemptymessage('c'); /* CopyDone */
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * SendFileList() - sends a list of filenames to frontend
+ *
+ * The function collects a list of filenames, necessary for a complete backup and
+ * sends this list to the client.
+ */
+static void
+SendFileList(void)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ List *tablespaces = NIL;
+ StringInfo tblspc_map_file = NULL;
+ tablespaceinfo *ti;
+
+ tblspc_map_file = makeStringInfo();
+ collectTablespaces(&tablespaces, tblspc_map_file, false, false);
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ tablespaces = lappend(tablespaces, ti);
+
+ foreach(lc, tablespaces)
+ {
+ List *filelist = NULL;
+ tablespaceinfo *ti;
+
+ ti = (tablespaceinfo *) lfirst(lc);
+ if (ti->path == NULL)
+ sendDir(".", 1, true, NIL, true, &filelist);
+ else
+ sendTablespace(ti->path, true, &filelist);
+
+ /* Construct and send the list of filenames */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 4); /* n field */
+
+ /* First field - file name */
+ pq_sendstring(&buf, "path");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "type");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, CHAROID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - mtime */
+ pq_sendstring(&buf, "mtime");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, filelist)
+ {
+ BackupFile *backupFile = (BackupFile *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 4); /* number of columns */
+
+ /* send path */
+ len = strlen(backupFile->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, backupFile->path, len);
+
+ /* send type */
+ pq_sendint32(&buf, 1);
+ pq_sendbyte(&buf, backupFile->type);
+
+ /* send size */
+ send_int8_string(&buf, backupFile->size);
+
+ /* send mtime */
+ send_int8_string(&buf, backupFile->mtime);
+
+ pq_endmessage(&buf);
+ }
+
+ if (filelist)
+ pfree(filelist);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * SendFiles() - sends the actual files to the caller
+ *
+ * The function sends out the given file(s) over to the caller using the COPY
+ * protocol. It does only entertains the regular files and any other kind such
+ * as directories or symlink etc will be ignored.
+ */
+static void
+SendFiles(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ int basepathlen = 1;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Setup and activate network throttling, if client requested it */
+ setup_throttle(opt->maxrate);
+
+ /* set backup start location. */
+ startptr = opt->startwallocation;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (is_absolute_path(pathbuf))
+ {
+ char *basepath;
+
+ /*
+ * 'pathbuf' points to the tablespace location, but we only want
+ * to include the version directory in it that belongs to us.
+ */
+ basepath = strstr(pathbuf, TABLESPACE_VERSION_DIRECTORY);
+ if (basepath)
+ basepathlen = basepath - pathbuf - 1;
+ }
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /*
+ * Only entertain requests for regular file, skip any directories or
+ * special files.
+ */
+ if (S_ISREG(statbuf.st_mode))
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, InvalidOid);
+ }
+ else
+ ereport(WARNING,
+ (errmsg("skipping special file or directory \"%s\"", pathbuf)));
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
+
+/*
+ * Construct a BackupFile entry and add to the list.
+ */
+static void
+addToBackupFileList(List **filelist, char *path, char type, size_t size,
+ time_t mtime)
+{
+ BackupFile *backupFile;
+
+ if (filelist)
+ {
+ backupFile = (BackupFile *) palloc0(sizeof(BackupFile));
+ strlcpy(backupFile->path, path, sizeof(backupFile->path));
+ backupFile->type = type;
+ backupFile->size = size;
+ backupFile->mtime = mtime;
+
+ *filelist = lappend(*filelist, backupFile);
+ }
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..0aa781ebdc 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,13 +87,24 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_SEND_FILE_LIST
+%token K_SEND_FILES
+%token K_STOP_BACKUP
+%token K_START_WAL_LOCATION
%type <node> command
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
%type <list> base_backup_opt_list
+ start_backup_opt_list stop_backup_opt_list
+ send_backup_files_opt_list
%type <defelt> base_backup_opt
+ backup_opt_label backup_opt_progress backup_opt_maxrate
+ backup_opt_fast backup_opt_tsmap backup_opt_wal backup_opt_nowait
+ backup_opt_chksum backup_opt_wal_loc
+ start_backup_opt stop_backup_opt send_backup_files_opt
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
@@ -102,6 +113,7 @@ static SQLCmd *make_sqlcmd(void);
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
+%type <list> backup_files backup_files_list
%%
@@ -162,10 +174,61 @@ base_backup:
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_START_BACKUP start_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_FILE_LIST
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = NIL;
+ cmd->cmdtag = SEND_FILE_LIST;
+ $$ = (Node *) cmd;
+ }
+ | K_SEND_FILES backup_files send_backup_files_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_FILES;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
+ }
+ | K_STOP_BACKUP stop_backup_opt_list
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
$$ = (Node *) cmd;
}
;
+start_backup_opt_list:
+ start_backup_opt_list start_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+stop_backup_opt_list:
+ stop_backup_opt_list stop_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+send_backup_files_opt_list:
+ send_backup_files_opt_list send_backup_files_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
base_backup_opt_list:
base_backup_opt_list base_backup_opt
{ $$ = lappend($1, $2); }
@@ -173,46 +236,116 @@ base_backup_opt_list:
{ $$ = NIL; }
;
+start_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ ;
+
+stop_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ ;
+
+send_backup_files_opt:
+ backup_opt_maxrate { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ | backup_opt_wal_loc { $$ = $1; }
+ ;
+
base_backup_opt:
- K_LABEL SCONST
- {
- $$ = makeDefElem("label",
- (Node *)makeString($2), -1);
- }
- | K_PROGRESS
- {
- $$ = makeDefElem("progress",
- (Node *)makeInteger(true), -1);
- }
- | K_FAST
- {
- $$ = makeDefElem("fast",
- (Node *)makeInteger(true), -1);
- }
- | K_WAL
- {
- $$ = makeDefElem("wal",
- (Node *)makeInteger(true), -1);
- }
- | K_NOWAIT
- {
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
- }
- | K_MAX_RATE UCONST
+ backup_opt_label { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ | backup_opt_maxrate { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ ;
+
+backup_opt_label:
+ K_LABEL SCONST
+ {
+ $$ = makeDefElem("label",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_progress:
+ K_PROGRESS
+ {
+ $$ = makeDefElem("progress",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_fast:
+ K_FAST
+ {
+ $$ = makeDefElem("fast",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal:
+ K_WAL
+ {
+ $$ = makeDefElem("wal",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_nowait:
+ K_NOWAIT
+ {
+ $$ = makeDefElem("nowait",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_maxrate:
+ K_MAX_RATE UCONST
+ {
+ $$ = makeDefElem("max_rate",
+ (Node *)makeInteger($2), -1);
+ };
+
+backup_opt_tsmap:
+ K_TABLESPACE_MAP
+ {
+ $$ = makeDefElem("tablespace_map",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_chksum:
+ K_NOVERIFY_CHECKSUMS
+ {
+ $$ = makeDefElem("noverify_checksums",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal_loc:
+ K_START_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1);
+ };
+
+backup_files:
+ '(' backup_files_list ')'
{
- $$ = makeDefElem("max_rate",
- (Node *)makeInteger($2), -1);
+ $$ = $2;
}
- | K_TABLESPACE_MAP
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ SCONST
{
- $$ = makeDefElem("tablespace_map",
- (Node *)makeInteger(true), -1);
+ $$ = list_make1(makeString($1));
}
- | K_NOVERIFY_CHECKSUMS
+ | backup_files_list ',' SCONST
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ $$ = lappend($1, makeString($3));
}
;
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..d2e2dfe1e9 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,12 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+SEND_FILE_LIST { return K_SEND_FILE_LIST; }
+SEND_FILES { return K_SEND_FILES; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+START_WAL_LOCATION { return K_START_WAL_LOCATION; }
+
"," { return ','; }
";" { return ';'; }
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 1e3ed4e19f..eac4802c7e 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,14 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ SEND_FILE_LIST,
+ SEND_FILES,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +50,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index b55917b9b6..5202e4160b 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool dryrun);
+extern int64 sendTablespace(char *path, bool dryrun, List **filelist);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122.2)
0003-Refactor-some-basebackup-code-to-increase-reusabilit_v7.patchapplication/octet-stream; name=0003-Refactor-some-basebackup-code-to-increase-reusabilit_v7.patchDownload
From 896fbef37a1feec87b770ea16514ce2b319adb39 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 9 Oct 2019 12:39:41 +0500
Subject: [PATCH 3/7] Refactor some basebackup code to increase reusability.
This commit adds a new function collectTablespaces and have moved the code
that collects tablespace information from do_pg_start_backup to it.
This does not introduce any functional change.
---
src/backend/access/transam/xlog.c | 192 +++++-----
src/backend/replication/basebackup.c | 510 ++++++++++++++-------------
src/include/access/xlog.h | 2 +
3 files changed, 369 insertions(+), 335 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6bc1a6b46d..c20dc447f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10309,10 +10309,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
{
bool gotUniqueStartpoint = false;
- DIR *tblspcdir;
- struct dirent *de;
- tablespaceinfo *ti;
- int datadirpathlen;
/*
* Force an XLOG file switch before the checkpoint, to ensure that the
@@ -10438,93 +10434,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
if (exclusive)
tblspcmapfile = makeStringInfo();
- datadirpathlen = strlen(DataDir);
-
- /* Collect information about all tablespaces */
- tblspcdir = AllocateDir("pg_tblspc");
- while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
- {
- char fullpath[MAXPGPATH + 10];
- char linkpath[MAXPGPATH];
- char *relpath = NULL;
- int rllen;
- StringInfoData buflinkpath;
- char *s = linkpath;
-
- /* Skip special stuff */
- if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
- continue;
-
- snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
-
-#if defined(HAVE_READLINK) || defined(WIN32)
- rllen = readlink(fullpath, linkpath, sizeof(linkpath));
- if (rllen < 0)
- {
- ereport(WARNING,
- (errmsg("could not read symbolic link \"%s\": %m",
- fullpath)));
- continue;
- }
- else if (rllen >= sizeof(linkpath))
- {
- ereport(WARNING,
- (errmsg("symbolic link \"%s\" target is too long",
- fullpath)));
- continue;
- }
- linkpath[rllen] = '\0';
-
- /*
- * Add the escape character '\\' before newline in a string to
- * ensure that we can distinguish between the newline in the
- * tablespace path and end of line while reading tablespace_map
- * file during archive recovery.
- */
- initStringInfo(&buflinkpath);
-
- while (*s)
- {
- if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
- appendStringInfoChar(&buflinkpath, '\\');
- appendStringInfoChar(&buflinkpath, *s++);
- }
-
- /*
- * Relpath holds the relative path of the tablespace directory
- * when it's located within PGDATA, or NULL if it's located
- * elsewhere.
- */
- if (rllen > datadirpathlen &&
- strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
- IS_DIR_SEP(linkpath[datadirpathlen]))
- relpath = linkpath + datadirpathlen + 1;
-
- ti = palloc(sizeof(tablespaceinfo));
- ti->oid = pstrdup(de->d_name);
- ti->path = pstrdup(buflinkpath.data);
- ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
-
- if (tablespaces)
- *tablespaces = lappend(*tablespaces, ti);
-
- appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
-
- pfree(buflinkpath.data);
-#else
-
- /*
- * If the platform does not have symbolic links, it should not be
- * possible to have tablespaces - clearly somebody else created
- * them. Warn about it and ignore.
- */
- ereport(WARNING,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("tablespaces are not supported on this platform")));
-#endif
- }
- FreeDir(tblspcdir);
+ collectTablespaces(tablespaces, tblspcmapfile, infotbssize, needtblspcmapfile);
/*
* Construct backup label file
@@ -12300,3 +12210,103 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Collect information about all tablespaces.
+ */
+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile)
+{
+ DIR *tblspcdir;
+ struct dirent *de;
+ tablespaceinfo *ti;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ tblspcdir = AllocateDir("pg_tblspc");
+ while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+ {
+ char fullpath[MAXPGPATH + 10];
+ char linkpath[MAXPGPATH];
+ char *relpath = NULL;
+ int rllen;
+ StringInfoData buflinkpath;
+ char *s = linkpath;
+
+ /* Skip special stuff */
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+
+#if defined(HAVE_READLINK) || defined(WIN32)
+ rllen = readlink(fullpath, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not read symbolic link \"%s\": %m",
+ fullpath)));
+ continue;
+ }
+ else if (rllen >= sizeof(linkpath))
+ {
+ ereport(WARNING,
+ (errmsg("symbolic link \"%s\" target is too long",
+ fullpath)));
+ continue;
+ }
+ linkpath[rllen] = '\0';
+
+ /*
+ * Add the escape character '\\' before newline in a string to
+ * ensure that we can distinguish between the newline in the
+ * tablespace path and end of line while reading tablespace_map
+ * file during archive recovery.
+ */
+ initStringInfo(&buflinkpath);
+
+ while (*s)
+ {
+ if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
+ appendStringInfoChar(&buflinkpath, '\\');
+ appendStringInfoChar(&buflinkpath, *s++);
+ }
+
+ /*
+ * Relpath holds the relative path of the tablespace directory
+ * when it's located within PGDATA, or NULL if it's located
+ * elsewhere.
+ */
+ if (rllen > datadirpathlen &&
+ strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
+ IS_DIR_SEP(linkpath[datadirpathlen]))
+ relpath = linkpath + datadirpathlen + 1;
+
+ ti = palloc(sizeof(tablespaceinfo));
+ ti->oid = pstrdup(de->d_name);
+ ti->path = pstrdup(buflinkpath.data);
+ ti->rpath = relpath ? pstrdup(relpath) : NULL;
+ ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+
+ if (tablespaces)
+ *tablespaces = lappend(*tablespaces, ti);
+
+ appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
+
+ pfree(buflinkpath.data);
+#else
+
+ /*
+ * If the platform does not have symbolic links, it should not be
+ * possible to have tablespaces - clearly somebody else created
+ * them. Warn about it and ignore.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("tablespaces are not supported on this platform")));
+#endif
+ }
+ FreeDir(tblspcdir);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 5774117089..dee590f16a 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -67,10 +67,12 @@ static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
static void perform_base_backup(basebackup_options *opt);
+static void include_wal_files(XLogRecPtr endptr);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
+static void setup_throttle(int maxrate);
static bool is_checksummed_file(const char *fullpath, const char *filename);
/* Was the backup currently in-progress initiated in recovery mode? */
@@ -293,29 +295,7 @@ perform_base_backup(basebackup_options *opt)
/* Send tablespace header */
SendBackupHeader(tablespaces);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ setup_throttle(opt->maxrate);
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
@@ -381,227 +361,7 @@ perform_base_backup(basebackup_options *opt)
* We've left the last tar file "open", so we can now append the
* required WAL files to it.
*/
- char pathbuf[MAXPGPATH];
- XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
- struct stat statbuf;
- List *historyFileList = NIL;
- List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
- ListCell *lc;
- TimeLineID tli;
-
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
- /* Ok, we have everything we need. Send the WAL files. */
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- FILE *fp;
- char buf[TAR_SEND_SIZE];
- size_t cnt;
- pgoff_t len = 0;
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
-
- fp = AllocateFile(pathbuf, "rb");
- if (fp == NULL)
- {
- int save_errno = errno;
-
- /*
- * Most likely reason for this is that the file was already
- * removed by a checkpoint, so check for that to get a better
- * error message.
- */
- CheckXLogRemoved(segno, tli);
-
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", pathbuf)));
- }
-
- if (fstat(fileno(fp), &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- pathbuf)));
- if (statbuf.st_size != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
-
- while ((cnt = fread(buf, 1,
- Min(sizeof(buf), wal_segment_size - len),
- fp)) > 0)
- {
- CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
-
- len += cnt;
- throttle(cnt);
-
- if (len == wal_segment_size)
- break;
- }
-
- CHECK_FREAD_ERROR(fp, pathbuf);
-
- if (len != wal_segment_size)
- {
- CheckXLogRemoved(segno, tli);
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("unexpected WAL file size \"%s\"", walFileName)));
- }
-
- /* wal_segment_size is a multiple of 512, so no need for padding */
-
- FreeFile(fp);
-
- /*
- * Mark file as archived, otherwise files can get archived again
- * after promotion of a new node. This is in line with
- * walreceiver.c always doing an XLogArchiveForceDone() after a
- * complete segment.
- */
- StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "");
- }
-
- /*
- * Send timeline history files too. Only the latest timeline history
- * file is required for recovery, and even that only if there happens
- * to be a timeline switch in the first WAL segment that contains the
- * checkpoint record, or if we're taking a base backup from a standby
- * server and the target timeline changes while the backup is taken.
- * But they are small and highly useful for debugging purposes, so
- * better include them all, always.
- */
- foreach(lc, historyFileList)
- {
- char *fname = lfirst(lc);
-
- snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
-
- if (lstat(pathbuf, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m", pathbuf)));
-
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
-
- /* unconditionally mark file as archived */
- StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "");
- }
+ include_wal_files(endptr);
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
@@ -1740,3 +1500,265 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+/*
+ * Append the required WAL files to the backup tar file. It assumes that the
+ * last tar file is "open" and the WALs will be appended to it.
+ */
+static void
+include_wal_files(XLogRecPtr endptr)
+{
+ /*
+ * We've left the last tar file "open", so we can now append the required
+ * WAL files to it.
+ */
+ char pathbuf[MAXPGPATH];
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ struct stat statbuf;
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and include
+ * all WAL files in the range between 'startptr' and 'endptr', regardless
+ * of the timeline the file is stamped with. If there are some spurious
+ * WAL files belonging to timelines that don't belong in this server's
+ * history, they will be included too. Normally there shouldn't be such
+ * files, but if there are, there's little harm in including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we need
+ * were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from oldest
+ * to newest, to reduce the chance that a file is recycled before we get a
+ * chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since we
+ * are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ /* Ok, we have everything we need. Send the WAL files. */
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ FILE *fp;
+ char buf[TAR_SEND_SIZE];
+ size_t cnt;
+ pgoff_t len = 0;
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+
+ fp = AllocateFile(pathbuf, "rb");
+ if (fp == NULL)
+ {
+ int save_errno = errno;
+
+ /*
+ * Most likely reason for this is that the file was already
+ * removed by a checkpoint, so check for that to get a better
+ * error message.
+ */
+ CheckXLogRemoved(segno, tli);
+
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", pathbuf)));
+ }
+
+ if (fstat(fileno(fp), &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ pathbuf)));
+ if (statbuf.st_size != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* send the WAL file itself */
+ _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+
+ while ((cnt = fread(buf, 1,
+ Min(sizeof(buf), wal_segment_size - len),
+ fp)) > 0)
+ {
+ CheckXLogRemoved(segno, tli);
+ /* Send the chunk as a CopyData message */
+ if (pq_putmessage('d', buf, cnt))
+ ereport(ERROR,
+ (errmsg("base backup could not send data, aborting backup")));
+
+ len += cnt;
+ throttle(cnt);
+
+ if (len == wal_segment_size)
+ break;
+ }
+
+ CHECK_FREAD_ERROR(fp, pathbuf);
+
+ if (len != wal_segment_size)
+ {
+ CheckXLogRemoved(segno, tli);
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("unexpected WAL file size \"%s\"", walFileName)));
+ }
+
+ /* wal_segment_size is a multiple of 512, so no need for padding */
+
+ FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again after
+ * promotion of a new node. This is in line with walreceiver.c always
+ * doing an XLogArchiveForceDone() after a complete segment.
+ */
+ StatusFilePath(pathbuf, walFileName, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+
+ /*
+ * Send timeline history files too. Only the latest timeline history file
+ * is required for recovery, and even that only if there happens to be a
+ * timeline switch in the first WAL segment that contains the checkpoint
+ * record, or if we're taking a base backup from a standby server and the
+ * target timeline changes while the backup is taken. But they are small
+ * and highly useful for debugging purposes, so better include them all,
+ * always.
+ */
+ foreach(lc, historyFileList)
+ {
+ char *fname = lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", fname);
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
+ }
+}
+
+/*
+ * Setup and activate network throttling, if client requested it
+ */
+static void
+setup_throttle(int maxrate)
+{
+ if (maxrate > 0)
+ {
+ throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ /* Enable throttling. */
+ throttling_counter = 0;
+
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9b588c87a5..6d27ab9444 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -349,6 +349,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
bool needtblspcmapfile);
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
+extern void collectTablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile);
extern void do_pg_abort_backup(void);
extern SessionBackupState get_backup_status(void);
--
2.21.0 (Apple Git-122.2)
0006-parallel-backup-testcase_v7.patchapplication/octet-stream; name=0006-parallel-backup-testcase_v7.patchDownload
From 7e1c4c88a1f6ac6c323fee035569fa3f246a3972 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 21:54:23 +0500
Subject: [PATCH 6/7] parallel backup - testcase
---
.../t/040_pg_basebackup_parallel.pl | 527 ++++++++++++++++++
1 file changed, 527 insertions(+)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 0000000000..4ec4c1e0f6
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,527 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 95;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+# # Create a temporary directory in the system location and symlink it
+# # to our physical temp location. That way we can use shorter names
+# # for the tablespace directories, which hopefully won't run afoul of
+# # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+my $file;
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
--
2.21.0 (Apple Git-122.2)
0005-Parallel-Backup-pg_basebackup_v7.patchapplication/octet-stream; name=0005-Parallel-Backup-pg_basebackup_v7.patchDownload
From bcfe471e924a88cffda223577335ca0c878a3c9c Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 14 Oct 2019 17:28:58 +0500
Subject: [PATCH 5/7] Parallel Backup - pg_basebackup
Implements the replication commands added in the backend replication
system and adds support for --jobs=NUM in pg_basebackup to take a full
backup in parallel using multiple connections. The utility will collect
a list of files from the server first and then workers will copy files
(one by one) over COPY protocol.
---
src/bin/pg_basebackup/pg_basebackup.c | 766 ++++++++++++++++++++++++--
1 file changed, 735 insertions(+), 31 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 16886fbe71..c8a36a0c12 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -13,6 +13,7 @@
#include "postgres_fe.h"
+#include <pthread.h>
#include <unistd.h>
#include <dirent.h>
#include <sys/stat.h>
@@ -85,12 +86,65 @@ typedef struct UnpackTarState
const char *mapped_tblspc_path;
pgoff_t current_len_left;
int current_padding;
+ size_t current_bytes_read;
FILE *file;
} UnpackTarState;
typedef void (*WriteDataCallback) (size_t nbytes, char *buf,
void *callback_data);
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+
+ int tsIndex; /* index of tsInfo this file belongs to. */
+} BackupFile;
+
+typedef struct
+{
+ Oid tblspcOid;
+ char *tablespace; /* tablespace name or NULL if 'base'
+ * tablespace */
+ int numFiles; /* number of files */
+ BackupFile *backupFiles; /* list of files in a tablespace */
+} TablespaceInfo;
+
+typedef struct
+{
+ int tablespacecount;
+ int totalfiles;
+ int numWorkers;
+
+ char xlogstart[64];
+ char *backup_label;
+ char *tablespace_map;
+
+ TablespaceInfo *tsInfo;
+ BackupFile **files; /* list of BackupFile pointers */
+ int fileIndex; /* index of file to be fetched */
+
+ PGconn **workerConns;
+} BackupInfo;
+
+typedef struct
+{
+ BackupInfo *backupInfo;
+ uint64 bytesRead;
+
+ int workerid;
+ pthread_t worker;
+
+ bool terminated;
+} WorkerState;
+
+BackupInfo *backupInfo = NULL;
+WorkerState *workers = NULL;
+
+static pthread_mutex_t fetch_mutex = PTHREAD_MUTEX_INITIALIZER;
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -144,6 +198,9 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -174,10 +231,11 @@ static PQExpBuffer recoveryconfcontents = NULL;
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
+static void workers_progress_report(uint64 totalBytesRead, const char *filename, bool force);
static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
+static int ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
void *callback_data);
static void BaseBackup(void);
@@ -188,6 +246,18 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void ParallelBackupRun(BackupInfo *backupInfo);
+static void StopBackup(BackupInfo *backupInfo);
+static void GetBackupFileList(PGconn *conn, BackupInfo *backupInfo);
+static int GetBackupFile(WorkerState *wstate);
+static BackupFile *getNextFile(BackupInfo *backupInfo);
+static int compareFileSize(const void *a, const void *b);
+static void read_label_tblspcmap(PGconn *conn, char **backup_label, char **tablespace_map);
+static void create_backup_dirs(bool basetablespace, char *tablespace, char *name);
+static void create_tblspc_symlink(char *filename);
+static void writefile(char *path, char *buf);
+static void *workerRun(void *arg);
+
static void
cleanup_directories_atexit(void)
@@ -239,6 +309,18 @@ cleanup_directories_atexit(void)
static void
disconnect_atexit(void)
{
+ /* close worker connections */
+ if (backupInfo && backupInfo->workerConns != NULL)
+ {
+ int i;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ if (backupInfo->workerConns[i] != NULL)
+ PQfinish(backupInfo->workerConns[i]);
+ }
+ }
+
if (conn != NULL)
PQfinish(conn);
}
@@ -386,6 +468,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -732,6 +815,94 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Print a progress report of worker threads. If verbose output
+ * is enabled, also print the current file name.
+ *
+ * Progress report is written at maximum once per second, unless the
+ * force parameter is set to true.
+ */
+static void
+workers_progress_report(uint64 totalBytesRead, const char *filename, bool force)
+{
+ int percent;
+ char totalBytesRead_str[32];
+ char totalsize_str[32];
+ pg_time_t now;
+
+ if (!showprogress)
+ return;
+
+ now = time(NULL);
+ if (now == last_progress_report && !force)
+ return; /* Max once per second */
+
+ last_progress_report = now;
+ percent = totalsize_kb ? (int) ((totalBytesRead / 1024) * 100 / totalsize_kb) : 0;
+
+ /*
+ * Avoid overflowing past 100% or the full size. This may make the total
+ * size number change as we approach the end of the backup (the estimate
+ * will always be wrong if WAL is included), but that's better than having
+ * the done column be bigger than the total.
+ */
+ if (percent > 100)
+ percent = 100;
+ if (totalBytesRead / 1024 > totalsize_kb)
+ totalsize_kb = totalBytesRead / 1024;
+
+ /*
+ * Separate step to keep platform-dependent format code out of
+ * translatable strings. And we only test for INT64_FORMAT availability
+ * in snprintf, not fprintf.
+ */
+ snprintf(totalBytesRead_str, sizeof(totalBytesRead_str), INT64_FORMAT,
+ totalBytesRead / 1024);
+ snprintf(totalsize_str, sizeof(totalsize_str), INT64_FORMAT, totalsize_kb);
+
+#define VERBOSE_FILENAME_LENGTH 35
+
+ if (verbose)
+ {
+ if (!filename)
+
+ /*
+ * No filename given, so clear the status line (used for last
+ * call)
+ */
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied %*s"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent,
+ VERBOSE_FILENAME_LENGTH + 5, "");
+ else
+ {
+ bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied, current file (%s%-*.*s)"),
+ (int) strlen(totalsize_str), totalBytesRead_str, totalsize_str,
+ percent,
+ /* Prefix with "..." if we do leading truncation */
+ truncate ? "..." : "",
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ /* Truncate filename at beginning if it's too long */
+ truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ }
+ }
+ else
+ {
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent);
+ }
+
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\r");
+ else
+ fprintf(stderr, "\n");
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -748,7 +919,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
char totalsize_str[32];
pg_time_t now;
- if (!showprogress)
+ if (!showprogress || numWorkers > 1)
return;
now = time(NULL);
@@ -1409,8 +1580,10 @@ get_tablespace_mapping(const char *dir)
canonicalize_path(canon_dir);
for (cell = tablespace_dirs.head; cell; cell = cell->next)
+ {
if (strcmp(canon_dir, cell->old_dir) == 0)
return cell->new_dir;
+ }
return dir;
}
@@ -1425,7 +1598,7 @@ get_tablespace_mapping(const char *dir)
* specified directory. If it's for another tablespace, it will be restored
* in the original or mapped directory.
*/
-static void
+static int
ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
{
UnpackTarState state;
@@ -1456,13 +1629,12 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
exit(1);
}
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
/*
* No data is synced here, everything is done for all tablespaces at the
* end.
*/
+
+ return state.current_bytes_read;
}
static void
@@ -1485,6 +1657,7 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
exit(1);
}
totaldone += 512;
+ state->current_bytes_read += 512;
state->current_len_left = read_tar_number(©buf[124], 12);
@@ -1616,6 +1789,7 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
fclose(state->file);
state->file = NULL;
totaldone += r;
+ state->current_bytes_read += r;
return;
}
@@ -1625,6 +1799,7 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
exit(1);
}
totaldone += r;
+ state->current_bytes_read += r;
progress_report(state->tablespacenum, state->filename, false);
state->current_len_left -= r;
@@ -1724,16 +1899,28 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
- escaped_label,
- showprogress ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (numWorkers <= 1)
+ {
+ basebkp =
+ psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ escaped_label,
+ showprogress ? "PROGRESS" : "",
+ includewal == FETCH_WAL ? "WAL" : "",
+ fastcheckpoint ? "FAST" : "",
+ includewal == NO_WAL ? "" : "NOWAIT",
+ maxrate_clause ? maxrate_clause : "",
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ }
+ else
+ {
+ basebkp =
+ psprintf("START_BACKUP LABEL '%s' %s %s %s",
+ escaped_label,
+ showprogress ? "PROGRESS" : "",
+ fastcheckpoint ? "FAST" : "",
+ format == 't' ? "TABLESPACE_MAP" : "");
+ }
if (PQsendQuery(conn, basebkp) == 0)
{
@@ -1783,7 +1970,7 @@ BaseBackup(void)
/*
* Get the header
*/
- res = PQgetResult(conn);
+ tablespacehdr = res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not get backup header: %s",
@@ -1839,24 +2026,75 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
+ if (numWorkers > 1)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ int j = 0,
+ k = 0;
- if (showprogress)
+ backupInfo = palloc0(sizeof(BackupInfo));
+ backupInfo->workerConns = (PGconn **) palloc0(sizeof(PGconn *) * numWorkers);
+ backupInfo->tablespacecount = tablespacecount;
+ backupInfo->numWorkers = numWorkers;
+ strlcpy(backupInfo->xlogstart, xlogstart, sizeof(backupInfo->xlogstart));
+
+ read_label_tblspcmap(conn, &backupInfo->backup_label, &backupInfo->tablespace_map);
+
+ /* retrieve backup file list from the server. */
+ GetBackupFileList(conn, backupInfo);
+
+ /*
+ * add backup_label in backup, (for tar format, ReceiveTarFile() will
+ * take care of it).
+ */
+ if (format == 'p')
+ writefile("backup_label", backupInfo->backup_label);
+
+ /*
+ * Flatten the file list to avoid unnecessary locks and enable the
+ * sequential access to file list. (Creating an array of BackupFile
+ * structre pointers).
+ */
+ backupInfo->files =
+ (BackupFile **) palloc0(sizeof(BackupFile *) * backupInfo->totalfiles);
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ TablespaceInfo *curTsInfo = &backupInfo->tsInfo[i];
+
+ for (j = 0; j < curTsInfo->numFiles; j++)
+ {
+ backupInfo->files[k] = &curTsInfo->backupFiles[j];
+ k++;
+ }
+ }
+
+ ParallelBackupRun(backupInfo);
+ StopBackup(backupInfo);
+ }
+ else
{
- progress_report(PQntuples(res), NULL, true);
- if (isatty(fileno(stderr)))
- fprintf(stderr, "\n"); /* Need to move to next line */
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
+
+ if (showprogress)
+ {
+ progress_report(PQntuples(tablespacehdr), NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
}
+ /* Write recovery contents */
+ if (format == 'p' && writerecoveryconf)
+ WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
+
PQclear(res);
/*
@@ -2052,6 +2290,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2079,7 +2318,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2220,6 +2459,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2334,6 +2576,22 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (format != 'p' && numWorkers > 1)
+ {
+ pg_log_error("parallel jobs are only supported with 'plain' format");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2406,3 +2664,449 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+/*
+ * Thread worker
+ */
+static void *
+workerRun(void *arg)
+{
+ WorkerState *wstate = (WorkerState *) arg;
+
+ GetBackupFile(wstate);
+
+ wstate->terminated = true;
+ return NULL;
+}
+
+/*
+ * Runs the worker threads and updates progress until all workers have
+ * terminated/completed.
+ */
+static void
+ParallelBackupRun(BackupInfo *backupInfo)
+{
+ int status,
+ i;
+ bool threadsActive = true;
+ uint64 totalBytes = 0;
+
+ workers = (WorkerState *) palloc0(sizeof(WorkerState) * numWorkers);
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ worker->backupInfo = backupInfo;
+ worker->workerid = i;
+ worker->bytesRead = 0;
+ worker->terminated = false;
+
+ backupInfo->workerConns[i] = GetConnection();
+ status = pthread_create(&worker->worker, NULL, workerRun, worker);
+ if (status != 0)
+ {
+ pg_log_error("failed to create thread: %m");
+ exit(1);
+ }
+
+ if (verbose)
+ pg_log_info("backup worker (%d) created, %d", i, status);
+ }
+
+ /*
+ * This is the main thread for updating progress. It waits for workers to
+ * complete and gets updated status during every loop iteration.
+ */
+ while (threadsActive)
+ {
+ char *filename = NULL;
+
+ threadsActive = false;
+ totalBytes = 0;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ totalBytes += worker->bytesRead;
+ threadsActive |= !worker->terminated;
+ }
+
+ if (backupInfo->fileIndex < backupInfo->totalfiles)
+ filename = backupInfo->files[backupInfo->fileIndex]->path;
+
+ workers_progress_report(totalBytes, filename, false);
+ pg_usleep(100000);
+ }
+
+ if (showprogress)
+ {
+ workers_progress_report(totalBytes, NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
+}
+
+/*
+ * Take the system out of backup mode.
+ */
+static void
+StopBackup(BackupInfo *backupInfo)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP LABEL '%s' %s %s",
+ backupInfo->backup_label,
+ includewal == FETCH_WAL ? "WAL" : "",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* receive pg_control and wal files */
+ ReceiveAndUnpackTarFile(conn, res, tablespacecount);
+ PQclear(res);
+}
+
+/*
+ * Retrieve backup file list from the server and populate TablespaceInfo struct
+ * to keep track of tablespaces and its files.
+ */
+static void
+GetBackupFileList(PGconn *conn, BackupInfo *backupInfo)
+{
+ TablespaceInfo *tsInfo;
+ PGresult *res = NULL;
+ char *basebkp;
+ int i;
+
+ backupInfo->tsInfo = palloc0(sizeof(TablespaceInfo) * backupInfo->tablespacecount);
+ tsInfo = backupInfo->tsInfo;
+
+ /*
+ * Get list of files.
+ */
+ basebkp = psprintf("SEND_FILE_LIST");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "SEND_FILE_LIST", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /*
+ * The list of files is grouped by tablespaces, and we want to fetch them
+ * in the same order.
+ */
+ for (i = 0; i < backupInfo->tablespacecount; i++)
+ {
+ bool basetablespace;
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get backup header: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ tsInfo[i].tblspcOid = atol(PQgetvalue(tablespacehdr, i, 0));
+ tsInfo[i].tablespace = PQgetvalue(tablespacehdr, i, 1);
+ tsInfo[i].numFiles = PQntuples(res);
+ tsInfo[i].backupFiles = palloc0(sizeof(BackupFile) * tsInfo[i].numFiles);
+
+ /* keep count of all files in backup */
+ backupInfo->totalfiles += tsInfo[i].numFiles;
+
+ for (int j = 0; j < tsInfo[i].numFiles; j++)
+ {
+ char *path = PQgetvalue(res, j, 0);
+ char type = PQgetvalue(res, j, 1)[0];
+ int32 size = atol(PQgetvalue(res, j, 2));
+ time_t mtime = atol(PQgetvalue(res, j, 3));
+
+ /*
+ * In 'plain' format, create backup directories first.
+ */
+ if (format == 'p' && type == 'd')
+ {
+ create_backup_dirs(basetablespace, tsInfo[i].tablespace, path);
+ continue;
+ }
+
+ if (format == 'p' && type == 'l')
+ {
+ create_tblspc_symlink(path);
+ continue;
+ }
+
+ strlcpy(tsInfo[i].backupFiles[j].path, path, MAXPGPATH);
+ tsInfo[i].backupFiles[j].type = type;
+ tsInfo[i].backupFiles[j].size = size;
+ tsInfo[i].backupFiles[j].mtime = mtime;
+ tsInfo[i].backupFiles[j].tsIndex = i;
+ }
+
+ /* sort files in descending order, based on size */
+ qsort(tsInfo[i].backupFiles, tsInfo[i].numFiles,
+ sizeof(BackupFile), &compareFileSize);
+ PQclear(res);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s", PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+}
+
+/*
+ * Retrieve and write backup file from the server. The file list is provided by
+ * worker state. It pulls a single file from this list and writes it to the
+ * backup directory.
+ */
+static int
+GetBackupFile(WorkerState *wstate)
+{
+ PGresult *res = NULL;
+ PGconn *worker_conn = NULL;
+ BackupFile *fetchFile = NULL;
+ BackupInfo *backupInfo = NULL;
+
+ backupInfo = wstate->backupInfo;
+ worker_conn = backupInfo->workerConns[wstate->workerid];
+ while ((fetchFile = getNextFile(backupInfo)) != NULL)
+ {
+ PQExpBuffer buf = createPQExpBuffer();
+
+ /*
+ * Fetch a single file from the server. To fetch the file, build a
+ * query in form of:
+ *
+ * SEND_FILES ('base/1/1245/32683') [options]
+ */
+ appendPQExpBuffer(buf, "SEND_FILES ( '%s' )", fetchFile->path);
+
+ /* add options */
+ appendPQExpBuffer(buf, " START_WAL_LOCATION '%s' %s %s",
+ backupInfo->xlogstart,
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (maxrate > 0)
+ appendPQExpBuffer(buf, " MAX_RATE %u", maxrate);
+
+ if (!worker_conn)
+ return 1;
+
+ if (PQsendQuery(worker_conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(worker_conn));
+ return 1;
+ }
+
+ destroyPQExpBuffer(buf);
+
+ /* process file contents, also count bytesRead for progress */
+ wstate->bytesRead +=
+ ReceiveAndUnpackTarFile(worker_conn, tablespacehdr, fetchFile->tsIndex);
+
+ res = PQgetResult(worker_conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data stream: %s",
+ PQerrorMessage(worker_conn));
+ exit(1);
+ }
+
+ res = PQgetResult(worker_conn);
+ }
+
+ PQclear(res);
+ return 0;
+}
+
+/*
+ * Increment fileIndex and store it in a local variable so that even a
+ * context switch does not affect the file index value and we don't accidentally
+ * increment the value twice and therefore skip some files.
+ */
+static BackupFile *
+getNextFile(BackupInfo *backupInfo)
+{
+ int fileIndex = 0;
+
+ pthread_mutex_lock(&fetch_mutex);
+ fileIndex = backupInfo->fileIndex++;
+ pthread_mutex_unlock(&fetch_mutex);
+
+ if (fileIndex >= backupInfo->totalfiles)
+ return NULL;
+
+ return backupInfo->files[fileIndex];
+}
+
+/* qsort comparator for BackupFile (sort descending order) */
+static int
+compareFileSize(const void *a, const void *b)
+{
+ const BackupFile *v1 = (BackupFile *) a;
+ const BackupFile *v2 = (BackupFile *) b;
+
+ if (v1->size > v2->size)
+ return -1;
+ if (v1->size < v2->size)
+ return 1;
+
+ return 0;
+}
+
+static void
+read_label_tblspcmap(PGconn *conn, char **backuplabel, char **tblspc_map)
+{
+ PGresult *res = NULL;
+
+ Assert(backuplabel != NULL);
+ Assert(tblspc_map != NULL);
+
+ /*
+ * Get Backup label and tablespace map data.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ *backuplabel = PQgetvalue(res, 0, 0); /* backup_label */
+ if (!PQgetisnull(res, 0, 1))
+ *tblspc_map = PQgetvalue(res, 0, 1); /* tablespace_map */
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ PQclear(res);
+}
+
+/*
+ * Create backup directories while taking care of tablespace path. If tablespace
+ * mapping (with -T) is given then the directory will be created on the mapped
+ * path.
+ */
+static void
+create_backup_dirs(bool basetablespace, char *tablespace, char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ Assert(name != NULL);
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, name);
+ else
+ {
+ Assert(tablespace != NULL);
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tablespace), (name + strlen(tablespace) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Create a symlink in pg_tblspc and apply any tablespace mapping given on
+ * the command line (--tablespace-mapping).
+ */
+static void
+create_tblspc_symlink(char *filename)
+{
+ int i;
+
+ for (i = 0; i < tablespacecount; i++)
+ {
+ char *tsoid = PQgetvalue(tablespacehdr, i, 0);
+
+ if (strstr(filename, tsoid) != NULL)
+ {
+ char *linkloc = psprintf("%s/%s", basedir, filename);
+ const char *mapped_tblspc_path = get_tablespace_mapping(PQgetvalue(tablespacehdr, i, 1));
+
+#ifdef HAVE_SYMLINK
+ if (symlink(mapped_tblspc_path, linkloc) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ linkloc, mapped_tblspc_path);
+ exit(1);
+ }
+#else
+ pg_log_error("symlinks are not supported on this platform");
+ exit(1);
+#endif
+ free(linkloc);
+ break;
+ }
+ }
+}
+
+/*
+ * General function for writing to a file; creates one if it doesn't exist
+ */
+static void
+writefile(char *path, char *buf)
+{
+ FILE *f;
+ char pathbuf[MAXPGPATH];
+
+ snprintf(pathbuf, MAXPGPATH, "%s/%s", basedir, path);
+ f = fopen(pathbuf, "w");
+ if (f == NULL)
+ {
+ pg_log_error("could not open file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fwrite(buf, strlen(buf), 1, f) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fclose(f))
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+}
--
2.21.0 (Apple Git-122.2)
0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v7.patchapplication/octet-stream; name=0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v7.patchDownload
From 3184318ba43a238d5ca67cb46b0d72e632cd8f96 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 16:45:28 +0500
Subject: [PATCH 2/7] Rename sizeonly to dryrun for few functions in
basebackup.
---
src/backend/replication/basebackup.c | 44 ++++++++++++++--------------
src/include/replication/basebackup.h | 2 +-
2 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ea87676405..5774117089 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -54,15 +54,15 @@ typedef struct
} basebackup_options;
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(const char *path, int basepathlen, bool dryrun,
List *tablespaces, bool sendtblspclinks);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+ struct stat *statbuf, bool dryrun);
static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+ bool dryrun);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void base_backup_cleanup(int code, Datum arg);
@@ -959,13 +959,13 @@ sendFileWithContent(const char *filename, const char *content)
/*
* Include the tablespace directory pointed to by 'path' in the output tar
- * stream. If 'sizeonly' is true, we just calculate a total length and return
+ * stream. If 'dryrun' is true, we just calculate a total length and return
* it, without actually sending anything.
*
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool dryrun)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -995,17 +995,17 @@ sendTablespace(char *path, bool sizeonly)
}
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
- sizeonly);
+ dryrun);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
return size;
}
/*
* Include all files from the given directory in the output tar stream. If
- * 'sizeonly' is true, we just calculate a total length and return it, without
+ * 'dryrun' is true, we just calculate a total length and return it, without
* actually sending anything.
*
* Omit any directory in the tablespaces list, to avoid backing up
@@ -1016,7 +1016,7 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
bool sendtblspclinks)
{
DIR *dir;
@@ -1171,7 +1171,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
}
@@ -1187,7 +1187,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -1199,14 +1199,14 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ dryrun);
continue; /* don't recurse into pg_wal */
}
@@ -1238,7 +1238,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
linkpath[rllen] = '\0';
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
- &statbuf, sizeonly);
+ &statbuf, dryrun);
#else
/*
@@ -1262,7 +1262,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* permissions right.
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ dryrun);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1293,17 +1293,17 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ if (!dryrun)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
- if (sent || sizeonly)
+ if (sent || dryrun)
{
/* Add size, rounded up to 512byte block */
size += ((statbuf.st_size + 511) & ~511);
@@ -1612,12 +1612,12 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
static int64
_tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly)
+ struct stat *statbuf, bool dryrun)
{
char h[512];
enum tarError rc;
- if (!sizeonly)
+ if (!dryrun)
{
rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
@@ -1654,7 +1654,7 @@ _tarWriteHeader(const char *filename, const char *linktarget,
*/
static int64
_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+ bool dryrun)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1664,7 +1664,7 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
+ return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, dryrun);
}
/*
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..b55917b9b6 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool dryrun);
#endif /* _BASEBACKUP_H */
--
2.21.0 (Apple Git-122.2)
0001-removed-PG_ENSURE_ERROR_CLEANUP-macro-from-basebacku_v7.patchapplication/octet-stream; name=0001-removed-PG_ENSURE_ERROR_CLEANUP-macro-from-basebacku_v7.patchDownload
From 3830d0e89293d6f8a87fe5facc9c2c64209a2e3e Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 10:21:38 +0500
Subject: [PATCH 1/7] removed PG_ENSURE_ERROR_CLEANUP macro from basebackup,
instead registered base_backup_cleanup with before_shmem_exit handler. This
will make sure that cleanup is always done when wal sender exits.
---
src/backend/replication/basebackup.c | 182 +++++++++++++--------------
1 file changed, 90 insertions(+), 92 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1fa4551eff..ea87676405 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -243,6 +243,8 @@ perform_base_backup(basebackup_options *opt)
StringInfo tblspc_map_file = NULL;
int datadirpathlen;
List *tablespaces = NIL;
+ ListCell *lc;
+ tablespaceinfo *ti;
datadirpathlen = strlen(DataDir);
@@ -261,121 +263,117 @@ perform_base_backup(basebackup_options *opt)
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
* us to abort the backup so we don't "leak" a backup counter. For this
- * reason, *all* functionality between do_pg_start_backup() and the end of
- * do_pg_stop_backup() should be inside the error cleanup block!
+ * reason, register base_backup_cleanup with before_shmem_exit handler.
+ * This will make sure that call is always made when process exits. In
+ * success, do_pg_stop_backup will have taken the system out of backup
+ * mode and this callback will have no effect, Otherwise the required
+ * cleanup will be done in any case.
*/
+ before_shmem_exit(base_backup_cleanup, (Datum) 0);
- PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
- {
- ListCell *lc;
- tablespaceinfo *ti;
-
- SendXlogRecPtrResult(startptr, starttli);
+ SendXlogRecPtrResult(startptr, starttli);
- /*
- * Calculate the relative path of temporary statistics directory in
- * order to skip the files which are located in that directory later.
- */
- if (is_absolute_path(pgstat_stat_directory) &&
- strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
- else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
- statrelpath = psprintf("./%s", pgstat_stat_directory);
- else
- statrelpath = pgstat_stat_directory;
-
- /* Add a node for the base directory at the end */
- ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
- tablespaces = lappend(tablespaces, ti);
+ /*
+ * Calculate the relative path of temporary statistics directory in order
+ * to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ tablespaces = lappend(tablespaces, ti);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+ /* Send tablespace header */
+ SendBackupHeader(tablespaces);
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
+ /* Setup and activate network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ {
+ throttling_sample =
+ (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
- /* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
+ /* Enable throttling. */
+ throttling_counter = 0;
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
+ /* The 'real data' starts now (header was ignored). */
+ throttled_last = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* Disable throttling. */
+ throttling_counter = -1;
+ }
- if (ti->path == NULL)
- {
- struct stat statbuf;
+ /* Send off our tablespaces one by one */
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+ StringInfoData buf;
- /* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data);
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
- /*
- * Send tablespace_map file if required and then the bulk of
- * the files.
- */
- if (tblspc_map_file && opt->sendtblspcmapfile)
- {
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
- }
- else
- sendDir(".", 1, false, tablespaces, true);
+ if (ti->path == NULL)
+ {
+ struct stat statbuf;
- /* ... and pg_control after everything else. */
- if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
- }
- else
- sendTablespace(ti->path, false);
+ /* In the main tar, include the backup_label first... */
+ sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data);
/*
- * If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * Send tablespace_map file if required and then the bulk of the
+ * files.
*/
- if (opt->includewal && ti->path == NULL)
+ if (tblspc_map_file && opt->sendtblspcmapfile)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
+ sendDir(".", 1, false, tablespaces, false);
}
else
- pq_putemptymessage('c'); /* CopyDone */
+ sendDir(".", 1, false, tablespaces, true);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
+ else
+ sendTablespace(ti->path, false);
- endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
+ /*
+ * If we're including WAL, and this is the main data directory we
+ * don't terminate the tar stream here. Instead, we will append the
+ * xlog files below and terminate it then. This is safe since the main
+ * data directory is always sent *last*.
+ */
+ if (opt->includewal && ti->path == NULL)
+ {
+ Assert(lnext(tablespaces, lc) == NULL);
+ }
+ else
+ pq_putemptymessage('c'); /* CopyDone */
}
- PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
+ endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
if (opt->includewal)
{
--
2.21.0 (Apple Git-122.2)
On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally
ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I
have simplified the SendFiles function. Client can only request the regular files, any other kind such as
directories or symlinks will be skipped, the client will be responsible for taking care of such.
Hi,
Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().
So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".
0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:
- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work. Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.
- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.
- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.
I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.
Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.
One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.
A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:
+ /* ... and pg_control after everything else. */
...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.
Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Dec 19, 2019 at 10:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:I have updated the patches (v7 attached) and have taken care of all
issues pointed by Jeevan, additionally
ran the pgindent on each patch. Furthermore, Command names have been
renamed as suggested and I
have simplified the SendFiles function. Client can only request the
regular files, any other kind such as
directories or symlinks will be skipped, the client will be responsible
for taking care of such.
Hi,
Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".Sure. will do.
0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work.
I'll remove this duplication by eliminating this call from START_BACKUP and
SEND_FILE_LIST functions. More about this is explained later in this email.
Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.
The concurrent CREATE/DROP TABLESPACE commands, it can happen and will
be resolved by the WAL files collected for the backup. I don't think we
can do anything when objects are created or dropped in-between start and
stop backup. BASE_BACKUPalso relies on the WAL files to handle such a
scenario and does not error out when some relation files go away.
- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.
It's a file-level property only. Throttle functionality relies on global
variables. StartBackup() and StopBackup() are calling setup_throttle
function to disable the throttling.
I should have been more explicit here by using -1 to setup_throttle,
Illustrating that throttling is disabled, instead of using 'opt->maxrate'.
(Although it defaults to -1 for these functions).
I'll remove the setup_throttle() call for both functions.
- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.
As I understand you are suggesting to add another command to fetch the
list of WAL files which would be called by the client after executing stop
backup. Once the client gets that list, it starts requesting the WAL files
one
by one.
So I will add LIST_WAL_FILES command that will take start_lsn and end_lsn
as arguments and return the list of WAL files between these LSNs.
Something like this :
LIST_WAL_FILES 'start_lsn' 'end_lsn';
I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.
do_pg_start_backup is collecting the tablespace information anyway to
build the tablespace_map for BASE_BACKUP. So returning the same seemed
better than adding a new command for the same information. hence multiple
calls to the collectTablespaces() [to be renamed to collect_tablespaces].
tablespace_map can be constructed by the client, but then BASE_BACKUP
is returning it as part of the full backup. If clients in parallel mode
are to construct this themselves, these will seem like two different
approaches. Perhaps this should be done for BASE_BACKUP as
well?
I'll refactor the do_pg_start_backup function to remove the code related
to tablespace information collection (to collect_tablespaces) and
tablespace_map file creation, so that this function does not collect this
information unnecessarily. perform_base_backup function can collect and
send the tablespace information to the client and then the client can
construct the tablespace_map file.
I'll add a new command to fetch the list of tablespaces i.e.
LIST_TABLESPACES
which will return the tablespace information to the client for parallel
mode. And will refactor START_BACKUP and STOP_BACKUP commands,
so that they only do the specific job of putting the system in backup mode
or
out of it, nothing else.These commands should only return the start and end
LSN to the client.
Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.
That was a mistake in my code. maxrate should've been equally divided
amongst all threads. I agree that we should move this to the client-side.
When a thread exits, its share should also be equally divided amongst
the remaining threads (i.e. recalculate maxrate for each remaining thread).
Say we have 4 running threads with each allocation 25% of the bandwidth.
Thread 1 exits. We recalculate bandwidth and assign the remaining 3 threads
33.33% each. This solves one problem that you had identified. However,
it doesn't solve where one (or more) thread is not fully consuming their
allocated share. I'm not really sure how we can solve it properly.
Suggestions
are welcome.
A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:+ /* ... and pg_control after everything else. */
...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.
Thank you Robert for the detailed review. I really appreciate your insights
and very precise feedback.
After the changes suggested above, the design on a high level will look
something
like this:
=== SEQUENTIAL EXECUTION ===
START_BACKUP [LABEL | FAST]
- Starts backup on the server
- Returns the start LSN to client
LIST_TABLESPACES
- Sends a list of all tables spaces to client
Loops over LIST_TABLESPACES
- LIST_FILES [tablespace]
- Sends file list for the given tablespace
- Create a list of all files
=== PARALLEL EXECUTION ===
Thread loop until the list of files is exhausted
SEND_FILE <file(s)> [CHECKSUM | WAL_START_LOCATION]
- If the checksum is enabled then WAL_START_LOCATION is required.
- Can request server to send one or more files but we are requesting one at
a time
- Pick next file from list of files
- Threads sleep after the list is exhausted
- All threads are sleeping
=== SEQUENTIAL EXECUTION ===
STOP_BACKUP [NOWAIT]
- Stops backup mode
- Return end LSN
If --wal-method=fetch then
LIST_WAL_FILES 'start_lsn' 'end_lsn'
- Sends a list of WAL files between start LSN and end LSN
=== PARALLEL EXECUTION ===
Thread loop until the list of WAL files is exhausted
SEND_FILE <WAL file>
- Can request server to send one or more files but we are requesting one
WAL file at a time
- Pick next file from list of WAL files
- Threads terminate and set their status as completed/terminated
=== SEQUENTIAL EXECUTION ===
Cleanup
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Sat, Jan 4, 2020 at 11:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Thu, Dec 19, 2019 at 10:47 PM Robert Haas <robertmhaas@gmail.com>
wrote:On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:I have updated the patches (v7 attached) and have taken care of all
issues pointed by Jeevan, additionally
ran the pgindent on each patch. Furthermore, Command names have been
renamed as suggested and I
have simplified the SendFiles function. Client can only request the
regular files, any other kind such as
directories or symlinks will be skipped, the client will be responsible
for taking care of such.
Hi,
Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".Sure. will do.
0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work.I'll remove this duplication by eliminating this call from START_BACKUP and
SEND_FILE_LIST functions. More about this is explained later in this email.Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.The concurrent CREATE/DROP TABLESPACE commands, it can happen and will
be resolved by the WAL files collected for the backup. I don't think we
can do anything when objects are created or dropped in-between start and
stop backup. BASE_BACKUPalso relies on the WAL files to handle such a
scenario and does not error out when some relation files go away.- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.It's a file-level property only. Throttle functionality relies on global
variables. StartBackup() and StopBackup() are calling setup_throttle
function to disable the throttling.I should have been more explicit here by using -1 to setup_throttle,
Illustrating that throttling is disabled, instead of using 'opt->maxrate'.
(Although it defaults to -1 for these functions).I'll remove the setup_throttle() call for both functions.
- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.As I understand you are suggesting to add another command to fetch the
list of WAL files which would be called by the client after executing stop
backup. Once the client gets that list, it starts requesting the WAL files
one
by one.So I will add LIST_WAL_FILES command that will take start_lsn and end_lsn
as arguments and return the list of WAL files between these LSNs.Something like this :
LIST_WAL_FILES 'start_lsn' 'end_lsn';I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.do_pg_start_backup is collecting the tablespace information anyway to
build the tablespace_map for BASE_BACKUP. So returning the same seemed
better than adding a new command for the same information. hence multiple
calls to the collectTablespaces() [to be renamed to collect_tablespaces].tablespace_map can be constructed by the client, but then BASE_BACKUP
is returning it as part of the full backup. If clients in parallel mode
are to construct this themselves, these will seem like two different
approaches. Perhaps this should be done for BASE_BACKUP as
well?I'll refactor the do_pg_start_backup function to remove the code related
to tablespace information collection (to collect_tablespaces) and
tablespace_map file creation, so that this function does not collect this
information unnecessarily. perform_base_backup function can collect and
send the tablespace information to the client and then the client can
construct the tablespace_map file.I'll add a new command to fetch the list of tablespaces i.e.
LIST_TABLESPACES
which will return the tablespace information to the client for parallel
mode. And will refactor START_BACKUP and STOP_BACKUP commands,
so that they only do the specific job of putting the system in backup mode
or
out of it, nothing else.These commands should only return the start and end
LSN to the client.Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.That was a mistake in my code. maxrate should've been equally divided
amongst all threads. I agree that we should move this to the client-side.
When a thread exits, its share should also be equally divided amongst
the remaining threads (i.e. recalculate maxrate for each remaining
thread).Say we have 4 running threads with each allocation 25% of the bandwidth.
Thread 1 exits. We recalculate bandwidth and assign the remaining 3 threads
33.33% each. This solves one problem that you had identified. However,
it doesn't solve where one (or more) thread is not fully consuming their
allocated share. I'm not really sure how we can solve it properly.
Suggestions
are welcome.A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:+ /* ... and pg_control after everything else. */
...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.Thank you Robert for the detailed review. I really appreciate your insights
and very precise feedback.After the changes suggested above, the design on a high level will look
something
like this:=== SEQUENTIAL EXECUTION ===
START_BACKUP [LABEL | FAST]
- Starts backup on the server
- Returns the start LSN to clientLIST_TABLESPACES
- Sends a list of all tables spaces to clientLoops over LIST_TABLESPACES
- LIST_FILES [tablespace]
- Sends file list for the given tablespace
- Create a list of all files=== PARALLEL EXECUTION ===
Thread loop until the list of files is exhausted
SEND_FILE <file(s)> [CHECKSUM | WAL_START_LOCATION]
- If the checksum is enabled then WAL_START_LOCATION is required.
- Can request server to send one or more files but we are requesting one
at a time
- Pick next file from list of files- Threads sleep after the list is exhausted
- All threads are sleeping=== SEQUENTIAL EXECUTION ===
STOP_BACKUP [NOWAIT]
- Stops backup mode
- Return end LSNIf --wal-method=fetch then
LIST_WAL_FILES 'start_lsn' 'end_lsn'
- Sends a list of WAL files between start LSN and end LSN=== PARALLEL EXECUTION ===
Thread loop until the list of WAL files is exhausted
SEND_FILE <WAL file>
- Can request server to send one or more files but we are requesting one
WAL file at a time
- Pick next file from list of WAL files- Threads terminate and set their status as completed/terminated
=== SEQUENTIAL EXECUTION ===
Cleanup
Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):
START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]
Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the
do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.
I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]/messages/by-id/521B4B29.20009@2ndquadrant.com
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many suggestions,
it was moved to server-side.
So, I have a few suggestions on how we can implement this:
1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This
approach
can be implemented on the client-side as well as on the server-side.
2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers finishes.
I believe this would only be possible if we handle throttling on the client.
Also, as I understand it, implementing this will introduce additional mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.
[1]: /messages/by-id/521B4B29.20009@2ndquadrant.com
/messages/by-id/521B4B29.20009@2ndquadrant.com
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0001-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb.patchapplication/octet-stream; name=0001-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb.patchDownload
From 4e6639ee5a2d0d519ef3755ba5efb1afbe7e6626 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 16:45:28 +0500
Subject: [PATCH 1/5] Rename sizeonly to dryrun for few functions in
basebackup.
---
src/backend/replication/basebackup.c | 44 ++++++++++++++--------------
src/include/replication/basebackup.h | 2 +-
2 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index dea8aab45e..e90ca6184b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -54,15 +54,15 @@ typedef struct
} basebackup_options;
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(const char *path, int basepathlen, bool dryrun,
List *tablespaces, bool sendtblspclinks);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+ struct stat *statbuf, bool dryrun);
static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+ bool dryrun);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
@@ -949,13 +949,13 @@ sendFileWithContent(const char *filename, const char *content)
/*
* Include the tablespace directory pointed to by 'path' in the output tar
- * stream. If 'sizeonly' is true, we just calculate a total length and return
+ * stream. If 'dryrun' is true, we just calculate a total length and return
* it, without actually sending anything.
*
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool dryrun)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -985,17 +985,17 @@ sendTablespace(char *path, bool sizeonly)
}
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
- sizeonly);
+ dryrun);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
return size;
}
/*
* Include all files from the given directory in the output tar stream. If
- * 'sizeonly' is true, we just calculate a total length and return it, without
+ * 'dryrun' is true, we just calculate a total length and return it, without
* actually sending anything.
*
* Omit any directory in the tablespaces list, to avoid backing up
@@ -1006,7 +1006,7 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
bool sendtblspclinks)
{
DIR *dir;
@@ -1161,7 +1161,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
}
@@ -1177,7 +1177,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -1189,14 +1189,14 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ dryrun);
continue; /* don't recurse into pg_wal */
}
@@ -1228,7 +1228,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
linkpath[rllen] = '\0';
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
- &statbuf, sizeonly);
+ &statbuf, dryrun);
#else
/*
@@ -1252,7 +1252,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* permissions right.
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ dryrun);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1283,17 +1283,17 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ if (!dryrun)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid);
- if (sent || sizeonly)
+ if (sent || dryrun)
{
/* Add size, rounded up to 512byte block */
size += ((statbuf.st_size + 511) & ~511);
@@ -1602,12 +1602,12 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
static int64
_tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly)
+ struct stat *statbuf, bool dryrun)
{
char h[512];
enum tarError rc;
- if (!sizeonly)
+ if (!dryrun)
{
rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
@@ -1644,7 +1644,7 @@ _tarWriteHeader(const char *filename, const char *linktarget,
*/
static int64
_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+ bool dryrun)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1654,7 +1654,7 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
+ return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, dryrun);
}
/*
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 07ed281bd6..e0210def6f 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool dryrun);
#endif /* _BASEBACKUP_H */
--
2.21.1 (Apple Git-122.3)
0002-Refactor-some-backup-code-to-increase-reusability.-T.patchapplication/octet-stream; name=0002-Refactor-some-backup-code-to-increase-reusability.-T.patchDownload
From a8bc5cd2f3313ef768dac7cd732fd8c4f9ee6fdd Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 27 Jan 2020 17:48:10 +0500
Subject: [PATCH 2/5] Refactor some backup code to increase reusability. This
commit adds two functions; collect_tablespaces and collect_wal_files. The
code related to collect tablespace information is moved from
do_pg_start_backup to collect_tablespaces function. Also, the code to collect
wal files is moved from perform_base_backup to collect_wal_files.
This does not introduce any functional changes.
---
src/backend/access/transam/xlog.c | 191 ++++++++++++-----------
src/backend/replication/basebackup.c | 217 +++++++++++++++------------
src/include/access/xlog.h | 2 +
3 files changed, 219 insertions(+), 191 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6e09ded597..0a2eb29c75 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10306,10 +10306,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
{
bool gotUniqueStartpoint = false;
- DIR *tblspcdir;
- struct dirent *de;
- tablespaceinfo *ti;
- int datadirpathlen;
/*
* Force an XLOG file switch before the checkpoint, to ensure that the
@@ -10435,93 +10431,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
if (exclusive)
tblspcmapfile = makeStringInfo();
- datadirpathlen = strlen(DataDir);
-
- /* Collect information about all tablespaces */
- tblspcdir = AllocateDir("pg_tblspc");
- while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
- {
- char fullpath[MAXPGPATH + 10];
- char linkpath[MAXPGPATH];
- char *relpath = NULL;
- int rllen;
- StringInfoData buflinkpath;
- char *s = linkpath;
-
- /* Skip special stuff */
- if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
- continue;
-
- snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
-
-#if defined(HAVE_READLINK) || defined(WIN32)
- rllen = readlink(fullpath, linkpath, sizeof(linkpath));
- if (rllen < 0)
- {
- ereport(WARNING,
- (errmsg("could not read symbolic link \"%s\": %m",
- fullpath)));
- continue;
- }
- else if (rllen >= sizeof(linkpath))
- {
- ereport(WARNING,
- (errmsg("symbolic link \"%s\" target is too long",
- fullpath)));
- continue;
- }
- linkpath[rllen] = '\0';
-
- /*
- * Add the escape character '\\' before newline in a string to
- * ensure that we can distinguish between the newline in the
- * tablespace path and end of line while reading tablespace_map
- * file during archive recovery.
- */
- initStringInfo(&buflinkpath);
-
- while (*s)
- {
- if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
- appendStringInfoChar(&buflinkpath, '\\');
- appendStringInfoChar(&buflinkpath, *s++);
- }
-
- /*
- * Relpath holds the relative path of the tablespace directory
- * when it's located within PGDATA, or NULL if it's located
- * elsewhere.
- */
- if (rllen > datadirpathlen &&
- strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
- IS_DIR_SEP(linkpath[datadirpathlen]))
- relpath = linkpath + datadirpathlen + 1;
-
- ti = palloc(sizeof(tablespaceinfo));
- ti->oid = pstrdup(de->d_name);
- ti->path = pstrdup(buflinkpath.data);
- ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
-
- if (tablespaces)
- *tablespaces = lappend(*tablespaces, ti);
-
- appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
-
- pfree(buflinkpath.data);
-#else
-
- /*
- * If the platform does not have symbolic links, it should not be
- * possible to have tablespaces - clearly somebody else created
- * them. Warn about it and ignore.
- */
- ereport(WARNING,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("tablespaces are not supported on this platform")));
-#endif
- }
- FreeDir(tblspcdir);
+ collect_tablespaces(tablespaces, tblspcmapfile, infotbssize, needtblspcmapfile);
/*
* Construct backup label file
@@ -12323,3 +12233,102 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Collect information about all tablespaces.
+ */
+void
+collect_tablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile)
+{
+ DIR *tblspcdir;
+ struct dirent *de;
+ tablespaceinfo *ti;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ tblspcdir = AllocateDir("pg_tblspc");
+ while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+ {
+ char fullpath[MAXPGPATH + 10];
+ char linkpath[MAXPGPATH];
+ char *relpath = NULL;
+ int rllen;
+ StringInfoData buflinkpath;
+ char *s = linkpath;
+
+ /* Skip special stuff */
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+
+#if defined(HAVE_READLINK) || defined(WIN32)
+ rllen = readlink(fullpath, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not read symbolic link \"%s\": %m",
+ fullpath)));
+ continue;
+ }
+ else if (rllen >= sizeof(linkpath))
+ {
+ ereport(WARNING,
+ (errmsg("symbolic link \"%s\" target is too long",
+ fullpath)));
+ continue;
+ }
+ linkpath[rllen] = '\0';
+
+ /*
+ * Add the escape character '\\' before newline in a string to ensure
+ * that we can distinguish between the newline in the tablespace path
+ * and end of line while reading tablespace_map file during archive
+ * recovery.
+ */
+ initStringInfo(&buflinkpath);
+
+ while (*s)
+ {
+ if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
+ appendStringInfoChar(&buflinkpath, '\\');
+ appendStringInfoChar(&buflinkpath, *s++);
+ }
+
+ /*
+ * Relpath holds the relative path of the tablespace directory when
+ * it's located within PGDATA, or NULL if it's located elsewhere.
+ */
+ if (rllen > datadirpathlen &&
+ strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
+ IS_DIR_SEP(linkpath[datadirpathlen]))
+ relpath = linkpath + datadirpathlen + 1;
+
+ ti = palloc(sizeof(tablespaceinfo));
+ ti->oid = pstrdup(de->d_name);
+ ti->path = pstrdup(buflinkpath.data);
+ ti->rpath = relpath ? pstrdup(relpath) : NULL;
+ ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+
+ if (tablespaces)
+ *tablespaces = lappend(*tablespaces, ti);
+
+ appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
+
+ pfree(buflinkpath.data);
+#else
+
+ /*
+ * If the platform does not have symbolic links, it should not be
+ * possible to have tablespaces - clearly somebody else created them.
+ * Warn about it and ignore.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("tablespaces are not supported on this platform")));
+#endif
+ }
+ FreeDir(tblspcdir);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e90ca6184b..9583277224 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -66,6 +66,8 @@ static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *sta
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
+static List *collect_wal_files(XLogRecPtr startptr, XLogRecPtr endptr,
+ List **historyFileList);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
@@ -373,112 +375,13 @@ perform_base_backup(basebackup_options *opt)
*/
char pathbuf[MAXPGPATH];
XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
struct stat statbuf;
List *historyFileList = NIL;
List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
ListCell *lc;
TimeLineID tli;
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
+ walFileList = collect_wal_files(startptr, endptr, &historyFileList);
/* Ok, we have everything we need. Send the WAL files. */
foreach(lc, walFileList)
{
@@ -611,6 +514,120 @@ perform_base_backup(basebackup_options *opt)
}
+/*
+ * construct a list of WAL files to be included in the backup.
+ */
+static List *
+collect_wal_files(XLogRecPtr startptr, XLogRecPtr endptr, List **historyFileList)
+{
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and include
+ * all WAL files in the range between 'startptr' and 'endptr', regardless
+ * of the timeline the file is stamped with. If there are some spurious
+ * WAL files belonging to timelines that don't belong in this server's
+ * history, they will be included too. Normally there shouldn't be such
+ * files, but if there are, there's little harm in including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ if (historyFileList)
+ *historyFileList = lappend(*historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we need
+ * were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from oldest
+ * to newest, to reduce the chance that a file is recycled before we get a
+ * chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since we
+ * are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ return walFileList;
+}
+
/*
* list_sort comparison function, to compare log/seg portion of WAL segment
* filenames, ignoring the timeline portion.
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 98b033fc20..22fe35801d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -350,6 +350,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
extern void do_pg_abort_backup(int code, Datum arg);
+extern void collect_tablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile);
extern void register_persistent_abort_backup_handler(void);
extern SessionBackupState get_backup_status(void);
--
2.21.1 (Apple Git-122.3)
0004-Parallel-Backup-pg_basebackup.patchapplication/octet-stream; name=0004-Parallel-Backup-pg_basebackup.patchDownload
From 68e6785cfc6de4b2b3fa338f8ca9782c3161b34b Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 27 Jan 2020 18:56:21 +0500
Subject: [PATCH 4/5] Parallel Backup - pg_basebackup
Implements the replication commands added in the backend replication
system and adds support for --jobs=NUM in pg_basebackup to take a full
backup in parallel using multiple connections. The utility will collect
a list of files from the server first and then workers will copy files
(one by one) over COPY protocol. The WAL files are also copied in similar
manner.
---
src/bin/pg_basebackup/pg_basebackup.c | 1029 +++++++++++++++++++++++--
1 file changed, 964 insertions(+), 65 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 238b671f7a..01f4122754 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -13,6 +13,7 @@
#include "postgres_fe.h"
+#include <pthread.h>
#include <unistd.h>
#include <dirent.h>
#include <sys/stat.h>
@@ -85,12 +86,65 @@ typedef struct UnpackTarState
const char *mapped_tblspc_path;
pgoff_t current_len_left;
int current_padding;
+ size_t current_bytes_read;
FILE *file;
} UnpackTarState;
typedef void (*WriteDataCallback) (size_t nbytes, char *buf,
void *callback_data);
+typedef struct BackupFile
+{
+ char path[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+
+ int tsindex; /* index of tsInfo this file belongs to. */
+ struct BackupFile *next;
+} BackupFile;
+
+typedef enum BackupState
+{
+ PB_FETCH_REL_LIST,
+ PB_FETCH_REL_FILES,
+ PB_FETCH_WAL_LIST,
+ PB_FETCH_WAL_FILES,
+ PB_STOP_BACKUP,
+ PB_BACKUP_COMPLETE
+} BackupState;
+
+typedef struct BackupInfo
+{
+ int totalfiles;
+ uint64 bytes_skipped;
+ char xlogstart[64];
+ char xlogend[64];
+ BackupFile *files; /* list of BackupFile pointers */
+ BackupFile *curr; /* pointer to the file in the list */
+ BackupState backupstate;
+ bool workersdone;
+ int activeworkers;
+} BackupInfo;
+
+typedef struct WorkerState
+{
+ pthread_t worker;
+ int workerid;
+ BackupInfo *backupinfo;
+ PGconn *conn;
+ uint64 bytesread;
+} WorkerState;
+
+BackupInfo *backupinfo = NULL;
+WorkerState *workers = NULL;
+
+/* lock to be used for fetching file from the files list. */
+static pthread_mutex_t fetch_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* condition to be used when the files list is filled. */
+static pthread_cond_t data_ready = PTHREAD_COND_INITIALIZER;
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -144,6 +198,9 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -174,10 +231,12 @@ static PQExpBuffer recoveryconfcontents = NULL;
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
+static void workers_progress_report(uint64 totalBytesRead,
+ const char *filename, bool force);
static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
+static int ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
void *callback_data);
static void BaseBackup(void);
@@ -188,6 +247,21 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void *worker_run(void *arg);
+static void create_parallel_workers(BackupInfo *backupInfo);
+static void parallel_backup_run(BackupInfo *backupInfo);
+static void cleanup_workers(void);
+static void stop_backup(void);
+static void get_backup_filelist(PGconn *conn, BackupInfo *backupInfo);
+static void get_wal_filelist(PGconn *conn, BackupInfo *backupInfo,
+ char *xlogstart, char *xlogend);
+static void free_filelist(BackupInfo *backupInfo);
+static int worker_get_files(WorkerState *wstate);
+static int receive_file(PGconn *conn, char *file, int tsIndex);
+static void create_backup_dirs(bool basetablespace, char *tablespace,
+ char *name);
+static void create_tblspc_symlink(char *filename);
+static void writefile(char *path, char *buf);
static void
cleanup_directories_atexit(void)
@@ -239,6 +313,8 @@ cleanup_directories_atexit(void)
static void
disconnect_atexit(void)
{
+ cleanup_workers();
+
if (conn != NULL)
PQfinish(conn);
}
@@ -386,6 +462,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -732,6 +809,94 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Print a progress report of worker threads. If verbose output
+ * is enabled, also print the current file name.
+ *
+ * Progress report is written at maximum once per second, unless the
+ * force parameter is set to true.
+ */
+static void
+workers_progress_report(uint64 totalBytesRead, const char *filename, bool force)
+{
+ int percent;
+ char totalBytesRead_str[32];
+ char totalsize_str[32];
+ pg_time_t now;
+
+ if (!showprogress)
+ return;
+
+ now = time(NULL);
+ if (now == last_progress_report && !force)
+ return; /* Max once per second */
+
+ last_progress_report = now;
+ percent = totalsize_kb ? (int) ((totalBytesRead / 1024) * 100 / totalsize_kb) : 0;
+
+ /*
+ * Avoid overflowing past 100% or the full size. This may make the total
+ * size number change as we approach the end of the backup (the estimate
+ * will always be wrong if WAL is included), but that's better than having
+ * the done column be bigger than the total.
+ */
+ if (percent > 100)
+ percent = 100;
+ if (totalBytesRead / 1024 > totalsize_kb)
+ totalsize_kb = totalBytesRead / 1024;
+
+ /*
+ * Separate step to keep platform-dependent format code out of
+ * translatable strings. And we only test for INT64_FORMAT availability
+ * in snprintf, not fprintf.
+ */
+ snprintf(totalBytesRead_str, sizeof(totalBytesRead_str), INT64_FORMAT,
+ totalBytesRead / 1024);
+ snprintf(totalsize_str, sizeof(totalsize_str), INT64_FORMAT, totalsize_kb);
+
+#define VERBOSE_FILENAME_LENGTH 35
+
+ if (verbose)
+ {
+ if (!filename)
+
+ /*
+ * No filename given, so clear the status line (used for last
+ * call)
+ */
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied %*s"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent,
+ VERBOSE_FILENAME_LENGTH + 5, "");
+ else
+ {
+ bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied, current file (%s%-*.*s)"),
+ (int) strlen(totalsize_str), totalBytesRead_str, totalsize_str,
+ percent,
+ /* Prefix with "..." if we do leading truncation */
+ truncate ? "..." : "",
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ /* Truncate filename at beginning if it's too long */
+ truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ }
+ }
+ else
+ {
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent);
+ }
+
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\r");
+ else
+ fprintf(stderr, "\n");
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -748,7 +913,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
char totalsize_str[32];
pg_time_t now;
- if (!showprogress)
+ if (!showprogress || numWorkers > 1)
return;
now = time(NULL);
@@ -1425,7 +1590,7 @@ get_tablespace_mapping(const char *dir)
* specified directory. If it's for another tablespace, it will be restored
* in the original or mapped directory.
*/
-static void
+static int
ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
{
UnpackTarState state;
@@ -1456,13 +1621,12 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
exit(1);
}
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
/*
* No data is synced here, everything is done for all tablespaces at the
* end.
*/
+
+ return state.current_bytes_read;
}
static void
@@ -1485,6 +1649,7 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
exit(1);
}
totaldone += 512;
+ state->current_bytes_read += 512;
state->current_len_left = read_tar_number(©buf[124], 12);
@@ -1616,6 +1781,7 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
fclose(state->file);
state->file = NULL;
totaldone += r;
+ state->current_bytes_read += r;
return;
}
@@ -1625,6 +1791,7 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
exit(1);
}
totaldone += r;
+ state->current_bytes_read += r;
progress_report(state->tablespacenum, state->filename, false);
state->current_len_left -= r;
@@ -1724,16 +1891,26 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
- escaped_label,
- showprogress ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (numWorkers <= 1)
+ {
+ basebkp =
+ psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ escaped_label,
+ showprogress ? "PROGRESS" : "",
+ includewal == FETCH_WAL ? "WAL" : "",
+ fastcheckpoint ? "FAST" : "",
+ includewal == NO_WAL ? "" : "NOWAIT",
+ maxrate_clause ? maxrate_clause : "",
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ }
+ else
+ {
+ basebkp =
+ psprintf("START_BACKUP LABEL '%s' %s",
+ escaped_label,
+ fastcheckpoint ? "FAST" : "");
+ }
if (PQsendQuery(conn, basebkp) == 0)
{
@@ -1780,10 +1957,36 @@ BaseBackup(void)
pg_log_info("write-ahead log start point: %s on timeline %u",
xlogstart, starttli);
+ if (numWorkers > 1)
+ {
+ /*
+ * Finish up the START_BACKUP command execution and make sure we have
+ * CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data for '%s': %s", "START_BACKUP",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+
+ basebkp = psprintf("LIST_TABLESPACES %s",
+ showprogress ? "PROGRESS" : "");
+
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "LIST_TABLESPACES", PQerrorMessage(conn));
+ exit(1);
+ }
+ }
+
/*
* Get the header
*/
- res = PQgetResult(conn);
+ tablespacehdr = res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not get backup header: %s",
@@ -1839,65 +2042,98 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
- {
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
-
- if (showprogress)
+ if (numWorkers <= 1)
{
- progress_report(PQntuples(res), NULL, true);
- if (isatty(fileno(stderr)))
- fprintf(stderr, "\n"); /* Need to move to next line */
- }
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
- PQclear(res);
+ if (showprogress)
+ {
+ progress_report(PQntuples(tablespacehdr), NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
- /*
- * Get the stop position
- */
- res = PQgetResult(conn);
- if (PQresultStatus(res) != PGRES_TUPLES_OK)
- {
- pg_log_error("could not get write-ahead log end position from server: %s",
- PQerrorMessage(conn));
- exit(1);
- }
- if (PQntuples(res) != 1)
- {
- pg_log_error("no write-ahead log end position returned from server");
- exit(1);
- }
- strlcpy(xlogend, PQgetvalue(res, 0, 0), sizeof(xlogend));
- if (verbose && includewal != NO_WAL)
- pg_log_info("write-ahead log end point: %s", xlogend);
- PQclear(res);
+ PQclear(res);
- res = PQgetResult(conn);
- if (PQresultStatus(res) != PGRES_COMMAND_OK)
- {
- const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+ /*
+ * Get the stop position
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get write-ahead log end position from server: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) != 1)
+ {
+ pg_log_error("no write-ahead log end position returned from server");
+ exit(1);
+ }
+ strlcpy(xlogend, PQgetvalue(res, 0, 0), sizeof(xlogend));
+ if (verbose && includewal != NO_WAL)
+ pg_log_info("write-ahead log end point: %s", xlogend);
+ PQclear(res);
- if (sqlstate &&
- strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
{
- pg_log_error("checksum error occurred");
- checksum_failure = true;
+ const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+ if (sqlstate &&
+ strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+ {
+ pg_log_error("checksum error occurred");
+ checksum_failure = true;
+ }
+ else
+ {
+ pg_log_error("final receive failed: %s",
+ PQerrorMessage(conn));
+ }
+ exit(1);
}
- else
+ }
+
+ if (numWorkers > 1)
+ {
+ /*
+ * Finish up the LIST_TABLESPACES command execution and make sure we
+ * have CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
{
- pg_log_error("final receive failed: %s",
+ pg_log_error("could not get data for '%s': %s", "LIST_TABLESPACES",
PQerrorMessage(conn));
+ exit(1);
}
- exit(1);
+ res = PQgetResult(conn);
+
+ backupinfo = palloc0(sizeof(BackupInfo));
+ backupinfo->backupstate = PB_FETCH_REL_LIST;
+
+ /* copy starting WAL location */
+ strlcpy(backupinfo->xlogstart, xlogstart, sizeof(backupinfo->xlogstart));
+ create_parallel_workers(backupinfo);
+ parallel_backup_run(backupinfo);
+ /* copy ending WAL location */
+ strlcpy(xlogend, backupinfo->xlogend, sizeof(xlogend));
}
+ /* Write recovery contents */
+ if (format == 'p' && writerecoveryconf)
+ WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
+
if (bgchild > 0)
{
#ifndef WIN32
@@ -2052,6 +2288,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2079,7 +2316,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2220,6 +2457,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2334,6 +2574,30 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (format != 'p' && numWorkers > 1)
+ {
+ pg_log_error("parallel jobs are only supported with 'plain' format");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (maxrate > 0 && numWorkers > 1)
+ {
+ pg_log_error("--max-rate is not supported with parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2406,3 +2670,638 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+/*
+ * Worker thread function. Added for code readability.
+ */
+static void *
+worker_run(void *arg)
+{
+ WorkerState *wstate = (WorkerState *) arg;
+
+ worker_get_files(wstate);
+
+ return NULL;
+}
+
+/*
+ * Create workers and initialize worker state.
+ */
+static void
+create_parallel_workers(BackupInfo *backupinfo)
+{
+ int status,
+ i;
+
+ workers = (WorkerState *) palloc(sizeof(WorkerState) * numWorkers);
+ backupinfo->activeworkers = 0;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ worker->backupinfo = backupinfo;
+ worker->bytesread = 0;
+ worker->workerid = i;
+ worker->conn = GetConnection();
+ backupinfo->activeworkers++;
+
+ status = pthread_create(&worker->worker, NULL, worker_run, worker);
+ if (status != 0)
+ {
+ pg_log_error("failed to create thread: %m");
+ exit(1);
+ }
+
+ if (verbose)
+ pg_log_info("backup worker (%d) created, %d", i, status);
+ }
+}
+
+/*
+ * This is the main function that controls the worker, assign tasks and does
+ * cleanup.
+ */
+static void
+parallel_backup_run(BackupInfo *backupinfo)
+{
+ uint64_t totalread = 0;
+
+ while (1)
+ {
+ char *filename = NULL;
+
+ switch (backupinfo->backupstate)
+ {
+ case PB_FETCH_REL_LIST: /* get the list of files to fetch */
+ backupinfo->backupstate = PB_FETCH_REL_FILES;
+ /* retrieve backup file list from the server. */
+ get_backup_filelist(conn, backupinfo);
+ /* unblock any workers waiting on the condition */
+ pthread_cond_broadcast(&data_ready);
+ break;
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_LIST: /* get the list of WAL files to fetch */
+ backupinfo->backupstate = PB_FETCH_WAL_FILES;
+ get_wal_filelist(conn, backupinfo, backupinfo->xlogstart, backupinfo->xlogend);
+ /* unblock any workers waiting on the condition */
+ pthread_cond_broadcast(&data_ready);
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;
+ case PB_STOP_BACKUP:
+
+ /*
+ * All relation files have been fetched, time to stop the
+ * backup, making sure to fetch the WAL files first (if needs
+ * be).
+ */
+ if (includewal == FETCH_WAL)
+ backupinfo->backupstate = PB_FETCH_WAL_LIST;
+ else
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+
+ /* get the pg_control file at last. */
+ receive_file(conn, "global/pg_control", tablespacecount - 1);
+ stop_backup();
+ break;
+ case PB_BACKUP_COMPLETE:
+
+ /*
+ * All relation and WAL files, (if needed) have been fetched,
+ * now we can safly stop all workers and finish up.
+ */
+ cleanup_workers();
+ if (showprogress)
+ {
+ workers_progress_report(totalread, NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
+
+ /* nothing more to do here */
+ return;
+ break;
+ default:
+ /* shouldn't come here. */
+ pg_log_error("unexpected backup state: %d",
+ backupinfo->backupstate);
+ exit(1);
+ break;
+ }
+
+ /* update and report progress */
+ totalread = 0;
+ for (int i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ totalread += worker->bytesread;
+ }
+ totalread += backupinfo->bytes_skipped;
+
+ if (backupinfo->curr != NULL)
+ filename = backupinfo->curr->path;
+
+ workers_progress_report(totalread, filename, false);
+ pg_usleep(100000);
+ }
+}
+
+/*
+ * Wait for the workers to complete the work and free connections.
+ */
+static void
+cleanup_workers(void)
+{
+ /* either non parallel backup */
+ if (!backupinfo)
+ return;
+ /* workers have already been stopped and cleanup has been done. */
+ if (backupinfo->workersdone)
+ return;
+
+ backupinfo->workersdone = true;
+ /* wakeup any workers waiting on the condition */
+ pthread_cond_broadcast(&data_ready);
+
+ for (int i = 0; i < numWorkers; i++)
+ {
+ pthread_join(workers[i].worker, NULL);
+ PQfinish(workers[i].conn);
+ }
+ free_filelist(backupinfo);
+}
+
+/*
+ * Take the system out of backup mode, also adds the backup_label file in
+ * the backup.
+ */
+static void
+stop_backup(void)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP %s",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /*
+ * Get the stop position
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get write-ahead log end position from server: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) != 1)
+ {
+ pg_log_error("no write-ahead log end position returned from server");
+ exit(1);
+ }
+
+ /* retrieve the end wal location. */
+ strlcpy(backupinfo->xlogend, PQgetvalue(res, 0, 0),
+ sizeof(backupinfo->xlogend));
+
+ /* retrieve the backup_label file contents and write them to the backup */
+ writefile("backup_label", PQgetvalue(res, 0, 2));
+
+ PQclear(res);
+
+ /*
+ * Finish up the Stop command execution and make sure we have
+ * CommandComplete and ReadyForQuery response.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data %s", PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+
+ if (verbose && includewal != NO_WAL)
+ pg_log_info("write-ahead log end point: %s", backupinfo->xlogend);
+}
+
+/*
+ * Retrieves the list of files available in $PGDATA from the server.
+ */
+static void
+get_backup_filelist(PGconn *conn, BackupInfo *backupInfo)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ for (int i = 0; i < tablespacecount; i++)
+ {
+ bool basetablespace;
+ char *tablespace;
+ int numFiles;
+
+ /*
+ * Query server to fetch the file list for given tablespace name. If
+ * the tablespace name is empty, it will fetch files list of 'base'
+ * tablespace.
+ */
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ tablespace = PQgetvalue(tablespacehdr, i, 1);
+
+ basebkp = psprintf("LIST_FILES '%s'",
+ basetablespace ? "" : tablespace);
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "LIST_FILES", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not list backup files: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ numFiles = PQntuples(res);
+ for (int j = 0; j < numFiles; j++)
+ {
+ BackupFile *file;
+ char *path = PQgetvalue(res, j, 0);
+ char type = PQgetvalue(res, j, 1)[0];
+ int32 size = atol(PQgetvalue(res, j, 2));
+ time_t mtime = atol(PQgetvalue(res, j, 3));
+
+ /*
+ * In 'plain' format, create backup directories first.
+ */
+ if (format == 'p' && type == 'd')
+ {
+ /*
+ * directory entries are skipped. however, a tar header size
+ * was included for them in totalsize_kb, so we need to add it
+ * for progress reporting purpose.
+ */
+ backupInfo->bytes_skipped += 512;
+ create_backup_dirs(basetablespace, tablespace, path);
+ continue;
+ }
+
+ if (format == 'p' && type == 'l')
+ {
+ /*
+ * symlink entries are skipped. however, a tar header size was
+ * included for them in totalsize_kb, so we need to add it for
+ * progress reporting purpose.
+ */
+ backupInfo->bytes_skipped += 512;
+ create_tblspc_symlink(path);
+ continue;
+ }
+
+ file = (BackupFile *) palloc(sizeof(BackupFile));
+ strlcpy(file->path, path, MAXPGPATH);
+ file->type = type;
+ file->size = size;
+ file->mtime = mtime;
+ file->tsindex = i;
+
+ /* add to the files list */
+ backupInfo->totalfiles++;
+ if (backupInfo->files == NULL)
+ backupInfo->curr = backupInfo->files = file;
+ else
+ {
+ backupInfo->curr->next = file;
+ backupInfo->curr = backupInfo->curr->next;
+ }
+ }
+
+ PQclear(res);
+
+ /*
+ * Finish up the LIST_FILES command execution and make sure we have
+ * CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data for '%s': %s", "LIST_FILES",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+ }
+
+ /* point curr to the head of list. */
+ backupInfo->curr = backupInfo->files;
+}
+
+/*
+ * Retrieve WAL file list from the server based on the starting wal location
+ * and ending wal location.
+ */
+static void
+get_wal_filelist(PGconn *conn, BackupInfo *backupInfo, char *xlogstart, char *xlogend)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+ int numWals;
+
+ basebkp = psprintf("LIST_WAL_FILES START_WAL_LOCATION '%s' END_WAL_LOCATION '%s'",
+ xlogstart, xlogend);
+
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "LIST_FILES", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not list wal files: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ numWals = PQntuples(res);
+ for (int i = 0; i < numWals; i++)
+ {
+ BackupFile *file = (BackupFile *) palloc0(sizeof(BackupFile));
+
+ if (backupInfo->files == NULL)
+ {
+ backupInfo->curr = backupInfo->files = file;
+ }
+ else
+ {
+ backupInfo->curr->next = file;
+ backupInfo->curr = file;
+ }
+
+ strlcpy(file->path, PQgetvalue(res, i, 0), MAXPGPATH);
+ file->tsindex = tablespacecount - 1;
+ backupInfo->totalfiles++;
+ }
+
+ /*
+ * Finish up the LIST_WAL_FILES command execution and make sure we have
+ * CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data for '%s': %s", "LIST_WAL_FILES",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+
+ /* point curr to the head of list. */
+ backupInfo->curr = backupInfo->files;
+}
+
+/* free files list */
+static void
+free_filelist(BackupInfo *backupInfo)
+{
+ /* free files list */
+ if (backupInfo->files != NULL)
+ {
+ backupInfo->curr = backupInfo->files;
+ while (backupInfo->curr != NULL)
+ {
+ BackupFile *file = backupInfo->curr;
+
+ backupInfo->curr = file->next;
+
+ pfree(file);
+ }
+
+ backupInfo->files = NULL;
+ backupInfo->totalfiles = 0;
+ }
+}
+
+/*
+ * Worker function to process and retrieve the files from the server. If the
+ * files list is empty, it will wait for it to be filled. Otherwise picks the
+ * next file in the list.
+ */
+static int
+worker_get_files(WorkerState *wstate)
+{
+ BackupFile *fetchfile = NULL;
+ BackupInfo *backupinfo = wstate->backupinfo;
+
+ while (!backupinfo->workersdone)
+ {
+ pthread_mutex_lock(&fetch_mutex);
+ if (backupinfo->curr == NULL)
+ {
+ /*
+ * Wait until there is data available in the list to process.
+ * pthread_cond_wait call unlocks the already locked mutex during
+ * the wait state. When the condition is true (a signal is
+ * raised), one of the competing threads acquires the mutex.
+ */
+ backupinfo->activeworkers--;
+ pthread_cond_wait(&data_ready, &fetch_mutex);
+ backupinfo->activeworkers++;
+ }
+
+ fetchfile = backupinfo->curr;
+ if (fetchfile != NULL)
+ {
+ backupinfo->totalfiles--;
+ backupinfo->curr = fetchfile->next;
+ }
+ pthread_mutex_unlock(&fetch_mutex);
+
+ if (fetchfile != NULL)
+ {
+ wstate->bytesread +=
+ receive_file(wstate->conn, fetchfile->path, fetchfile->tsindex);
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * This function fetches the requested file from the server.
+ */
+static int
+receive_file(PGconn *conn, char *file, int tsIndex)
+{
+ PGresult *res = NULL;
+ int bytesread;
+ PQExpBuffer buf = createPQExpBuffer();
+
+ /*
+ * Fetch a single file from the server. To fetch the file, build a query
+ * in form of:
+ *
+ * SEND_FILES ('base/1/1245/32683') [options]
+ */
+ appendPQExpBuffer(buf, "SEND_FILES ( '%s' )", file);
+
+ /* add options */
+ appendPQExpBuffer(buf, " START_WAL_LOCATION '%s' %s",
+ backupinfo->xlogstart,
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (!conn)
+ return 1;
+
+ if (PQsendQuery(conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(conn));
+ return 1;
+ }
+
+ destroyPQExpBuffer(buf);
+
+ /* process file contents, also count bytesRead for progress */
+ bytesread = ReceiveAndUnpackTarFile(conn, tablespacehdr, tsIndex);
+
+ PQclear(res);
+
+ /*
+ * Finish up the SEND_FILES command execution and make sure we have
+ * CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data for '%s': %s", "SEND_FILES",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+ return bytesread;
+}
+
+/*
+ * Create backup directories while taking care of tablespace path. If tablespace
+ * mapping (with -T) is given then the directory will be created on the mapped
+ * path.
+ */
+static void
+create_backup_dirs(bool basetablespace, char *tablespace, char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ Assert(name != NULL);
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, name);
+ else
+ {
+ Assert(tablespace != NULL);
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tablespace), (name + strlen(tablespace) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Create a symlink in pg_tblspc and apply any tablespace mapping given on
+ * the command line (--tablespace-mapping).
+ */
+static void
+create_tblspc_symlink(char *filename)
+{
+ int i;
+
+ for (i = 0; i < tablespacecount; i++)
+ {
+ char *tsoid = PQgetvalue(tablespacehdr, i, 0);
+
+ if (strstr(filename, tsoid) != NULL)
+ {
+ char *linkloc = psprintf("%s/%s", basedir, filename);
+ const char *mapped_tblspc_path = get_tablespace_mapping(PQgetvalue(tablespacehdr, i, 1));
+
+#ifdef HAVE_SYMLINK
+ if (symlink(mapped_tblspc_path, linkloc) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ linkloc, mapped_tblspc_path);
+ exit(1);
+ }
+#else
+ pg_log_error("symlinks are not supported on this platform");
+ exit(1);
+#endif
+ free(linkloc);
+ break;
+ }
+ }
+}
+
+/*
+ * General function for writing to a file; creates one if it doesn't exist
+ */
+static void
+writefile(char *path, char *buf)
+{
+ FILE *f;
+ char pathbuf[MAXPGPATH];
+
+ snprintf(pathbuf, MAXPGPATH, "%s/%s", basedir, path);
+ f = fopen(pathbuf, "w");
+ if (f == NULL)
+ {
+ pg_log_error("could not open file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fwrite(buf, strlen(buf), 1, f) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fclose(f))
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+}
--
2.21.1 (Apple Git-122.3)
0005-parallel-backup-testcase.patchapplication/octet-stream; name=0005-parallel-backup-testcase.patchDownload
From b659afb30307702886b0663ebb958d0ea4371c01 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 21:54:23 +0500
Subject: [PATCH 5/5] parallel backup - testcase
---
.../t/040_pg_basebackup_parallel.pl | 527 ++++++++++++++++++
1 file changed, 527 insertions(+)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 0000000000..4ec4c1e0f6
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,527 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 95;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+# # Create a temporary directory in the system location and symlink it
+# # to our physical temp location. That way we can use shorter names
+# # for the tablespace directories, which hopefully won't run afoul of
+# # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+my $file;
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
--
2.21.1 (Apple Git-122.3)
0003-Parallel-Backup-Backend-Replication-commands.patchapplication/octet-stream; name=0003-Parallel-Backup-Backend-Replication-commands.patchDownload
From e5998a955fe224babced5f5f5c5caf122872165d Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 27 Jan 2020 18:32:42 +0500
Subject: [PATCH 3/5] Parallel Backup - Backend Replication commands
This feature adds following replication commands to the backend replication
system, to help facilitate taking a full backup in parallel using multiple
connections.
- START_BACKUP [LABEL '<label>'] [FAST]
This command instructs the server to get prepared for performing an
online backup.
- STOP_BACKUP [NOWAIT]
This command instructs the server that online backup is finished. It
will bring the system out of backup mode.
- LIST_TABLESPACES [PROGRESS]
This command instructs the server to return a list of tablespaces.
- LIST_FILES [TABLESPACE]
This command instructs the server to return a list of files for a
given tablespace, base tablespace if TABLESPACE is empty.
- LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
This command instructs the server to return a list WAL files between
the given locations.
- SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]
Instructs the server to send the contents of the requested FILE(s).
---
src/backend/access/transam/xlog.c | 4 +-
src/backend/replication/basebackup.c | 529 ++++++++++++++++++++++++-
src/backend/replication/repl_gram.y | 265 +++++++++++--
src/backend/replication/repl_scanner.l | 8 +
src/include/nodes/replnodes.h | 12 +
src/include/replication/basebackup.h | 2 +-
6 files changed, 751 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0a2eb29c75..fc294a61a9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11078,7 +11078,7 @@ do_pg_abort_backup(int code, Datum arg)
if (emit_warning)
ereport(WARNING,
- (errmsg("aborting backup due to backend exiting before pg_stop_back up was called")));
+ (errmsg("aborting backup due to backend exiting while a non-exclusive backup is in progress")));
}
/*
@@ -12310,7 +12310,7 @@ collect_tablespaces(List **tablespaces, StringInfo tblspcmapfile,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 9583277224..90f96bf20d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -38,6 +38,8 @@
#include "storage/ipc.h"
#include "storage/reinit.h"
#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/relcache.h"
#include "utils/timestamp.h"
@@ -51,11 +53,22 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ XLogRecPtr startwallocation;
+ XLogRecPtr endwallocation;
+ char *tablespace;
} basebackup_options;
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ size_t size;
+ time_t mtime;
+} BackupFile;
+
static int64 sendDir(const char *path, int basepathlen, bool dryrun,
- List *tablespaces, bool sendtblspclinks);
+ List *tablespaces, bool sendtblspclinks, List **filelist);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -69,11 +82,27 @@ static void perform_base_backup(basebackup_options *opt);
static List *collect_wal_files(XLogRecPtr startptr, XLogRecPtr endptr,
List **historyFileList);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli, StringInfo label);
+static void SendFilesHeader(List *files);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void start_backup(basebackup_options *opt);
+static void stop_backup(basebackup_options *opt);
+static void list_tablespaces(basebackup_options *opt);
+static void list_files(basebackup_options *opt);
+static void list_wal_files(basebackup_options *opt);
+static void send_files(basebackup_options *opt, List *filenames,
+ bool missing_ok);
+static void add_to_filelist(List **filelist, char *path, char type,
+ size_t size, time_t mtime);
+
+/*
+ * Store label file during non-exclusive backups.
+ */
+static StringInfo label_file;
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -260,7 +289,7 @@ perform_base_backup(basebackup_options *opt)
ListCell *lc;
tablespaceinfo *ti;
- SendXlogRecPtrResult(startptr, starttli);
+ SendXlogRecPtrResult(startptr, starttli, NULL);
/*
* Calculate the relative path of temporary statistics directory in
@@ -276,7 +305,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
tablespaces = lappend(tablespaces, ti);
/* Send tablespace header */
@@ -332,10 +361,10 @@ perform_base_backup(basebackup_options *opt)
if (tblspc_map_file && opt->sendtblspcmapfile)
{
sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
+ sendDir(".", 1, false, tablespaces, false, NULL);
}
else
- sendDir(".", 1, false, tablespaces, true);
+ sendDir(".", 1, false, tablespaces, true, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -346,7 +375,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -499,7 +528,7 @@ perform_base_backup(basebackup_options *opt)
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
}
- SendXlogRecPtrResult(endptr, endtli);
+ SendXlogRecPtrResult(endptr, endtli, NULL);
if (total_checksum_failures)
{
@@ -656,6 +685,9 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_startwallocation = false;
+ bool o_endwallocation = false;
+ bool o_tablespace = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -744,12 +776,47 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "start_wal_location") == 0)
+ {
+ bool have_error = false;
+ char *startwallocation;
+
+ if (o_startwallocation)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ startwallocation = strVal(defel->arg);
+ opt->startwallocation = pg_lsn_in_internal(startwallocation, &have_error);
+ o_startwallocation = true;
+ }
+ else if (strcmp(defel->defname, "end_wal_location") == 0)
+ {
+ bool have_error = false;
+ char *endwallocation;
+
+ if (o_endwallocation)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ endwallocation = strVal(defel->arg);
+ opt->endwallocation = pg_lsn_in_internal(endwallocation, &have_error);
+ o_endwallocation = true;
+ }
+ else if (strcmp(defel->defname, "tablespace") == 0)
+ {
+ if (o_tablespace)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->tablespace = strVal(defel->arg);
+ o_tablespace = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
}
- if (opt->label == NULL)
- opt->label = "base backup";
}
@@ -767,6 +834,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
parse_basebackup_options(cmd->options, &opt);
+ /* default value for label, if not specified. */
+ if (opt.label == NULL)
+ {
+ if (cmd->cmdtag == BASE_BACKUP)
+ opt.label = "base backup";
+ else
+ opt.label = "start backup";
+ }
+
WalSndSetState(WALSNDSTATE_BACKUP);
if (update_process_title)
@@ -778,7 +854,34 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg, false);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ start_backup(&opt);
+ break;
+ case LIST_TABLESPACES:
+ list_tablespaces(&opt);
+ break;
+ case LIST_FILES:
+ list_files(&opt);
+ break;
+ case SEND_FILES:
+ send_files(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ stop_backup(&opt);
+ break;
+ case LIST_WAL_FILES:
+ list_wal_files(&opt);
+ break;
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -866,18 +969,18 @@ SendBackupHeader(List *tablespaces)
}
/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
*/
static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli, StringInfo label)
{
StringInfoData buf;
char str[MAXFNAMELEN];
Size len;
pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
+ pq_sendint16(&buf, 3); /* 3 fields */
/* Field headers */
pq_sendstring(&buf, "recptr");
@@ -900,11 +1003,19 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_sendint16(&buf, -1);
pq_sendint32(&buf, 0);
pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "label");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
pq_endmessage(&buf);
/* Data row */
pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
+ pq_sendint16(&buf, 3); /* number of columns */
len = snprintf(str, sizeof(str),
"%X/%X", (uint32) (ptr >> 32), (uint32) ptr);
@@ -915,12 +1026,109 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_sendint32(&buf, len);
pq_sendbytes(&buf, str, len);
+ if (label)
+ {
+ len = label->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, label->data, len);
+ }
+ else
+ {
+ pq_sendint32(&buf, -1); /* NULL */
+ }
+
pq_endmessage(&buf);
/* Send a CommandComplete message */
pq_puttextmessage('C', "SELECT");
}
+
+/*
+ * Sends the resultset containing filename, type (where type can be f' for
+ * regular, 'd' for directory, 'l' for link), file size and modification time).
+ */
+static void
+SendFilesHeader(List *files)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ /* Construct and send the list of files */
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 4); /* n field */
+
+ /* First field - file name */
+ pq_sendstring(&buf, "path");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "type");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, CHAROID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Fourth field - mtime */
+ pq_sendstring(&buf, "mtime");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, files)
+ {
+ BackupFile *file = (BackupFile *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 4); /* number of columns */
+
+ /* send path */
+ len = strlen(file->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, file->path, len);
+
+ /* send type */
+ pq_sendint32(&buf, 1);
+ pq_sendbyte(&buf, file->type);
+
+ /* send size */
+ send_int8_string(&buf, file->size);
+
+ /* send mtime */
+ send_int8_string(&buf, file->mtime);
+
+ pq_endmessage(&buf);
+ }
+
+ list_free(files);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
/*
* Inject a file with given name and content in the output tar stream.
*/
@@ -972,7 +1180,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool dryrun)
+sendTablespace(char *path, bool dryrun, List **filelist)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -1001,11 +1209,11 @@ sendTablespace(char *path, bool dryrun)
return 0;
}
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
dryrun);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true, filelist);
return size;
}
@@ -1024,7 +1232,7 @@ sendTablespace(char *path, bool dryrun)
*/
static int64
sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
- bool sendtblspclinks)
+ bool sendtblspclinks, List **filelist)
{
DIR *dir;
struct dirent *de;
@@ -1178,6 +1386,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
@@ -1194,6 +1404,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -1215,6 +1427,10 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
dryrun);
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
+ add_to_filelist(filelist, "./pg_wal/archive_status", 'd', -1,
+ statbuf.st_mtime);
+
continue; /* don't recurse into pg_wal */
}
@@ -1244,6 +1460,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ add_to_filelist(filelist, pathbuf, 'l', statbuf.st_size, statbuf.st_mtime);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, dryrun);
#else
@@ -1270,6 +1487,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
dryrun);
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1300,13 +1518,15 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks, filelist);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!dryrun)
+ add_to_filelist(filelist, pathbuf, 'f', statbuf.st_size, statbuf.st_mtime);
+
+ if (!dryrun && filelist == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid);
@@ -1747,3 +1967,268 @@ throttle(size_t increment)
*/
throttled_last = GetCurrentTimestamp();
}
+
+/*
+ * start_backup - prepare to start an online backup.
+ *
+ * This function calls do_pg_start_backup() and sends back starting checkpoint,
+ * available tablespaces, content of backup_label and tablespace_map files.
+ */
+static void
+start_backup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo tblspc_map_file;
+ MemoryContext oldcontext;
+
+ /* Label file need to be long-lived, since its read in stop_backup. */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ label_file = makeStringInfo();
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * tablespace map file is not used, but since this argument is required by
+ * do_pg_start_backup, we have to provide it here.
+ */
+ tblspc_map_file = makeStringInfo();
+
+ register_persistent_abort_backup_handler();
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ label_file, NULL, tblspc_map_file, false, false);
+
+ /* send startptr and starttli to frontend */
+ SendXlogRecPtrResult(startptr, starttli, NULL);
+
+ /* free tablspace map buffer. */
+ pfree(tblspc_map_file->data);
+ pfree(tblspc_map_file);
+}
+
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
+static void
+stop_backup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+
+ if (get_backup_status() != SESSION_BACKUP_NON_EXCLUSIVE)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("non-exclusive backup is not in progress")));
+
+ /*
+ * Stop the non-exclusive backup. Return a copy of the backup label so it
+ * can be written to disk by the caller.
+ */
+ endptr = do_pg_stop_backup(label_file->data, !opt->nowait, &endtli);
+ SendXlogRecPtrResult(endptr, endtli, label_file);
+
+ /* Free structures allocated in TopMemoryContext */
+ pfree(label_file->data);
+ pfree(label_file);
+ label_file = NULL;
+}
+
+/*
+ * list_tablespaces() - sends a list of tablespace entries
+ */
+static void
+list_tablespaces(basebackup_options *opt)
+{
+ StringInfo tblspc_map_file;
+ List *tablespaces = NIL;
+ tablespaceinfo *ti;
+
+ tblspc_map_file = makeStringInfo();
+ collect_tablespaces(&tablespaces, tblspc_map_file, opt->progress, false);
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ SendBackupHeader(tablespaces);
+ list_free(tablespaces);
+}
+
+/*
+ * list_files() - sends a list of files available in given tablespace.
+ */
+static void
+list_files(basebackup_options *opt)
+{
+ List *files = NIL;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in order
+ * to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ if (strlen(opt->tablespace) > 0)
+ sendTablespace(opt->tablespace, true, &files);
+ else
+ sendDir(".", 1, true, NIL, true, &files);
+
+ SendFilesHeader(files);
+}
+
+/*
+ * list_wal_files() - sends a list of WAL files between start wal location and
+ * end wal location.
+ */
+static void
+list_wal_files(basebackup_options *opt)
+{
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ List *files = NIL;
+ ListCell *lc;
+
+ walFileList = collect_wal_files(opt->startwallocation, opt->endwallocation,
+ &historyFileList);
+ foreach(lc, walFileList)
+ {
+ char pathbuf[MAXPGPATH];
+ char *walFileName = (char *) lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ add_to_filelist(&files, pathbuf, 'f', wal_segment_size, 0);
+ }
+
+ SendFilesHeader(files);
+}
+
+/*
+ * send_files() - sends the actual files to the caller
+ *
+ * The function sends out the given file(s) over to the caller using the COPY
+ * protocol. It does only entertains the regular files and any other kind such
+ * as directories or symlink etc will be ignored.
+ */
+static void
+send_files(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ int basepathlen = 0;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Disable throttling. */
+ throttling_counter = -1;
+
+ /* set backup start location. */
+ startptr = opt->startwallocation;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (is_absolute_path(pathbuf))
+ {
+ char *basepath;
+
+ /*
+ * 'pathbuf' points to the tablespace location, but we only want
+ * to include the version directory in it that belongs to us.
+ */
+ basepath = strstr(pathbuf, TABLESPACE_VERSION_DIRECTORY);
+ if (basepath)
+ basepathlen = basepath - pathbuf - 1;
+ }
+ else if (pathbuf[0] == '.' && pathbuf[1] == '/')
+ basepathlen = 2;
+ else
+ basepathlen = 0;
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /*
+ * Only entertain requests for regular file, skip any directories or
+ * special files.
+ */
+ if (S_ISREG(statbuf.st_mode))
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen, &statbuf, true, InvalidOid);
+ }
+ else
+ ereport(WARNING,
+ (errmsg("skipping special file or directory \"%s\"", pathbuf)));
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
+
+/*
+ * Construct a BackupFile entry and add to the list.
+ */
+static void
+add_to_filelist(List **filelist, char *path, char type, size_t size,
+ time_t mtime)
+{
+ BackupFile *file;
+
+ if (filelist)
+ {
+ file = (BackupFile *) palloc(sizeof(BackupFile));
+ strlcpy(file->path, path, sizeof(file->path));
+ file->type = type;
+ file->size = size;
+ file->mtime = mtime;
+
+ *filelist = lappend(*filelist, file);
+ }
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 2d96567409..f79e1a504f 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,13 +87,28 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_LIST_TABLESPACES
+%token K_LIST_FILES
+%token K_SEND_FILES
+%token K_STOP_BACKUP
+%token K_LIST_WAL_FILES
+%token K_START_WAL_LOCATION
+%token K_END_WAL_LOCATION
%type <node> command
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
-%type <list> base_backup_opt_list
-%type <defelt> base_backup_opt
+%type <list> base_backup_opt_list start_backup_opt_list stop_backup_opt_list
+ list_tablespace_opt_list list_files_opt_list
+ list_wal_files_opt_list send_backup_files_opt_list
+ backup_files backup_files_list
+%type <defelt> base_backup_opt backup_opt_label backup_opt_progress
+ backup_opt_fast backup_opt_wal backup_opt_nowait
+ backup_opt_maxrate backup_opt_tsmap backup_opt_chksum
+ backup_opt_start_wal_loc backup_opt_end_wal_loc
+ backup_opt_tablespace start_backup_opt send_backup_files_opt
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
@@ -153,69 +168,231 @@ var_name: IDENT { $$ = $1; }
{ $$ = psprintf("%s.%s", $1, $3); }
;
-/*
- * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
- */
base_backup:
+ /*
+ * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
+ * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
+ */
K_BASE_BACKUP base_backup_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
$$ = (Node *) cmd;
}
- ;
-
-base_backup_opt_list:
- base_backup_opt_list base_backup_opt
- { $$ = lappend($1, $2); }
- | /* EMPTY */
- { $$ = NIL; }
- ;
-
-base_backup_opt:
- K_LABEL SCONST
- {
- $$ = makeDefElem("label",
- (Node *)makeString($2), -1);
- }
- | K_PROGRESS
+ /* START_BACKUP [LABEL '<label>'] [FAST] */
+ | K_START_BACKUP start_backup_opt_list
{
- $$ = makeDefElem("progress",
- (Node *)makeInteger(true), -1);
- }
- | K_FAST
- {
- $$ = makeDefElem("fast",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
}
- | K_WAL
+ /* STOP_BACKUP [NOWAIT] */
+ | K_STOP_BACKUP stop_backup_opt_list
{
- $$ = makeDefElem("wal",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
+ $$ = (Node *) cmd;
}
- | K_NOWAIT
+ /* LIST_TABLESPACES [PROGRESS] */
+ | K_LIST_TABLESPACES list_tablespace_opt_list
{
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = LIST_TABLESPACES;
+ $$ = (Node *) cmd;
}
- | K_MAX_RATE UCONST
+ /* LIST_FILES [TABLESPACE] */
+ | K_LIST_FILES list_files_opt_list
{
- $$ = makeDefElem("max_rate",
- (Node *)makeInteger($2), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = LIST_FILES;
+ $$ = (Node *) cmd;
}
- | K_TABLESPACE_MAP
+ /* LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X'] */
+ | K_LIST_WAL_FILES list_wal_files_opt_list
{
- $$ = makeDefElem("tablespace_map",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = LIST_WAL_FILES;
+ $$ = (Node *) cmd;
}
- | K_NOVERIFY_CHECKSUMS
+ /*
+ * SEND_FILES '(' 'FILE' [, ...] ')' [START_WAL_LOCATION 'X/X']
+ * [NOVERIFY_CHECKSUMS]
+ */
+ | K_SEND_FILES backup_files send_backup_files_opt_list
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_FILES;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
}
;
+base_backup_opt_list:
+ base_backup_opt_list base_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+base_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ | backup_opt_maxrate { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ ;
+
+start_backup_opt_list:
+ start_backup_opt_list start_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+start_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ ;
+
+stop_backup_opt_list:
+ backup_opt_nowait
+ { $$ = list_make1($1); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+list_tablespace_opt_list:
+ backup_opt_progress
+ { $$ = list_make1($1); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+list_files_opt_list:
+ backup_opt_tablespace
+ { $$ = list_make1($1); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+list_wal_files_opt_list:
+ backup_opt_start_wal_loc backup_opt_end_wal_loc
+ { $$ = list_make2($1, $2); }
+ ;
+
+send_backup_files_opt_list:
+ send_backup_files_opt_list send_backup_files_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files:
+ '(' backup_files_list ')'
+ { $$ = $2; }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ SCONST
+ { $$ = list_make1(makeString($1)); }
+ | backup_files_list ',' SCONST
+ { $$ = lappend($1, makeString($3)); }
+ ;
+
+send_backup_files_opt:
+ backup_opt_chksum { $$ = $1; }
+ | backup_opt_start_wal_loc { $$ = $1; }
+ ;
+
+backup_opt_label:
+ K_LABEL SCONST
+ {
+ $$ = makeDefElem("label",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_progress:
+ K_PROGRESS
+ {
+ $$ = makeDefElem("progress",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_fast:
+ K_FAST
+ {
+ $$ = makeDefElem("fast",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal:
+ K_WAL
+ {
+ $$ = makeDefElem("wal",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_nowait:
+ K_NOWAIT
+ {
+ $$ = makeDefElem("nowait",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_maxrate:
+ K_MAX_RATE UCONST
+ {
+ $$ = makeDefElem("max_rate",
+ (Node *)makeInteger($2), -1);
+ };
+
+backup_opt_tsmap:
+ K_TABLESPACE_MAP
+ {
+ $$ = makeDefElem("tablespace_map",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_chksum:
+ K_NOVERIFY_CHECKSUMS
+ {
+ $$ = makeDefElem("noverify_checksums",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_start_wal_loc:
+ K_START_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_end_wal_loc:
+ K_END_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("end_wal_location",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_tablespace:
+ SCONST
+ {
+ $$ = makeDefElem("tablespace", //tblspcname?
+ (Node *)makeString($1), -1);
+ };
+
create_replication_slot:
/* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL RESERVE_WAL */
K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_opt_list
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 14c9a1e798..faa00cfd0e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,14 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+LIST_FILES { return K_LIST_FILES; }
+LIST_TABLESPACES { return K_LIST_TABLESPACES; }
+SEND_FILES { return K_SEND_FILES; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+LIST_WAL_FILES { return K_LIST_WAL_FILES; }
+START_WAL_LOCATION { return K_START_WAL_LOCATION; }
+END_WAL_LOCATION { return K_END_WAL_LOCATION; }
"," { return ','; }
";" { return ';'; }
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 5456141a8a..c046ea39ae 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,16 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ LIST_TABLESPACES,
+ LIST_FILES,
+ LIST_WAL_FILES,
+ SEND_FILES,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +52,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index e0210def6f..3bc85d4c3e 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool dryrun);
+extern int64 sendTablespace(char *path, bool dryrun, List **filelist);
#endif /* _BASEBACKUP_H */
--
2.21.1 (Apple Git-122.3)
Hi Asif,
On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the
do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many suggestions,
it was moved to server-side.So, I have a few suggestions on how we can implement this:
1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This
approach
can be implemented on the client-side as well as on the server-side.2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers finishes.
I believe this would only be possible if we handle throttling on the
client.
Also, as I understand it, implementing this will introduce additional mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.[1]
/messages/by-id/521B4B29.20009@2ndquadrant.com--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
The latest changes look good to me. However, the patch set is missing the
documentation.
Please add those.
Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Thanks Jeevan. Here is the documentation patch.
On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,
On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the
do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many
suggestions,
it was moved to server-side.So, I have a few suggestions on how we can implement this:
1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This
approach
can be implemented on the client-side as well as on the server-side.2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers
finishes.
I believe this would only be possible if we handle throttling on the
client.
Also, as I understand it, implementing this will introduce additional
mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.[1]
/messages/by-id/521B4B29.20009@2ndquadrant.com--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.caThe latest changes look good to me. However, the patch set is missing the
documentation.
Please add those.Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0006-parallel-backup-documentation.patchapplication/octet-stream; name=0006-parallel-backup-documentation.patchDownload
From 7fd87b7dcb9c626fa6abb3d526de284a93f232c5 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Fri, 14 Feb 2020 17:02:51 +0500
Subject: [PATCH 6/6] parallel backup - documentation
---
doc/src/sgml/protocol.sgml | 366 ++++++++++++++++++++++++++++
doc/src/sgml/ref/pg_basebackup.sgml | 19 ++
2 files changed, 385 insertions(+)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 80275215e0..e332d1ac45 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2700,6 +2700,372 @@ The commands accepted in replication mode are:
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>START_BACKUP</literal>
+ [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+ [ <literal>FAST</literal> ]
+ <indexterm><primary>START_BACKUP</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instructs the server to prepare for performing on-line backup. The following
+ options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>LABEL</literal> <replaceable>'label'</replaceable></term>
+ <listitem>
+ <para>
+ Sets the label of the backup. If none is specified, a backup label
+ of <literal>start backup</literal> will be used. The quoting rules
+ for the label are the same as a standard SQL string with
+ <xref linkend="guc-standard-conforming-strings"/> turned on.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>FAST</literal></term>
+ <listitem>
+ <para>
+ Request a fast checkpoint.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send out a single result set. The
+ first column contains the start position given in XLogRecPtr format, and
+ the second column contains the corresponding timeline ID.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>STOP_BACKUP</literal>
+ [ <literal>NOWAIT</literal> ]
+ <indexterm><primary>STOP_BACKUP</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instructs the server to finish performing on-line backup.
+ <variablelist>
+ <varlistentry>
+ <term><literal>NOWAIT</literal></term>
+ <listitem>
+ <para>
+ By default, the backup will wait until the last required WAL
+ segment has been archived, or emit a warning if log archiving is
+ not enabled. Specifying <literal>NOWAIT</literal> disables both
+ the waiting and the warning, leaving the client responsible for
+ ensuring the required log is available.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send out a single result set. The
+ first column contains the start position given in XLogRecPtr format, the
+ second column contains the corresponding timeline ID and the third column
+ contains the contents of the <filename>backup_label</filename> file.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>LIST_TABLESPACES</literal>
+ [ <literal>PROGRESS</literal> ]
+ <indexterm><primary>LIST_TABLESPACES</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instruct the server to return a list of tablespaces available in data
+ directory.
+ <variablelist>
+ <varlistentry>
+ <term><literal>PROGRESS</literal></term>
+ <listitem>
+ <para>
+ Request information required to generate a progress report. This will
+ send back an approximate size in the header of each tablespace, which
+ can be used to calculate how far along the stream is done. This is
+ calculated by enumerating all the file sizes once before the transfer
+ is even started, and might as such have a negative impact on the
+ performance. In particular, it might take longer before the first data
+ is streamed. Since the database files can change during the backup,
+ the size is only approximate and might both grow and shrink between
+ the time of approximation and the sending of the actual files.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send one result set.
+ The result set will have one row for each tablespace. The fields in this
+ row are:
+ <variablelist>
+ <varlistentry>
+ <term><literal>spcoid</literal> (<type>oid</type>)</term>
+ <listitem>
+ <para>
+ The OID of the tablespace, or null if it's the base directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>spclocation</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The full path of the tablespace directory, or null if it's the base
+ directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the tablespace, in kilobytes (1024 bytes),
+ if progress report has been requested; otherwise it's null.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>LIST_FILES</literal>
+ [ <literal>TABLESPACE</literal> ]
+ <indexterm><primary>LIST_FILES</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ This command instructs the server to return a list of files available
+ in the given tablespace.
+ <variablelist>
+ <varlistentry>
+ <term><literal>TABLESPACE</literal></term>
+ <listitem>
+ <para>
+ name of the tablespace. If its empty or not provided then 'base' tablespace
+ is assumed.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send out a result set. The fields
+ in this result set are:
+ <variablelist>
+ <varlistentry>
+ <term><literal>path</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The path and name of the file. In case of tablespace, it is an absolute
+ path on the database server, however, in case of <filename>base</filename>
+ tablespace, it is relative to $PGDATA.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>type</literal> (<type>char</type>)</term>
+ <listitem>
+ <para>
+ A single character, identifying the type of file.
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <literal>'f'</literal> - Regular file. Can be any relation or
+ non-relation file in $PGDATA.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>'d'</literal> - Directory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>'l'</literal> - Symbolic link.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the file, in kilobytes (1024 bytes). It's null
+ if type is 'd' or 'l'.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>mtime</literal> (<type>Int64</type>)</term>
+ <listitem>
+ <para>
+ The file or directory last modification time, as seconds since the Epoch.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ The list will contain all files in tablespace directory, regardless of whether
+ they are PostgreSQL files or other files added to the same directory. The only
+ excluded files are:
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <filename>postmaster.pid</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>postmaster.opts</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_internal.init</filename> (found in multiple directories)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Various temporary files and directories created during the operation
+ of the PostgreSQL server, such as any file or directory beginning
+ with <filename>pgsql_tmp</filename> and temporary relations.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_wal</filename>, including subdirectories. If the backup is run
+ with WAL files included, a synthesized version of <filename>pg_wal</filename>
+ will be included, but it will only contain the files necessary for the backup
+ to work, not the rest of the contents.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_dynshmem</filename>, <filename>pg_notify</filename>,
+ <filename>pg_replslot</filename>, <filename>pg_serial</filename>,
+ <filename>pg_snapshots</filename>, <filename>pg_stat_tmp</filename>, and
+ <filename>pg_subtrans</filename> are copied as empty directories (even if
+ they are symbolic links).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Files other than regular files and directories, such as symbolic
+ links (other than for the directories listed above) and special
+ device files, are skipped. (Symbolic links
+ in <filename>pg_tblspc</filename> are maintained.)
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>LIST_WAL_FILES</literal>
+ <literal>START_WAL_LOCATION</literal> <replaceable class="parameter">X/X</replaceable>
+ <literal>END_WAL_LOCATION</literal> <replaceable class="parameter">X/X</replaceable>
+ <indexterm><primary>LIST_WAL_FILES</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instruct the server to return a list of WAL files available in pg_wal directory.
+ The following options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>START_WAL_LOCATION</literal></term>
+ <listitem>
+ <para>
+ The starting WAL position when START BACKUP command was issued,
+ returned in the form of XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>END_WAL_LOCATION</literal></term>
+ <listitem>
+ <para>
+ The ending WAL position when STOP BACKUP command was issued,
+ returned in the form of XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send out a result and each row will
+ consist of a WAL file entry. The result set will have the same fields as
+ <literal>LIST_FILES</literal> command.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SEND_FILES ( <replaceable class="parameter">'FILE'</replaceable> [, ...] )</literal>
+ [ <literal>START_WAL_LOCATION</literal> <replaceable class="parameter">X/X</replaceable> ]
+ [ <literal>NOVERIFY_CHECKSUMS</literal> ]
+ <indexterm><primary>SEND_FILES</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instructs the server to send the contents of the requested FILE(s).
+ </para>
+ <para>
+ A clause of the form <literal>SEND_FILES ( 'FILE', 'FILE', ... ) [OPTIONS]</literal>
+ is accepted where one or more FILE(s) can be requested.
+ </para>
+ <para>
+ In response to this command, one or more CopyResponse results will be sent,
+ one for each FILE requested. The data in the CopyResponse results will be
+ a tar format (following the “ustar interchange format” specified in the
+ POSIX 1003.1-2008 standard) dump of the tablespace contents, except that
+ the two trailing blocks of zeroes specified in the standard are omitted.
+ </para>
+ <para>
+ The following options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>START_WAL_LOCATION</literal></term>
+ <listitem>
+ <para>
+ The starting WAL position when START BACKUP command was issued,
+ returned in the form of XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+ <listitem>
+ <para>
+ By default, checksums are verified during a base backup if they are
+ enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+ this verification.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index fc9e222f8d..3b1d9c9ba6 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -536,6 +536,25 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-j <replaceable class="parameter">n</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">n</replaceable></option></term>
+ <listitem>
+ <para>
+ Create <replaceable class="parameter">n</replaceable> threads to copy
+ backup files from the database server. <application>pg_basebackup</application>
+ will open <replaceable class="parameter">n</replaceable> +1 connections
+ to the database. Therefore, the server must be configured with
+ <xref linkend="guc-max-wal-senders"/> set high enough to accommodate all
+ connections.
+ </para>
+ <para>
+ parallel mode only works with plain format.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
--
2.21.1 (Apple Git-122.3)
Hi,
I have created a commitfest entry.
https://commitfest.postgresql.org/27/2472/
On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Thanks Jeevan. Here is the documentation patch.
On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:Hi Asif,
On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the
do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many
suggestions,
it was moved to server-side.So, I have a few suggestions on how we can implement this:
1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This
approach
can be implemented on the client-side as well as on the server-side.2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers
finishes.
I believe this would only be possible if we handle throttling on the
client.
Also, as I understand it, implementing this will introduce additional
mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.[1]
/messages/by-id/521B4B29.20009@2ndquadrant.com--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.caThe latest changes look good to me. However, the patch set is missing the
documentation.
Please add those.Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Hi Asif
I have started testing this feature. I have applied v6 patch on commit
a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
I got few observations, please take a look.
*--if backup failed, backup directory is not getting removed.*
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty
*--giving large number of jobs leading segmentation fault.*
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new
process for connection: Resource temporarily unavailable
could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily
unavailable
Segmentation fault (core dumped)
--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
/tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at
pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
pg_basebackup.c:2668
*--with tablespace is in the same directory as data, parallel_backup
crashed*
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.
postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)
--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
/tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
pg_basebackup.c:2668
(gdb)
Thanks & Regards,
Rajkumar Raghuwanshi
On Tue, Feb 25, 2020 at 7:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Show quoted text
Hi,
I have created a commitfest entry.
https://commitfest.postgresql.org/27/2472/On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:Thanks Jeevan. Here is the documentation patch.
On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:Hi Asif,
On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the
do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many
suggestions,
it was moved to server-side.So, I have a few suggestions on how we can implement this:
1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This
approach
can be implemented on the client-side as well as on the server-side.2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers
finishes.
I believe this would only be possible if we handle throttling on the
client.
Also, as I understand it, implementing this will introduce additional
mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.[1]
/messages/by-id/521B4B29.20009@2ndquadrant.com--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.caThe latest changes look good to me. However, the patch set is missing
the documentation.
Please add those.Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif
I have started testing this feature. I have applied v6 patch on commit
a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
I got few observations, please take a look.*--if backup failed, backup directory is not getting removed.*
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D
/tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D
/tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not
empty*--giving large number of jobs leading segmentation fault.*
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new
process for connection: Resource temporarily unavailablecould not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily
unavailable
Segmentation fault (core dumped)--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
/tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0)
at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
pg_basebackup.c:2668*--with tablespace is in the same directory as data, parallel_backup
crashed*
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
/tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
pg_basebackup.c:2668
(gdb)
Thanks Rajkumar. I have fixed the above issues and have rebased the patch
to the latest master (b7f64c64).
(V9 of the patches are attached).
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
0001-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v9.patchapplication/octet-stream; name=0001-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v9.patchDownload
From b5b8694e2a61b084508ced9a49d462160d692b58 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Wed, 30 Oct 2019 16:45:28 +0500
Subject: [PATCH 1/6] Rename sizeonly to dryrun for few functions in
basebackup.
---
src/backend/replication/basebackup.c | 44 ++++++++++++++--------------
src/include/replication/basebackup.h | 2 +-
2 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 806d013108d..ca074d59ac9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,15 +55,15 @@ typedef struct
} basebackup_options;
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(const char *path, int basepathlen, bool dryrun,
List *tablespaces, bool sendtblspclinks);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+ struct stat *statbuf, bool dryrun);
static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+ bool dryrun);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
@@ -1021,13 +1021,13 @@ sendFileWithContent(const char *filename, const char *content)
/*
* Include the tablespace directory pointed to by 'path' in the output tar
- * stream. If 'sizeonly' is true, we just calculate a total length and return
+ * stream. If 'dryrun' is true, we just calculate a total length and return
* it, without actually sending anything.
*
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool sizeonly)
+sendTablespace(char *path, bool dryrun)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -1057,17 +1057,17 @@ sendTablespace(char *path, bool sizeonly)
}
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
- sizeonly);
+ dryrun);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
return size;
}
/*
* Include all files from the given directory in the output tar stream. If
- * 'sizeonly' is true, we just calculate a total length and return it, without
+ * 'dryrun' is true, we just calculate a total length and return it, without
* actually sending anything.
*
* Omit any directory in the tablespaces list, to avoid backing up
@@ -1078,7 +1078,7 @@ sendTablespace(char *path, bool sizeonly)
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
+sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
bool sendtblspclinks)
{
DIR *dir;
@@ -1237,7 +1237,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
}
@@ -1253,7 +1253,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -1265,14 +1265,14 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ dryrun);
continue; /* don't recurse into pg_wal */
}
@@ -1304,7 +1304,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
linkpath[rllen] = '\0';
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
- &statbuf, sizeonly);
+ &statbuf, dryrun);
#else
/*
@@ -1328,7 +1328,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* permissions right.
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ dryrun);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1359,17 +1359,17 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!sizeonly)
+ if (!dryrun)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid);
- if (sent || sizeonly)
+ if (sent || dryrun)
{
/* Add size, rounded up to 512byte block */
size += ((statbuf.st_size + 511) & ~511);
@@ -1688,12 +1688,12 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
static int64
_tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly)
+ struct stat *statbuf, bool dryrun)
{
char h[512];
enum tarError rc;
- if (!sizeonly)
+ if (!dryrun)
{
rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
@@ -1731,7 +1731,7 @@ _tarWriteHeader(const char *filename, const char *linktarget,
*/
static int64
_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+ bool dryrun)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1741,7 +1741,7 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
+ return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, dryrun);
}
/*
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 07ed281bd63..e0210def6f3 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool sizeonly);
+extern int64 sendTablespace(char *path, bool dryrun);
#endif /* _BASEBACKUP_H */
--
2.21.1 (Apple Git-122.3)
0004-Parallel-Backup-pg_basebackup_v9.patchapplication/octet-stream; name=0004-Parallel-Backup-pg_basebackup_v9.patchDownload
From 945cd4b33f3b98bddf849fcca3c2a091248f0142 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 27 Jan 2020 18:56:21 +0500
Subject: [PATCH 4/6] Parallel Backup - pg_basebackup
Implements the replication commands added in the backend replication
system and adds support for --jobs=NUM in pg_basebackup to take a full
backup in parallel using multiple connections. The utility will collect
a list of files from the server first and then workers will copy files
(one by one) over COPY protocol. The WAL files are also copied in similar
manner.
---
src/bin/pg_basebackup/pg_basebackup.c | 1080 +++++++++++++++++++++++--
1 file changed, 1015 insertions(+), 65 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 48bd838803b..7e392889809 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -13,6 +13,7 @@
#include "postgres_fe.h"
+#include <pthread.h>
#include <unistd.h>
#include <dirent.h>
#include <sys/stat.h>
@@ -85,12 +86,65 @@ typedef struct UnpackTarState
const char *mapped_tblspc_path;
pgoff_t current_len_left;
int current_padding;
+ size_t current_bytes_read;
FILE *file;
} UnpackTarState;
typedef void (*WriteDataCallback) (size_t nbytes, char *buf,
void *callback_data);
+typedef struct BackupFile
+{
+ char path[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+
+ int tsindex; /* index of tsInfo this file belongs to. */
+ struct BackupFile *next;
+} BackupFile;
+
+typedef enum BackupState
+{
+ PB_FETCH_REL_LIST,
+ PB_FETCH_REL_FILES,
+ PB_FETCH_WAL_LIST,
+ PB_FETCH_WAL_FILES,
+ PB_STOP_BACKUP,
+ PB_BACKUP_COMPLETE
+} BackupState;
+
+typedef struct BackupInfo
+{
+ int totalfiles;
+ uint64 bytes_skipped;
+ char xlogstart[64];
+ char xlogend[64];
+ BackupFile *files; /* list of BackupFile pointers */
+ BackupFile *curr; /* pointer to the file in the list */
+ BackupState backupstate;
+ bool workersdone;
+ int activeworkers;
+} BackupInfo;
+
+typedef struct WorkerState
+{
+ pthread_t worker;
+ int workerid;
+ BackupInfo *backupinfo;
+ PGconn *conn;
+ uint64 bytesread;
+} WorkerState;
+
+BackupInfo *backupinfo = NULL;
+WorkerState *workers = NULL;
+
+/* lock to be used for fetching file from the files list. */
+static pthread_mutex_t fetch_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* condition to be used when the files list is filled. */
+static pthread_cond_t data_ready = PTHREAD_COND_INITIALIZER;
+
/*
* pg_xlog has been renamed to pg_wal in version 10. This version number
* should be compared with PQserverVersion().
@@ -144,6 +198,9 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
+static int numWorkers = 1;
+static PGresult *tablespacehdr;
+
/* Progress counters */
static uint64 totalsize_kb;
static uint64 totaldone;
@@ -174,10 +231,12 @@ static PQExpBuffer recoveryconfcontents = NULL;
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
static void progress_report(int tablespacenum, const char *filename, bool force);
+static void workers_progress_report(uint64 totalBytesRead,
+ const char *filename, bool force);
static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
+static int ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
void *callback_data);
static void BaseBackup(void);
@@ -188,6 +247,22 @@ static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
static const char *get_tablespace_mapping(const char *dir);
static void tablespace_list_append(const char *arg);
+static void *worker_run(void *arg);
+static void create_parallel_workers(BackupInfo *backupInfo);
+static void parallel_backup_run(BackupInfo *backupInfo);
+static void cleanup_workers(void);
+static void stop_backup(void);
+static void get_backup_filelist(PGconn *conn, BackupInfo *backupInfo);
+static void get_wal_filelist(PGconn *conn, BackupInfo *backupInfo,
+ char *xlogstart, char *xlogend);
+static void free_filelist(BackupInfo *backupInfo);
+static int worker_get_files(WorkerState *wstate);
+static int receive_file(PGconn *conn, char *file, int tsIndex);
+static void create_backup_dirs(bool basetablespace, char *tablespace,
+ char *name);
+static void create_tblspc_symlink(char *filename);
+static void writefile(char *path, char *buf);
+static int fetch_max_wal_senders(PGconn *conn);
static void
cleanup_directories_atexit(void)
@@ -239,6 +314,8 @@ cleanup_directories_atexit(void)
static void
disconnect_atexit(void)
{
+ cleanup_workers();
+
if (conn != NULL)
PQfinish(conn);
}
@@ -386,6 +463,7 @@ usage(void)
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
printf(_(" --no-verify-checksums\n"
" do not verify checksums\n"));
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to backup\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=CONNSTR connection string\n"));
@@ -733,6 +811,94 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Print a progress report of worker threads. If verbose output
+ * is enabled, also print the current file name.
+ *
+ * Progress report is written at maximum once per second, unless the
+ * force parameter is set to true.
+ */
+static void
+workers_progress_report(uint64 totalBytesRead, const char *filename, bool force)
+{
+ int percent;
+ char totalBytesRead_str[32];
+ char totalsize_str[32];
+ pg_time_t now;
+
+ if (!showprogress)
+ return;
+
+ now = time(NULL);
+ if (now == last_progress_report && !force)
+ return; /* Max once per second */
+
+ last_progress_report = now;
+ percent = totalsize_kb ? (int) ((totalBytesRead / 1024) * 100 / totalsize_kb) : 0;
+
+ /*
+ * Avoid overflowing past 100% or the full size. This may make the total
+ * size number change as we approach the end of the backup (the estimate
+ * will always be wrong if WAL is included), but that's better than having
+ * the done column be bigger than the total.
+ */
+ if (percent > 100)
+ percent = 100;
+ if (totalBytesRead / 1024 > totalsize_kb)
+ totalsize_kb = totalBytesRead / 1024;
+
+ /*
+ * Separate step to keep platform-dependent format code out of
+ * translatable strings. And we only test for INT64_FORMAT availability
+ * in snprintf, not fprintf.
+ */
+ snprintf(totalBytesRead_str, sizeof(totalBytesRead_str), INT64_FORMAT,
+ totalBytesRead / 1024);
+ snprintf(totalsize_str, sizeof(totalsize_str), INT64_FORMAT, totalsize_kb);
+
+#define VERBOSE_FILENAME_LENGTH 35
+
+ if (verbose)
+ {
+ if (!filename)
+
+ /*
+ * No filename given, so clear the status line (used for last
+ * call)
+ */
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied %*s"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent,
+ VERBOSE_FILENAME_LENGTH + 5, "");
+ else
+ {
+ bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied, current file (%s%-*.*s)"),
+ (int) strlen(totalsize_str), totalBytesRead_str, totalsize_str,
+ percent,
+ /* Prefix with "..." if we do leading truncation */
+ truncate ? "..." : "",
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
+ /* Truncate filename at beginning if it's too long */
+ truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ }
+ }
+ else
+ {
+ fprintf(stderr, _("%*s/%s kB (%d%%) copied"),
+ (int) strlen(totalsize_str),
+ totalBytesRead_str, totalsize_str,
+ percent);
+ }
+
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\r");
+ else
+ fprintf(stderr, "\n");
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -749,7 +915,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
char totalsize_str[32];
pg_time_t now;
- if (!showprogress)
+ if (!showprogress || numWorkers > 1)
return;
now = time(NULL);
@@ -1439,7 +1605,7 @@ get_tablespace_mapping(const char *dir)
* specified directory. If it's for another tablespace, it will be restored
* in the original or mapped directory.
*/
-static void
+static int
ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
{
UnpackTarState state;
@@ -1470,13 +1636,12 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
exit(1);
}
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
/*
* No data is synced here, everything is done for all tablespaces at the
* end.
*/
+
+ return state.current_bytes_read;
}
static void
@@ -1499,6 +1664,7 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
exit(1);
}
totaldone += 512;
+ state->current_bytes_read += 512;
state->current_len_left = read_tar_number(©buf[124], 12);
@@ -1630,6 +1796,7 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
fclose(state->file);
state->file = NULL;
totaldone += r;
+ state->current_bytes_read += r;
return;
}
@@ -1639,6 +1806,7 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
exit(1);
}
totaldone += r;
+ state->current_bytes_read += r;
progress_report(state->tablespacenum, state->filename, false);
state->current_len_left -= r;
@@ -1706,6 +1874,24 @@ BaseBackup(void)
exit(1);
}
+ if (numWorkers > 1)
+ {
+ int max_wal_senders = fetch_max_wal_senders(conn);
+
+ /*
+ * In parallel backup mode, pg_basebackup opens numWorkers + 2
+ * connections. One of the two additional connections is used by the
+ * main application while the other one is used if WAL streaming is
+ * enabled (-X Stream).
+ */
+ if (numWorkers + 2 > max_wal_senders)
+ {
+ pg_log_error("number of requested workers exceeds max_wal_senders (currently %d)",
+ max_wal_senders);
+ exit(1);
+ }
+ }
+
/*
* Build contents of configuration file if requested
*/
@@ -1738,16 +1924,26 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
- escaped_label,
- showprogress ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (numWorkers <= 1)
+ {
+ basebkp =
+ psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
+ escaped_label,
+ showprogress ? "PROGRESS" : "",
+ includewal == FETCH_WAL ? "WAL" : "",
+ fastcheckpoint ? "FAST" : "",
+ includewal == NO_WAL ? "" : "NOWAIT",
+ maxrate_clause ? maxrate_clause : "",
+ format == 't' ? "TABLESPACE_MAP" : "",
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ }
+ else
+ {
+ basebkp =
+ psprintf("START_BACKUP LABEL '%s' %s",
+ escaped_label,
+ fastcheckpoint ? "FAST" : "");
+ }
if (PQsendQuery(conn, basebkp) == 0)
{
@@ -1794,10 +1990,36 @@ BaseBackup(void)
pg_log_info("write-ahead log start point: %s on timeline %u",
xlogstart, starttli);
+ if (numWorkers > 1)
+ {
+ /*
+ * Finish up the START_BACKUP command execution and make sure we have
+ * CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data for '%s': %s", "START_BACKUP",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+
+ basebkp = psprintf("LIST_TABLESPACES %s",
+ showprogress ? "PROGRESS" : "");
+
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "LIST_TABLESPACES", PQerrorMessage(conn));
+ exit(1);
+ }
+ }
+
/*
* Get the header
*/
- res = PQgetResult(conn);
+ tablespacehdr = res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not get backup header: %s",
@@ -1853,65 +2075,98 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
- for (i = 0; i < PQntuples(res); i++)
- {
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
-
- if (showprogress)
+ if (numWorkers <= 1)
{
- progress_report(PQntuples(res), NULL, true);
- if (isatty(fileno(stderr)))
- fprintf(stderr, "\n"); /* Need to move to next line */
- }
+ /*
+ * Start receiving chunks
+ */
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (format == 't')
+ ReceiveTarFile(conn, res, i);
+ else
+ ReceiveAndUnpackTarFile(conn, res, i);
+ } /* Loop over all tablespaces */
- PQclear(res);
+ if (showprogress)
+ {
+ progress_report(PQntuples(tablespacehdr), NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
- /*
- * Get the stop position
- */
- res = PQgetResult(conn);
- if (PQresultStatus(res) != PGRES_TUPLES_OK)
- {
- pg_log_error("could not get write-ahead log end position from server: %s",
- PQerrorMessage(conn));
- exit(1);
- }
- if (PQntuples(res) != 1)
- {
- pg_log_error("no write-ahead log end position returned from server");
- exit(1);
- }
- strlcpy(xlogend, PQgetvalue(res, 0, 0), sizeof(xlogend));
- if (verbose && includewal != NO_WAL)
- pg_log_info("write-ahead log end point: %s", xlogend);
- PQclear(res);
+ PQclear(res);
- res = PQgetResult(conn);
- if (PQresultStatus(res) != PGRES_COMMAND_OK)
- {
- const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+ /*
+ * Get the stop position
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get write-ahead log end position from server: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) != 1)
+ {
+ pg_log_error("no write-ahead log end position returned from server");
+ exit(1);
+ }
+ strlcpy(xlogend, PQgetvalue(res, 0, 0), sizeof(xlogend));
+ if (verbose && includewal != NO_WAL)
+ pg_log_info("write-ahead log end point: %s", xlogend);
+ PQclear(res);
- if (sqlstate &&
- strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
{
- pg_log_error("checksum error occurred");
- checksum_failure = true;
+ const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+ if (sqlstate &&
+ strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+ {
+ pg_log_error("checksum error occurred");
+ checksum_failure = true;
+ }
+ else
+ {
+ pg_log_error("final receive failed: %s",
+ PQerrorMessage(conn));
+ }
+ exit(1);
}
- else
+ }
+
+ if (numWorkers > 1)
+ {
+ /*
+ * Finish up the LIST_TABLESPACES command execution and make sure we
+ * have CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
{
- pg_log_error("final receive failed: %s",
+ pg_log_error("could not get data for '%s': %s", "LIST_TABLESPACES",
PQerrorMessage(conn));
+ exit(1);
}
- exit(1);
+ res = PQgetResult(conn);
+
+ backupinfo = palloc0(sizeof(BackupInfo));
+ backupinfo->backupstate = PB_FETCH_REL_LIST;
+
+ /* copy starting WAL location */
+ strlcpy(backupinfo->xlogstart, xlogstart, sizeof(backupinfo->xlogstart));
+ create_parallel_workers(backupinfo);
+ parallel_backup_run(backupinfo);
+ /* copy ending WAL location */
+ strlcpy(xlogend, backupinfo->xlogend, sizeof(xlogend));
}
+ /* Write recovery contents */
+ if (format == 'p' && writerecoveryconf)
+ WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
+
if (bgchild > 0)
{
#ifndef WIN32
@@ -2066,6 +2321,7 @@ main(int argc, char **argv)
{"waldir", required_argument, NULL, 1},
{"no-slot", no_argument, NULL, 2},
{"no-verify-checksums", no_argument, NULL, 3},
+ {"jobs", required_argument, NULL, 'j'},
{NULL, 0, NULL, 0}
};
int c;
@@ -2093,7 +2349,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPj:",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2234,6 +2490,9 @@ main(int argc, char **argv)
case 3:
verify_checksums = false;
break;
+ case 'j': /* number of jobs */
+ numWorkers = atoi(optarg);
+ break;
default:
/*
@@ -2348,6 +2607,30 @@ main(int argc, char **argv)
}
}
+ if (numWorkers <= 0)
+ {
+ pg_log_error("invalid number of parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (format != 'p' && numWorkers > 1)
+ {
+ pg_log_error("parallel jobs are only supported with 'plain' format");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (maxrate > 0 && numWorkers > 1)
+ {
+ pg_log_error("--max-rate is not supported with parallel jobs");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
#ifndef HAVE_LIBZ
if (compresslevel != 0)
{
@@ -2420,3 +2703,670 @@ main(int argc, char **argv)
success = true;
return 0;
}
+
+/*
+ * Worker thread function. Added for code readability.
+ */
+static void *
+worker_run(void *arg)
+{
+ WorkerState *wstate = (WorkerState *) arg;
+
+ worker_get_files(wstate);
+
+ return NULL;
+}
+
+/*
+ * Create workers and initialize worker state.
+ */
+static void
+create_parallel_workers(BackupInfo *backupinfo)
+{
+ int status,
+ i;
+
+ workers = (WorkerState *) palloc(sizeof(WorkerState) * numWorkers);
+ backupinfo->activeworkers = 0;
+
+ for (i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ worker->backupinfo = backupinfo;
+ worker->bytesread = 0;
+ worker->workerid = i;
+ worker->conn = GetConnection();
+ backupinfo->activeworkers++;
+
+ status = pthread_create(&worker->worker, NULL, worker_run, worker);
+ if (status != 0)
+ {
+ pg_log_error("failed to create thread: %m");
+ exit(1);
+ }
+
+ if (verbose)
+ pg_log_info("backup worker (%d) created, %d", i, status);
+ }
+}
+
+/*
+ * This is the main function that controls the worker, assign tasks and does
+ * cleanup.
+ */
+static void
+parallel_backup_run(BackupInfo *backupinfo)
+{
+ uint64_t totalread = 0;
+
+ while (1)
+ {
+ char *filename = NULL;
+
+ switch (backupinfo->backupstate)
+ {
+ case PB_FETCH_REL_LIST: /* get the list of files to fetch */
+ backupinfo->backupstate = PB_FETCH_REL_FILES;
+ /* retrieve backup file list from the server. */
+ get_backup_filelist(conn, backupinfo);
+ /* unblock any workers waiting on the condition */
+ pthread_cond_broadcast(&data_ready);
+ break;
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_LIST: /* get the list of WAL files to fetch */
+ backupinfo->backupstate = PB_FETCH_WAL_FILES;
+ get_wal_filelist(conn, backupinfo, backupinfo->xlogstart, backupinfo->xlogend);
+ /* unblock any workers waiting on the condition */
+ pthread_cond_broadcast(&data_ready);
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;
+ case PB_STOP_BACKUP:
+
+ /*
+ * All relation files have been fetched, time to stop the
+ * backup, making sure to fetch the WAL files first (if needs
+ * be).
+ */
+ if (includewal == FETCH_WAL)
+ backupinfo->backupstate = PB_FETCH_WAL_LIST;
+ else
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+
+ /* get the pg_control file at last. */
+ receive_file(conn, "global/pg_control", tablespacecount - 1);
+ stop_backup();
+ break;
+ case PB_BACKUP_COMPLETE:
+
+ /*
+ * All relation and WAL files, (if needed) have been fetched,
+ * now we can safly stop all workers and finish up.
+ */
+ cleanup_workers();
+ if (showprogress)
+ {
+ workers_progress_report(totalread, NULL, true);
+ if (isatty(fileno(stderr)))
+ fprintf(stderr, "\n"); /* Need to move to next line */
+ }
+
+ /* nothing more to do here */
+ return;
+ break;
+ default:
+ /* shouldn't come here. */
+ pg_log_error("unexpected backup state: %d",
+ backupinfo->backupstate);
+ exit(1);
+ break;
+ }
+
+ /* update and report progress */
+ totalread = 0;
+ for (int i = 0; i < numWorkers; i++)
+ {
+ WorkerState *worker = &workers[i];
+
+ totalread += worker->bytesread;
+ }
+ totalread += backupinfo->bytes_skipped;
+
+ if (backupinfo->curr != NULL)
+ filename = backupinfo->curr->path;
+
+ workers_progress_report(totalread, filename, false);
+ pg_usleep(100000);
+ }
+}
+
+/*
+ * Wait for the workers to complete the work and free connections.
+ */
+static void
+cleanup_workers(void)
+{
+ /* either non parallel backup */
+ if (!backupinfo)
+ return;
+ /* workers have already been stopped and cleanup has been done. */
+ if (backupinfo->workersdone)
+ return;
+
+ backupinfo->workersdone = true;
+ /* wakeup any workers waiting on the condition */
+ pthread_cond_broadcast(&data_ready);
+
+ for (int i = 0; i < numWorkers; i++)
+ {
+ pthread_join(workers[i].worker, NULL);
+ PQfinish(workers[i].conn);
+ }
+ free_filelist(backupinfo);
+}
+
+/*
+ * Take the system out of backup mode, also adds the backup_label file in
+ * the backup.
+ */
+static void
+stop_backup(void)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ basebkp = psprintf("STOP_BACKUP %s",
+ includewal == NO_WAL ? "" : "NOWAIT");
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not execute STOP BACKUP \"%s\"",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /*
+ * Get the stop position
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not get write-ahead log end position from server: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) != 1)
+ {
+ pg_log_error("no write-ahead log end position returned from server");
+ exit(1);
+ }
+
+ /* retrieve the end wal location. */
+ strlcpy(backupinfo->xlogend, PQgetvalue(res, 0, 0),
+ sizeof(backupinfo->xlogend));
+
+ /* retrieve the backup_label file contents and write them to the backup */
+ writefile("backup_label", PQgetvalue(res, 0, 2));
+
+ PQclear(res);
+
+ /*
+ * Finish up the Stop command execution and make sure we have
+ * CommandComplete and ReadyForQuery response.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data %s", PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+
+ if (verbose && includewal != NO_WAL)
+ pg_log_info("write-ahead log end point: %s", backupinfo->xlogend);
+}
+
+/*
+ * Retrieves the list of files available in $PGDATA from the server.
+ */
+static void
+get_backup_filelist(PGconn *conn, BackupInfo *backupInfo)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+
+ for (int i = 0; i < tablespacecount; i++)
+ {
+ bool basetablespace;
+ char *tablespace;
+ int numFiles;
+
+ /*
+ * Query server to fetch the file list for given tablespace name. If
+ * the tablespace name is empty, it will fetch files list of 'base'
+ * tablespace.
+ */
+ basetablespace = PQgetisnull(tablespacehdr, i, 0);
+ tablespace = PQgetvalue(tablespacehdr, i, 1);
+
+ basebkp = psprintf("LIST_FILES '%s'",
+ basetablespace ? "" : tablespace);
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "LIST_FILES", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not list backup files: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ if (PQntuples(res) < 1)
+ {
+ pg_log_error("no data returned from server");
+ exit(1);
+ }
+
+ numFiles = PQntuples(res);
+ for (int j = 0; j < numFiles; j++)
+ {
+ BackupFile *file;
+ char *path = PQgetvalue(res, j, 0);
+ char type = PQgetvalue(res, j, 1)[0];
+ int32 size = atol(PQgetvalue(res, j, 2));
+ time_t mtime = atol(PQgetvalue(res, j, 3));
+
+ /*
+ * In 'plain' format, create backup directories first.
+ */
+ if (format == 'p' && type == 'd')
+ {
+ /*
+ * directory entries are skipped. however, a tar header size
+ * was included for them in totalsize_kb, so we need to add it
+ * for progress reporting purpose.
+ */
+ backupInfo->bytes_skipped += 512;
+ create_backup_dirs(basetablespace, tablespace, path);
+ continue;
+ }
+
+ if (format == 'p' && type == 'l')
+ {
+ /*
+ * symlink entries are skipped. however, a tar header size was
+ * included for them in totalsize_kb, so we need to add it for
+ * progress reporting purpose.
+ */
+ backupInfo->bytes_skipped += 512;
+ create_tblspc_symlink(path);
+ continue;
+ }
+
+ file = (BackupFile *) palloc(sizeof(BackupFile));
+ strlcpy(file->path, path, MAXPGPATH);
+ file->type = type;
+ file->size = size;
+ file->mtime = mtime;
+ file->tsindex = i;
+
+ /* add to the files list */
+ backupInfo->totalfiles++;
+ if (backupInfo->curr == NULL)
+ backupInfo->curr = backupInfo->files = file;
+ else
+ {
+ backupInfo->curr->next = file;
+ backupInfo->curr = backupInfo->curr->next;
+ }
+ }
+
+ PQclear(res);
+
+ /*
+ * Finish up the LIST_FILES command execution and make sure we have
+ * CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data for '%s': %s", "LIST_FILES",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+ }
+
+ /* point curr to the head of list. */
+ backupInfo->curr = backupInfo->files;
+}
+
+/*
+ * Retrieve WAL file list from the server based on the starting wal location
+ * and ending wal location.
+ */
+static void
+get_wal_filelist(PGconn *conn, BackupInfo *backupInfo, char *xlogstart, char *xlogend)
+{
+ PGresult *res = NULL;
+ char *basebkp;
+ int numWals;
+
+ basebkp = psprintf("LIST_WAL_FILES START_WAL_LOCATION '%s' END_WAL_LOCATION '%s'",
+ xlogstart, xlogend);
+
+ if (PQsendQuery(conn, basebkp) == 0)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "LIST_FILES", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not list wal files: %s",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+
+ numWals = PQntuples(res);
+ for (int i = 0; i < numWals; i++)
+ {
+ BackupFile *file = (BackupFile *) palloc0(sizeof(BackupFile));
+
+ if (backupInfo->curr == NULL)
+ backupInfo->curr = backupInfo->files = file;
+ else
+ {
+ backupInfo->curr->next = file;
+ backupInfo->curr = file;
+ }
+
+ strlcpy(file->path, PQgetvalue(res, i, 0), MAXPGPATH);
+ file->tsindex = tablespacecount - 1;
+ backupInfo->totalfiles++;
+ }
+
+ /*
+ * Finish up the LIST_WAL_FILES command execution and make sure we have
+ * CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data for '%s': %s", "LIST_WAL_FILES",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+
+ /* point curr to the head of list. */
+ backupInfo->curr = backupInfo->files;
+}
+
+/* free files list */
+static void
+free_filelist(BackupInfo *backupInfo)
+{
+ /* free files list */
+ if (backupInfo->files != NULL)
+ {
+ backupInfo->curr = backupInfo->files;
+ while (backupInfo->curr != NULL)
+ {
+ BackupFile *file = backupInfo->curr;
+
+ backupInfo->curr = file->next;
+
+ pfree(file);
+ }
+
+ backupInfo->files = NULL;
+ backupInfo->totalfiles = 0;
+ }
+}
+
+/*
+ * Worker function to process and retrieve the files from the server. If the
+ * files list is empty, it will wait for it to be filled. Otherwise picks the
+ * next file in the list.
+ */
+static int
+worker_get_files(WorkerState *wstate)
+{
+ BackupFile *fetchfile = NULL;
+ BackupInfo *backupinfo = wstate->backupinfo;
+
+ while (!backupinfo->workersdone)
+ {
+ pthread_mutex_lock(&fetch_mutex);
+ if (backupinfo->curr == NULL)
+ {
+ /*
+ * Wait until there is data available in the list to process.
+ * pthread_cond_wait call unlocks the already locked mutex during
+ * the wait state. When the condition is true (a signal is
+ * raised), one of the competing threads acquires the mutex.
+ */
+ backupinfo->activeworkers--;
+ pthread_cond_wait(&data_ready, &fetch_mutex);
+ backupinfo->activeworkers++;
+ }
+
+ fetchfile = backupinfo->curr;
+ if (fetchfile != NULL)
+ {
+ backupinfo->totalfiles--;
+ backupinfo->curr = fetchfile->next;
+ }
+ pthread_mutex_unlock(&fetch_mutex);
+
+ if (fetchfile != NULL)
+ {
+ wstate->bytesread +=
+ receive_file(wstate->conn, fetchfile->path, fetchfile->tsindex);
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * This function fetches the requested file from the server.
+ */
+static int
+receive_file(PGconn *conn, char *file, int tsIndex)
+{
+ PGresult *res = NULL;
+ int bytesread;
+ PQExpBuffer buf = createPQExpBuffer();
+
+ /*
+ * Fetch a single file from the server. To fetch the file, build a query
+ * in form of:
+ *
+ * SEND_FILES ('base/1/1245/32683') [options]
+ */
+ appendPQExpBuffer(buf, "SEND_FILES ( '%s' )", file);
+
+ /* add options */
+ appendPQExpBuffer(buf, " START_WAL_LOCATION '%s' %s",
+ backupinfo->xlogstart,
+ verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
+ if (!conn)
+ return 1;
+
+ if (PQsendQuery(conn, buf->data) == 0)
+ {
+ pg_log_error("could not send files list \"%s\"",
+ PQerrorMessage(conn));
+ return 1;
+ }
+
+ destroyPQExpBuffer(buf);
+
+ /* process file contents, also count bytesRead for progress */
+ bytesread = ReceiveAndUnpackTarFile(conn, tablespacehdr, tsIndex);
+
+ PQclear(res);
+
+ /*
+ * Finish up the SEND_FILES command execution and make sure we have
+ * CommandComplete.
+ */
+ res = PQgetResult(conn);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("could not get data for '%s': %s", "SEND_FILES",
+ PQerrorMessage(conn));
+ exit(1);
+ }
+ res = PQgetResult(conn);
+ return bytesread;
+}
+
+/*
+ * Create backup directories while taking care of tablespace path. If tablespace
+ * mapping (with -T) is given then the directory will be created on the mapped
+ * path.
+ */
+static void
+create_backup_dirs(bool basetablespace, char *tablespace, char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ Assert(name != NULL);
+
+ if (basetablespace)
+ snprintf(dirpath, sizeof(dirpath), "%s/%s", basedir, name);
+ else
+ {
+ Assert(tablespace != NULL);
+ snprintf(dirpath, sizeof(dirpath), "%s/%s",
+ get_tablespace_mapping(tablespace), (name + strlen(tablespace) + 1));
+ }
+
+ if (pg_mkdir_p(dirpath, pg_dir_create_mode) != 0)
+ {
+ if (errno != EEXIST)
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ dirpath);
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Create a symlink in pg_tblspc and apply any tablespace mapping given on
+ * the command line (--tablespace-mapping).
+ */
+static void
+create_tblspc_symlink(char *filename)
+{
+ int i;
+
+ for (i = 0; i < tablespacecount; i++)
+ {
+ char *tsoid = PQgetvalue(tablespacehdr, i, 0);
+
+ if (strstr(filename, tsoid) != NULL)
+ {
+ char *linkloc = psprintf("%s/%s", basedir, filename);
+ const char *mapped_tblspc_path = get_tablespace_mapping(PQgetvalue(tablespacehdr, i, 1));
+
+#ifdef HAVE_SYMLINK
+ if (symlink(mapped_tblspc_path, linkloc) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ linkloc, mapped_tblspc_path);
+ exit(1);
+ }
+#else
+ pg_log_error("symlinks are not supported on this platform");
+ exit(1);
+#endif
+ free(linkloc);
+ break;
+ }
+ }
+}
+
+/*
+ * General function for writing to a file; creates one if it doesn't exist
+ */
+static void
+writefile(char *path, char *buf)
+{
+ FILE *f;
+ char pathbuf[MAXPGPATH];
+
+ snprintf(pathbuf, MAXPGPATH, "%s/%s", basedir, path);
+ f = fopen(pathbuf, "w");
+ if (f == NULL)
+ {
+ pg_log_error("could not open file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fwrite(buf, strlen(buf), 1, f) != 1)
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+
+ if (fclose(f))
+ {
+ pg_log_error("could not write to file \"%s\": %m", pathbuf);
+ exit(1);
+ }
+}
+
+static int
+fetch_max_wal_senders(PGconn *conn)
+{
+ PGresult *res;
+ int max_wal_senders;
+
+ /* check connection existence */
+ Assert(conn != NULL);
+
+ res = PQexec(conn, "SHOW max_wal_senders");
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("could not send replication command \"%s\": %s",
+ "SHOW max_wal_senders", PQerrorMessage(conn));
+
+ PQclear(res);
+ return -1;
+ }
+
+ if (PQntuples(res) != 1 || PQnfields(res) < 1)
+ {
+ pg_log_error("could not fetch max wal senders: got %d rows and %d fields, expected %d rows and %d or more fields",
+ PQntuples(res), PQnfields(res), 1, 1);
+
+ PQclear(res);
+ return false;
+ }
+
+ max_wal_senders = atoi(PQgetvalue(res, 0, 0));
+ PQclear(res);
+
+ return max_wal_senders;
+}
--
2.21.1 (Apple Git-122.3)
0002-Refactor-some-backup-code-to-increase-reusability.-T_v9.patchapplication/octet-stream; name=0002-Refactor-some-backup-code-to-increase-reusability.-T_v9.patchDownload
From 1d41fa411fc02db73a49277779baeb022f3ae82d Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 27 Jan 2020 17:48:10 +0500
Subject: [PATCH 2/6] Refactor some backup code to increase reusability. This
commit adds two functions; collect_tablespaces and collect_wal_files. The
code related to collect tablespace information is moved from
do_pg_start_backup to collect_tablespaces function. Also, the code to collect
wal files is moved from perform_base_backup to collect_wal_files.
This does not introduce any functional changes.
---
src/backend/access/transam/xlog.c | 191 ++++++++++++-----------
src/backend/replication/basebackup.c | 217 +++++++++++++++------------
src/include/access/xlog.h | 2 +
3 files changed, 219 insertions(+), 191 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4fa446ffa42..f5670141126 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10348,10 +10348,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
{
bool gotUniqueStartpoint = false;
- DIR *tblspcdir;
- struct dirent *de;
- tablespaceinfo *ti;
- int datadirpathlen;
/*
* Force an XLOG file switch before the checkpoint, to ensure that the
@@ -10477,8 +10473,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
if (exclusive)
tblspcmapfile = makeStringInfo();
- datadirpathlen = strlen(DataDir);
-
/*
* Report that we are now estimating the total backup size
* if we're streaming base backup as requested by pg_basebackup
@@ -10487,91 +10481,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
- /* Collect information about all tablespaces */
- tblspcdir = AllocateDir("pg_tblspc");
- while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
- {
- char fullpath[MAXPGPATH + 10];
- char linkpath[MAXPGPATH];
- char *relpath = NULL;
- int rllen;
- StringInfoData buflinkpath;
- char *s = linkpath;
-
- /* Skip special stuff */
- if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
- continue;
-
- snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
-
-#if defined(HAVE_READLINK) || defined(WIN32)
- rllen = readlink(fullpath, linkpath, sizeof(linkpath));
- if (rllen < 0)
- {
- ereport(WARNING,
- (errmsg("could not read symbolic link \"%s\": %m",
- fullpath)));
- continue;
- }
- else if (rllen >= sizeof(linkpath))
- {
- ereport(WARNING,
- (errmsg("symbolic link \"%s\" target is too long",
- fullpath)));
- continue;
- }
- linkpath[rllen] = '\0';
-
- /*
- * Add the escape character '\\' before newline in a string to
- * ensure that we can distinguish between the newline in the
- * tablespace path and end of line while reading tablespace_map
- * file during archive recovery.
- */
- initStringInfo(&buflinkpath);
-
- while (*s)
- {
- if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
- appendStringInfoChar(&buflinkpath, '\\');
- appendStringInfoChar(&buflinkpath, *s++);
- }
-
- /*
- * Relpath holds the relative path of the tablespace directory
- * when it's located within PGDATA, or NULL if it's located
- * elsewhere.
- */
- if (rllen > datadirpathlen &&
- strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
- IS_DIR_SEP(linkpath[datadirpathlen]))
- relpath = linkpath + datadirpathlen + 1;
-
- ti = palloc(sizeof(tablespaceinfo));
- ti->oid = pstrdup(de->d_name);
- ti->path = pstrdup(buflinkpath.data);
- ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
-
- if (tablespaces)
- *tablespaces = lappend(*tablespaces, ti);
-
- appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
-
- pfree(buflinkpath.data);
-#else
-
- /*
- * If the platform does not have symbolic links, it should not be
- * possible to have tablespaces - clearly somebody else created
- * them. Warn about it and ignore.
- */
- ereport(WARNING,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("tablespaces are not supported on this platform")));
-#endif
- }
- FreeDir(tblspcdir);
+ collect_tablespaces(tablespaces, tblspcmapfile, infotbssize, needtblspcmapfile);
/*
* Construct backup label file
@@ -12390,3 +12300,102 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Collect information about all tablespaces.
+ */
+void
+collect_tablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile)
+{
+ DIR *tblspcdir;
+ struct dirent *de;
+ tablespaceinfo *ti;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ tblspcdir = AllocateDir("pg_tblspc");
+ while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+ {
+ char fullpath[MAXPGPATH + 10];
+ char linkpath[MAXPGPATH];
+ char *relpath = NULL;
+ int rllen;
+ StringInfoData buflinkpath;
+ char *s = linkpath;
+
+ /* Skip special stuff */
+ if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
+ continue;
+
+ snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+
+#if defined(HAVE_READLINK) || defined(WIN32)
+ rllen = readlink(fullpath, linkpath, sizeof(linkpath));
+ if (rllen < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not read symbolic link \"%s\": %m",
+ fullpath)));
+ continue;
+ }
+ else if (rllen >= sizeof(linkpath))
+ {
+ ereport(WARNING,
+ (errmsg("symbolic link \"%s\" target is too long",
+ fullpath)));
+ continue;
+ }
+ linkpath[rllen] = '\0';
+
+ /*
+ * Add the escape character '\\' before newline in a string to ensure
+ * that we can distinguish between the newline in the tablespace path
+ * and end of line while reading tablespace_map file during archive
+ * recovery.
+ */
+ initStringInfo(&buflinkpath);
+
+ while (*s)
+ {
+ if ((*s == '\n' || *s == '\r') && needtblspcmapfile)
+ appendStringInfoChar(&buflinkpath, '\\');
+ appendStringInfoChar(&buflinkpath, *s++);
+ }
+
+ /*
+ * Relpath holds the relative path of the tablespace directory when
+ * it's located within PGDATA, or NULL if it's located elsewhere.
+ */
+ if (rllen > datadirpathlen &&
+ strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
+ IS_DIR_SEP(linkpath[datadirpathlen]))
+ relpath = linkpath + datadirpathlen + 1;
+
+ ti = palloc(sizeof(tablespaceinfo));
+ ti->oid = pstrdup(de->d_name);
+ ti->path = pstrdup(buflinkpath.data);
+ ti->rpath = relpath ? pstrdup(relpath) : NULL;
+ ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+
+ if (tablespaces)
+ *tablespaces = lappend(*tablespaces, ti);
+
+ appendStringInfo(tblspcmapfile, "%s %s\n", ti->oid, ti->path);
+
+ pfree(buflinkpath.data);
+#else
+
+ /*
+ * If the platform does not have symbolic links, it should not be
+ * possible to have tablespaces - clearly somebody else created them.
+ * Warn about it and ignore.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("tablespaces are not supported on this platform")));
+#endif
+ }
+ FreeDir(tblspcdir);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ca074d59ac9..abc3bad01ee 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -67,6 +67,8 @@ static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *sta
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
+static List *collect_wal_files(XLogRecPtr startptr, XLogRecPtr endptr,
+ List **historyFileList);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
@@ -438,115 +440,16 @@ perform_base_backup(basebackup_options *opt)
*/
char pathbuf[MAXPGPATH];
XLogSegNo segno;
- XLogSegNo startsegno;
- XLogSegNo endsegno;
struct stat statbuf;
List *historyFileList = NIL;
List *walFileList = NIL;
- char firstoff[MAXFNAMELEN];
- char lastoff[MAXFNAMELEN];
- DIR *dir;
- struct dirent *de;
ListCell *lc;
TimeLineID tli;
pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
- /*
- * I'd rather not worry about timelines here, so scan pg_wal and
- * include all WAL files in the range between 'startptr' and 'endptr',
- * regardless of the timeline the file is stamped with. If there are
- * some spurious WAL files belonging to timelines that don't belong in
- * this server's history, they will be included too. Normally there
- * shouldn't be such files, but if there are, there's little harm in
- * including them.
- */
- XLByteToSeg(startptr, startsegno, wal_segment_size);
- XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
- XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
- XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
-
- dir = AllocateDir("pg_wal");
- while ((de = ReadDir(dir, "pg_wal")) != NULL)
- {
- /* Does it look like a WAL segment, and is it in the range? */
- if (IsXLogFileName(de->d_name) &&
- strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
- strcmp(de->d_name + 8, lastoff + 8) <= 0)
- {
- walFileList = lappend(walFileList, pstrdup(de->d_name));
- }
- /* Does it look like a timeline history file? */
- else if (IsTLHistoryFileName(de->d_name))
- {
- historyFileList = lappend(historyFileList, pstrdup(de->d_name));
- }
- }
- FreeDir(dir);
-
- /*
- * Before we go any further, check that none of the WAL segments we
- * need were removed.
- */
- CheckXLogRemoved(startsegno, ThisTimeLineID);
-
- /*
- * Sort the WAL filenames. We want to send the files in order from
- * oldest to newest, to reduce the chance that a file is recycled
- * before we get a chance to send it over.
- */
- list_sort(walFileList, compareWalFileNames);
-
- /*
- * There must be at least one xlog file in the pg_wal directory, since
- * we are doing backup-including-xlog.
- */
- if (walFileList == NIL)
- ereport(ERROR,
- (errmsg("could not find any WAL files")));
-
- /*
- * Sanity check: the first and last segment should cover startptr and
- * endptr, with no gaps in between.
- */
- XLogFromFileName((char *) linitial(walFileList),
- &tli, &segno, wal_segment_size);
- if (segno != startsegno)
- {
- char startfname[MAXFNAMELEN];
-
- XLogFileName(startfname, ThisTimeLineID, startsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", startfname)));
- }
- foreach(lc, walFileList)
- {
- char *walFileName = (char *) lfirst(lc);
- XLogSegNo currsegno = segno;
- XLogSegNo nextsegno = segno + 1;
-
- XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
- if (!(nextsegno == segno || currsegno == segno))
- {
- char nextfname[MAXFNAMELEN];
-
- XLogFileName(nextfname, ThisTimeLineID, nextsegno,
- wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", nextfname)));
- }
- }
- if (segno != endsegno)
- {
- char endfname[MAXFNAMELEN];
-
- XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
- ereport(ERROR,
- (errmsg("could not find WAL file \"%s\"", endfname)));
- }
-
+ walFileList = collect_wal_files(startptr, endptr, &historyFileList);
/* Ok, we have everything we need. Send the WAL files. */
foreach(lc, walFileList)
{
@@ -681,6 +584,120 @@ perform_base_backup(basebackup_options *opt)
pgstat_progress_end_command();
}
+/*
+ * construct a list of WAL files to be included in the backup.
+ */
+static List *
+collect_wal_files(XLogRecPtr startptr, XLogRecPtr endptr, List **historyFileList)
+{
+ XLogSegNo segno;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
+ List *walFileList = NIL;
+ char firstoff[MAXFNAMELEN];
+ char lastoff[MAXFNAMELEN];
+ DIR *dir;
+ struct dirent *de;
+ ListCell *lc;
+ TimeLineID tli;
+
+ /*
+ * I'd rather not worry about timelines here, so scan pg_wal and include
+ * all WAL files in the range between 'startptr' and 'endptr', regardless
+ * of the timeline the file is stamped with. If there are some spurious
+ * WAL files belonging to timelines that don't belong in this server's
+ * history, they will be included too. Normally there shouldn't be such
+ * files, but if there are, there's little harm in including them.
+ */
+ XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
+ XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
+ XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
+
+ dir = AllocateDir("pg_wal");
+ while ((de = ReadDir(dir, "pg_wal")) != NULL)
+ {
+ /* Does it look like a WAL segment, and is it in the range? */
+ if (IsXLogFileName(de->d_name) &&
+ strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+ strcmp(de->d_name + 8, lastoff + 8) <= 0)
+ {
+ walFileList = lappend(walFileList, pstrdup(de->d_name));
+ }
+ /* Does it look like a timeline history file? */
+ else if (IsTLHistoryFileName(de->d_name))
+ {
+ if (historyFileList)
+ *historyFileList = lappend(*historyFileList, pstrdup(de->d_name));
+ }
+ }
+ FreeDir(dir);
+
+ /*
+ * Before we go any further, check that none of the WAL segments we need
+ * were removed.
+ */
+ CheckXLogRemoved(startsegno, ThisTimeLineID);
+
+ /*
+ * Sort the WAL filenames. We want to send the files in order from oldest
+ * to newest, to reduce the chance that a file is recycled before we get a
+ * chance to send it over.
+ */
+ list_sort(walFileList, compareWalFileNames);
+
+ /*
+ * There must be at least one xlog file in the pg_wal directory, since we
+ * are doing backup-including-xlog.
+ */
+ if (walFileList == NIL)
+ ereport(ERROR,
+ (errmsg("could not find any WAL files")));
+
+ /*
+ * Sanity check: the first and last segment should cover startptr and
+ * endptr, with no gaps in between.
+ */
+ XLogFromFileName((char *) linitial(walFileList),
+ &tli, &segno, wal_segment_size);
+ if (segno != startsegno)
+ {
+ char startfname[MAXFNAMELEN];
+
+ XLogFileName(startfname, ThisTimeLineID, startsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", startfname)));
+ }
+ foreach(lc, walFileList)
+ {
+ char *walFileName = (char *) lfirst(lc);
+ XLogSegNo currsegno = segno;
+ XLogSegNo nextsegno = segno + 1;
+
+ XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
+ if (!(nextsegno == segno || currsegno == segno))
+ {
+ char nextfname[MAXFNAMELEN];
+
+ XLogFileName(nextfname, ThisTimeLineID, nextsegno,
+ wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", nextfname)));
+ }
+ }
+ if (segno != endsegno)
+ {
+ char endfname[MAXFNAMELEN];
+
+ XLogFileName(endfname, ThisTimeLineID, endsegno, wal_segment_size);
+ ereport(ERROR,
+ (errmsg("could not find WAL file \"%s\"", endfname)));
+ }
+
+ return walFileList;
+}
+
/*
* list_sort comparison function, to compare log/seg portion of WAL segment
* filenames, ignoring the timeline portion.
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 98b033fc208..22fe35801dc 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -350,6 +350,8 @@ extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
extern void do_pg_abort_backup(int code, Datum arg);
+extern void collect_tablespaces(List **tablespaces, StringInfo tblspcmapfile,
+ bool infotbssize, bool needtblspcmapfile);
extern void register_persistent_abort_backup_handler(void);
extern SessionBackupState get_backup_status(void);
--
2.21.1 (Apple Git-122.3)
0003-Parallel-Backup-Backend-Replication-commands_v9.patchapplication/octet-stream; name=0003-Parallel-Backup-Backend-Replication-commands_v9.patchDownload
From ab91e2c9078bfe42fb9306314304c558a41b7632 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Mon, 27 Jan 2020 18:32:42 +0500
Subject: [PATCH 3/6] Parallel Backup - Backend Replication commands
This feature adds following replication commands to the backend replication
system, to help facilitate taking a full backup in parallel using multiple
connections.
- START_BACKUP [LABEL '<label>'] [FAST]
This command instructs the server to get prepared for performing an
online backup.
- STOP_BACKUP [NOWAIT]
This command instructs the server that online backup is finished. It
will bring the system out of backup mode.
- LIST_TABLESPACES [PROGRESS]
This command instructs the server to return a list of tablespaces.
- LIST_FILES [TABLESPACE]
This command instructs the server to return a list of files for a
given tablespace, base tablespace if TABLESPACE is empty.
- LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
This command instructs the server to return a list WAL files between
the given locations.
- SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]
Instructs the server to send the contents of the requested FILE(s).
---
src/backend/access/transam/xlog.c | 4 +-
src/backend/replication/basebackup.c | 529 ++++++++++++++++++++++++-
src/backend/replication/repl_gram.y | 265 +++++++++++--
src/backend/replication/repl_scanner.l | 8 +
src/include/nodes/replnodes.h | 12 +
src/include/replication/basebackup.h | 2 +-
6 files changed, 751 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f5670141126..4189b056c88 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11128,7 +11128,7 @@ do_pg_abort_backup(int code, Datum arg)
if (emit_warning)
ereport(WARNING,
- (errmsg("aborting backup due to backend exiting before pg_stop_back up was called")));
+ (errmsg("aborting backup due to backend exiting while a non-exclusive backup is in progress")));
}
/*
@@ -12377,7 +12377,7 @@ collect_tablespaces(List **tablespaces, StringInfo tblspcmapfile,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ? sendTablespace(fullpath, true) : -1;
+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index abc3bad01ee..a294d77da50 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -39,6 +39,8 @@
#include "storage/ipc.h"
#include "storage/reinit.h"
#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/pg_lsn.h"
#include "utils/ps_status.h"
#include "utils/relcache.h"
#include "utils/timestamp.h"
@@ -52,11 +54,22 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ XLogRecPtr startwallocation;
+ XLogRecPtr endwallocation;
+ char *tablespace;
} basebackup_options;
+typedef struct
+{
+ char path[MAXPGPATH];
+ char type;
+ size_t size;
+ time_t mtime;
+} BackupFile;
+
static int64 sendDir(const char *path, int basepathlen, bool dryrun,
- List *tablespaces, bool sendtblspclinks);
+ List *tablespaces, bool sendtblspclinks, List **filelist);
static bool sendFile(const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
@@ -70,12 +83,28 @@ static void perform_base_backup(basebackup_options *opt);
static List *collect_wal_files(XLogRecPtr startptr, XLogRecPtr endptr,
List **historyFileList);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli, StringInfo label);
+static void SendFilesHeader(List *files);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void start_backup(basebackup_options *opt);
+static void stop_backup(basebackup_options *opt);
+static void list_tablespaces(basebackup_options *opt);
+static void list_files(basebackup_options *opt);
+static void list_wal_files(basebackup_options *opt);
+static void send_files(basebackup_options *opt, List *filenames,
+ bool missing_ok);
+static void add_to_filelist(List **filelist, char *path, char type,
+ size_t size, time_t mtime);
+
+/*
+ * Store label file during non-exclusive backups.
+ */
+static StringInfo label_file;
+
/* Was the backup currently in-progress initiated in recovery mode? */
static bool backup_started_in_recovery = false;
@@ -303,7 +332,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
- ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
tablespaces = lappend(tablespaces, ti);
/*
@@ -336,7 +365,7 @@ perform_base_backup(basebackup_options *opt)
}
/* Send the starting position of the backup */
- SendXlogRecPtrResult(startptr, starttli);
+ SendXlogRecPtrResult(startptr, starttli, NULL);
/* Send tablespace header */
SendBackupHeader(tablespaces);
@@ -391,10 +420,10 @@ perform_base_backup(basebackup_options *opt)
if (tblspc_map_file && opt->sendtblspcmapfile)
{
sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data);
- sendDir(".", 1, false, tablespaces, false);
+ sendDir(".", 1, false, tablespaces, false, NULL);
}
else
- sendDir(".", 1, false, tablespaces, true);
+ sendDir(".", 1, false, tablespaces, true, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -405,7 +434,7 @@ perform_base_backup(basebackup_options *opt)
sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
- sendTablespace(ti->path, false);
+ sendTablespace(ti->path, false, NULL);
/*
* If we're including WAL, and this is the main data directory we
@@ -568,7 +597,7 @@ perform_base_backup(basebackup_options *opt)
/* Send CopyDone message for the last tar file */
pq_putemptymessage('c');
}
- SendXlogRecPtrResult(endptr, endtli);
+ SendXlogRecPtrResult(endptr, endtli, NULL);
if (total_checksum_failures)
{
@@ -726,6 +755,9 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_maxrate = false;
bool o_tablespace_map = false;
bool o_noverify_checksums = false;
+ bool o_startwallocation = false;
+ bool o_endwallocation = false;
+ bool o_tablespace = false;
MemSet(opt, 0, sizeof(*opt));
foreach(lopt, options)
@@ -814,12 +846,47 @@ parse_basebackup_options(List *options, basebackup_options *opt)
noverify_checksums = true;
o_noverify_checksums = true;
}
+ else if (strcmp(defel->defname, "start_wal_location") == 0)
+ {
+ bool have_error = false;
+ char *startwallocation;
+
+ if (o_startwallocation)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ startwallocation = strVal(defel->arg);
+ opt->startwallocation = pg_lsn_in_internal(startwallocation, &have_error);
+ o_startwallocation = true;
+ }
+ else if (strcmp(defel->defname, "end_wal_location") == 0)
+ {
+ bool have_error = false;
+ char *endwallocation;
+
+ if (o_endwallocation)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+
+ endwallocation = strVal(defel->arg);
+ opt->endwallocation = pg_lsn_in_internal(endwallocation, &have_error);
+ o_endwallocation = true;
+ }
+ else if (strcmp(defel->defname, "tablespace") == 0)
+ {
+ if (o_tablespace)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->tablespace = strVal(defel->arg);
+ o_tablespace = true;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
}
- if (opt->label == NULL)
- opt->label = "base backup";
}
@@ -837,6 +904,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
parse_basebackup_options(cmd->options, &opt);
+ /* default value for label, if not specified. */
+ if (opt.label == NULL)
+ {
+ if (cmd->cmdtag == BASE_BACKUP)
+ opt.label = "base backup";
+ else
+ opt.label = "start backup";
+ }
+
WalSndSetState(WALSNDSTATE_BACKUP);
if (update_process_title)
@@ -848,7 +924,34 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
+ {
+ case BASE_BACKUP:
+ perform_base_backup(&opt);
+ break;
+ case START_BACKUP:
+ start_backup(&opt);
+ break;
+ case LIST_TABLESPACES:
+ list_tablespaces(&opt);
+ break;
+ case LIST_FILES:
+ list_files(&opt);
+ break;
+ case SEND_FILES:
+ send_files(&opt, cmd->backupfiles, true);
+ break;
+ case STOP_BACKUP:
+ stop_backup(&opt);
+ break;
+ case LIST_WAL_FILES:
+ list_wal_files(&opt);
+ break;
+ default:
+ elog(ERROR, "unrecognized replication command tag: %u",
+ cmd->cmdtag);
+ break;
+ }
}
static void
@@ -936,18 +1039,18 @@ SendBackupHeader(List *tablespaces)
}
/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
*/
static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli, StringInfo label)
{
StringInfoData buf;
char str[MAXFNAMELEN];
Size len;
pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
+ pq_sendint16(&buf, 3); /* 3 fields */
/* Field headers */
pq_sendstring(&buf, "recptr");
@@ -970,11 +1073,19 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_sendint16(&buf, -1);
pq_sendint32(&buf, 0);
pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "label");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
pq_endmessage(&buf);
/* Data row */
pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
+ pq_sendint16(&buf, 3); /* number of columns */
len = snprintf(str, sizeof(str),
"%X/%X", (uint32) (ptr >> 32), (uint32) ptr);
@@ -985,12 +1096,109 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
pq_sendint32(&buf, len);
pq_sendbytes(&buf, str, len);
+ if (label)
+ {
+ len = label->len;
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, label->data, len);
+ }
+ else
+ {
+ pq_sendint32(&buf, -1); /* NULL */
+ }
+
pq_endmessage(&buf);
/* Send a CommandComplete message */
pq_puttextmessage('C', "SELECT");
}
+
+/*
+ * Sends the resultset containing filename, type (where type can be f' for
+ * regular, 'd' for directory, 'l' for link), file size and modification time).
+ */
+static void
+SendFilesHeader(List *files)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ /* Construct and send the list of files */
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 4); /* n field */
+
+ /* First field - file name */
+ pq_sendstring(&buf, "path");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Second field - is_dir */
+ pq_sendstring(&buf, "type");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, CHAROID);
+ pq_sendint16(&buf, 1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Fourth field - mtime */
+ pq_sendstring(&buf, "mtime");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, files)
+ {
+ BackupFile *file = (BackupFile *) lfirst(lc);
+ Size len;
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 4); /* number of columns */
+
+ /* send path */
+ len = strlen(file->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, file->path, len);
+
+ /* send type */
+ pq_sendint32(&buf, 1);
+ pq_sendbyte(&buf, file->type);
+
+ /* send size */
+ send_int8_string(&buf, file->size);
+
+ /* send mtime */
+ send_int8_string(&buf, file->mtime);
+
+ pq_endmessage(&buf);
+ }
+
+ list_free(files);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
/*
* Inject a file with given name and content in the output tar stream.
*/
@@ -1044,7 +1252,7 @@ sendFileWithContent(const char *filename, const char *content)
* Only used to send auxiliary tablespaces, not PGDATA.
*/
int64
-sendTablespace(char *path, bool dryrun)
+sendTablespace(char *path, bool dryrun, List **filelist)
{
int64 size;
char pathbuf[MAXPGPATH];
@@ -1073,11 +1281,11 @@ sendTablespace(char *path, bool dryrun)
return 0;
}
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
dryrun);
-
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), dryrun, NIL, true);
+ size += sendDir(pathbuf, strlen(path), dryrun, NIL, true, filelist);
return size;
}
@@ -1096,7 +1304,7 @@ sendTablespace(char *path, bool dryrun)
*/
static int64
sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
- bool sendtblspclinks)
+ bool sendtblspclinks, List **filelist)
{
DIR *dir;
struct dirent *de;
@@ -1254,6 +1462,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
+
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
excludeFound = true;
break;
@@ -1270,6 +1480,8 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
+
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
size += _tarWriteDir(pathbuf, basepathlen, &statbuf, dryrun);
continue;
}
@@ -1291,6 +1503,10 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
dryrun);
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
+ add_to_filelist(filelist, "./pg_wal/archive_status", 'd', -1,
+ statbuf.st_mtime);
+
continue; /* don't recurse into pg_wal */
}
@@ -1320,6 +1536,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
+ add_to_filelist(filelist, pathbuf, 'l', statbuf.st_size, statbuf.st_mtime);
size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
&statbuf, dryrun);
#else
@@ -1346,6 +1563,7 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
*/
size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
dryrun);
+ add_to_filelist(filelist, pathbuf, 'd', -1, statbuf.st_mtime);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1376,13 +1594,15 @@ sendDir(const char *path, int basepathlen, bool dryrun, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks);
+ size += sendDir(pathbuf, basepathlen, dryrun, tablespaces, sendtblspclinks, filelist);
}
else if (S_ISREG(statbuf.st_mode))
{
bool sent = false;
- if (!dryrun)
+ add_to_filelist(filelist, pathbuf, 'f', statbuf.st_size, statbuf.st_mtime);
+
+ if (!dryrun && filelist == NULL)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid);
@@ -1867,3 +2087,268 @@ update_basebackup_progress(int64 delta)
pgstat_progress_update_multi_param(nparam, index, val);
}
+
+/*
+ * start_backup - prepare to start an online backup.
+ *
+ * This function calls do_pg_start_backup() and sends back starting checkpoint,
+ * available tablespaces, content of backup_label and tablespace_map files.
+ */
+static void
+start_backup(basebackup_options *opt)
+{
+ TimeLineID starttli;
+ StringInfo tblspc_map_file;
+ MemoryContext oldcontext;
+
+ /* Label file need to be long-lived, since its read in stop_backup. */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ label_file = makeStringInfo();
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * tablespace map file is not used, but since this argument is required by
+ * do_pg_start_backup, we have to provide it here.
+ */
+ tblspc_map_file = makeStringInfo();
+
+ register_persistent_abort_backup_handler();
+ startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
+ label_file, NULL, tblspc_map_file, false, false);
+
+ /* send startptr and starttli to frontend */
+ SendXlogRecPtrResult(startptr, starttli, NULL);
+
+ /* free tablspace map buffer. */
+ pfree(tblspc_map_file->data);
+ pfree(tblspc_map_file);
+}
+
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
+static void
+stop_backup(basebackup_options *opt)
+{
+ TimeLineID endtli;
+ XLogRecPtr endptr;
+
+ if (get_backup_status() != SESSION_BACKUP_NON_EXCLUSIVE)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("non-exclusive backup is not in progress")));
+
+ /*
+ * Stop the non-exclusive backup. Return a copy of the backup label so it
+ * can be written to disk by the caller.
+ */
+ endptr = do_pg_stop_backup(label_file->data, !opt->nowait, &endtli);
+ SendXlogRecPtrResult(endptr, endtli, label_file);
+
+ /* Free structures allocated in TopMemoryContext */
+ pfree(label_file->data);
+ pfree(label_file);
+ label_file = NULL;
+}
+
+/*
+ * list_tablespaces() - sends a list of tablespace entries
+ */
+static void
+list_tablespaces(basebackup_options *opt)
+{
+ StringInfo tblspc_map_file;
+ List *tablespaces = NIL;
+ tablespaceinfo *ti;
+
+ tblspc_map_file = makeStringInfo();
+ collect_tablespaces(&tablespaces, tblspc_map_file, opt->progress, false);
+
+ /* Add a node for the base directory at the end */
+ ti = palloc0(sizeof(tablespaceinfo));
+ ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true, NULL) : -1;
+ tablespaces = lappend(tablespaces, ti);
+
+ SendBackupHeader(tablespaces);
+ list_free(tablespaces);
+}
+
+/*
+ * list_files() - sends a list of files available in given tablespace.
+ */
+static void
+list_files(basebackup_options *opt)
+{
+ List *files = NIL;
+ int datadirpathlen;
+
+ datadirpathlen = strlen(DataDir);
+
+ /*
+ * Calculate the relative path of temporary statistics directory in order
+ * to skip the files which are located in that directory later.
+ */
+ if (is_absolute_path(pgstat_stat_directory) &&
+ strncmp(pgstat_stat_directory, DataDir, datadirpathlen) == 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory + datadirpathlen + 1);
+ else if (strncmp(pgstat_stat_directory, "./", 2) != 0)
+ statrelpath = psprintf("./%s", pgstat_stat_directory);
+ else
+ statrelpath = pgstat_stat_directory;
+
+ if (strlen(opt->tablespace) > 0)
+ sendTablespace(opt->tablespace, true, &files);
+ else
+ sendDir(".", 1, true, NIL, true, &files);
+
+ SendFilesHeader(files);
+}
+
+/*
+ * list_wal_files() - sends a list of WAL files between start wal location and
+ * end wal location.
+ */
+static void
+list_wal_files(basebackup_options *opt)
+{
+ List *historyFileList = NIL;
+ List *walFileList = NIL;
+ List *files = NIL;
+ ListCell *lc;
+
+ walFileList = collect_wal_files(opt->startwallocation, opt->endwallocation,
+ &historyFileList);
+ foreach(lc, walFileList)
+ {
+ char pathbuf[MAXPGPATH];
+ char *walFileName = (char *) lfirst(lc);
+
+ snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
+ add_to_filelist(&files, pathbuf, 'f', wal_segment_size, 0);
+ }
+
+ SendFilesHeader(files);
+}
+
+/*
+ * send_files() - sends the actual files to the caller
+ *
+ * The function sends out the given file(s) over to the caller using the COPY
+ * protocol. It does only entertains the regular files and any other kind such
+ * as directories or symlink etc will be ignored.
+ */
+static void
+send_files(basebackup_options *opt, List *filenames, bool missing_ok)
+{
+ StringInfoData buf;
+ ListCell *lc;
+ int basepathlen = 0;
+
+ if (list_length(filenames) <= 0)
+ return;
+
+ total_checksum_failures = 0;
+
+ /* Disable throttling. */
+ throttling_counter = -1;
+
+ /* set backup start location. */
+ startptr = opt->startwallocation;
+
+ /* Send CopyOutResponse message */
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+
+ foreach(lc, filenames)
+ {
+ struct stat statbuf;
+ char *pathbuf;
+
+ pathbuf = (char *) strVal(lfirst(lc));
+ if (is_absolute_path(pathbuf))
+ {
+ char *basepath;
+
+ /*
+ * 'pathbuf' points to the tablespace location, but we only want
+ * to include the version directory in it that belongs to us.
+ */
+ basepath = strstr(pathbuf, TABLESPACE_VERSION_DIRECTORY);
+ if (basepath)
+ basepathlen = basepath - pathbuf - 1;
+ }
+ else if (pathbuf[0] == '.' && pathbuf[1] == '/')
+ basepathlen = 2;
+ else
+ basepathlen = 0;
+
+ if (lstat(pathbuf, &statbuf) != 0)
+ {
+ if (errno != ENOENT)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file or directory \"%s\": %m",
+ pathbuf)));
+
+ /* If the file went away while scanning, it's not an error. */
+ continue;
+ }
+
+ /*
+ * Only entertain requests for regular file, skip any directories or
+ * special files.
+ */
+ if (S_ISREG(statbuf.st_mode))
+ {
+ /* send file to client */
+ sendFile(pathbuf, pathbuf + basepathlen, &statbuf, true, InvalidOid);
+ }
+ else
+ ereport(WARNING,
+ (errmsg("skipping special file or directory \"%s\"", pathbuf)));
+ }
+
+ pq_putemptymessage('c'); /* CopyDone */
+
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
+ if (total_checksum_failures)
+ {
+ if (total_checksum_failures > 1)
+ ereport(WARNING,
+ (errmsg("%lld total checksum verification failures", total_checksum_failures)));
+
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("checksum verification failure during base backup")));
+ }
+}
+
+/*
+ * Construct a BackupFile entry and add to the list.
+ */
+static void
+add_to_filelist(List **filelist, char *path, char type, size_t size,
+ time_t mtime)
+{
+ BackupFile *file;
+
+ if (filelist)
+ {
+ file = (BackupFile *) palloc(sizeof(BackupFile));
+ strlcpy(file->path, path, sizeof(file->path));
+ file->type = type;
+ file->size = size;
+ file->mtime = mtime;
+
+ *filelist = lappend(*filelist, file);
+ }
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 14fcd532218..16e5402d55d 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -87,13 +87,28 @@ static SQLCmd *make_sqlcmd(void);
%token K_EXPORT_SNAPSHOT
%token K_NOEXPORT_SNAPSHOT
%token K_USE_SNAPSHOT
+%token K_START_BACKUP
+%token K_LIST_TABLESPACES
+%token K_LIST_FILES
+%token K_SEND_FILES
+%token K_STOP_BACKUP
+%token K_LIST_WAL_FILES
+%token K_START_WAL_LOCATION
+%token K_END_WAL_LOCATION
%type <node> command
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
-%type <list> base_backup_opt_list
-%type <defelt> base_backup_opt
+%type <list> base_backup_opt_list start_backup_opt_list stop_backup_opt_list
+ list_tablespace_opt_list list_files_opt_list
+ list_wal_files_opt_list send_backup_files_opt_list
+ backup_files backup_files_list
+%type <defelt> base_backup_opt backup_opt_label backup_opt_progress
+ backup_opt_fast backup_opt_wal backup_opt_nowait
+ backup_opt_maxrate backup_opt_tsmap backup_opt_chksum
+ backup_opt_start_wal_loc backup_opt_end_wal_loc
+ backup_opt_tablespace start_backup_opt send_backup_files_opt
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
@@ -153,69 +168,231 @@ var_name: IDENT { $$ = $1; }
{ $$ = psprintf("%s.%s", $1, $3); }
;
-/*
- * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
- */
base_backup:
+ /*
+ * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
+ * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
+ */
K_BASE_BACKUP base_backup_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
+ cmd->cmdtag = BASE_BACKUP;
$$ = (Node *) cmd;
}
- ;
-
-base_backup_opt_list:
- base_backup_opt_list base_backup_opt
- { $$ = lappend($1, $2); }
- | /* EMPTY */
- { $$ = NIL; }
- ;
-
-base_backup_opt:
- K_LABEL SCONST
- {
- $$ = makeDefElem("label",
- (Node *)makeString($2), -1);
- }
- | K_PROGRESS
+ /* START_BACKUP [LABEL '<label>'] [FAST] */
+ | K_START_BACKUP start_backup_opt_list
{
- $$ = makeDefElem("progress",
- (Node *)makeInteger(true), -1);
- }
- | K_FAST
- {
- $$ = makeDefElem("fast",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = START_BACKUP;
+ $$ = (Node *) cmd;
}
- | K_WAL
+ /* STOP_BACKUP [NOWAIT] */
+ | K_STOP_BACKUP stop_backup_opt_list
{
- $$ = makeDefElem("wal",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = STOP_BACKUP;
+ $$ = (Node *) cmd;
}
- | K_NOWAIT
+ /* LIST_TABLESPACES [PROGRESS] */
+ | K_LIST_TABLESPACES list_tablespace_opt_list
{
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = LIST_TABLESPACES;
+ $$ = (Node *) cmd;
}
- | K_MAX_RATE UCONST
+ /* LIST_FILES [TABLESPACE] */
+ | K_LIST_FILES list_files_opt_list
{
- $$ = makeDefElem("max_rate",
- (Node *)makeInteger($2), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = LIST_FILES;
+ $$ = (Node *) cmd;
}
- | K_TABLESPACE_MAP
+ /* LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X'] */
+ | K_LIST_WAL_FILES list_wal_files_opt_list
{
- $$ = makeDefElem("tablespace_map",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $2;
+ cmd->cmdtag = LIST_WAL_FILES;
+ $$ = (Node *) cmd;
}
- | K_NOVERIFY_CHECKSUMS
+ /*
+ * SEND_FILES '(' 'FILE' [, ...] ')' [START_WAL_LOCATION 'X/X']
+ * [NOVERIFY_CHECKSUMS]
+ */
+ | K_SEND_FILES backup_files send_backup_files_opt_list
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ cmd->cmdtag = SEND_FILES;
+ cmd->backupfiles = $2;
+ $$ = (Node *) cmd;
}
;
+base_backup_opt_list:
+ base_backup_opt_list base_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+base_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_progress { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ | backup_opt_wal { $$ = $1; }
+ | backup_opt_nowait { $$ = $1; }
+ | backup_opt_maxrate { $$ = $1; }
+ | backup_opt_tsmap { $$ = $1; }
+ | backup_opt_chksum { $$ = $1; }
+ ;
+
+start_backup_opt_list:
+ start_backup_opt_list start_backup_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+start_backup_opt:
+ backup_opt_label { $$ = $1; }
+ | backup_opt_fast { $$ = $1; }
+ ;
+
+stop_backup_opt_list:
+ backup_opt_nowait
+ { $$ = list_make1($1); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+list_tablespace_opt_list:
+ backup_opt_progress
+ { $$ = list_make1($1); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+list_files_opt_list:
+ backup_opt_tablespace
+ { $$ = list_make1($1); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+list_wal_files_opt_list:
+ backup_opt_start_wal_loc backup_opt_end_wal_loc
+ { $$ = list_make2($1, $2); }
+ ;
+
+send_backup_files_opt_list:
+ send_backup_files_opt_list send_backup_files_opt
+ { $$ = lappend($1, $2); }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files:
+ '(' backup_files_list ')'
+ { $$ = $2; }
+ | /* EMPTY */
+ { $$ = NIL; }
+ ;
+
+backup_files_list:
+ SCONST
+ { $$ = list_make1(makeString($1)); }
+ | backup_files_list ',' SCONST
+ { $$ = lappend($1, makeString($3)); }
+ ;
+
+send_backup_files_opt:
+ backup_opt_chksum { $$ = $1; }
+ | backup_opt_start_wal_loc { $$ = $1; }
+ ;
+
+backup_opt_label:
+ K_LABEL SCONST
+ {
+ $$ = makeDefElem("label",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_progress:
+ K_PROGRESS
+ {
+ $$ = makeDefElem("progress",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_fast:
+ K_FAST
+ {
+ $$ = makeDefElem("fast",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_wal:
+ K_WAL
+ {
+ $$ = makeDefElem("wal",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_nowait:
+ K_NOWAIT
+ {
+ $$ = makeDefElem("nowait",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_maxrate:
+ K_MAX_RATE UCONST
+ {
+ $$ = makeDefElem("max_rate",
+ (Node *)makeInteger($2), -1);
+ };
+
+backup_opt_tsmap:
+ K_TABLESPACE_MAP
+ {
+ $$ = makeDefElem("tablespace_map",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_chksum:
+ K_NOVERIFY_CHECKSUMS
+ {
+ $$ = makeDefElem("noverify_checksums",
+ (Node *)makeInteger(true), -1);
+ };
+
+backup_opt_start_wal_loc:
+ K_START_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_end_wal_loc:
+ K_END_WAL_LOCATION SCONST
+ {
+ $$ = makeDefElem("end_wal_location",
+ (Node *)makeString($2), -1);
+ };
+
+backup_opt_tablespace:
+ SCONST
+ {
+ $$ = makeDefElem("tablespace", //tblspcname?
+ (Node *)makeString($1), -1);
+ };
+
create_replication_slot:
/* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL RESERVE_WAL */
K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_opt_list
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 14c9a1e798a..faa00cfd0ee 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -107,6 +107,14 @@ EXPORT_SNAPSHOT { return K_EXPORT_SNAPSHOT; }
NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+START_BACKUP { return K_START_BACKUP; }
+LIST_FILES { return K_LIST_FILES; }
+LIST_TABLESPACES { return K_LIST_TABLESPACES; }
+SEND_FILES { return K_SEND_FILES; }
+STOP_BACKUP { return K_STOP_BACKUP; }
+LIST_WAL_FILES { return K_LIST_WAL_FILES; }
+START_WAL_LOCATION { return K_START_WAL_LOCATION; }
+END_WAL_LOCATION { return K_END_WAL_LOCATION; }
"," { return ','; }
";" { return ';'; }
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index 5456141a8ab..c046ea39ae9 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -23,6 +23,16 @@ typedef enum ReplicationKind
REPLICATION_KIND_LOGICAL
} ReplicationKind;
+typedef enum BackupCmdTag
+{
+ BASE_BACKUP,
+ START_BACKUP,
+ LIST_TABLESPACES,
+ LIST_FILES,
+ LIST_WAL_FILES,
+ SEND_FILES,
+ STOP_BACKUP
+} BackupCmdTag;
/* ----------------------
* IDENTIFY_SYSTEM command
@@ -42,6 +52,8 @@ typedef struct BaseBackupCmd
{
NodeTag type;
List *options;
+ BackupCmdTag cmdtag;
+ List *backupfiles;
} BaseBackupCmd;
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index e0210def6f3..3bc85d4c3e2 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -31,6 +31,6 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, bool dryrun);
+extern int64 sendTablespace(char *path, bool dryrun, List **filelist);
#endif /* _BASEBACKUP_H */
--
2.21.1 (Apple Git-122.3)
0005-parallel-backup-testcase_v9.patchapplication/octet-stream; name=0005-parallel-backup-testcase_v9.patchDownload
From 04a98b7ca2e824aac6f1d6fcf9b078c83c6a7ca4 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Sun, 13 Oct 2019 21:54:23 +0500
Subject: [PATCH 5/6] parallel backup - testcase
---
.../t/040_pg_basebackup_parallel.pl | 527 ++++++++++++++++++
1 file changed, 527 insertions(+)
create mode 100644 src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
diff --git a/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
new file mode 100644
index 00000000000..4ec4c1e0f6b
--- /dev/null
+++ b/src/bin/pg_basebackup/t/040_pg_basebackup_parallel.pl
@@ -0,0 +1,527 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 95;
+
+program_help_ok('pg_basebackup');
+program_version_ok('pg_basebackup');
+program_options_handling_ok('pg_basebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Initialize node without replication settings
+$node->init(extra => ['--data-checksums']);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$node->command_fails(['pg_basebackup'],
+ 'pg_basebackup needs target directory specified');
+
+# Some Windows ANSI code pages may reject this filename, in which case we
+# quietly proceed without this bit of test coverage.
+if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
+{
+ print $badchars "test backup of file with non-UTF8 name\n";
+ close $badchars;
+}
+
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup" ],
+ 'pg_basebackup fails because of WAL configuration');
+
+ok(!-d "$tempdir/backup", 'backup directory was cleaned up');
+
+# Create a backup directory that is not empty so the next command will fail
+# but leave the data directory behind
+mkdir("$tempdir/backup")
+ or BAIL_OUT("unable to create $tempdir/backup");
+append_to_file("$tempdir/backup/dir-not-empty.txt", "Some data");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/backup", '-n' ],
+ 'failing run with no-clean option');
+
+ok(-d "$tempdir/backup", 'backup directory was created and left behind');
+rmtree("$tempdir/backup");
+
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Write some files to test that they are not copied.
+foreach my $filename (
+ qw(backup_label tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp)
+ )
+{
+ open my $file, '>>', "$pgdata/$filename";
+ print $file "DONOTCOPY";
+ close $file;
+}
+
+# Connect to a database to create global/pg_internal.init. If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+ q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles =
+ qw(t999_999 t9999_999.1 t999_9999_vm t99999_99999_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+ append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup in parallel mode.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none', "-j 4" ],
+ 'pg_basebackup runs');
+ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
+
+# Permissions on backup should be default
+SKIP:
+{
+ skip "unix-style permissions not supported on Windows", 1
+ if ($windows_os);
+
+ ok(check_mode_recursive("$tempdir/backup", 0700, 0600),
+ "check backup dir permissions");
+}
+
+# Only archive_status directory should be copied in pg_wal/.
+is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/pg_wal/")) ],
+ [ sort qw(. .. archive_status) ],
+ 'no WAL files copied');
+
+# Contents of these directories should not be copied.
+foreach my $dirname (
+ qw(pg_dynshmem pg_notify pg_replslot pg_serial pg_snapshots pg_stat_tmp pg_subtrans)
+ )
+{
+ is_deeply(
+ [ sort(slurp_dir("$tempdir/backup/$dirname/")) ],
+ [ sort qw(. ..) ],
+ "contents of $dirname/ not copied");
+}
+
+# These files should not be copied.
+foreach my $filename (
+ qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid tablespace_map current_logfiles.tmp
+ global/pg_internal.init))
+{
+ ok(!-f "$tempdir/backup/$filename", "$filename not copied");
+}
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+ ok( !-f "$tempdir/backup/base/$postgresOid/$filename",
+ "base/$postgresOid/$filename not copied");
+}
+
+# Make sure existing backup_label was ignored.
+isnt(slurp_file("$tempdir/backup/backup_label"),
+ 'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
+ "$tempdir/xlog2", "-j 4"
+ ],
+ 'separate xlog directory');
+ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
+ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
+
+$node->command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft', "-j 4"],
+ 'tar format');
+
+rmtree("$tempdir/tarbackup");
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T=/foo" ],
+ '-T with empty old directory fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=" ],
+ '-T with empty new directory fails');
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4",
+ "-T/foo=/bar=/baz"
+ ],
+ '-T with multiple = fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo=/bar" ],
+ '-T with old directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-T/foo=bar" ],
+ '-T with new directory not absolute fails');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-j 4", "-Tfoo" ],
+ '-T with invalid format fails');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 18 if ($windows_os);
+
+ # Move pg_replslot out of $pgdata and create a symlink to it.
+ $node->stop;
+
+ # Set umask so test directories and files are created with group permissions
+ umask(0027);
+
+ # Enable group permissions on PGDATA
+ chmod_recursive("$pgdata", 0750, 0640);
+
+ rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+ or BAIL_OUT "could not move $pgdata/pg_replslot";
+ symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+ or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+ $node->start;
+
+# # Create a temporary directory in the system location and symlink it
+# # to our physical temp location. That way we can use shorter names
+# # for the tablespace directories, which hopefully won't run afoul of
+# # the 99 character length limit.
+ my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+ symlink "$tempdir", $shorter_tempdir;
+
+ mkdir "$tempdir/tblspc1";
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+ $node->safe_psql('postgres',
+ "CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;'
+ );
+
+ my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath", 'unlogged main fork in tablespace');
+
+ # Create files that look like temporary relations to ensure they are ignored
+ # in a tablespace.
+ my @tempRelationFiles = qw(t888_888 t888888_888888_vm.1);
+ my $tblSpc1Id = basename(
+ dirname(
+ dirname(
+ $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('test1')}))));
+
+ foreach my $filename (@tempRelationFiles)
+ {
+ append_to_file(
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ 'TEMP_RELATION');
+ }
+
+ $node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4" ],
+ 'plain format with tablespaces fails without tablespace mapping');
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tblspc1=$tempdir/tbackup/tblspc1"
+ ],
+ 'plain format with tablespaces succeeds with tablespace mapping');
+ ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
+ opendir(my $dh, "$pgdata/pg_tblspc") or die;
+ ok( ( grep {
+ -l "$tempdir/backup1/pg_tblspc/$_"
+ and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+ "$tempdir/tbackup/tblspc1"
+ } readdir($dh)),
+ "tablespace symlink was updated");
+ closedir $dh;
+
+ # Group access should be enabled on all backup files
+ ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+ "check backup dir permissions");
+
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) =
+ $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
+ # Temp relations should not be copied.
+ foreach my $filename (@tempRelationFiles)
+ {
+ ok( !-f "$tempdir/tbackup/tblspc1/$tblSpc1Id/$postgresOid/$filename",
+ "[tblspc1]/$postgresOid/$filename not copied");
+
+ # Also remove temp relation files or tablespace drop will fail.
+ my $filepath =
+ "$shorter_tempdir/tblspc1/$tblSpc1Id/$postgresOid/$filename";
+
+ unlink($filepath)
+ or BAIL_OUT("unable to unlink $filepath");
+ }
+
+ ok( -d "$tempdir/backup1/pg_replslot",
+ 'pg_replslot symlink copied as directory');
+ rmtree("$tempdir/backup1");
+
+ mkdir "$tempdir/tbl=spc2";
+ $node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backup3", '-Fp', "-j 4",
+ "-T$shorter_tempdir/tbl\\=spc2=$tempdir/tbackup/tbl\\=spc2"
+ ],
+ 'mapping tablespace with = sign in path');
+ ok(-d "$tempdir/tbackup/tbl=spc2",
+ 'tablespace with = sign was relocated');
+ $node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+ rmtree("$tempdir/backup3");
+}
+
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' , '-j 4'],
+ 'pg_basebackup -R runs');
+ok(-f "$tempdir/backupR/postgresql.auto.conf", 'postgresql.auto.conf exists');
+ok(-f "$tempdir/backupR/standby.signal", 'standby.signal was created');
+my $recovery_conf = slurp_file "$tempdir/backupR/postgresql.auto.conf";
+rmtree("$tempdir/backupR");
+
+my $port = $node->port;
+like(
+ $recovery_conf,
+ qr/^primary_conninfo = '.*port=$port.*'\n/m,
+ 'postgresql.auto.conf sets primary_conninfo');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxd" , "-j 4"],
+ 'pg_basebackup runs in default xlog mode');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxd/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxd");
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' , "-j 4"],
+ 'pg_basebackup -X fetch runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxf");
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' , "-j 4"],
+ 'pg_basebackup -X stream runs');
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
+ 'WAL files copied');
+rmtree("$tempdir/backupxs");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupnoslot", '-X',
+ 'stream', '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup -X stream runs with --no-slot');
+rmtree("$tempdir/backupnoslot");
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+#
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C' , '-j 4'],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot',
+ '-j 4'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+
+$node->command_ok(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup -C runs');
+rmtree("$tempdir/backupxs_slot");
+
+is( $node->safe_psql(
+ 'postgres',
+ q{SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ 'slot0',
+ 'replication slot was created');
+isnt(
+ $node->safe_psql(
+ 'postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+ ),
+ '',
+ 'restart LSN of new slot is not null');
+
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/backupxs_slot1", '-C', '-S', 'slot0', '-j 4'],
+ 'pg_basebackup fails with -C -S and a previously existing slot');
+
+$node->safe_psql('postgres',
+ q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
+my $lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+is($lsn, '', 'restart LSN of new slot is null');
+$node->command_fails(
+ [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1', '-X', 'none', '-j 4'],
+ 'pg_basebackup with replication slot fails without WAL streaming');
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl", '-X',
+ 'stream', '-S', 'slot1', '-j 4'
+ ],
+ 'pg_basebackup -X stream with replication slot runs');
+$lsn = $node->safe_psql('postgres',
+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot1'}
+);
+like($lsn, qr!^0/[0-9A-Z]{7,8}$!, 'restart LSN of slot has advanced');
+rmtree("$tempdir/backupxs_sl");
+
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
+ 'stream', '-S', 'slot1', '-R',
+ '-j 4'
+ ],
+ 'pg_basebackup with replication slot and -R runs');
+like(
+ slurp_file("$tempdir/backupxs_sl_R/postgresql.auto.conf"),
+ qr/^primary_slot_name = 'slot1'\n/m,
+ 'recovery conf file sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+rmtree("$tempdir/backupxs_sl_R");
+
+# create tables to corrupt and get their relfilenodes
+my $file_corrupt1 = $node->safe_psql('postgres',
+ q{SELECT a INTO corrupt1 FROM generate_series(1,10000) AS a; ALTER TABLE corrupt1 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt1')}
+);
+my $file_corrupt2 = $node->safe_psql('postgres',
+ q{SELECT b INTO corrupt2 FROM generate_series(1,2) AS b; ALTER TABLE corrupt2 SET (autovacuum_enabled=false); SELECT pg_relation_filepath('corrupt2')}
+);
+
+# set page header and block sizes
+my $pageheader_size = 24;
+my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
+
+# induce corruption
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+my $file;
+open $file, '+<', "$pgdata/$file_corrupt1";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*checksum verification failed/s],
+ 'pg_basebackup reports checksum mismatch');
+rmtree("$tempdir/backup_corrupt");
+
+# induce further corruption in 5 more blocks
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt1";
+for my $i (1 .. 5)
+{
+ my $offset = $pageheader_size + $i * $block_size;
+ seek($file, $offset, 0);
+ syswrite($file, "\0\0\0\0\0\0\0\0\0");
+}
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+ [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-j 4'],
+ 1,
+ [qr{^$}],
+ [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+ 'pg_basebackup does not report more than 5 checksum mismatches');
+rmtree("$tempdir/backup_corrupt2");
+
+# induce corruption in a second file
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+open $file, '+<', "$pgdata/$file_corrupt2";
+seek($file, $pageheader_size, 0);
+syswrite($file, "\0\0\0\0\0\0\0\0\0");
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+# do not verify checksums, should return ok
+$node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_corrupt4", '--no-verify-checksums',
+ '-j 4'
+ ],
+ 'pg_basebackup with -k does not report checksum mismatch');
+rmtree("$tempdir/backup_corrupt4");
+
+$node->safe_psql('postgres', "DROP TABLE corrupt1;");
+$node->safe_psql('postgres', "DROP TABLE corrupt2;");
--
2.21.1 (Apple Git-122.3)
0006-parallel-backup-documentation_v9.patchapplication/octet-stream; name=0006-parallel-backup-documentation_v9.patchDownload
From 9d41ac12788e5bc8b674cc24784d2cfac75d7d14 Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Fri, 14 Feb 2020 17:02:51 +0500
Subject: [PATCH 6/6] parallel backup - documentation
---
doc/src/sgml/protocol.sgml | 366 ++++++++++++++++++++++++++++
doc/src/sgml/ref/pg_basebackup.sgml | 20 ++
2 files changed, 386 insertions(+)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index f139ba02312..8c420a08dc8 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2700,6 +2700,372 @@ The commands accepted in replication mode are:
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>START_BACKUP</literal>
+ [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+ [ <literal>FAST</literal> ]
+ <indexterm><primary>START_BACKUP</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instructs the server to prepare for performing on-line backup. The following
+ options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>LABEL</literal> <replaceable>'label'</replaceable></term>
+ <listitem>
+ <para>
+ Sets the label of the backup. If none is specified, a backup label
+ of <literal>start backup</literal> will be used. The quoting rules
+ for the label are the same as a standard SQL string with
+ <xref linkend="guc-standard-conforming-strings"/> turned on.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>FAST</literal></term>
+ <listitem>
+ <para>
+ Request a fast checkpoint.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send out a single result set. The
+ first column contains the start position given in XLogRecPtr format, and
+ the second column contains the corresponding timeline ID.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>STOP_BACKUP</literal>
+ [ <literal>NOWAIT</literal> ]
+ <indexterm><primary>STOP_BACKUP</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instructs the server to finish performing on-line backup.
+ <variablelist>
+ <varlistentry>
+ <term><literal>NOWAIT</literal></term>
+ <listitem>
+ <para>
+ By default, the backup will wait until the last required WAL
+ segment has been archived, or emit a warning if log archiving is
+ not enabled. Specifying <literal>NOWAIT</literal> disables both
+ the waiting and the warning, leaving the client responsible for
+ ensuring the required log is available.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send out a single result set. The
+ first column contains the start position given in XLogRecPtr format, the
+ second column contains the corresponding timeline ID and the third column
+ contains the contents of the <filename>backup_label</filename> file.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>LIST_TABLESPACES</literal>
+ [ <literal>PROGRESS</literal> ]
+ <indexterm><primary>LIST_TABLESPACES</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instruct the server to return a list of tablespaces available in data
+ directory.
+ <variablelist>
+ <varlistentry>
+ <term><literal>PROGRESS</literal></term>
+ <listitem>
+ <para>
+ Request information required to generate a progress report. This will
+ send back an approximate size in the header of each tablespace, which
+ can be used to calculate how far along the stream is done. This is
+ calculated by enumerating all the file sizes once before the transfer
+ is even started, and might as such have a negative impact on the
+ performance. In particular, it might take longer before the first data
+ is streamed. Since the database files can change during the backup,
+ the size is only approximate and might both grow and shrink between
+ the time of approximation and the sending of the actual files.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send one result set.
+ The result set will have one row for each tablespace. The fields in this
+ row are:
+ <variablelist>
+ <varlistentry>
+ <term><literal>spcoid</literal> (<type>oid</type>)</term>
+ <listitem>
+ <para>
+ The OID of the tablespace, or null if it's the base directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>spclocation</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The full path of the tablespace directory, or null if it's the base
+ directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the tablespace, in kilobytes (1024 bytes),
+ if progress report has been requested; otherwise it's null.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>LIST_FILES</literal>
+ [ <literal>TABLESPACE</literal> ]
+ <indexterm><primary>LIST_FILES</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ This command instructs the server to return a list of files available
+ in the given tablespace.
+ <variablelist>
+ <varlistentry>
+ <term><literal>TABLESPACE</literal></term>
+ <listitem>
+ <para>
+ name of the tablespace. If its empty or not provided then 'base' tablespace
+ is assumed.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send out a result set. The fields
+ in this result set are:
+ <variablelist>
+ <varlistentry>
+ <term><literal>path</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ The path and name of the file. In case of tablespace, it is an absolute
+ path on the database server, however, in case of <filename>base</filename>
+ tablespace, it is relative to $PGDATA.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>type</literal> (<type>char</type>)</term>
+ <listitem>
+ <para>
+ A single character, identifying the type of file.
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <literal>'f'</literal> - Regular file. Can be any relation or
+ non-relation file in $PGDATA.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>'d'</literal> - Directory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>'l'</literal> - Symbolic link.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>size</literal> (<type>int8</type>)</term>
+ <listitem>
+ <para>
+ The approximate size of the file, in kilobytes (1024 bytes). It's null
+ if type is 'd' or 'l'.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>mtime</literal> (<type>Int64</type>)</term>
+ <listitem>
+ <para>
+ The file or directory last modification time, as seconds since the Epoch.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ The list will contain all files in tablespace directory, regardless of whether
+ they are PostgreSQL files or other files added to the same directory. The only
+ excluded files are:
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <filename>postmaster.pid</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>postmaster.opts</filename>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_internal.init</filename> (found in multiple directories)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Various temporary files and directories created during the operation
+ of the PostgreSQL server, such as any file or directory beginning
+ with <filename>pgsql_tmp</filename> and temporary relations.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_wal</filename>, including subdirectories. If the backup is run
+ with WAL files included, a synthesized version of <filename>pg_wal</filename>
+ will be included, but it will only contain the files necessary for the backup
+ to work, not the rest of the contents.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_dynshmem</filename>, <filename>pg_notify</filename>,
+ <filename>pg_replslot</filename>, <filename>pg_serial</filename>,
+ <filename>pg_snapshots</filename>, <filename>pg_stat_tmp</filename>, and
+ <filename>pg_subtrans</filename> are copied as empty directories (even if
+ they are symbolic links).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Files other than regular files and directories, such as symbolic
+ links (other than for the directories listed above) and special
+ device files, are skipped. (Symbolic links
+ in <filename>pg_tblspc</filename> are maintained.)
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>LIST_WAL_FILES</literal>
+ <literal>START_WAL_LOCATION</literal> <replaceable class="parameter">X/X</replaceable>
+ <literal>END_WAL_LOCATION</literal> <replaceable class="parameter">X/X</replaceable>
+ <indexterm><primary>LIST_WAL_FILES</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instruct the server to return a list of WAL files available in pg_wal directory.
+ The following options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>START_WAL_LOCATION</literal></term>
+ <listitem>
+ <para>
+ The starting WAL position when START BACKUP command was issued,
+ returned in the form of XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>END_WAL_LOCATION</literal></term>
+ <listitem>
+ <para>
+ The ending WAL position when STOP BACKUP command was issued,
+ returned in the form of XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ In response to this command, server will send out a result and each row will
+ consist of a WAL file entry. The result set will have the same fields as
+ <literal>LIST_FILES</literal> command.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SEND_FILES ( <replaceable class="parameter">'FILE'</replaceable> [, ...] )</literal>
+ [ <literal>START_WAL_LOCATION</literal> <replaceable class="parameter">X/X</replaceable> ]
+ [ <literal>NOVERIFY_CHECKSUMS</literal> ]
+ <indexterm><primary>SEND_FILES</primary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Instructs the server to send the contents of the requested FILE(s).
+ </para>
+ <para>
+ A clause of the form <literal>SEND_FILES ( 'FILE', 'FILE', ... ) [OPTIONS]</literal>
+ is accepted where one or more FILE(s) can be requested.
+ </para>
+ <para>
+ In response to this command, one or more CopyResponse results will be sent,
+ one for each FILE requested. The data in the CopyResponse results will be
+ a tar format (following the “ustar interchange format” specified in the
+ POSIX 1003.1-2008 standard) dump of the tablespace contents, except that
+ the two trailing blocks of zeroes specified in the standard are omitted.
+ </para>
+ <para>
+ The following options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>START_WAL_LOCATION</literal></term>
+ <listitem>
+ <para>
+ The starting WAL position when START BACKUP command was issued,
+ returned in the form of XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+ <listitem>
+ <para>
+ By default, checksums are verified during a base backup if they are
+ enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+ this verification.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 29bf2f9b979..14b5fb1f5d8 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -552,6 +552,26 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><option>-j <replaceable class="parameter">n</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">n</replaceable></option></term>
+ <listitem>
+ <para>
+ Create <replaceable class="parameter">n</replaceable> threads to copy
+ backup files from the database server. <application>pg_basebackup</application>
+ will open at least <replaceable class="parameter">n</replaceable> + 1
+ connections to the database. An additional connection is made if WAL
+ streaming is enabled. Therefore, the server must be configured with
+ <xref linkend="guc-max-wal-senders"/> set high enough to accommodate all
+ connections.
+ </para>
+ <para>
+ parallel mode only works with plain format.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
--
2.21.1 (Apple Git-122.3)
Thanks for the patches.
I have verified reported issues with new patches, issues are fixed now.
I got another observation where If a new slot name given without -C option,
it leads to server crash error.
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
test_bkp_slot
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not exist
pg_basebackup: error: could not list backup files: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing data directory "/tmp/bkp"
Thanks & Regards,
Rajkumar Raghuwanshi
On Fri, Mar 13, 2020 at 9:51 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Show quoted text
On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif
I have started testing this feature. I have applied v6 patch on commit
a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
I got few observations, please take a look.*--if backup failed, backup directory is not getting removed.*
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D
/tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D
/tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not
empty*--giving large number of jobs leading segmentation fault.*
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new
process for connection: Resource temporarily unavailablecould not fork new process for connection: Resource temporarily
unavailable
pg_basebackup: error: failed to create thread: Resource temporarily
unavailable
Segmentation fault (core dumped)--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
/tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0)
at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
pg_basebackup.c:2668*--with tablespace is in the same directory as data, parallel_backup
crashed*
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
/tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
pg_basebackup.c:2668
(gdb)Thanks Rajkumar. I have fixed the above issues and have rebased the patch
to the latest master (b7f64c64).
(V9 of the patches are attached).--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:
Thanks for the patches.
I have verified reported issues with new patches, issues are fixed now.
I got another observation where If a new slot name given without -C
option, it leads to server crash error.[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
test_bkp_slot
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not exist
pg_basebackup: error: could not list backup files: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing data directory "/tmp/bkp"
It seems to be an expected behavior. The START_BACKUP command has been
executed, and
pg_basebackup tries to start a WAL streaming process with a non-existent
slot, which results in
an error. So the backup is aborted while terminating all other processes.
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Mon, Mar 16, 2020 at 11:52 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Thanks for the patches.
I have verified reported issues with new patches, issues are fixed now.
I got another observation where If a new slot name given without -C
option, it leads to server crash error.[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
test_bkp_slot
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not exist
pg_basebackup: error: could not list backup files: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing data directory "/tmp/bkp"It seems to be an expected behavior. The START_BACKUP command has been
executed, and
pg_basebackup tries to start a WAL streaming process with a non-existent
slot, which results in
an error. So the backup is aborted while terminating all other processes.
I think error message can be improved. current error message looks like
database server is crashed.
on PG same is existing with exit 1.
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp --slot
test_bkp_slot
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not exist
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory "/tmp/bkp"
Show quoted text
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Hi Asif,
Thanks Rajkumar. I have fixed the above issues and have rebased the patch
to the latest master (b7f64c64).
(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out
pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let
rename
SEND_FILES to SEND_FILE instead.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.
5. Typos:
tablspace => tablespace
safly => safely
6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Phone: +91 20 66449694
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb
This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup
crashed
this crash is not consistently reproducible.
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)
stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
-R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3 0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)
Thanks & Regards,
Rajkumar Raghuwanshi
On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
Show quoted text
Hi Asif,
Thanks Rajkumar. I have fixed the above issues and have rebased the patch
to the latest master (b7f64c64).
(V9 of the patches are attached).I had a further review of the patches and here are my few observations:
1. +/* + * stop_backup() - ends an online backup + * + * The function is called at the end of an online backup. It sends out pg_control + * file, optionally WAL segments and ending WAL location. + */Comments seem out-dated.
2. With parallel jobs, maxrate is now not supported. Since we are now
asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?3. As we are always fetching a single file and as Robert suggested, let
rename
SEND_FILES to SEND_FILE instead.4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.5. Typos:
tablspace => tablespace
safly => safely6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.7. + case PB_FETCH_REL_FILES: /* fetch files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_STOP_BACKUP; + free_filelist(backupinfo); + } + break; + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_BACKUP_COMPLETE; + } + break;Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL CompanyPhone: +91 20 66449694
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedbThis e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
Hi Asif,
In another scenarios, bkp data is corrupted for tablespace. again this is
not reproducible everytime,
but If I am running the same set of commands I am getting the same error.
[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values
('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
/tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)
[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL:
"pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
directory
Thanks & Regards,
Rajkumar Raghuwanshi
On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:
Show quoted text
Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup
crashed
this crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
-R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3 0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)Thanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:Hi Asif,
Thanks Rajkumar. I have fixed the above issues and have rebased the
patch to the latest master (b7f64c64).
(V9 of the patches are attached).I had a further review of the patches and here are my few observations:
1. +/* + * stop_backup() - ends an online backup + * + * The function is called at the end of an online backup. It sends out pg_control + * file, optionally WAL segments and ending WAL location. + */Comments seem out-dated.
2. With parallel jobs, maxrate is now not supported. Since we are now
asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?3. As we are always fetching a single file and as Robert suggested, let
rename
SEND_FILES to SEND_FILE instead.4. Does this work on Windows? I mean does pthread_create() work on
Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.5. Typos:
tablspace => tablespace
safly => safely6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.7. + case PB_FETCH_REL_FILES: /* fetch files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_STOP_BACKUP; + free_filelist(backupinfo); + } + break; + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_BACKUP_COMPLETE; + } + break;Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL CompanyPhone: +91 20 66449694
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedbThis e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
Hi Asif,
While testing further I observed parallel backup is not able to take backup
of standby server.
mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
/tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">>
/tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf
./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)
[edb@localhost bin]$ ./psql postgres -p 5433 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)
*[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
6pg_basebackup: error: could not list backup files: ERROR: the standby was
promoted during online backupHINT: This means that the backup being taken
is corrupt and should not be used. Try taking another online
backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION
Thanks & Regards,
Rajkumar Raghuwanshi
On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:
Show quoted text
Hi Asif,
In another scenarios, bkp data is corrupted for tablespace. again this is
not reproducible everytime,
but If I am running the same set of commands I am getting the same error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values
('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
/tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL:
"pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
directoryThanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup
crashed
this crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
-R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at
pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3 0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)Thanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:Hi Asif,
Thanks Rajkumar. I have fixed the above issues and have rebased the
patch to the latest master (b7f64c64).
(V9 of the patches are attached).I had a further review of the patches and here are my few observations:
1. +/* + * stop_backup() - ends an online backup + * + * The function is called at the end of an online backup. It sends out pg_control + * file, optionally WAL segments and ending WAL location. + */Comments seem out-dated.
2. With parallel jobs, maxrate is now not supported. Since we are now
asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?3. As we are always fetching a single file and as Robert suggested, let
rename
SEND_FILES to SEND_FILE instead.4. Does this work on Windows? I mean does pthread_create() work on
Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.5. Typos:
tablspace => tablespace
safly => safely6. parallel_backup_run() needs some comments explaining the states it
goes
through PB_* states.7. + case PB_FETCH_REL_FILES: /* fetch files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_STOP_BACKUP; + free_filelist(backupinfo); + } + break; + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_BACKUP_COMPLETE; + } + break;Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL CompanyPhone: +91 20 66449694
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedbThis e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,
While testing further I observed parallel backup is not able to take
backup of standby server.mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slaveecho "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
/tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">>
/tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">>
/tmp/slave/postgresql.conf./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)[edb@localhost bin]$ ./psql postgres -p 5433 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)*[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
6pg_basebackup: error: could not list backup files: ERROR: the standby was
promoted during online backupHINT: This means that the backup being taken
is corrupt and should not be used. Try taking another online
backup.pg_basebackup: removing data directory "/tmp/bkp_s"*#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,
Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
In another scenarios, bkp data is corrupted for tablespace. again this is
not reproducible everytime,
but If I am running the same set of commands I am getting the same error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
/tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL:
"pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
directoryThanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup
crashed
this crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
-R
Segmentation fault (core dumped)stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
-R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at
pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3 0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)Thanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:Hi Asif,
Thanks Rajkumar. I have fixed the above issues and have rebased the
patch to the latest master (b7f64c64).
(V9 of the patches are attached).I had a further review of the patches and here are my few observations:
1. +/* + * stop_backup() - ends an online backup + * + * The function is called at the end of an online backup. It sends out pg_control + * file, optionally WAL segments and ending WAL location. + */Comments seem out-dated.
Fixed.
2. With parallel jobs, maxrate is now not supported. Since we are now
asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?3. As we are always fetching a single file and as Robert suggested, let
rename
SEND_FILES to SEND_FILE instead.
Yes, we are fetching a single file. However, SEND_FILES is still capable of
fetching multiple files in one
go, that's why the name.
4. Does this work on Windows? I mean does pthread_create() work on
Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.
patch is updated to add support for the Windows platform.
5. Typos:
tablspace => tablespace
safly => safelyDone.
6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7. + case PB_FETCH_REL_FILES: /* fetch files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_STOP_BACKUP; + free_filelist(backupinfo); + } + break; + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_BACKUP_COMPLETE; + } + break;Done.
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Done.
The corrupted tablespace and crash, reported by Rajkumar, have been fixed.
A pointer
variable remained uninitialized which in turn caused the system to
misbehave.
Attached is the updated set of patches. AFAIK, to complete parallel backup
feature
set, there remain three sub-features:
1- parallel backup does not work with a standby server. In parallel backup,
the server
spawns multiple processes and there is no shared state being maintained. So
currently,
no way to tell multiple processes if the standby was promoted during the
backup since
the START_BACKUP was called.
2- throttling. Robert previously suggested that we implement throttling on
the client-side.
However, I found a previous discussion where it was advocated to be added
to the
backend instead[1]/messages/by-id/521B4B29.20009@2ndquadrant.com.
So, it was better to have a consensus before moving the throttle function
to the client.
That’s why for the time being I have disabled it and have asked for
suggestions on it
to move forward.
It seems to me that we have to maintain a shared state in order to support
taking backup
from standby. Also, there is a new feature recently committed for backup
progress
reporting in the backend (pg_stat_progress_basebackup). This functionality
was recently
added via this commit ID: e65497df. For parallel backup to update these
stats, a shared
state will be required.
Since multiple pg_basebackup can be running at the same time, maintaining a
shared state
can become a little complex, unless we disallow taking multiple parallel
backups.
So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.
[1]: /messages/by-id/521B4B29.20009@2ndquadrant.com
/messages/by-id/521B4B29.20009@2ndquadrant.com
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
Thanks Asif,
I have re-verified reported issue. expect standby backup, others are fixed.
Thanks & Regards,
Rajkumar Raghuwanshi
On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Show quoted text
On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
While testing further I observed parallel backup is not able to take
backup of standby server.mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slaveecho "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
/tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">>
/tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">>
/tmp/slave/postgresql.conf./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)[edb@localhost bin]$ ./psql postgres -p 5433 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)*[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
6pg_basebackup: error: could not list backup files: ERROR: the standby was
promoted during online backupHINT: This means that the backup being taken
is corrupt and should not be used. Try taking another online
backup.pg_basebackup: removing data directory "/tmp/bkp_s"*#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,
Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
In another scenarios, bkp data is corrupted for tablespace. again this
is not reproducible everytime,
but If I am running the same set of commands I am getting the same error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
/tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL:
"pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
directoryThanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup
crashed
this crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
-R
Segmentation fault (core dumped)stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D
/tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at
pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3 0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)Thanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:Hi Asif,
Thanks Rajkumar. I have fixed the above issues and have rebased the
patch to the latest master (b7f64c64).
(V9 of the patches are attached).I had a further review of the patches and here are my few observations:
1. +/* + * stop_backup() - ends an online backup + * + * The function is called at the end of an online backup. It sends out pg_control + * file, optionally WAL segments and ending WAL location. + */Comments seem out-dated.
Fixed.
2. With parallel jobs, maxrate is now not supported. Since we are now
asking
data in multiple threads throttling seems important here. Can you
please
explain why have you disabled that?3. As we are always fetching a single file and as Robert suggested,
let rename
SEND_FILES to SEND_FILE instead.Yes, we are fetching a single file. However, SEND_FILES is still capable
of fetching multiple files in one
go, that's why the name.4. Does this work on Windows? I mean does pthread_create() work on
Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.patch is updated to add support for the Windows platform.
5. Typos:
tablspace => tablespace
safly => safelyDone.
6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7. + case PB_FETCH_REL_FILES: /* fetch files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_STOP_BACKUP; + free_filelist(backupinfo); + } + break; + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_BACKUP_COMPLETE; + } + break;Done.
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Done.
The corrupted tablespace and crash, reported by Rajkumar, have been fixed.
A pointer
variable remained uninitialized which in turn caused the system to
misbehave.Attached is the updated set of patches. AFAIK, to complete parallel backup
feature
set, there remain three sub-features:1- parallel backup does not work with a standby server. In parallel
backup, the server
spawns multiple processes and there is no shared state being maintained.
So currently,
no way to tell multiple processes if the standby was promoted during the
backup since
the START_BACKUP was called.2- throttling. Robert previously suggested that we implement throttling on
the client-side.
However, I found a previous discussion where it was advocated to be added
to the
backend instead[1].So, it was better to have a consensus before moving the throttle function
to the client.
That’s why for the time being I have disabled it and have asked for
suggestions on it
to move forward.It seems to me that we have to maintain a shared state in order to support
taking backup
from standby. Also, there is a new feature recently committed for backup
progress
reporting in the backend (pg_stat_progress_basebackup). This functionality
was recently
added via this commit ID: e65497df. For parallel backup to update these
stats, a shared
state will be required.Since multiple pg_basebackup can be running at the same time, maintaining
a shared state
can become a little complex, unless we disallow taking multiple parallel
backups.So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.[1]
/messages/by-id/521B4B29.20009@2ndquadrant.com--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:
Thanks Asif,
I have re-verified reported issue. expect standby backup, others are fixed.
Yes As Asif mentioned he is working on the standby issue and adding
bandwidth throttling functionality to parallel backup.
It would be good to get some feedback on Asif previous email from Robert on
the design considerations for stand-by server support and throttling. I
believe all the other points mentioned by Robert in this thread are
addressed by Asif so it would be good to hear about any other concerns that
are not addressed.
Thanks,
-- Ahsan
Thanks & Regards,
Rajkumar RaghuwanshiOn Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
While testing further I observed parallel backup is not able to take
backup of standby server.mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slaveecho "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
/tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">>
/tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">>
/tmp/slave/postgresql.conf./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)[edb@localhost bin]$ ./psql postgres -p 5433 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)*[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
6pg_basebackup: error: could not list backup files: ERROR: the standby was
promoted during online backupHINT: This means that the backup being taken
is corrupt and should not be used. Try taking another online
backup.pg_basebackup: removing data directory "/tmp/bkp_s"*#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,
Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
In another scenarios, bkp data is corrupted for tablespace. again this
is not reproducible everytime,
but If I am running the same set of commands I am getting the same
error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database
testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
/tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p
5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL:
"pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
directoryThanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup
crashed
this crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D
/tmp/test_bkp/bkp -R
Segmentation fault (core dumped)stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D
/tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at
pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3 0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)Thanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:Hi Asif,
Thanks Rajkumar. I have fixed the above issues and have rebased the
patch to the latest master (b7f64c64).
(V9 of the patches are attached).I had a further review of the patches and here are my few
observations:1. +/* + * stop_backup() - ends an online backup + * + * The function is called at the end of an online backup. It sends out pg_control + * file, optionally WAL segments and ending WAL location. + */Comments seem out-dated.
Fixed.
2. With parallel jobs, maxrate is now not supported. Since we are now
asking
data in multiple threads throttling seems important here. Can you
please
explain why have you disabled that?3. As we are always fetching a single file and as Robert suggested,
let rename
SEND_FILES to SEND_FILE instead.Yes, we are fetching a single file. However, SEND_FILES is still capable
of fetching multiple files in one
go, that's why the name.4. Does this work on Windows? I mean does pthread_create() work on
Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.patch is updated to add support for the Windows platform.
5. Typos:
tablspace => tablespace
safly => safelyDone.
6. parallel_backup_run() needs some comments explaining the states it
goes
through PB_* states.7. + case PB_FETCH_REL_FILES: /* fetch files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_STOP_BACKUP; + free_filelist(backupinfo); + } + break; + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_BACKUP_COMPLETE; + } + break;Done.
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Done.
The corrupted tablespace and crash, reported by Rajkumar, have been
fixed. A pointer
variable remained uninitialized which in turn caused the system to
misbehave.Attached is the updated set of patches. AFAIK, to complete parallel
backup feature
set, there remain three sub-features:1- parallel backup does not work with a standby server. In parallel
backup, the server
spawns multiple processes and there is no shared state being maintained.
So currently,
no way to tell multiple processes if the standby was promoted during the
backup since
the START_BACKUP was called.2- throttling. Robert previously suggested that we implement
throttling on the client-side.
However, I found a previous discussion where it was advocated to be added
to the
backend instead[1].So, it was better to have a consensus before moving the throttle function
to the client.
That’s why for the time being I have disabled it and have asked for
suggestions on it
to move forward.It seems to me that we have to maintain a shared state in order to
support taking backup
from standby. Also, there is a new feature recently committed for backup
progress
reporting in the backend (pg_stat_progress_basebackup). This
functionality was recently
added via this commit ID: e65497df. For parallel backup to update these
stats, a shared
state will be required.Since multiple pg_basebackup can be running at the same time, maintaining
a shared state
can become a little complex, unless we disallow taking multiple parallel
backups.So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.[1]
/messages/by-id/521B4B29.20009@2ndquadrant.com--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
Hi Asif,
My colleague Kashif Zeeshan reported an issue off-list, posting here,
please take a look.
When executing two backups at the same time, getting FATAL error due to
max_wal_senders and instead of exit Backup got completed
And when tried to start the server from the backup cluster, getting error.
[edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57849"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C3000050
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57848"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C2000348
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$ ./pg_ctl -D /home/edb/Desktop/backup1/ -o "-p 5438"
start
pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster
directory
Thanks & Regards,
Rajkumar Raghuwanshi
On Mon, Mar 30, 2020 at 6:28 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
Show quoted text
On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Thanks Asif,
I have re-verified reported issue. expect standby backup, others are
fixed.Yes As Asif mentioned he is working on the standby issue and adding
bandwidth throttling functionality to parallel backup.It would be good to get some feedback on Asif previous email from Robert
on the design considerations for stand-by server support and throttling. I
believe all the other points mentioned by Robert in this thread are
addressed by Asif so it would be good to hear about any other concerns that
are not addressed.Thanks,
-- Ahsan
Thanks & Regards,
Rajkumar RaghuwanshiOn Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
While testing further I observed parallel backup is not able to take
backup of standby server.mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">>
data/postgresql.conf./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slaveecho "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
/tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">>
/tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">>
/tmp/slave/postgresql.conf./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)[edb@localhost bin]$ ./psql postgres -p 5433 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)*[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
6pg_basebackup: error: could not list backup files: ERROR: the standby was
promoted during online backupHINT: This means that the backup being taken
is corrupt and should not be used. Try taking another online
backup.pg_basebackup: removing data directory "/tmp/bkp_s"*#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,
Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
In another scenarios, bkp data is corrupted for tablespace. again this
is not reproducible everytime,
but If I am running the same set of commands I am getting the same
error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database
testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl
(a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
/tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p
5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL:
"pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
directoryThanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup
crashed
this crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test
(a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D
/tmp/test_bkp/bkp -R
Segmentation fault (core dumped)stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D
/tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at
pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3 0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)Thanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:Hi Asif,
Thanks Rajkumar. I have fixed the above issues and have rebased the
patch to the latest master (b7f64c64).
(V9 of the patches are attached).I had a further review of the patches and here are my few
observations:1. +/* + * stop_backup() - ends an online backup + * + * The function is called at the end of an online backup. It sends out pg_control + * file, optionally WAL segments and ending WAL location. + */Comments seem out-dated.
Fixed.
2. With parallel jobs, maxrate is now not supported. Since we are
now asking
data in multiple threads throttling seems important here. Can you
please
explain why have you disabled that?3. As we are always fetching a single file and as Robert suggested,
let rename
SEND_FILES to SEND_FILE instead.Yes, we are fetching a single file. However, SEND_FILES is still capable
of fetching multiple files in one
go, that's why the name.4. Does this work on Windows? I mean does pthread_create() work on
Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.patch is updated to add support for the Windows platform.
5. Typos:
tablspace => tablespace
safly => safelyDone.
6. parallel_backup_run() needs some comments explaining the states it
goes
through PB_* states.7. + case PB_FETCH_REL_FILES: /* fetch files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_STOP_BACKUP; + free_filelist(backupinfo); + } + break; + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_BACKUP_COMPLETE; + } + break;Done.
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Done.
The corrupted tablespace and crash, reported by Rajkumar, have been
fixed. A pointer
variable remained uninitialized which in turn caused the system to
misbehave.Attached is the updated set of patches. AFAIK, to complete parallel
backup feature
set, there remain three sub-features:1- parallel backup does not work with a standby server. In parallel
backup, the server
spawns multiple processes and there is no shared state being maintained.
So currently,
no way to tell multiple processes if the standby was promoted during the
backup since
the START_BACKUP was called.2- throttling. Robert previously suggested that we implement
throttling on the client-side.
However, I found a previous discussion where it was advocated to be
added to the
backend instead[1].So, it was better to have a consensus before moving the throttle
function to the client.
That’s why for the time being I have disabled it and have asked for
suggestions on it
to move forward.It seems to me that we have to maintain a shared state in order to
support taking backup
from standby. Also, there is a new feature recently committed for backup
progress
reporting in the backend (pg_stat_progress_basebackup). This
functionality was recently
added via this commit ID: e65497df. For parallel backup to update these
stats, a shared
state will be required.Since multiple pg_basebackup can be running at the same time,
maintaining a shared state
can become a little complex, unless we disallow taking multiple parallel
backups.So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.[1]
/messages/by-id/521B4B29.20009@2ndquadrant.com--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
Hi Asif
The backup failed with errors "error: could not connect to server: could
not look up local user ID 1000: Too many open files" when the
max_wal_senders was set to 2000.
The errors generated for the workers starting from backup worke=1017.
Please note that the backup directory was also not cleaned after the backup
was failed.
Steps
=======
1) Generate data in DB
./pgbench -i -s 600 -h localhost -p 5432 postgres
2) Set max_wal_senders = 2000 in postgresql.
3) Generate the backup
[edb@localhost bin]$
^[[A[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
/home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/F1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_58692"
pg_basebackup: backup worker (0) created
….
…..
…..
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local
user ID 1000: Too many open files
pg_basebackup: backup worker (1018) created
pg_basebackup: error: could not connect to server: could not look up local
user ID 1000: Too many open files
…
…
…
pg_basebackup: error: could not connect to server: could not look up local
user ID 1000: Too many open files
pg_basebackup: backup worker (1989) created
pg_basebackup: error: could not create file
"/home/edb/Desktop/backup//global/4183": Too many open files
pg_basebackup: error: could not create file
"/home/edb/Desktop/backup//global/3592": Too many open files
pg_basebackup: error: could not create file
"/home/edb/Desktop/backup//global/4177": Too many open files
[edb@localhost bin]$
4) The backup directory is not cleaned
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_commit_ts pg_logical pg_notify pg_serial pg_stat
pg_subtrans pg_twophase pg_xact
global pg_dynshmem pg_multixact pg_replslot pg_snapshots pg_stat_tmp
pg_tblspc pg_wal
[edb@localhost bin]$
Kashif Zeeshan
EnterpriseDB
On Thu, Apr 2, 2020 at 2:58 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,
My colleague Kashif Zeeshan reported an issue off-list, posting here,
please take a look.When executing two backups at the same time, getting FATAL error due to
max_wal_senders and instead of exit Backup got completed
And when tried to start the server from the backup cluster, getting error.[edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57849"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C3000050
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D
/home/edb/Desktop/backup1/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57848"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C2000348
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed[edb@localhost bin]$ ./pg_ctl -D /home/edb/Desktop/backup1/ -o "-p 5438"
start
pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster
directoryThanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 30, 2020 at 6:28 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Thanks Asif,
I have re-verified reported issue. expect standby backup, others are
fixed.Yes As Asif mentioned he is working on the standby issue and adding
bandwidth throttling functionality to parallel backup.It would be good to get some feedback on Asif previous email from Robert
on the design considerations for stand-by server support and throttling. I
believe all the other points mentioned by Robert in this thread are
addressed by Asif so it would be good to hear about any other concerns that
are not addressed.Thanks,
-- Ahsan
Thanks & Regards,
Rajkumar RaghuwanshiOn Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
While testing further I observed parallel backup is not able to take
backup of standby server.mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">>
data/postgresql.conf./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slaveecho "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
/tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">>
/tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">>
/tmp/slave/postgresql.conf./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)[edb@localhost bin]$ ./psql postgres -p 5433 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)*[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
6pg_basebackup: error: could not list backup files: ERROR: the standby was
promoted during online backupHINT: This means that the backup being taken
is corrupt and should not be used. Try taking another online
backup.pg_basebackup: removing data directory "/tmp/bkp_s"*#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,
Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
In another scenarios, bkp data is corrupted for tablespace. again
this is not reproducible everytime,
but If I am running the same set of commands I am getting the same
error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database
testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl
(a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
/tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p
5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from
testtbl";
psql: error: could not connect to server: FATAL:
"pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
directoryThanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R,
pg_basebackup crashed
this crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test
(a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D
/tmp/test_bkp/bkp -R
Segmentation fault (core dumped)stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D
/tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at
pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3 0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)Thanks & Regards,
Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:Hi Asif,
Thanks Rajkumar. I have fixed the above issues and have rebased
the patch to the latest master (b7f64c64).
(V9 of the patches are attached).I had a further review of the patches and here are my few
observations:1. +/* + * stop_backup() - ends an online backup + * + * The function is called at the end of an online backup. It sends out pg_control + * file, optionally WAL segments and ending WAL location. + */Comments seem out-dated.
Fixed.
2. With parallel jobs, maxrate is now not supported. Since we are
now asking
data in multiple threads throttling seems important here. Can you
please
explain why have you disabled that?3. As we are always fetching a single file and as Robert suggested,
let rename
SEND_FILES to SEND_FILE instead.Yes, we are fetching a single file. However, SEND_FILES is still
capable of fetching multiple files in one
go, that's why the name.4. Does this work on Windows? I mean does pthread_create() work on
Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.patch is updated to add support for the Windows platform.
5. Typos:
tablspace => tablespace
safly => safelyDone.
6. parallel_backup_run() needs some comments explaining the states it
goes
through PB_* states.7. + case PB_FETCH_REL_FILES: /* fetch files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_STOP_BACKUP; + free_filelist(backupinfo); + } + break; + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */ + if (backupinfo->activeworkers == 0) + { + backupinfo->backupstate = PB_BACKUP_COMPLETE; + } + break;Done.
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Done.
The corrupted tablespace and crash, reported by Rajkumar, have been
fixed. A pointer
variable remained uninitialized which in turn caused the system to
misbehave.Attached is the updated set of patches. AFAIK, to complete parallel
backup feature
set, there remain three sub-features:1- parallel backup does not work with a standby server. In parallel
backup, the server
spawns multiple processes and there is no shared state being
maintained. So currently,
no way to tell multiple processes if the standby was promoted during
the backup since
the START_BACKUP was called.2- throttling. Robert previously suggested that we implement
throttling on the client-side.
However, I found a previous discussion where it was advocated to be
added to the
backend instead[1].So, it was better to have a consensus before moving the throttle
function to the client.
That’s why for the time being I have disabled it and have asked for
suggestions on it
to move forward.It seems to me that we have to maintain a shared state in order to
support taking backup
from standby. Also, there is a new feature recently committed for
backup progress
reporting in the backend (pg_stat_progress_basebackup). This
functionality was recently
added via this commit ID: e65497df. For parallel backup to update these
stats, a shared
state will be required.Since multiple pg_basebackup can be running at the same time,
maintaining a shared state
can become a little complex, unless we disallow taking multiple
parallel backups.So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.[1]
/messages/by-id/521B4B29.20009@2ndquadrant.com--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager
EnterpriseDB Corporation
The Enterprise Postgres Company
On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
go, that's why the name.
I don't see why it should work that way. If we're fetching individual
files, why have an unused capability to fetch multiple files?
1- parallel backup does not work with a standby server. In parallel backup, the server
spawns multiple processes and there is no shared state being maintained. So currently,
no way to tell multiple processes if the standby was promoted during the backup since
the START_BACKUP was called.
Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.
2- throttling. Robert previously suggested that we implement throttling on the client-side.
However, I found a previous discussion where it was advocated to be added to the
backend instead[1].So, it was better to have a consensus before moving the throttle function to the client.
That’s why for the time being I have disabled it and have asked for suggestions on it
to move forward.It seems to me that we have to maintain a shared state in order to support taking backup
from standby. Also, there is a new feature recently committed for backup progress
reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
added via this commit ID: e65497df. For parallel backup to update these stats, a shared
state will be required.
I've come around to the view that a shared state is a good idea and
that throttling on the server-side makes more sense. I'm not clear on
whether we need shared state only for throttling or whether we need it
for more than that. Another possible reason might be for the
progress-reporting stuff that just got added.
Since multiple pg_basebackup can be running at the same time, maintaining a shared state
can become a little complex, unless we disallow taking multiple parallel backups.
I do not see why it would be necessary to disallow taking multiple
parallel backups. You just need to have multiple copies of the shared
state and a way to decide which one to use for any particular backup.
I guess that is a little complex, but only a little.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:
The backup failed with errors "error: could not connect to server: could
not look up local user ID 1000: Too many open files" when the
max_wal_senders was set to 2000.
The errors generated for the workers starting from backup worke=1017.
It wasn't the fact that you set max_wal_senders to 2000. It was the fact
that you specified 1990 parallel workers. By so doing, you overloaded the
machine, which is why everything failed. That's to be expected.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Apr 2, 2020 at 4:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:The backup failed with errors "error: could not connect to server: could
not look up local user ID 1000: Too many open files" when the
max_wal_senders was set to 2000.
The errors generated for the workers starting from backup worke=1017.It wasn't the fact that you set max_wal_senders to 2000. It was the fact
that you specified 1990 parallel workers. By so doing, you overloaded the
machine, which is why everything failed. That's to be expected.Thanks alot Robert,
In this case the backup folder was not being emptied as the backup was
failed, the cleanup should be done in this case too.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager
EnterpriseDB Corporation
The Enterprise Postgres Company
On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
<kashif.zeeshan@enterprisedb.com> wrote:
Thanks alot Robert,
In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this case too.
Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Apr 2, 2020 at 6:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
<kashif.zeeshan@enterprisedb.com> wrote:Thanks alot Robert,
In this case the backup folder was not being emptied as the backup wasfailed, the cleanup should be done in this case too.
Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?
The cleanup is done in the cases I have seen so far with base pg_basebackup
functionality (not including the parallel backup feature) with the message
"pg_basebackup: removing contents of data directory"
A similar case was also fixed for parallel backup reported by Rajkumar
where the contents of the backup folder were not cleaned up after the error.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager
EnterpriseDB Corporation
The Enterprise Postgres Company
On Thu, Apr 2, 2020 at 9:46 AM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:
Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?
The cleanup is done in the cases I have seen so far with base
pg_basebackup functionality (not including the parallel backup feature)
with the message "pg_basebackup: removing contents of data directory"
A similar case was also fixed for parallel backup reported by Rajkumar
where the contents of the backup folder were not cleaned up after the error.
What I'm saying is that it's unclear whether there's a bug here or whether
it just failed because of the very extreme test scenario you created.
Spawning >1000 processes on a small machine can easily make a lot of things
fail.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Apr 2, 2020 at 4:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:Yes, we are fetching a single file. However, SEND_FILES is still capable
of fetching multiple files in one
go, that's why the name.
I don't see why it should work that way. If we're fetching individual
files, why have an unused capability to fetch multiple files?
Okay will rename and will modify the function to send a single file as well.
1- parallel backup does not work with a standby server. In parallel
backup, the server
spawns multiple processes and there is no shared state being maintained.
So currently,
no way to tell multiple processes if the standby was promoted during the
backup since
the START_BACKUP was called.
Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.
Yes, but the user will get the error only after the STOP_BACKUP, not while
the backup is
in progress. So if the backup is a large one, early error detection would
be much beneficial.
This is the current behavior of non-parallel backup as well.
2- throttling. Robert previously suggested that we implement throttling
on the client-side.
However, I found a previous discussion where it was advocated to be
added to the
backend instead[1].
So, it was better to have a consensus before moving the throttle
function to the client.
That’s why for the time being I have disabled it and have asked for
suggestions on it
to move forward.
It seems to me that we have to maintain a shared state in order to
support taking backup
from standby. Also, there is a new feature recently committed for backup
progress
reporting in the backend (pg_stat_progress_basebackup). This
functionality was recently
added via this commit ID: e65497df. For parallel backup to update these
stats, a shared
state will be required.
I've come around to the view that a shared state is a good idea and
that throttling on the server-side makes more sense. I'm not clear on
whether we need shared state only for throttling or whether we need it
for more than that. Another possible reason might be for the
progress-reporting stuff that just got added.
Okay, then I will add the shared state. And since we are adding the shared
state, we can use
that for throttling, progress-reporting and standby early error checking.
Since multiple pg_basebackup can be running at the same time,
maintaining a shared state
can become a little complex, unless we disallow taking multiple parallel
backups.
I do not see why it would be necessary to disallow taking multiple
parallel backups. You just need to have multiple copies of the shared
state and a way to decide which one to use for any particular backup.
I guess that is a little complex, but only a little.
There are two possible options:
(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.
This BackupID should be given back as a response to start backup command.
All client workers
must append this ID to all parallel backup replication commands. So that we
can use this identifier
to search for that particular backup. Does that sound good?
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
in progress. So if the backup is a large one, early error detection would be much beneficial.
This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.
Okay, then I will add the shared state. And since we are adding the shared state, we can use
that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
There are two possible options:
(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.This BackupID should be given back as a response to start backup command. All client workers
must append this ID to all parallel backup replication commands. So that we can use this identifier
to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.Yes, but the user will get the error only after the STOP_BACKUP, not
while the backup is
in progress. So if the backup is a large one, early error detection
would be much beneficial.
This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.
Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
*if* (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
Okay, then I will add the shared state. And since we are adding the
shared state, we can use
that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.There are two possible options:
(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.This BackupID should be given back as a response to start backup
command. All client workers
must append this ID to all parallel backup replication commands. So that
we can use this identifier
to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.
Okay so If we are to add another replication command ‘JOIN_BACKUP
unique_backup_id’
to make workers find the relevant shared state. There won't be any need for
changing
the grammar for any other command. The START_BACKUP can return the
unique_backup_id
in the result set.
I am thinking of the following struct for shared state:
*typedef* *struct*
{
*char* backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
*bool* backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash
table.
There will be one structure per parallel backup. Since a single parallel
backup
can engage more than one wal sender, so I think max_wal_senders might be a
little
too much; perhaps max_wal_senders/2 since there will be at least 2
connections
per parallel backup? Alternatively, we can set a new GUC that defines the
maximum
number of for concurrent parallel backups i.e.
‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.
The key would be “backupid=hex_encode(pg_random_strong(16))”
Checking for Standby Promotion:
At the START_BACKUP command, we initialize
BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.
Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will
now be
updating it. So in parallel backup, this will represent the overall bytes
that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the
throttling
value atomically.
Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.
Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.
START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend
process.
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Hi Asif
When a non-existent slot is used with tablespace then correct error is
displayed but then the backup folder is not cleaned and leaves a corrupt
backup.
Steps
=======
edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/
-T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact
pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact
postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify
pg_serial pg_stat pg_subtrans pg_twophase pg_wal
postgresql.auto.conf
[edb@localhost bin]$
If the same case is executed without the parallel backup patch then the
backup folder is cleaned after the error is displayed.
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T
/home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
*pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
pg_basebackup: changes to tablespace directories will not be undone
On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.Yes, but the user will get the error only after the STOP_BACKUP, not
while the backup is
in progress. So if the backup is a large one, early error detection
would be much beneficial.
This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
*if* (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
Okay, then I will add the shared state. And since we are adding the
shared state, we can use
that for throttling, progress-reporting and standby early error
checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.There are two possible options:
(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.This BackupID should be given back as a response to start backup
command. All client workers
must append this ID to all parallel backup replication commands. So
that we can use this identifier
to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP
unique_backup_id’
to make workers find the relevant shared state. There won't be any need
for changing
the grammar for any other command. The START_BACKUP can return the
unique_backup_id
in the result set.I am thinking of the following struct for shared state:
*typedef* *struct*
{
*char* backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
*bool* backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash
table.
There will be one structure per parallel backup. Since a single parallel
backup
can engage more than one wal sender, so I think max_wal_senders might be a
little
too much; perhaps max_wal_senders/2 since there will be at least 2
connections
per parallel backup? Alternatively, we can set a new GUC that defines the
maximum
number of for concurrent parallel backups i.e.
‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”
Checking for Standby Promotion:
At the START_BACKUP command, we initialize
BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the
same
as for non-parallel backup with the exception that multiple threads will
now be
updating it. So in parallel backup, this will represent the overall bytes
that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the
throttling
value atomically.Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.Grammar:
There is a change in the resultset being returned for START_BACKUP
command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend
process.LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager
EnterpriseDB Corporation
The Enterprise Postgres Company
Asif,
After recent backup manifest addition, patches needed to rebase and
reconsideration of a few things like making sure that parallel backup
creates
a manifest file correctly or not etc.
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:
Hi Asif
When a non-existent slot is used with tablespace then correct error is
displayed but then the backup folder is not cleaned and leaves a corrupt
backup.Steps
=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D
/home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact
pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact
postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify
pg_serial pg_stat pg_subtrans pg_twophase pg_wal
postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the
backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T
/home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
*pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
pg_basebackup: changes to tablespace directories will not be undone
Hi Asif
A similar case is when DB Server is shut down while the Parallel Backup is
in progress then the correct error is displayed but then the backup folder
is not cleaned and leaves a corrupt backup. I think one bug fix will solve
all these cases where clean up is not done when parallel backup is failed.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
Same case when executed on pg_basebackup without the Parallel backup patch
then proper clean up is done.
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory
"/home/edb/Desktop/backup/"
[edb@localhost bin]$
Thanks
On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.Yes, but the user will get the error only after the STOP_BACKUP, not
while the backup is
in progress. So if the backup is a large one, early error detection
would be much beneficial.
This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
*if* (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
Okay, then I will add the shared state. And since we are adding the
shared state, we can use
that for throttling, progress-reporting and standby early error
checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.There are two possible options:
(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.This BackupID should be given back as a response to start backup
command. All client workers
must append this ID to all parallel backup replication commands. So
that we can use this identifier
to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP
unique_backup_id’
to make workers find the relevant shared state. There won't be any need
for changing
the grammar for any other command. The START_BACKUP can return the
unique_backup_id
in the result set.I am thinking of the following struct for shared state:
*typedef* *struct*
{
*char* backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
*bool* backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash
table.
There will be one structure per parallel backup. Since a single parallel
backup
can engage more than one wal sender, so I think max_wal_senders might be
a little
too much; perhaps max_wal_senders/2 since there will be at least 2
connections
per parallel backup? Alternatively, we can set a new GUC that defines the
maximum
number of for concurrent parallel backups i.e.
‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”
Checking for Standby Promotion:
At the START_BACKUP command, we initialize
BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the
same
as for non-parallel backup with the exception that multiple threads will
now be
updating it. So in parallel backup, this will represent the overall bytes
that
have been transferred. So the workers would sleep if they have exceeded
the
limit. Hence, the shared state carries a lock to safely update the
throttling
value atomically.Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.Grammar:
There is a change in the resultset being returned for START_BACKUP
command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command
is
added. SEND_FILES has been renamed to SEND_FILE. There are no other
changes
to the grammar.START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend
process.LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / ManagerEnterpriseDB Corporation
The Enterprise Postgres Company
--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager
EnterpriseDB Corporation
The Enterprise Postgres Company
Hi,
Thanks, Kashif and Rajkumar. I have fixed the reported issues.
I have added the shared state as previously described. The new grammar
changes
are as follows:
START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and
hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The
hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I
think it's
sufficient initial size. max_wal_senders is not used, because it can
be set to quite a
large values.
JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.
SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now
accessible through
shared state.
There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
The current patches (v11) have been rebased to the latest master. The
backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have
generated a warning so that
user is aware of it and not expect it in the backup.
On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:
On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:Hi Asif
When a non-existent slot is used with tablespace then correct error is
displayed but then the backup folder is not cleaned and leaves a corrupt
backup.Steps
=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D
/home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact
pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact
postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify
pg_serial pg_stat pg_subtrans pg_twophase pg_wal
postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the
backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T
/home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
*pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
pg_basebackup: changes to tablespace directories will not be undoneHi Asif
A similar case is when DB Server is shut down while the Parallel Backup is
in progress then the correct error is displayed but then the backup folder
is not cleaned and leaves a corrupt backup. I think one bug fix will solve
all these cases where clean up is not done when parallel backup is failed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j
8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$Same case when executed on pg_basebackup without the Parallel backup patch
then proper clean up is done.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory
"/home/edb/Desktop/backup/"
[edb@localhost bin]$Thanks
On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.Yes, but the user will get the error only after the STOP_BACKUP, not
while the backup is
in progress. So if the backup is a large one, early error detection
would be much beneficial.
This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only
intendedto make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
*if* (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
Okay, then I will add the shared state. And since we are adding the
shared state, we can use
that for throttling, progress-reporting and standby early error
checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.There are two possible options:
(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.This BackupID should be given back as a response to start backup
command. All client workers
must append this ID to all parallel backup replication commands. So
that we can use this identifier
to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP
unique_backup_id’
to make workers find the relevant shared state. There won't be any need
for changing
the grammar for any other command. The START_BACKUP can return the
unique_backup_id
in the result set.I am thinking of the following struct for shared state:
*typedef* *struct*
{
*char* backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
*bool* backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash
table.
There will be one structure per parallel backup. Since a single parallel
backup
can engage more than one wal sender, so I think max_wal_senders might be
a little
too much; perhaps max_wal_senders/2 since there will be at least 2
connections
per parallel backup? Alternatively, we can set a new GUC that defines
the maximum
number of for concurrent parallel backups i.e.
‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”
Checking for Standby Promotion:
At the START_BACKUP command, we initialize
BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the
same
as for non-parallel backup with the exception that multiple threads will
now be
updating it. So in parallel backup, this will represent the overall
bytes that
have been transferred. So the workers would sleep if they have exceeded
the
limit. Hence, the shared state carries a lock to safely update the
throttling
value atomically.Progress Reporting:
Although I think we should add progress-reporting for parallel backup as
a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this
structure
as well.Grammar:
There is a change in the resultset being returned for START_BACKUP
command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command
is
added. SEND_FILES has been renamed to SEND_FILE. There are no other
changes
to the grammar.START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a
backend process.LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / ManagerEnterpriseDB Corporation
The Enterprise Postgres Company--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / ManagerEnterpriseDB Corporation
The Enterprise Postgres Company
--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi,
Thanks, Kashif and Rajkumar. I have fixed the reported issues.
I have added the shared state as previously described. The new grammar
changes
are as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and
hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The
hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
I think it's
sufficient initial size. max_wal_senders is not used, because it can
be set to quite a
large values.JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now
accessible through
shared state.There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The
backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have
generated a warning so that
user is aware of it and not expect it in the backup.
So, are you working on to make it work? I don't think a parallel backup
feature should be creating a backup with no manifest.
--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Tue, Apr 7, 2020 at 10:03 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:Hi,
Thanks, Kashif and Rajkumar. I have fixed the reported issues.
I have added the shared state as previously described. The new grammar
changes
are as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and
hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The
hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
I think it's
sufficient initial size. max_wal_senders is not used, because it
can be set to quite a
large values.JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now
accessible through
shared state.There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The
backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have
generated a warning so that
user is aware of it and not expect it in the backup.So, are you working on to make it work? I don't think a parallel backup
feature should be creating a backup with no manifest.
I will, however parallel backup is already quite a large patch. So I think
we should first
agree on the current work before adding a backup manifest and
progress-reporting support.
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
On Fri, Apr 3, 2020 at 4:46 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
Oh, OK. So then we need to preserve that behavior, I think. Sorry, I
didn't realize the check was happening there.
I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
Looks broadly reasonable. Can anything other than lock and
throttling_counter change while it's running? If not, how about using
pg_atomic_uint64 for the throttling counter, and dropping lock? If
that gets too complicated it's OK to keep it as you have it.
The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.
I don't think you need a hash table. Linear search should be fine. And
I see no point in dividing max_wal_senders by 2 either. The default is
*10*. You'd need to increase that by more than an order of magnitude
for a hash table to be needed, and more than that for the shared
memory consumption to matter.
The key would be “backupid=hex_encode(pg_random_strong(16))”
wfm
Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.
I mean, you can separate it for review if you wish, but it would need
to be committed together.
START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
OK. But what if I want to use this interface for a non-parallel backup?
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
I don't think it makes sense for STOP_BACKUP to return the same values
that START_BACKUP already returned. Presumably STOP_BACKUP should
return the end LSN. It could also return the backup label and
tablespace map files, as the corresponding SQL function does, unless
there's some better way of returning those in this case.
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.
OK.
LIST_TABLESPACES [PROGRESS]
OK.
LIST_FILES [TABLESPACE]
OK.
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
Why not just LIST_WAL_FILES 'startptr' 'endptr'?
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
Why parens? That seems useless.
Maybe it would make sense to have SEND_DATA_FILE 'datafilename' and
SEND_WAL_FILE 'walfilename' as separate commands. But not sure.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
I will, however parallel backup is already quite a large patch. So I think we should first
agree on the current work before adding a backup manifest and progress-reporting support.
It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.
I'm gonna try to find some time for that as soon as I can.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi Asif,
Thanks for new patches.
Patches need to be rebased on head. Getting a failure while applying the
0003 patch.
edb@localhost postgresql]$ git apply
v11/0003-Parallel-Backup-Backend-Replication-commands.patch
error: patch failed: src/backend/storage/ipc/ipci.c:147
error: src/backend/storage/ipc/ipci.c: patch does not apply
I have applied v11 patches on commit -
23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.
pg_basebackup has a new option "--no-estimate-size", pg_basebackup crashes
when using this option.
[edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2
Segmentation fault (core dumped)
--stacktrace
[edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
Loaded symbols for /lib64/libselinux.so.1
Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size
--jobs=2'.
Program terminated with signal 11, Segmentation fault.
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
298 while (ISSPACE (*s))
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
#1 0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
#2 0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at
pg_basebackup.c:2526
Thanks & Regards,
Rajkumar Raghuwanshi
On Tue, Apr 7, 2020 at 11:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
I will, however parallel backup is already quite a large patch. So I
think we should first
agree on the current work before adding a backup manifest and
progress-reporting support.
It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.I'm gonna try to find some time for that as soon as I can.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
rebased and updated to current master (d025cf88ba). v12 is attahced.
Also, changed the grammar for LIST_WAL_FILES and SEND_FILE to:
- LIST_WAL_FILES 'startptr' 'endptr'
- SEND_FILE 'FILE' [NOVERIFY_CHECKSUMS]
On Wed, Apr 8, 2020 at 10:48 AM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,
Thanks for new patches.
Patches need to be rebased on head. Getting a failure while applying the
0003 patch.
edb@localhost postgresql]$ git apply
v11/0003-Parallel-Backup-Backend-Replication-commands.patch
error: patch failed: src/backend/storage/ipc/ipci.c:147
error: src/backend/storage/ipc/ipci.c: patch does not applyI have applied v11 patches on commit -
23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.pg_basebackup has a new option "--no-estimate-size", pg_basebackup
crashes when using this option.[edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size
--jobs=2
Segmentation fault (core dumped)--stacktrace
[edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
Loaded symbols for /lib64/libselinux.so.1
Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size
--jobs=2'.
Program terminated with signal 11, Segmentation fault.
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
298 while (ISSPACE (*s))
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
#1 0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
#2 0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at
pg_basebackup.c:2526Thanks & Regards,
Rajkumar RaghuwanshiOn Tue, Apr 7, 2020 at 11:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:I will, however parallel backup is already quite a large patch. So I
think we should first
agree on the current work before adding a backup manifest and
progress-reporting support.
It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.I'm gonna try to find some time for that as soon as I can.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
Attachments:
parallel_backup_v12.zipapplication/zip; name=parallel_backup_v12.zipDownload
PK �`�P ? 0001-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb.patchUT
=x�^=x�^Cx�^ux � �Yis�F�,���Rk�&���]9�.������um�.� ��@ �CGb��o�� x��(U��LOw���Q0����g������c�&��=�����q�f�osf���W<}:�C�?������C���|2e>���E+�&�x2Z{�����Gn7`�V��>�{����q �N����R�n%�������
���/(�gS��'|�� ��!J}p�~N�[��1��&�����4liZ��� ��6�����x������H�B���_��'�����6_%g�r��:�6�q=�5a��7vQ��GB�J�*�����F��i��8�l��X{-����i�o�{��������j����lO����v�Wk���s�]��+ht�P�x'Q�@������;�������b����.�(���*:(!OFPY2i�s�4]y�o�^N��V�H�$XC�-���.��%�D<Cf�8[�&�����7q>�d�����Y�kW�l�j.�ySR5��&2Q ��$,�HW�}t5�
W?��L�-j���&���O���d.�g�����o��
���P�����7�,��X���Ic��5�j�H4:����
.]_c�d�j=�1M�^ ��E2�a�r�B��hr0B��?�\��}�I�A-�5��[��K�B �"|���?O�a�����Qve5�DR����.��������]� j0�� ��u��������m*$��;�
1!H�0Mp^�5Q���i`��N��CU�8���2��Rk0�0P�q2���$�|�>/O��Hid�����J���yX���7�}H&x���b�U�4���0`����}R�m~�����������$]07���|d��G�9mI2������*�V5�����h��\=��w�\�D��1���7b��G��zp�������l������������������
x�����X��!]g\�]a���l%I���Q�%��n��)���u/�@H@�����#7�
QU���Z�nk�Q�-�"�����
�a(b��K-�:n��B J�9�����-/�l!��@:"l��Hy`��/I#K���)����������&���z8��
���I7��F:'h�cr��ec��{K�MY(`�RR������FqIo'��#��e���K[���e���"E
1Ta��+�F"O��\|��`0X��b��K{$k0���K�ZR�����*A�*�% ��q�����k���l�8F~��G�y;�m�x�/��d��1��\�e�R������y^���2���6��y�b���VOL�"7���B������3 &c��<���B/^������a�$`��U�.{
�6��lq}�w�������<y6d�l���xt�����.Q>�:� ~�R���O�S�dt{&�;�Z��w+Cd'W��g�gdM��hNL�`K>��O�D<��e��
�h�n9�Z�jK=o�����Y�]�./���SS�������� %A�'A��4�9Q%�@�
�Dp��"�����BS>E����i�sg'�z���P/.�W�2�E�N�s�gN��{1�qCz����<>�7�>!��n����O�����lF�Y`�P�a�[��$y����Y���>&�� C����=|�����s�p+�q��������S��4�Z.�/���|�r/����[T[�X�������(T�(r��������uE��vm��ZH��9!=A8g�C��C2K�-��Q91���M��f|���/n|j��7`I�&U<'t�U�_�n�����/n����/�������Q�&��m�9
��+��
������.���,�r|+EG��8U�Qv���?h��������67�^&�b�q��F���3������|B~!�r?����Ya��,����'�,
]���r#1+�l+����K���:�G��)��o��wA��S�WG������>�n�2��`$[��h��W���p������|�S����fSK�����~��!!0���+&|�v�Y���$��~�����E�X���~S�>��~���\k��|f��^�k���V�����m�����g?7�|���D��B��e�r����h�+Q��}�������������djC�������>�}w�g���cE�?�}�QFg5cot<�:;���������h���Az^�IS7����?PK��+� PK �`�P �@ ? 0002-Refactor-some-backup-code-to-increase-reusability.-T.patchUT
=x�^=x�^Cx�^ux � �[mw�8��B�9m 2�k�ighB���i������� o���M��t����[L�tsfR�������<��:���<s]����a�����e#�eltzt�t�����srpz���2�i��V�L�G:�V�����n��H�O�, �0������&�x2�{Y�`)?CA���@�N���g��g����V�:�
������q��=K:���A��9i�$�r2d��YD���$
�81g '1�%l��^z�$�/�B���K s���w!�'�� y�|�S6�y1�'�n�������W(�W�p1�a.����?h2
�)����i����@U��v4��������et�*6I�OB���MU��P����������%��s�V�@���TO����$Xp�� �8��C����$v�Q,�}����~� a��o~8n:~����B����S���a�}TZ�r���-�,����g.��%&����w��e09!�������zw��}.�4�����F����?<���m�^��o��:���~�l�[�kwZ'���j����
6Yk��~#�v���ca��O��.�e����h����G����E�a^���"���_y�� {�s�{vdUI�����|���^���o�_���njs���}�h�$���N^��7<�j��k�}���s���_��+�q�~���� �E!x.yj� ^mT*��J}/�I��^,�f���C�=W6�c�x�1��T*.K��X:�y j��{�����a,��|�z�F�I�����!��2p'��"t�0y�����k�,��m�W������*H��M�IE�g�"��2e_����/�5�;��hh��\5v����Q�K�Ga��9���`�AxGx�z���0L��\T�J��r������F�l��1-��,�k�`��R�5�9O��1����:������IjGq8�!��Y��v�b6��������Uw�{�=����}�>�����6�{���w�����.�����G�U�1�3HR�O�j��/�\�,_��CH����V��v����&��������VYo���������U{���*h�Je4�}��?�u?���:��P�l��m|/�Rlc����Z�
�D����"�A����d8i�s��c�/���/�_��1Ig����D��C8���x���r�@s�N^� -��;Y�^7f
� �~���I������d ��u���O���#�xk���|����v�����������uTK��xy}��j�� �\4�1��J��~P�?�,!���g%W��c��������K>�Tj<�������3� &��tB�c�?w�$���'���g�<u=�iA�����!��\��/��H�e�A �c�"�J���a0�\������a v�l��(����u]�y�����������]]R~�h��'�i�E��2����L�3/����q��b�:�PVD�a�$�Zf\I1p���"�e�b���j,v&Pn��5�o�F*�/5��S#v��>*�%���u�aD�1�]ty�-����Oa~�-�7�E0�|�s�sahK\����J�����j_��I�#�(�F������G��T�^�����y�nBd�v �� ,�����m�"@�0Ab4��2��H1��~;d��?}*'`z���j�5�C%D��r`�������UtV��[B��SJ�v�����
VS�^Dp:S��2�\l
�pgf�6���uc]����2VM���f���:93�vA�����WeX���Lo#+��C<�Z"�+�md�U3�����Y����kk�c��E��W�
���IB ;�F�\��hs��d?��-���t��AKN='���A[�[�$+�,EE!Xf�hZt4�� ��Y���}�a���'�?`D�1m��,�� ��9o��aRR8�n )��9��_���Hz�����77������nNt�)l6d��E8v�������,-?�by#%J�k�{�A��ga/�Vt��"~i-���t����P
��l��&������g��v���h��OW��/�G��9D)��}�+�0�vnX��C���J���d�����U�CU�2]A�${�H�c�fh����E��'Kj��,#�t�K�(-a�`�5��Z��m�?��%���OW"���9�OW�~Z��������O8P��x�b6B�t{�O�>��^��t[�O�A�9�a�����5\�o�a�W������W�6����^ ��/��JL��,��!h/c �l����t3XO7B�r��H/����7��a��@�n����0�n���%���AO���tcOW x�%~�z�
��n����G�w!��S����H�N
�%bW��q{������>�S���(K���gK��b������Q�s0���v��[G�g|�FN� S���C���rO1(�s� ���=�c���B��
�����M�[�^��R���Ic"���466���yP*�1D
�
��+��Cc��\r�Z�'j��8!${��Ru�db����l��@M15
: �*^����5��'j�X�7���5��E�U_��'_�����OM������&G���S���C��!S'��5:�@���E�
A��Q��j��o���(l���Tu��TSd����8�_���!w�-�.��`��d��8w���qR�z�-����b�;i&�u##R^�GB����w����g���J^gts���RBZ��x��lVVm <��k�1�m�{=x������-��eg�%1, `dL�waJ&�4E��DC�cB� �7#�((�z���1�� �c����]�������������Y�B�$CxJ �E�c( d)D�T�3��J����D���Yb�4��<���'����j�+I��$<�
Q���/@�� ��?^X ��wK|��B)��(�r �k�D*I+�1K~D<��)L�p��#�R����|�/#��}�oC��Z���0��:�
���6��zy:��t<X�fO��� �
���+*��-���*��_-E�/�z���N]�-�n�a���@(���/�,_`��O)-,��D��5O.��<
�m��<