base backup client as auxiliary backend process

Started by Peter Eisentrautover 6 years ago39 messages
#1Peter Eisentraut
peter.eisentraut@2ndquadrant.com
1 attachment(s)

Setting up a standby instance is still quite complicated. You need to
run pg_basebackup with all the right options. You need to make sure
pg_basebackup has the right permissions for the target directories. The
created instance has to be integrated into the operating system's start
scripts. There is this slightly awkward business of the --recovery-conf
option and how it interacts with other features. And you should
probably run pg_basebackup under screen. And then how do you get
notified when it's done. And when it's done you have to log back in and
finish up. Too many steps.

My idea is that the postmaster can launch a base backup worker, wait
till it's done, then proceed with the rest of the startup. initdb gets
a special option to create a "minimal" data directory with only a few
files, directories, and the usual configuration files. Then you create
a $PGDATA/basebackup.signal, start the postmaster as normal. It sees
the signal file, launches an auxiliary process that runs the base
backup, then proceeds with normal startup in standby mode.

This makes a whole bunch of things much nicer: The connection
information for where to get the base backup from comes from
postgresql.conf, so you only need to specify it in one place.
pg_basebackup is completely out of the picture; no need to deal with
command-line options, --recovery-conf, screen, monitoring for
completion, etc. If something fails, the base backup process can
automatically be restarted (maybe). Operating system integration is
much easier: You only call initdb and then pg_ctl or postgres, as you
are already doing. Automated deployment systems don't need to wait for
pg_basebackup to finish: You only call initdb, then start the server,
and then you're done -- waiting for the base backup to finish can be
done by the regular monitoring system.

Attached is a very hackish patch to implement this. It works like this:

# (assuming you have a primary already running somewhere)
initdb -D data2 --minimal
$EDITOR data2/postgresql.conf # set primary_conninfo
pg_ctl -D data2 start

(Curious side note: If you don’t set primary_conninfo in these steps,
then libpq defaults apply, so the default behavior might end up being
that a given instance attempts to replicate from itself.)

It works for basic cases. It's missing tablespace support, proper
fsyncing, progress reporting, probably more. Those would be pretty
straightforward I think. The interesting bit is the delicate ordering
of the postmaster startup: Normally, the pg_control file is read quite
early, but if starting from a minimal data directory, we need to wait
until the base backup is done. There is also the question what you do
if the base backup fails halfway through. Currently you probably need
to delete the whole data directory and start again with initdb. Better
might be a way to start again and overwrite any existing files, but that
can clearly also be dangerous. All this needs some careful analysis,
but I think it's doable.

Any thoughts?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v1-0001-Base-backup-client-as-auxiliary-backend-process.patchtext/plain; charset=UTF-8; name=v1-0001-Base-backup-client-as-auxiliary-backend-process.patch; x-mac-creator=0; x-mac-type=0Download
From 1cf36db2514b04428570496fc9d1145591fda0fc Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sat, 29 Jun 2019 21:52:21 +0200
Subject: [PATCH v1] Base backup client as auxiliary backend process

---
 src/backend/access/transam/xlog.c             |  14 +-
 src/backend/bootstrap/bootstrap.c             |   9 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           |  95 +++++-
 src/backend/replication/Makefile              |   2 +-
 src/backend/replication/basebackup_client.c   |  33 ++
 .../libpqwalreceiver/libpqwalreceiver.c       | 308 ++++++++++++++++++
 src/bin/initdb/initdb.c                       |  39 ++-
 src/include/access/xlog.h                     |   4 +
 src/include/miscadmin.h                       |   2 +
 src/include/pgstat.h                          |   1 +
 src/include/replication/basebackup_client.h   |   1 +
 src/include/replication/walreceiver.h         |   4 +
 13 files changed, 502 insertions(+), 16 deletions(-)
 create mode 100644 src/backend/replication/basebackup_client.c
 create mode 100644 src/include/replication/basebackup_client.h

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e08320e829..da97970703 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -905,7 +905,6 @@ static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpti, bool report);
 static bool rescanLatestTimeLine(void);
 static void WriteControlFile(void);
-static void ReadControlFile(void);
 static char *str_time(pg_time_t tnow);
 static bool CheckForStandbyTrigger(void);
 
@@ -4572,7 +4571,7 @@ WriteControlFile(void)
 						XLOG_CONTROL_FILE)));
 }
 
-static void
+void
 ReadControlFile(void)
 {
 	pg_crc32c	crc;
@@ -6209,13 +6208,11 @@ StartupXLOG(void)
 	CurrentResourceOwner = AuxProcessResourceOwner;
 
 	/*
-	 * Verify XLOG status looks valid.
+	 * Check that contents look valid.
 	 */
-	if (ControlFile->state < DB_SHUTDOWNED ||
-		ControlFile->state > DB_IN_PRODUCTION ||
-		!XRecOffIsValid(ControlFile->checkPoint))
+	if (!XRecOffIsValid(ControlFile->checkPoint))
 		ereport(FATAL,
-				(errmsg("control file contains invalid data")));
+				(errmsg("control file contains invalid checkpoint location")));
 
 	if (ControlFile->state == DB_SHUTDOWNED)
 	{
@@ -6248,6 +6245,9 @@ StartupXLOG(void)
 		ereport(LOG,
 				(errmsg("database system was interrupted; last known up at %s",
 						str_time(ControlFile->time))));
+	else
+		ereport(FATAL,
+				(errmsg("control file contains invalid database cluster state")));
 
 	/* This is just to allow attaching to startup process with a debugger */
 #ifdef XLOG_REPLAY_DELAY
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 43627ab8f4..57769c160c 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -36,6 +36,7 @@
 #include "postmaster/bgwriter.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
+#include "replication/basebackup_client.h"
 #include "replication/walreceiver.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
@@ -326,6 +327,9 @@ AuxiliaryProcessMain(int argc, char *argv[])
 			case StartupProcess:
 				statmsg = pgstat_get_backend_desc(B_STARTUP);
 				break;
+			case BaseBackupProcess:
+				statmsg = pgstat_get_backend_desc(B_BASE_BACKUP);
+				break;
 			case BgWriterProcess:
 				statmsg = pgstat_get_backend_desc(B_BG_WRITER);
 				break;
@@ -451,6 +455,11 @@ AuxiliaryProcessMain(int argc, char *argv[])
 			StartupProcessMain();
 			proc_exit(1);		/* should never return */
 
+		case BaseBackupProcess:
+			/* don't set signals, basebackup has its own agenda */
+			BaseBackupMain();
+			proc_exit(1);		/* should never return */
+
 		case BgWriterProcess:
 			/* don't set signals, bgwriter has its own agenda */
 			BackgroundWriterMain();
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b4f2b28b51..6664932183 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2934,6 +2934,9 @@ pgstat_bestart(void)
 			case StartupProcess:
 				lbeentry.st_backendType = B_STARTUP;
 				break;
+			case BaseBackupProcess:
+				lbeentry.st_backendType = B_BASE_BACKUP;
+				break;
 			case BgWriterProcess:
 				lbeentry.st_backendType = B_BG_WRITER;
 				break;
@@ -4289,6 +4292,9 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_BG_WORKER:
 			backendDesc = "background worker";
 			break;
+		case B_BASE_BACKUP:
+			backendDesc = "base backup";
+			break;
 		case B_BG_WRITER:
 			backendDesc = "background writer";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 688ad439ed..287b233399 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -248,6 +248,7 @@ bool		restart_after_crash = true;
 
 /* PIDs of special child processes; 0 when not running */
 static pid_t StartupPID = 0,
+			BaseBackupPID = 0,
 			BgWriterPID = 0,
 			CheckpointerPID = 0,
 			WalWriterPID = 0,
@@ -539,6 +540,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #endif							/* EXEC_BACKEND */
 
 #define StartupDataBase()		StartChildProcess(StartupProcess)
+#define StartBaseBackup()		StartChildProcess(BaseBackupProcess)
 #define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
 #define StartCheckpointer()		StartChildProcess(CheckpointerProcess)
 #define StartWalWriter()		StartChildProcess(WalWriterProcess)
@@ -572,6 +574,8 @@ PostmasterMain(int argc, char *argv[])
 	bool		listen_addr_saved = false;
 	int			i;
 	char	   *output_config_variable = NULL;
+	struct stat stat_buf;
+	bool		basebackup_signal_file_found = false;
 
 	InitProcessGlobals();
 
@@ -877,12 +881,27 @@ PostmasterMain(int argc, char *argv[])
 	/* Verify that DataDir looks reasonable */
 	checkDataDir();
 
-	/* Check that pg_control exists */
-	checkControlFile();
-
 	/* And switch working directory into it */
 	ChangeToDataDir();
 
+	if (stat(BASEBACKUP_SIGNAL_FILE, &stat_buf) == 0)
+	{
+		int         fd;
+
+		fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY,
+							   S_IRUSR | S_IWUSR);
+		if (fd >= 0)
+		{
+			(void) pg_fsync(fd);
+			close(fd);
+		}
+		basebackup_signal_file_found = true;
+	}
+
+	/* Check that pg_control exists */
+	if (!basebackup_signal_file_found)
+		checkControlFile();
+
 	/*
 	 * Check for invalid combinations of GUC settings.
 	 */
@@ -961,7 +980,8 @@ PostmasterMain(int argc, char *argv[])
 	 * processes will inherit the correct function pointer and not need to
 	 * repeat the test.
 	 */
-	LocalProcessControlFile(false);
+	if (!basebackup_signal_file_found)
+		LocalProcessControlFile(false);
 
 	/*
 	 * Initialize SSL library, if specified.
@@ -1363,6 +1383,33 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
 
+	if (basebackup_signal_file_found)
+	{
+		BaseBackupPID = StartBaseBackup();
+
+		/*
+		 * XXX wait until done
+		 */
+		while (BaseBackupPID != 0)
+		{
+			PG_SETMASK(&UnBlockSig);
+			sleep(2);
+			PG_SETMASK(&BlockSig);
+		}
+
+		/*
+		 * Base backup done, now signal standby mode.  XXX Is there a use for
+		 * switching into (non-standby) recovery here?  How would that be
+		 * configured?
+		 */
+		durable_rename(BASEBACKUP_SIGNAL_FILE, STANDBY_SIGNAL_FILE, FATAL);
+
+		/*
+		 * Reread the control file that came in with the base backup.
+		 */
+		ReadControlFile();
+	}
+
 	/*
 	 * We're ready to rock and roll...
 	 */
@@ -2631,6 +2678,8 @@ SIGHUP_handler(SIGNAL_ARGS)
 		SignalChildren(SIGHUP);
 		if (StartupPID != 0)
 			signal_child(StartupPID, SIGHUP);
+		if (BaseBackupPID != 0)
+			signal_child(BaseBackupPID, SIGHUP);
 		if (BgWriterPID != 0)
 			signal_child(BgWriterPID, SIGHUP);
 		if (CheckpointerPID != 0)
@@ -2782,6 +2831,8 @@ pmdie(SIGNAL_ARGS)
 
 			if (StartupPID != 0)
 				signal_child(StartupPID, SIGTERM);
+			if (BaseBackupPID != 0)
+				signal_child(BaseBackupPID, SIGTERM);
 			if (BgWriterPID != 0)
 				signal_child(BgWriterPID, SIGTERM);
 			if (WalReceiverPID != 0)
@@ -2997,6 +3048,22 @@ reaper(SIGNAL_ARGS)
 			continue;
 		}
 
+		/*
+		 * Was it the base backup process?
+		 */
+		if (pid == BaseBackupPID)
+		{
+			BaseBackupPID = 0;
+			if (EXIT_STATUS_0(exitstatus))
+				;
+			else if (EXIT_STATUS_1(exitstatus))
+				elog(FATAL, "base backup failed");
+			else
+				HandleChildCrash(pid, exitstatus,
+								 _("base backup process"));
+			continue;
+		}
+
 		/*
 		 * Was it the bgwriter?  Normal exit can be ignored; we'll start a new
 		 * one at the next iteration of the postmaster's main loop, if
@@ -3516,6 +3583,18 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 		StartupStatus = STARTUP_SIGNALED;
 	}
 
+	/* Take care of the base backup process too */
+	if (pid == BaseBackupPID)
+		BaseBackupPID = 0;
+	else if (BaseBackupPID != 0 && take_action)
+	{
+		ereport(DEBUG2,
+				(errmsg_internal("sending %s to process %d",
+								 (SendStop ? "SIGSTOP" : "SIGQUIT"),
+								 (int) BaseBackupPID)));
+		signal_child(BaseBackupPID, (SendStop ? SIGSTOP : SIGQUIT));
+	}
+
 	/* Take care of the bgwriter too */
 	if (pid == BgWriterPID)
 		BgWriterPID = 0;
@@ -3750,6 +3829,7 @@ PostmasterStateMachine(void)
 		if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
 			StartupPID == 0 &&
 			WalReceiverPID == 0 &&
+			BaseBackupPID == 0 &&
 			BgWriterPID == 0 &&
 			(CheckpointerPID == 0 ||
 			 (!FatalError && Shutdown < ImmediateShutdown)) &&
@@ -3844,6 +3924,7 @@ PostmasterStateMachine(void)
 			/* These other guys should be dead already */
 			Assert(StartupPID == 0);
 			Assert(WalReceiverPID == 0);
+			Assert(BaseBackupPID == 0);
 			Assert(BgWriterPID == 0);
 			Assert(CheckpointerPID == 0);
 			Assert(WalWriterPID == 0);
@@ -4027,6 +4108,8 @@ TerminateChildren(int signal)
 		if (signal == SIGQUIT || signal == SIGKILL)
 			StartupStatus = STARTUP_SIGNALED;
 	}
+	if (BaseBackupPID != 0)
+		signal_child(BgWriterPID, signal);
 	if (BgWriterPID != 0)
 		signal_child(BgWriterPID, signal);
 	if (CheckpointerPID != 0)
@@ -5400,6 +5483,10 @@ StartChildProcess(AuxProcType type)
 				ereport(LOG,
 						(errmsg("could not fork startup process: %m")));
 				break;
+			case BaseBackupProcess:
+				ereport(LOG,
+						(errmsg("could not fork base backup process: %m")));
+				break;
 			case BgWriterProcess:
 				ereport(LOG,
 						(errmsg("could not fork background writer process: %m")));
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 562b55fbaa..748093100a 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -14,7 +14,7 @@ include $(top_builddir)/src/Makefile.global
 
 override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
 
-OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o \
+OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o basebackup_client.o \
 	repl_gram.o slot.o slotfuncs.o syncrep.o syncrep_gram.o
 
 SUBDIRS = logical
diff --git a/src/backend/replication/basebackup_client.c b/src/backend/replication/basebackup_client.c
new file mode 100644
index 0000000000..4a75a6091f
--- /dev/null
+++ b/src/backend/replication/basebackup_client.c
@@ -0,0 +1,33 @@
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "replication/basebackup_client.h"
+#include "replication/walreceiver.h"
+#include "storage/ipc.h"
+#include "utils/guc.h"
+
+void
+BaseBackupMain(void)
+{
+	WalReceiverConn *wrconn = NULL;
+	char	   *err;
+
+	/* Load the libpq-specific functions */
+	load_file("libpqwalreceiver", false);
+	if (WalReceiverFunctions == NULL)
+		elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+
+	/* Establish the connection to the primary */
+	wrconn = walrcv_connect(PrimaryConnInfo, false, cluster_name[0] ? cluster_name : "basebackup", &err);
+	if (!wrconn)
+		ereport(ERROR,
+				(errmsg("could not connect to the primary server: %s", err)));
+
+	walrcv_base_backup(wrconn);
+
+	walrcv_disconnect(wrconn);
+
+	elog(LOG, "base backup completed");
+	proc_exit(0);
+}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6eba08a920..b72d849fde 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -17,8 +17,10 @@
 #include "postgres.h"
 
 #include <unistd.h>
+#include <sys/stat.h>
 #include <sys/time.h>
 
+#include "common/string.h"
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
 #include "access/xlog.h"
@@ -27,6 +29,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "pgtar.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
@@ -61,6 +64,7 @@ static int	libpqrcv_server_version(WalReceiverConn *conn);
 static void libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
 											 TimeLineID tli, char **filename,
 											 char **content, int *len);
+static void libpqrcv_base_backup(WalReceiverConn *conn);
 static bool libpqrcv_startstreaming(WalReceiverConn *conn,
 									const WalRcvStreamOptions *options);
 static void libpqrcv_endstreaming(WalReceiverConn *conn,
@@ -88,6 +92,7 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = {
 	libpqrcv_identify_system,
 	libpqrcv_server_version,
 	libpqrcv_readtimelinehistoryfile,
+	libpqrcv_base_backup,
 	libpqrcv_startstreaming,
 	libpqrcv_endstreaming,
 	libpqrcv_receive,
@@ -356,6 +361,309 @@ libpqrcv_server_version(WalReceiverConn *conn)
 	return PQserverVersion(conn->streamConn);
 }
 
+/*
+ * XXX copied from pg_basebackup.c
+ */
+static void
+ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
+{
+	char		current_path[MAXPGPATH];
+	char		filename[MAXPGPATH];
+	pgoff_t		current_len_left = 0;
+	int			current_padding = 0;
+	char	   *copybuf = NULL;
+	FILE	   *file = NULL;
+
+	strlcpy(current_path, DataDir, sizeof(current_path));
+
+	/*
+	 * Get the COPY data
+	 */
+	res = PQgetResult(conn);
+	if (PQresultStatus(res) != PGRES_COPY_OUT)
+		ereport(ERROR,
+				(errmsg("could not get COPY data stream: %s",
+						PQerrorMessage(conn))));
+
+	while (1)
+	{
+		int			r;
+
+		if (copybuf != NULL)
+		{
+			PQfreemem(copybuf);
+			copybuf = NULL;
+		}
+
+		r = PQgetCopyData(conn, &copybuf, 0);
+
+		if (r == -1)
+		{
+			/*
+			 * End of chunk
+			 */
+			if (file)
+				fclose(file);
+
+			break;
+		}
+		else if (r == -2)
+		{
+			ereport(ERROR,
+					(errmsg("could not read COPY data: %s",
+							PQerrorMessage(conn))));
+		}
+
+		if (file == NULL)
+		{
+			int			filemode;
+
+			/*
+			 * No current file, so this must be the header for a new file
+			 */
+			if (r != 512)
+				ereport(ERROR,
+						(errmsg("invalid tar block header size: %d", r)));
+
+			current_len_left = read_tar_number(&copybuf[124], 12);
+
+			/* Set permissions on the file */
+			filemode = read_tar_number(&copybuf[100], 8);
+
+			/*
+			 * All files are padded up to 512 bytes
+			 */
+			current_padding =
+				((current_len_left + 511) & ~511) - current_len_left;
+
+			/*
+			 * First part of header is zero terminated filename
+			 */
+			snprintf(filename, sizeof(filename), "%s/%s", current_path,
+					 copybuf);
+			if (filename[strlen(filename) - 1] == '/')
+			{
+				/*
+				 * Ends in a slash means directory or symlink to directory
+				 */
+				if (copybuf[156] == '5')
+				{
+					/*
+					 * Directory
+					 */
+					filename[strlen(filename) - 1] = '\0';	/* Remove trailing slash */
+					if (MakePGDirectory(filename) != 0)
+					{
+						if (errno != EEXIST)
+						{
+							elog(ERROR, "could not create directory \"%s\": %m",
+								 filename);
+						}
+					}
+#ifndef WIN32
+					if (chmod(filename, (mode_t) filemode))
+						elog(ERROR, "could not set permissions on directory \"%s\": %m",
+							 filename);
+#endif
+				}
+				else if (copybuf[156] == '2')
+				{
+					/*
+					 * Symbolic link
+					 *
+					 * It's most likely a link in pg_tblspc directory, to the
+					 * location of a tablespace. Apply any tablespace mapping
+					 * given on the command line (--tablespace-mapping). (We
+					 * blindly apply the mapping without checking that the
+					 * link really is inside pg_tblspc. We don't expect there
+					 * to be other symlinks in a data directory, but if there
+					 * are, you can call it an undocumented feature that you
+					 * can map them too.)
+					 */
+#ifdef TODO
+					filename[strlen(filename) - 1] = '\0';	/* Remove trailing slash */
+
+					mapped_tblspc_path = get_tablespace_mapping(&copybuf[157]);
+					if (symlink(mapped_tblspc_path, filename) != 0)
+					{
+						pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+									 filename, mapped_tblspc_path);
+						exit(1);
+					}
+#endif
+				}
+				else
+				{
+					elog(ERROR, "unrecognized link indicator \"%c\"",
+						 copybuf[156]);
+				}
+				continue;		/* directory or link handled */
+			}
+
+			/*
+			 * regular file
+			 */
+			file = fopen(filename, "wb");
+			if (!file)
+				elog(ERROR, "could not create file \"%s\": %m", filename);
+
+#ifndef WIN32
+			if (chmod(filename, (mode_t) filemode))
+				elog(ERROR, "could not set permissions on file \"%s\": %m",
+					 filename);
+#endif
+
+			if (current_len_left == 0)
+			{
+				/*
+				 * Done with this file, next one will be a new tar header
+				 */
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+		}						/* new file */
+		else
+		{
+			/*
+			 * Continuing blocks in existing file
+			 */
+			if (current_len_left == 0 && r == current_padding)
+			{
+				/*
+				 * Received the padding block for this file, ignore it and
+				 * close the file, then move on to the next tar header.
+				 */
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+
+			if (fwrite(copybuf, r, 1, file) != 1)
+				elog(ERROR, "could not write to file \"%s\": %m", filename);
+
+			current_len_left -= r;
+			if (current_len_left == 0 && current_padding == 0)
+			{
+				/*
+				 * Received the last block, and there is no padding to be
+				 * expected. Close the file and move on to the next tar
+				 * header.
+				 */
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+		}						/* continuing data in existing file */
+	}							/* loop over all data blocks */
+
+	if (file != NULL)
+		elog(ERROR, "COPY stream ended before last file was finished");
+
+	if (copybuf != NULL)
+		PQfreemem(copybuf);
+}
+
+/*
+ * Make base backup from remote and write to local disk.
+ */
+static void
+libpqrcv_base_backup(WalReceiverConn *conn)
+{
+	PGresult   *res;
+
+	elog(LOG, "initiating base backup, waiting for remote checkpoint to complete");
+
+	if (PQsendQuery(conn->streamConn, "BASE_BACKUP") == 0)
+		ereport(ERROR,
+				(errmsg("could not start base backup on remote server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+
+	/*
+	 * First result set: WAL start position and timeline ID; we skip it.
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not start base backup on remote server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+
+	ereport(LOG,
+			(errmsg("remote checkpoint completed")));
+
+	PQclear(res);
+
+	/*
+	 * Second result set: tablespace information
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not get backup header: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) < 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("no data returned from server")));
+	}
+
+	/*
+	 * Start receiving chunks
+	 */
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		ReceiveAndUnpackTarFile(conn->streamConn, res, i);
+	}
+
+	PQclear(res);
+
+	/*
+	 * Final result set: WAL end position and timeline ID
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not get write-ahead log end position from server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) != 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("no write-ahead log end position returned from server")));
+	}
+	PQclear(res);
+
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+#ifdef TODO
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (sqlstate &&
+			strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+		{
+			elog(ERROR, "checksum error occurred");
+		}
+		else
+#endif
+		{
+			elog(ERROR, "final receive failed: %s",
+				 pchomp(PQerrorMessage(conn->streamConn)));
+		}
+	}
+	PQclear(res);
+}
+
 /*
  * Start streaming WAL data from given streaming options.
  *
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ad5cd4194a..4c4ec78095 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -136,6 +136,7 @@ static char *pwfilename = NULL;
 static char *superuser_password = NULL;
 static const char *authmethodhost = NULL;
 static const char *authmethodlocal = NULL;
+static bool minimal = false;
 static bool debug = false;
 static bool noclean = false;
 static bool do_sync = true;
@@ -2962,6 +2963,22 @@ initialize_data_directory(void)
 	/* Now create all the text config files */
 	setup_config();
 
+	/*
+	 * If minimal data directory requested, write basebackup.signal, and then
+	 * we are done here.
+	 */
+	if (minimal)
+	{
+		char	   *path;
+		char	   *lines[1] = {NULL};
+
+		path = psprintf("%s/basebackup.signal", pg_data);
+		writefile(path, lines);
+		free(path);
+
+		return;
+	}
+
 	/* Bootstrap template1 */
 	bootstrap_template1();
 
@@ -3053,6 +3070,7 @@ main(int argc, char *argv[])
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
+		{"minimal", no_argument, NULL, 'm'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3094,7 +3112,7 @@ main(int argc, char *argv[])
 
 	/* process command-line options */
 
-	while ((c = getopt_long(argc, argv, "dD:E:kL:nNU:WA:sST:X:g", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "dD:E:kL:mnNU:WA:sST:X:g", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -3149,6 +3167,9 @@ main(int argc, char *argv[])
 			case 'L':
 				share_path = pg_strdup(optarg);
 				break;
+			case 'm':
+				minimal = true;
+				break;
 			case 1:
 				locale = pg_strdup(optarg);
 				break;
@@ -3361,9 +3382,19 @@ main(int argc, char *argv[])
 	/* translator: This is a placeholder in a shell command. */
 	appendPQExpBuffer(start_db_cmd, " -l %s start", _("logfile"));
 
-	printf(_("\nSuccess. You can now start the database server using:\n\n"
-			 "    %s\n\n"),
-		   start_db_cmd->data);
+	if (!minimal)
+	{
+		printf(_("\nSuccess. You can now start the database server using:\n\n"
+				 "    %s\n\n"),
+			   start_db_cmd->data);
+	}
+	else
+	{
+		printf(_("\nSo far so good. Now configure the replication connection in\n"
+				 "postgresql.conf, and then start the database server using:\n\n"
+				 "    %s\n\n"),
+			   start_db_cmd->data);
+	}
 
 	destroyPQExpBuffer(start_db_cmd);
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 237f4e0350..5f081923b2 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,9 @@ extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
 extern void BootStrapXLOG(void);
 extern void LocalProcessControlFile(bool reset);
+#ifndef FRONTEND
+extern void ReadControlFile(void);
+#endif
 extern void StartupXLOG(void);
 extern void ShutdownXLOG(int code, Datum arg);
 extern void InitXLOGAccess(void);
@@ -354,6 +357,7 @@ extern void do_pg_abort_backup(void);
 extern SessionBackupState get_backup_status(void);
 
 /* File path names (all relative to $PGDATA) */
+#define BASEBACKUP_SIGNAL_FILE	"basebackup.signal"
 #define RECOVERY_SIGNAL_FILE	"recovery.signal"
 #define STANDBY_SIGNAL_FILE		"standby.signal"
 #define BACKUP_LABEL_FILE		"backup_label"
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 61a24c2e3c..1f40d33290 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -398,6 +398,7 @@ typedef enum
 	CheckerProcess = 0,
 	BootstrapProcess,
 	StartupProcess,
+	BaseBackupProcess,
 	BgWriterProcess,
 	CheckpointerProcess,
 	WalWriterProcess,
@@ -410,6 +411,7 @@ extern AuxProcType MyAuxProcType;
 
 #define AmBootstrapProcess()		(MyAuxProcType == BootstrapProcess)
 #define AmStartupProcess()			(MyAuxProcType == StartupProcess)
+#define AmBaseBackupProcess()		(MyAuxProcType == BaseBackupProcess)
 #define AmBackgroundWriterProcess() (MyAuxProcType == BgWriterProcess)
 #define AmCheckpointerProcess()		(MyAuxProcType == CheckpointerProcess)
 #define AmWalWriterProcess()		(MyAuxProcType == WalWriterProcess)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a188..e5c385306e 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -722,6 +722,7 @@ typedef enum BackendType
 	B_AUTOVAC_LAUNCHER,
 	B_AUTOVAC_WORKER,
 	B_BACKEND,
+	B_BASE_BACKUP,
 	B_BG_WORKER,
 	B_BG_WRITER,
 	B_CHECKPOINTER,
diff --git a/src/include/replication/basebackup_client.h b/src/include/replication/basebackup_client.h
new file mode 100644
index 0000000000..dcd10b96f2
--- /dev/null
+++ b/src/include/replication/basebackup_client.h
@@ -0,0 +1 @@
+extern void BaseBackupMain(void);
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 86a8130051..bf895b1494 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -215,6 +215,7 @@ typedef void (*walrcv_readtimelinehistoryfile_fn) (WalReceiverConn *conn,
 												   TimeLineID tli,
 												   char **filename,
 												   char **content, int *size);
+typedef void (*walrcv_base_backup_fn) (WalReceiverConn *conn);
 typedef bool (*walrcv_startstreaming_fn) (WalReceiverConn *conn,
 										  const WalRcvStreamOptions *options);
 typedef void (*walrcv_endstreaming_fn) (WalReceiverConn *conn,
@@ -242,6 +243,7 @@ typedef struct WalReceiverFunctionsType
 	walrcv_identify_system_fn walrcv_identify_system;
 	walrcv_server_version_fn walrcv_server_version;
 	walrcv_readtimelinehistoryfile_fn walrcv_readtimelinehistoryfile;
+	walrcv_base_backup_fn walrcv_base_backup;
 	walrcv_startstreaming_fn walrcv_startstreaming;
 	walrcv_endstreaming_fn walrcv_endstreaming;
 	walrcv_receive_fn walrcv_receive;
@@ -267,6 +269,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
 	WalReceiverFunctions->walrcv_server_version(conn)
 #define walrcv_readtimelinehistoryfile(conn, tli, filename, content, size) \
 	WalReceiverFunctions->walrcv_readtimelinehistoryfile(conn, tli, filename, content, size)
+#define walrcv_base_backup(conn) \
+	WalReceiverFunctions->walrcv_base_backup(conn)
 #define walrcv_startstreaming(conn, options) \
 	WalReceiverFunctions->walrcv_startstreaming(conn, options)
 #define walrcv_endstreaming(conn, next_tli) \

base-commit: c0faa727507ed34db0d02769d21bbaaf9605e2e4
-- 
2.22.0

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#1)
Re: base backup client as auxiliary backend process

On Sun, Jun 30, 2019 at 8:05 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

Attached is a very hackish patch to implement this. It works like this:

# (assuming you have a primary already running somewhere)
initdb -D data2 --minimal
$EDITOR data2/postgresql.conf # set primary_conninfo
pg_ctl -D data2 start

+1, very nice. How about --replica?

FIY Windows doesn't like your patch:

src/backend/postmaster/postmaster.c(1396): warning C4013: 'sleep'
undefined; assuming extern returning int
[C:\projects\postgresql\postgres.vcxproj]

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.45930

--
Thomas Munro
https://enterprisedb.com

In reply to: Thomas Munro (#2)
Re: base backup client as auxiliary backend process

Hello

 Attached is a very hackish patch to implement this. It works like this:

     # (assuming you have a primary already running somewhere)
     initdb -D data2 --minimal
     $EDITOR data2/postgresql.conf # set primary_conninfo
     pg_ctl -D data2 start

+1, very nice. How about --replica?

+1

Also not works with -DEXEC_BACKEND for me.

There is also the question what you do
if the base backup fails halfway through. Currently you probably need
to delete the whole data directory and start again with initdb. Better
might be a way to start again and overwrite any existing files, but that
can clearly also be dangerous.

I think the need for delete directory and rerun initdb is better than overwrite files.

- we need check major version. Basebackup can works with different versions, but would be useless to copying cluster which we can not run
- basebackup silently overwrite configs (pg_hba.conf, postgresql.conf, postgresql.auto.conf) in $PGDATA. This is ok for pg_basebackup but not for backend process
- I think we need start walreceiver. At best, without interruption during startup replay (if possible)

XXX Is there a use for
* switching into (non-standby) recovery here?

I think not.

regards, Sergei

#4Euler Taveira
euler@timbira.com.br
In reply to: Peter Eisentraut (#1)
Re: base backup client as auxiliary backend process

Em sáb, 29 de jun de 2019 às 17:05, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> escreveu:

Setting up a standby instance is still quite complicated. You need to
run pg_basebackup with all the right options. You need to make sure
Attached is a very hackish patch to implement this. It works like this:

# (assuming you have a primary already running somewhere)
initdb -D data2 --minimal
$EDITOR data2/postgresql.conf # set primary_conninfo
pg_ctl -D data2 start

Great! The main complaints about pg_basebackup usage in TB clusters
are: (a) it can't be restarted and (b) it can't be parallelized.
AFAICS your proposal doesn't solve them. It would be nice if it can be
solved in future releases (using rsync or another in-house tool is as
fragile as using pg_basebackup).

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#5Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#1)
Re: base backup client as auxiliary backend process

On Sat, Jun 29, 2019 at 4:05 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

My idea is that the postmaster can launch a base backup worker, wait
till it's done, then proceed with the rest of the startup. initdb gets
a special option to create a "minimal" data directory with only a few
files, directories, and the usual configuration files.

Why do we even have to do that much? Can we remove the need for an
initdb altogether?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: base backup client as auxiliary backend process

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jun 29, 2019 at 4:05 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

My idea is that the postmaster can launch a base backup worker, wait
till it's done, then proceed with the rest of the startup. initdb gets
a special option to create a "minimal" data directory with only a few
files, directories, and the usual configuration files.

Why do we even have to do that much? Can we remove the need for an
initdb altogether?

Gotta have config files in place already, no?

regards, tom lane

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: base backup client as auxiliary backend process

On Thu, Jul 11, 2019 at 10:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gotta have config files in place already, no?

Why?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: base backup client as auxiliary backend process

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jul 11, 2019 at 10:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gotta have config files in place already, no?

Why?

How's the postmaster to know that it's supposed to run pg_basebackup
rather than start normally? Where will it get the connection information?
Seem to need configuration data *somewhere*.

regards, tom lane

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: base backup client as auxiliary backend process

On Thu, Jul 11, 2019 at 4:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jul 11, 2019 at 10:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gotta have config files in place already, no?

Why?

How's the postmaster to know that it's supposed to run pg_basebackup
rather than start normally? Where will it get the connection information?
Seem to need configuration data *somewhere*.

Maybe just:

./postgres --replica='connstr' -D createme

?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Robert Haas (#9)
Re: base backup client as auxiliary backend process

On 2019-07-11 22:20, Robert Haas wrote:

On Thu, Jul 11, 2019 at 4:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jul 11, 2019 at 10:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gotta have config files in place already, no?

Why?

How's the postmaster to know that it's supposed to run pg_basebackup
rather than start normally? Where will it get the connection information?
Seem to need configuration data *somewhere*.

Maybe just:

./postgres --replica='connstr' -D createme

What you are describing is of course theoretically possible, but it
doesn't really fit with how existing tooling normally deals with this,
which is one of the problems I want to address.

initdb has all the knowledge of how to create the data *directory*, how
to set permissions, deal with existing and non-empty directories, how to
set up a separate WAL directory. Packaged environments might wrap this
further by using the correct OS users, creating the directory first as
root, then changing owner, etc. This is all logic that we can reuse and
probably don't want to duplicate elsewhere.

Furthermore, we have for the longest time encouraged packagers *not* to
create data directories automatically when a service is started, because
this might store data in places that will be hidden by a later mount.
Keeping this property requires making the initialization of the data
directory a separate step somehow. That step doesn't have to be called
"initdb", it could be a new "pg_mkdirs", but for the reasons described
above, this would create a fair mount of code duplication and not really
gain anything.

Finally, many installations want to have the configuration files under
control of some centralized configuration management system. The way
those want to work is usually: (1) create file system structures, (2)
install configuration files from some templates, (3) start service.
This is of course how setting up a primary works. Having such a system
set up a standby is currently seemingly impossible in an elegant way,
because the order and timing of how things work is all wrong. My
proposed change would fix this because things would be set up in the
same three-step process. (As has been pointed out, this would require
that the base backup does not copy over the configuration files from the
remote, which my patch currently doesn't do correctly.)

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#10)
Re: base backup client as auxiliary backend process

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 2019-07-11 22:20, Robert Haas wrote:

On Thu, Jul 11, 2019 at 4:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

How's the postmaster to know that it's supposed to run pg_basebackup
rather than start normally? Where will it get the connection information?
Seem to need configuration data *somewhere*.

Maybe just:

./postgres --replica='connstr' -D createme

What you are describing is of course theoretically possible, but it
doesn't really fit with how existing tooling normally deals with this,
which is one of the problems I want to address.

I don't care for Robert's suggestion for a different reason: it presumes
that all data that can possibly be needed to set up a new replica is
feasible to cram onto the postmaster command line, and always will be.

An immediate counterexample is that's not where you want to be specifying
the password for a replication connection. But even without that sort of
security issue, this approach won't scale. It also does not work even a
little bit nicely for tooling in which the postmaster is not supposed to
be started directly by the user. (Which is to say, all postgres-service
tooling everywhere.)

regards, tom lane

#12Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Peter Eisentraut (#1)
Re: base backup client as auxiliary backend process

Hello.

At Sat, 29 Jun 2019 22:05:22 +0200, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in <61b8d18d-c922-ac99-b990-a31ba63cdcbb@2ndquadrant.com>

Setting up a standby instance is still quite complicated. You need to
run pg_basebackup with all the right options. You need to make sure
pg_basebackup has the right permissions for the target directories. The
created instance has to be integrated into the operating system's start
scripts. There is this slightly awkward business of the --recovery-conf
option and how it interacts with other features. And you should
probably run pg_basebackup under screen. And then how do you get
notified when it's done. And when it's done you have to log back in and
finish up. Too many steps.

My idea is that the postmaster can launch a base backup worker, wait
till it's done, then proceed with the rest of the startup. initdb gets
a special option to create a "minimal" data directory with only a few
files, directories, and the usual configuration files. Then you create
a $PGDATA/basebackup.signal, start the postmaster as normal. It sees
the signal file, launches an auxiliary process that runs the base
backup, then proceeds with normal startup in standby mode.

This makes a whole bunch of things much nicer: The connection
information for where to get the base backup from comes from
postgresql.conf, so you only need to specify it in one place.
pg_basebackup is completely out of the picture; no need to deal with
command-line options, --recovery-conf, screen, monitoring for
completion, etc. If something fails, the base backup process can
automatically be restarted (maybe). Operating system integration is
much easier: You only call initdb and then pg_ctl or postgres, as you
are already doing. Automated deployment systems don't need to wait for
pg_basebackup to finish: You only call initdb, then start the server,
and then you're done -- waiting for the base backup to finish can be
done by the regular monitoring system.

Attached is a very hackish patch to implement this. It works like this:

# (assuming you have a primary already running somewhere)
initdb -D data2 --minimal
$EDITOR data2/postgresql.conf # set primary_conninfo
pg_ctl -D data2 start

Nice idea!

(Curious side note: If you don’t set primary_conninfo in these steps,
then libpq defaults apply, so the default behavior might end up being
that a given instance attempts to replicate from itself.)

We may be able to have different setting for primary and replica
for other settings if we could have sections in the configuration
file, defining, say, [replica] section gives us more frexibility.
Though it is a bit far from the topic, dedicate command-line
configuration editor that can find and replace specified
parameter would elimite the sublte editing step. It is annoying
that finding specific separator in conf file then trim then add
new contnet.

It works for basic cases. It's missing tablespace support, proper
fsyncing, progress reporting, probably more. Those would be pretty

While catching up master, connections to replica are once
accepted then result in FATAL error. I now and then receive
inquiries for that. With the new feature, we get FATAL also while
basebackup phase. That can let users fear more frequently.

straightforward I think. The interesting bit is the delicate ordering
of the postmaster startup: Normally, the pg_control file is read quite
early, but if starting from a minimal data directory, we need to wait
until the base backup is done. There is also the question what you do
if the base backup fails halfway through. Currently you probably need
to delete the whole data directory and start again with initdb. Better
might be a way to start again and overwrite any existing files, but that
can clearly also be dangerous. All this needs some careful analysis,
but I think it's doable.

Any thoughts?

Just overwriting won't work since files removed just before
retrying are left alon in replica. I think it should work
similarly to initdb, that is, removing all then retrying.

It's easy if we don't consider reducing startup time. Just do
initdb then start exising postmaster internally. But melding them
together makes room for reducing the startup time. We even could
redirect read-only queries to master while setting up the server.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#13Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Peter Eisentraut (#1)
1 attachment(s)
Re: base backup client as auxiliary backend process

Attached is a very hackish patch to implement this. It works like this:

# (assuming you have a primary already running somewhere)
initdb -D data2 --replica
$EDITOR data2/postgresql.conf # set primary_conninfo
pg_ctl -D data2 start

Attached is an updated patch for this. I have changed the initdb option
name per suggestion. The WAL receiver is now started concurrently with
the base backup. There is progress reporting (ps display), fsyncing.
Configuration files are not copied anymore. There is a simple test
suite. Tablespace support is still missing, but it would be
straightforward.

It's still all to be considered experimental, but it's taking shape and
working pretty well.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-Base-backup-client-as-auxiliary-backend-process.patchtext/plain; charset=UTF-8; name=v2-0001-Base-backup-client-as-auxiliary-backend-process.patch; x-mac-creator=0; x-mac-type=0Download
From aae4640acbb2a1ae4ff5d2e80abce0798799fe73 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 30 Aug 2019 20:42:51 +0200
Subject: [PATCH v2] Base backup client as auxiliary backend process

Discussion: https://www.postgresql.org/message-id/flat/61b8d18d-c922-ac99-b990-a31ba63cdcbb@2ndquadrant.com
---
 doc/src/sgml/protocol.sgml                    |  12 +-
 doc/src/sgml/ref/initdb.sgml                  |  17 +
 src/backend/access/transam/xlog.c             |  84 ++--
 src/backend/bootstrap/bootstrap.c             |   9 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           | 114 ++++-
 src/backend/replication/basebackup.c          |  68 +++
 .../libpqwalreceiver/libpqwalreceiver.c       | 419 ++++++++++++++++++
 src/backend/replication/repl_gram.y           |   9 +-
 src/backend/replication/repl_scanner.l        |   1 +
 src/bin/initdb/initdb.c                       |  39 +-
 src/include/access/xlog.h                     |   6 +
 src/include/miscadmin.h                       |   2 +
 src/include/pgstat.h                          |   1 +
 src/include/replication/basebackup.h          |   2 +
 src/include/replication/walreceiver.h         |   4 +
 src/include/utils/guc.h                       |   2 +-
 src/test/recovery/t/018_basebackup.pl         |  29 ++
 18 files changed, 768 insertions(+), 56 deletions(-)
 create mode 100644 src/test/recovery/t/018_basebackup.pl

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b20f1690a7..81f43b5c00 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2466,7 +2466,7 @@ <title>Streaming Replication Protocol</title>
   </varlistentry>
 
   <varlistentry>
-    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ]
+    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>EXCLUDE_CONF</literal> ]
      <indexterm><primary>BASE_BACKUP</primary></indexterm>
     </term>
     <listitem>
@@ -2576,6 +2576,16 @@ <title>Streaming Replication Protocol</title>
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry>
+        <term><literal>EXCLUDE_CONF</literal></term>
+        <listitem>
+         <para>
+          Do not copy configuration files, that is, files that end in
+          <filename>.conf</filename>.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist>
      </para>
      <para>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..1261e02d59 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -286,6 +286,23 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-r</option></term>
+      <term><option>--replica</option></term>
+      <listitem>
+       <para>
+        Initialize a data directory for a physical replication replica.  The
+        data directory will not be initialized with a full database system,
+        but will instead only contain a minimal set of files.  A server that
+        is started on this data directory will first fetch a base backup and
+        then switch to standby mode.  The connection information for the base
+        backup has to be configured by setting <xref
+        linkend="guc-primary-conninfo"/>, and other parameters as desired,
+        before the server is started.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-S</option></term>
       <term><option>--sync-only</option></term>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e651a841bb..7ab8ab45f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -905,8 +905,6 @@ static void CheckRecoveryConsistency(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
-static void WriteControlFile(void);
-static void ReadControlFile(void);
 static char *str_time(pg_time_t tnow);
 static bool CheckForStandbyTrigger(void);
 
@@ -4481,7 +4479,7 @@ rescanLatestTimeLine(void)
  * ReadControlFile() verifies they are correct.  We could split out the
  * I/O and compatibility-check functions, but there seems no need currently.
  */
-static void
+void
 WriteControlFile(void)
 {
 	int			fd;
@@ -4573,7 +4571,7 @@ WriteControlFile(void)
 						XLOG_CONTROL_FILE)));
 }
 
-static void
+void
 ReadControlFile(void)
 {
 	pg_crc32c	crc;
@@ -5079,6 +5077,41 @@ XLOGShmemInit(void)
 	InitSharedLatch(&XLogCtl->recoveryWakeupLatch);
 }
 
+void
+InitControlFile(uint64 sysidentifier)
+{
+	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
+
+	/*
+	 * Generate a random nonce. This is used for authentication requests that
+	 * will fail because the user does not exist. The nonce is used to create
+	 * a genuine-looking password challenge for the non-existent user, in lieu
+	 * of an actual stored password.
+	 */
+	if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
+		ereport(PANIC,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate secret authorization token")));
+
+	memset(ControlFile, 0, sizeof(ControlFileData));
+	/* Initialize pg_control status fields */
+	ControlFile->system_identifier = sysidentifier;
+	memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
+	ControlFile->state = DB_SHUTDOWNED;
+	ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
+
+	/* Set important parameter values for use when replaying WAL */
+	ControlFile->MaxConnections = MaxConnections;
+	ControlFile->max_worker_processes = max_worker_processes;
+	ControlFile->max_wal_senders = max_wal_senders;
+	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_locks_per_xact = max_locks_per_xact;
+	ControlFile->wal_level = wal_level;
+	ControlFile->wal_log_hints = wal_log_hints;
+	ControlFile->track_commit_timestamp = track_commit_timestamp;
+	ControlFile->data_checksum_version = bootstrap_data_checksum_version;
+}
+
 /*
  * This func must be called ONCE on system install.  It creates pg_control
  * and the initial XLOG segment.
@@ -5094,7 +5127,6 @@ BootStrapXLOG(void)
 	char	   *recptr;
 	bool		use_existent;
 	uint64		sysidentifier;
-	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
 	struct timeval tv;
 	pg_crc32c	crc;
 
@@ -5115,17 +5147,6 @@ BootStrapXLOG(void)
 	sysidentifier |= ((uint64) tv.tv_usec) << 12;
 	sysidentifier |= getpid() & 0xFFF;
 
-	/*
-	 * Generate a random nonce. This is used for authentication requests that
-	 * will fail because the user does not exist. The nonce is used to create
-	 * a genuine-looking password challenge for the non-existent user, in lieu
-	 * of an actual stored password.
-	 */
-	if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
-		ereport(PANIC,
-				(errcode(ERRCODE_INTERNAL_ERROR),
-				 errmsg("could not generate secret authorization token")));
-
 	/* First timeline ID is always 1 */
 	ThisTimeLineID = 1;
 
@@ -5233,30 +5254,12 @@ BootStrapXLOG(void)
 	openLogFile = -1;
 
 	/* Now create pg_control */
-
-	memset(ControlFile, 0, sizeof(ControlFileData));
-	/* Initialize pg_control status fields */
-	ControlFile->system_identifier = sysidentifier;
-	memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
-	ControlFile->state = DB_SHUTDOWNED;
+	InitControlFile(sysidentifier);
 	ControlFile->time = checkPoint.time;
 	ControlFile->checkPoint = checkPoint.redo;
 	ControlFile->checkPointCopy = checkPoint;
-	ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
-
-	/* Set important parameter values for use when replaying WAL */
-	ControlFile->MaxConnections = MaxConnections;
-	ControlFile->max_worker_processes = max_worker_processes;
-	ControlFile->max_wal_senders = max_wal_senders;
-	ControlFile->max_prepared_xacts = max_prepared_xacts;
-	ControlFile->max_locks_per_xact = max_locks_per_xact;
-	ControlFile->wal_level = wal_level;
-	ControlFile->wal_log_hints = wal_log_hints;
-	ControlFile->track_commit_timestamp = track_commit_timestamp;
-	ControlFile->data_checksum_version = bootstrap_data_checksum_version;
 
 	/* some additional ControlFile fields are set in WriteControlFile() */
-
 	WriteControlFile();
 
 	/* Bootstrap the commit log, too */
@@ -6225,13 +6228,11 @@ StartupXLOG(void)
 	CurrentResourceOwner = AuxProcessResourceOwner;
 
 	/*
-	 * Verify XLOG status looks valid.
+	 * Check that contents look valid.
 	 */
-	if (ControlFile->state < DB_SHUTDOWNED ||
-		ControlFile->state > DB_IN_PRODUCTION ||
-		!XRecOffIsValid(ControlFile->checkPoint))
+	if (!XRecOffIsValid(ControlFile->checkPoint))
 		ereport(FATAL,
-				(errmsg("control file contains invalid data")));
+				(errmsg("control file contains invalid checkpoint location")));
 
 	if (ControlFile->state == DB_SHUTDOWNED)
 	{
@@ -6264,6 +6265,9 @@ StartupXLOG(void)
 		ereport(LOG,
 				(errmsg("database system was interrupted; last known up at %s",
 						str_time(ControlFile->time))));
+	else
+		ereport(FATAL,
+				(errmsg("control file contains invalid database cluster state")));
 
 	/* This is just to allow attaching to startup process with a debugger */
 #ifdef XLOG_REPLAY_DELAY
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 9238fbe98d..a8b1ffd08a 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -36,6 +36,7 @@
 #include "postmaster/bgwriter.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
+#include "replication/basebackup.h"
 #include "replication/walreceiver.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
@@ -326,6 +327,9 @@ AuxiliaryProcessMain(int argc, char *argv[])
 			case StartupProcess:
 				statmsg = pgstat_get_backend_desc(B_STARTUP);
 				break;
+			case BaseBackupProcess:
+				statmsg = pgstat_get_backend_desc(B_BASE_BACKUP);
+				break;
 			case BgWriterProcess:
 				statmsg = pgstat_get_backend_desc(B_BG_WRITER);
 				break;
@@ -451,6 +455,11 @@ AuxiliaryProcessMain(int argc, char *argv[])
 			StartupProcessMain();
 			proc_exit(1);		/* should never return */
 
+		case BaseBackupProcess:
+			/* don't set signals, basebackup has its own agenda */
+			BaseBackupMain();
+			proc_exit(1);		/* should never return */
+
 		case BgWriterProcess:
 			/* don't set signals, bgwriter has its own agenda */
 			BackgroundWriterMain();
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d362e7f7d7..79465333bc 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2934,6 +2934,9 @@ pgstat_bestart(void)
 			case StartupProcess:
 				lbeentry.st_backendType = B_STARTUP;
 				break;
+			case BaseBackupProcess:
+				lbeentry.st_backendType = B_BASE_BACKUP;
+				break;
 			case BgWriterProcess:
 				lbeentry.st_backendType = B_BG_WRITER;
 				break;
@@ -4289,6 +4292,9 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_BG_WORKER:
 			backendDesc = "background worker";
 			break;
+		case B_BASE_BACKUP:
+			backendDesc = "base backup";
+			break;
 		case B_BG_WRITER:
 			backendDesc = "background writer";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 62dc93d56b..3096e6ef33 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -116,6 +116,7 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "replication/logicallauncher.h"
+#include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -248,6 +249,7 @@ bool		restart_after_crash = true;
 
 /* PIDs of special child processes; 0 when not running */
 static pid_t StartupPID = 0,
+			BaseBackupPID = 0,
 			BgWriterPID = 0,
 			CheckpointerPID = 0,
 			WalWriterPID = 0,
@@ -539,6 +541,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #endif							/* EXEC_BACKEND */
 
 #define StartupDataBase()		StartChildProcess(StartupProcess)
+#define StartBaseBackup()		StartChildProcess(BaseBackupProcess)
 #define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
 #define StartCheckpointer()		StartChildProcess(CheckpointerProcess)
 #define StartWalWriter()		StartChildProcess(WalWriterProcess)
@@ -572,6 +575,8 @@ PostmasterMain(int argc, char *argv[])
 	bool		listen_addr_saved = false;
 	int			i;
 	char	   *output_config_variable = NULL;
+	struct stat stat_buf;
+	bool		basebackup_signal_file_found = false;
 
 	InitProcessGlobals();
 
@@ -877,12 +882,27 @@ PostmasterMain(int argc, char *argv[])
 	/* Verify that DataDir looks reasonable */
 	checkDataDir();
 
-	/* Check that pg_control exists */
-	checkControlFile();
-
 	/* And switch working directory into it */
 	ChangeToDataDir();
 
+	if (stat(BASEBACKUP_SIGNAL_FILE, &stat_buf) == 0)
+	{
+		int         fd;
+
+		fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY,
+							   S_IRUSR | S_IWUSR);
+		if (fd >= 0)
+		{
+			(void) pg_fsync(fd);
+			close(fd);
+		}
+		basebackup_signal_file_found = true;
+	}
+
+	/* Check that pg_control exists */
+	if (!basebackup_signal_file_found)
+		checkControlFile();
+
 	/*
 	 * Check for invalid combinations of GUC settings.
 	 */
@@ -961,7 +981,8 @@ PostmasterMain(int argc, char *argv[])
 	 * processes will inherit the correct function pointer and not need to
 	 * repeat the test.
 	 */
-	LocalProcessControlFile(false);
+	if (!basebackup_signal_file_found)
+		LocalProcessControlFile(false);
 
 	/*
 	 * Initialize SSL library, if specified.
@@ -1363,6 +1384,39 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
 
+	if (basebackup_signal_file_found)
+	{
+		BaseBackupPID = StartBaseBackup();
+
+		/*
+		 * Wait until done.  Start WAL receiver in the meantime, once base
+		 * backup has received the starting position.
+		 */
+		while (BaseBackupPID != 0)
+		{
+			PG_SETMASK(&UnBlockSig);
+			pg_usleep(1000000L);
+			PG_SETMASK(&BlockSig);
+			MaybeStartWalReceiver();
+		}
+
+		/*
+		 * XXX Shut down WAL receiver.  It will be restarted later in xlog.c,
+		 * and that will complain if it's already running.
+		 */
+		ShutdownWalRcv();
+
+		/*
+		 * Base backup done, now signal standby mode.
+		 */
+		durable_rename(BASEBACKUP_SIGNAL_FILE, STANDBY_SIGNAL_FILE, FATAL);
+
+		/*
+		 * Reread the control file that came in with the base backup.
+		 */
+		ReadControlFile();
+	}
+
 	/*
 	 * We're ready to rock and roll...
 	 */
@@ -2631,6 +2685,8 @@ SIGHUP_handler(SIGNAL_ARGS)
 		SignalChildren(SIGHUP);
 		if (StartupPID != 0)
 			signal_child(StartupPID, SIGHUP);
+		if (BaseBackupPID != 0)
+			signal_child(BaseBackupPID, SIGHUP);
 		if (BgWriterPID != 0)
 			signal_child(BgWriterPID, SIGHUP);
 		if (CheckpointerPID != 0)
@@ -2782,6 +2838,8 @@ pmdie(SIGNAL_ARGS)
 
 			if (StartupPID != 0)
 				signal_child(StartupPID, SIGTERM);
+			if (BaseBackupPID != 0)
+				signal_child(BaseBackupPID, SIGTERM);
 			if (BgWriterPID != 0)
 				signal_child(BgWriterPID, SIGTERM);
 			if (WalReceiverPID != 0)
@@ -3012,6 +3070,23 @@ reaper(SIGNAL_ARGS)
 			continue;
 		}
 
+		/*
+		 * Was it the base backup process?
+		 */
+		if (pid == BaseBackupPID)
+		{
+			BaseBackupPID = 0;
+			if (EXIT_STATUS_0(exitstatus))
+				;
+			else if (EXIT_STATUS_1(exitstatus))
+				ereport(FATAL,
+						(errmsg("base backup failed")));
+			else
+				HandleChildCrash(pid, exitstatus,
+								 _("base backup process"));
+			continue;
+		}
+
 		/*
 		 * Was it the bgwriter?  Normal exit can be ignored; we'll start a new
 		 * one at the next iteration of the postmaster's main loop, if
@@ -3531,6 +3606,18 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 		StartupStatus = STARTUP_SIGNALED;
 	}
 
+	/* Take care of the base backup process too */
+	if (pid == BaseBackupPID)
+		BaseBackupPID = 0;
+	else if (BaseBackupPID != 0 && take_action)
+	{
+		ereport(DEBUG2,
+				(errmsg_internal("sending %s to process %d",
+								 (SendStop ? "SIGSTOP" : "SIGQUIT"),
+								 (int) BaseBackupPID)));
+		signal_child(BaseBackupPID, (SendStop ? SIGSTOP : SIGQUIT));
+	}
+
 	/* Take care of the bgwriter too */
 	if (pid == BgWriterPID)
 		BgWriterPID = 0;
@@ -3765,6 +3852,7 @@ PostmasterStateMachine(void)
 		if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
 			StartupPID == 0 &&
 			WalReceiverPID == 0 &&
+			BaseBackupPID == 0 &&
 			BgWriterPID == 0 &&
 			(CheckpointerPID == 0 ||
 			 (!FatalError && Shutdown < ImmediateShutdown)) &&
@@ -3859,6 +3947,7 @@ PostmasterStateMachine(void)
 			/* These other guys should be dead already */
 			Assert(StartupPID == 0);
 			Assert(WalReceiverPID == 0);
+			Assert(BaseBackupPID == 0);
 			Assert(BgWriterPID == 0);
 			Assert(CheckpointerPID == 0);
 			Assert(WalWriterPID == 0);
@@ -4042,6 +4131,8 @@ TerminateChildren(int signal)
 		if (signal == SIGQUIT || signal == SIGKILL)
 			StartupStatus = STARTUP_SIGNALED;
 	}
+	if (BaseBackupPID != 0)
+		signal_child(BgWriterPID, signal);
 	if (BgWriterPID != 0)
 		signal_child(BgWriterPID, signal);
 	if (CheckpointerPID != 0)
@@ -4867,6 +4958,7 @@ SubPostmasterMain(int argc, char *argv[])
 		strcmp(argv[1], "--forkavlauncher") == 0 ||
 		strcmp(argv[1], "--forkavworker") == 0 ||
 		strcmp(argv[1], "--forkboot") == 0 ||
+		strcmp(argv[1], "--forkbasebackup") == 0 ||
 		strncmp(argv[1], "--forkbgworker=", 15) == 0)
 		PGSharedMemoryReAttach();
 	else
@@ -4906,7 +4998,8 @@ SubPostmasterMain(int argc, char *argv[])
 	 * (re-)read control file, as it contains config. The postmaster will
 	 * already have read this, but this process doesn't know about that.
 	 */
-	LocalProcessControlFile(false);
+	if (strcmp(argv[1], "--forkbasebackup") != 0)
+		LocalProcessControlFile(false);
 
 	/*
 	 * Reload any libraries that were preloaded by the postmaster.  Since we
@@ -4967,7 +5060,8 @@ SubPostmasterMain(int argc, char *argv[])
 		/* And run the backend */
 		BackendRun(&port);		/* does not return */
 	}
-	if (strcmp(argv[1], "--forkboot") == 0)
+	if (strcmp(argv[1], "--forkboot") == 0 ||
+		strcmp(argv[1], "--forkbasebackup") == 0)
 	{
 		/* Restore basic shared memory pointers */
 		InitShmemAccess(UsedShmemSegAddr);
@@ -5371,7 +5465,7 @@ StartChildProcess(AuxProcType type)
 	av[ac++] = "postgres";
 
 #ifdef EXEC_BACKEND
-	av[ac++] = "--forkboot";
+	av[ac++] = (type == BaseBackupProcess) ? "--forkbasebackup" : "--forkboot";
 	av[ac++] = NULL;			/* filled in by postmaster_forkexec */
 #endif
 
@@ -5415,6 +5509,10 @@ StartChildProcess(AuxProcType type)
 				ereport(LOG,
 						(errmsg("could not fork startup process: %m")));
 				break;
+			case BaseBackupProcess:
+				ereport(LOG,
+						(errmsg("could not fork base backup process: %m")));
+				break;
 			case BgWriterProcess:
 				ereport(LOG,
 						(errmsg("could not fork background writer process: %m")));
@@ -5556,7 +5654,7 @@ static void
 MaybeStartWalReceiver(void)
 {
 	if (WalReceiverPID == 0 &&
-		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
+		(pmState == PM_INIT || pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 		 pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
 		Shutdown == NoShutdown)
 	{
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c91f66dcbe..fcac192d0b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -29,6 +29,7 @@
 #include "port.h"
 #include "postmaster/syslogger.h"
 #include "replication/basebackup.h"
+#include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
 #include "storage/bufpage.h"
@@ -38,6 +39,7 @@
 #include "storage/ipc.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/ps_status.h"
 #include "utils/relcache.h"
 #include "utils/timestamp.h"
@@ -111,6 +113,9 @@ static long long int total_checksum_failures;
 /* Do not verify checksums. */
 static bool noverify_checksums = false;
 
+/* Do not copy config files. */
+static bool exclude_conf = false;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -638,6 +643,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
 	bool		o_noverify_checksums = false;
+	bool		o_exclude_conf = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -726,6 +732,15 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			noverify_checksums = true;
 			o_noverify_checksums = true;
 		}
+		else if (strcmp(defel->defname, "exclude_conf") == 0)
+		{
+			if (o_exclude_conf)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("duplicate option \"%s\"", defel->defname)));
+			exclude_conf = true;
+			o_exclude_conf = true;
+		}
 		else
 			elog(ERROR, "option \"%s\" not recognized",
 				 defel->defname);
@@ -1135,6 +1150,18 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
 			continue;
 		}
 
+		if (exclude_conf)
+		{
+			char	   *dot = strrchr(de->d_name, '.');
+			if (dot && strcmp(dot, ".conf") == 0)
+			{
+				elog(DEBUG2,
+					 "configuration file \"%s\" excluded from backup",
+					 de->d_name);
+				continue;
+			}
+		}
+
 		snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
 
 		/* Skip pg_control here to back up it last */
@@ -1711,3 +1738,44 @@ throttle(size_t increment)
 	 */
 	throttled_last = GetCurrentTimestamp();
 }
+
+
+/*
+ * base backup worker process (client) main function
+ */
+void
+BaseBackupMain(void)
+{
+	WalReceiverConn *wrconn = NULL;
+	char	   *err;
+	TimeLineID	primaryTLI;
+	uint64		primary_sysid;
+
+	/* Load the libpq-specific functions */
+	load_file("libpqwalreceiver", false);
+	if (WalReceiverFunctions == NULL)
+		elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+
+	/* Establish the connection to the primary */
+	wrconn = walrcv_connect(PrimaryConnInfo, false, cluster_name[0] ? cluster_name : "basebackup", &err);
+	if (!wrconn)
+		ereport(ERROR,
+				(errmsg("could not connect to the primary server: %s", err)));
+
+	/*
+	 * Get the remote sysid and stick it into the local control file, so that
+	 * the walreceiver is happy.  The control file will later be overwritten
+	 * by the base backup.
+	 */
+	primary_sysid = strtoull(walrcv_identify_system(wrconn, &primaryTLI), NULL, 10);
+	InitControlFile(primary_sysid);
+	WriteControlFile();
+
+	walrcv_base_backup(wrconn);
+
+	walrcv_disconnect(wrconn);
+
+	ereport(LOG,
+			(errmsg("base backup completed")));
+	proc_exit(0);
+}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6eba08a920..6d448acacf 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -17,8 +17,14 @@
 #include "postgres.h"
 
 #include <unistd.h>
+#include <sys/stat.h>
 #include <sys/time.h>
 
+#ifdef USE_SYSTEMD
+#include <systemd/sd-daemon.h>
+#endif
+
+#include "common/string.h"
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
 #include "access/xlog.h"
@@ -27,10 +33,13 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "pgtar.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
+#include "utils/ps_status.h"
 #include "utils/tuplestore.h"
 
 PG_MODULE_MAGIC;
@@ -61,6 +70,7 @@ static int	libpqrcv_server_version(WalReceiverConn *conn);
 static void libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
 											 TimeLineID tli, char **filename,
 											 char **content, int *len);
+static void libpqrcv_base_backup(WalReceiverConn *conn);
 static bool libpqrcv_startstreaming(WalReceiverConn *conn,
 									const WalRcvStreamOptions *options);
 static void libpqrcv_endstreaming(WalReceiverConn *conn,
@@ -88,6 +98,7 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = {
 	libpqrcv_identify_system,
 	libpqrcv_server_version,
 	libpqrcv_readtimelinehistoryfile,
+	libpqrcv_base_backup,
 	libpqrcv_startstreaming,
 	libpqrcv_endstreaming,
 	libpqrcv_receive,
@@ -356,6 +367,414 @@ libpqrcv_server_version(WalReceiverConn *conn)
 	return PQserverVersion(conn->streamConn);
 }
 
+/*
+ * XXX copied from pg_basebackup.c
+ */
+
+unsigned long long totaldone;
+unsigned long long totalsize_kb;
+int tablespacenum;
+int tablespacecount;
+
+static void
+base_backup_report_progress(void)
+{
+	int			percent;
+	char	   *progress;
+
+	percent = totalsize_kb ? (int) ((totaldone / 1024) * 100 / totalsize_kb) : 0;
+
+	/*
+	 * Avoid overflowing past 100% or the full size. This may make the total
+	 * size number change as we approach the end of the backup (the estimate
+	 * will always be wrong if WAL is included), but that's better than having
+	 * the done column be bigger than the total.
+	 */
+	if (percent > 100)
+		percent = 100;
+	if (totaldone / 1024 > totalsize_kb)
+		totalsize_kb = totaldone / 1024;
+
+	/* Note: no translation of ps status */
+	progress = psprintf((tablespacecount == 1 ?
+						 "%llu/%llu kB (%d%%), %d/%d tablespace" :
+						 "%llu/%llu kB (%d%%), %d/%d tablespaces"),
+						totaldone / 1024,
+						totalsize_kb,
+						percent,
+						tablespacenum,
+						tablespacecount);
+
+	set_ps_display(progress, false);
+#ifdef USE_SYSTEMD
+	sd_pid_notifyf(PostmasterPid, 0, "STATUS=base backup %s", progress);
+#endif
+
+	pfree(progress);
+}
+
+static void
+ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res)
+{
+	char		current_path[MAXPGPATH];
+	char		filename[MAXPGPATH];
+	pgoff_t		current_len_left = 0;
+	int			current_padding = 0;
+	char	   *copybuf = NULL;
+	FILE	   *file = NULL;
+	off_t		flush_offset;
+
+	strlcpy(current_path, DataDir, sizeof(current_path));
+
+	/*
+	 * Get the COPY data
+	 */
+	res = PQgetResult(conn);
+	if (PQresultStatus(res) != PGRES_COPY_OUT)
+		ereport(ERROR,
+				(errmsg("could not get COPY data stream: %s",
+						PQerrorMessage(conn))));
+
+	while (1)
+	{
+		int			r;
+
+		if (copybuf != NULL)
+		{
+			PQfreemem(copybuf);
+			copybuf = NULL;
+		}
+
+		r = PQgetCopyData(conn, &copybuf, 0);
+
+		if (r == -1)
+		{
+			/*
+			 * End of chunk
+			 */
+			if (file)
+				fclose(file);
+
+			break;
+		}
+		else if (r == -2)
+		{
+			ereport(ERROR,
+					(errmsg("could not read COPY data: %s",
+							PQerrorMessage(conn))));
+		}
+
+		if (file == NULL)
+		{
+			int			filemode;
+
+			/*
+			 * No current file, so this must be the header for a new file
+			 */
+			if (r != 512)
+				ereport(ERROR,
+						(errmsg("invalid tar block header size: %d", r)));
+
+			current_len_left = read_tar_number(&copybuf[124], 12);
+
+			/* Set permissions on the file */
+			filemode = read_tar_number(&copybuf[100], 8);
+
+			/*
+			 * All files are padded up to 512 bytes
+			 */
+			current_padding =
+				((current_len_left + 511) & ~511) - current_len_left;
+
+			/*
+			 * First part of header is zero terminated filename
+			 */
+			snprintf(filename, sizeof(filename), "%s/%s", current_path,
+					 copybuf);
+			if (filename[strlen(filename) - 1] == '/')
+			{
+				/*
+				 * Ends in a slash means directory or symlink to directory
+				 */
+				if (copybuf[156] == '5')
+				{
+					/*
+					 * Directory
+					 */
+					filename[strlen(filename) - 1] = '\0';	/* Remove trailing slash */
+					if (MakePGDirectory(filename) != 0)
+					{
+						if (errno != EEXIST)
+							ereport(ERROR,
+									(errcode_for_file_access(),
+									 errmsg("could not create directory \"%s\": %m",
+											filename)));
+					}
+#ifndef WIN32
+					if (chmod(filename, (mode_t) filemode))
+						ereport(ERROR,
+								(errcode_for_file_access(),
+								 errmsg("could not set permissions on directory \"%s\": %m",
+										filename)));
+#endif
+					fsync_fname(filename, true);
+				}
+				else if (copybuf[156] == '2')
+				{
+					/*
+					 * Symbolic link
+					 *
+					 * It's most likely a link in pg_tblspc directory, to the
+					 * location of a tablespace. Apply any tablespace mapping
+					 * given on the command line (--tablespace-mapping). (We
+					 * blindly apply the mapping without checking that the
+					 * link really is inside pg_tblspc. We don't expect there
+					 * to be other symlinks in a data directory, but if there
+					 * are, you can call it an undocumented feature that you
+					 * can map them too.)
+					 */
+#ifdef TODO
+					filename[strlen(filename) - 1] = '\0';	/* Remove trailing slash */
+
+					mapped_tblspc_path = get_tablespace_mapping(&copybuf[157]);
+					if (symlink(mapped_tblspc_path, filename) != 0)
+					{
+						pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+									 filename, mapped_tblspc_path);
+						exit(1);
+					}
+					fsync_fname(filename, false);
+#endif
+				}
+				else
+				{
+					ereport(ERROR,
+							(errmsg("unrecognized link indicator \"%c\"",
+									copybuf[156])));
+				}
+				continue;		/* directory or link handled */
+			}
+
+			/*
+			 * regular file
+			 */
+			file = fopen(filename, "wb");
+			if (!file)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 (errmsg("could not create file \"%s\": %m", filename))));
+
+			flush_offset = 0;
+
+#ifndef WIN32
+			if (chmod(filename, (mode_t) filemode))
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 (errmsg("could not set permissions on file \"%s\": %m",
+								 filename))));
+#endif
+
+			if (current_len_left == 0)
+			{
+				/*
+				 * Done with this file, next one will be a new tar header
+				 */
+				pg_fsync(fileno(file));
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+		}						/* new file */
+		else
+		{
+			/*
+			 * Continuing blocks in existing file
+			 */
+			if (current_len_left == 0 && r == current_padding)
+			{
+				/*
+				 * Received the padding block for this file, ignore it and
+				 * close the file, then move on to the next tar header.
+				 */
+				pg_fsync(fileno(file));
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+
+			if (fwrite(copybuf, r, 1, file) != 1)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not write to file \"%s\": %m", filename)));
+
+			pg_flush_data(fileno(file), flush_offset, r);
+			flush_offset += r;
+			totaldone += r;
+			base_backup_report_progress();
+
+			current_len_left -= r;
+			if (current_len_left == 0 && current_padding == 0)
+			{
+				/*
+				 * Received the last block, and there is no padding to be
+				 * expected. Close the file and move on to the next tar
+				 * header.
+				 */
+				pg_fsync(fileno(file));
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+		}						/* continuing data in existing file */
+	}							/* loop over all data blocks */
+	base_backup_report_progress();
+
+	if (file != NULL)
+		ereport(ERROR,
+				(errmsg("COPY stream ended before last file was finished")));
+
+	if (copybuf != NULL)
+		PQfreemem(copybuf);
+}
+
+/*
+ * Make base backup from remote and write to local disk.
+ */
+static void
+libpqrcv_base_backup(WalReceiverConn *conn)
+{
+	StringInfoData stmt;
+	PGresult   *res;
+	char		xlogstart[64];
+	TimeLineID	starttli;
+	XLogRecPtr	recptr;
+	bool		error;
+
+	ereport(LOG,
+			(errmsg("initiating base backup, waiting for remote checkpoint to complete")));
+	set_ps_display("waiting for checkpoint", false);
+
+	initStringInfo(&stmt);
+	appendStringInfo(&stmt, "BASE_BACKUP PROGRESS NOWAIT EXCLUDE_CONF");
+	if (cluster_name && cluster_name[0])
+		appendStringInfo(&stmt, " LABEL %s", quote_literal_cstr(cluster_name));
+
+	if (PQsendQuery(conn->streamConn, stmt.data) == 0)
+		ereport(ERROR,
+				(errmsg("could not start base backup on remote server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+
+	/*
+	 * First result set: WAL start position and timeline ID
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not start base backup on remote server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) != 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("server returned unexpected response to BASE_BACKUP command; got %d rows and %d fields, expected %d rows and %d fields",
+						PQntuples(res), PQnfields(res), 1, 2)));
+	}
+
+	ereport(LOG,
+			(errmsg("remote checkpoint completed")));
+
+	strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart));
+	starttli = atoi(PQgetvalue(res, 0, 1));
+	PQclear(res);
+	elog(DEBUG1, "write-ahead log start point: %s on timeline %u",
+		 xlogstart, starttli);
+	recptr = pg_lsn_in_internal(xlogstart, &error);
+	if (error)
+		elog(ERROR, "invalid LSN received: %s", xlogstart);
+
+	/*
+	 * Second result set: tablespace information
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not get backup header: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) < 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("no data returned from server")));
+	}
+
+	totalsize_kb = totaldone = 0;
+	tablespacecount = PQntuples(res);
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		totalsize_kb += atol(PQgetvalue(res, i, 2));
+	}
+
+	RequestXLogStreaming(starttli, recptr, PrimaryConnInfo, PrimarySlotName);
+
+	/*
+	 * Start receiving chunks
+	 */
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		tablespacenum = i;
+		ReceiveAndUnpackTarFile(conn->streamConn, res);
+	}
+	tablespacenum++;
+	base_backup_report_progress();
+
+	PQclear(res);
+
+	/*
+	 * Final result set: WAL end position and timeline ID
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not get write-ahead log end position from server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) != 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("no write-ahead log end position returned from server")));
+	}
+	PQclear(res);
+
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+#ifdef TODO
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (sqlstate &&
+			strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+		{
+			elog(ERROR, "checksum error occurred");
+		}
+		else
+#endif
+		{
+			elog(ERROR, "final receive failed: %s",
+				 pchomp(PQerrorMessage(conn->streamConn)));
+		}
+	}
+	PQclear(res);
+}
+
 /*
  * Start streaming WAL data from given streaming options.
  *
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..8c962bc711 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -78,6 +78,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_WAL
 %token K_TABLESPACE_MAP
 %token K_NOVERIFY_CHECKSUMS
+%token K_EXCLUDE_CONF
 %token K_TIMELINE
 %token K_PHYSICAL
 %token K_LOGICAL
@@ -154,8 +155,7 @@ var_name:	IDENT	{ $$ = $1; }
 		;
 
 /*
- * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
+ * BASE_BACKUP [option]...
  */
 base_backup:
 			K_BASE_BACKUP base_backup_opt_list
@@ -214,6 +214,11 @@ base_backup_opt:
 				  $$ = makeDefElem("noverify_checksums",
 								   (Node *)makeInteger(true), -1);
 				}
+			| K_EXCLUDE_CONF
+				{
+				  $$ = makeDefElem("exclude_conf",
+								   (Node *)makeInteger(true), -1);
+				}
 			;
 
 create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..6a2d8d142b 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -93,6 +93,7 @@ MAX_RATE		{ return K_MAX_RATE; }
 WAL			{ return K_WAL; }
 TABLESPACE_MAP			{ return K_TABLESPACE_MAP; }
 NOVERIFY_CHECKSUMS	{ return K_NOVERIFY_CHECKSUMS; }
+EXCLUDE_CONF			{ return K_EXCLUDE_CONF; }
 TIMELINE			{ return K_TIMELINE; }
 START_REPLICATION	{ return K_START_REPLICATION; }
 CREATE_REPLICATION_SLOT		{ return K_CREATE_REPLICATION_SLOT; }
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 88a261d9bd..4722ad2107 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -136,6 +136,7 @@ static char *pwfilename = NULL;
 static char *superuser_password = NULL;
 static const char *authmethodhost = NULL;
 static const char *authmethodlocal = NULL;
+static bool replica = false;
 static bool debug = false;
 static bool noclean = false;
 static bool do_sync = true;
@@ -2938,6 +2939,22 @@ initialize_data_directory(void)
 	/* Now create all the text config files */
 	setup_config();
 
+	/*
+	 * If data directory for replica requested, write basebackup.signal, and
+	 * then we are done here.
+	 */
+	if (replica)
+	{
+		char	   *path;
+		char	   *lines[1] = {NULL};
+
+		path = psprintf("%s/basebackup.signal", pg_data);
+		writefile(path, lines);
+		free(path);
+
+		return;
+	}
+
 	/* Bootstrap template1 */
 	bootstrap_template1();
 
@@ -3029,6 +3046,7 @@ main(int argc, char *argv[])
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
+		{"replica", no_argument, NULL, 'r'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3070,7 +3088,7 @@ main(int argc, char *argv[])
 
 	/* process command-line options */
 
-	while ((c = getopt_long(argc, argv, "dD:E:kL:nNU:WA:sST:X:g", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "dD:E:kL:nNrU:WA:sST:X:g", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -3116,6 +3134,9 @@ main(int argc, char *argv[])
 			case 'N':
 				do_sync = false;
 				break;
+			case 'r':
+				replica = true;
+				break;
 			case 'S':
 				sync_only = true;
 				break;
@@ -3337,9 +3358,19 @@ main(int argc, char *argv[])
 	/* translator: This is a placeholder in a shell command. */
 	appendPQExpBuffer(start_db_cmd, " -l %s start", _("logfile"));
 
-	printf(_("\nSuccess. You can now start the database server using:\n\n"
-			 "    %s\n\n"),
-		   start_db_cmd->data);
+	if (!replica)
+	{
+		printf(_("\nSuccess. You can now start the database server using:\n\n"
+				 "    %s\n\n"),
+			   start_db_cmd->data);
+	}
+	else
+	{
+		printf(_("\nSo far so good. Now configure the replication connection in\n"
+				 "postgresql.conf, and then start the database server using:\n\n"
+				 "    %s\n\n"),
+			   start_db_cmd->data);
+	}
 
 	destroyPQExpBuffer(start_db_cmd);
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..6796cce0eb 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,11 @@ extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
 extern void BootStrapXLOG(void);
 extern void LocalProcessControlFile(bool reset);
+#ifndef FRONTEND
+extern void InitControlFile(uint64 sysidentifier);
+extern void WriteControlFile(void);
+extern void ReadControlFile(void);
+#endif
 extern void StartupXLOG(void);
 extern void ShutdownXLOG(int code, Datum arg);
 extern void InitXLOGAccess(void);
@@ -354,6 +359,7 @@ extern void do_pg_abort_backup(void);
 extern SessionBackupState get_backup_status(void);
 
 /* File path names (all relative to $PGDATA) */
+#define BASEBACKUP_SIGNAL_FILE	"basebackup.signal"
 #define RECOVERY_SIGNAL_FILE	"recovery.signal"
 #define STANDBY_SIGNAL_FILE		"standby.signal"
 #define BACKUP_LABEL_FILE		"backup_label"
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index bc6e03fbc7..75efc3cf5f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -398,6 +398,7 @@ typedef enum
 	CheckerProcess = 0,
 	BootstrapProcess,
 	StartupProcess,
+	BaseBackupProcess,
 	BgWriterProcess,
 	CheckpointerProcess,
 	WalWriterProcess,
@@ -410,6 +411,7 @@ extern AuxProcType MyAuxProcType;
 
 #define AmBootstrapProcess()		(MyAuxProcType == BootstrapProcess)
 #define AmStartupProcess()			(MyAuxProcType == StartupProcess)
+#define AmBaseBackupProcess()		(MyAuxProcType == BaseBackupProcess)
 #define AmBackgroundWriterProcess() (MyAuxProcType == BgWriterProcess)
 #define AmCheckpointerProcess()		(MyAuxProcType == CheckpointerProcess)
 #define AmWalWriterProcess()		(MyAuxProcType == WalWriterProcess)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..6b6a06ced8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -721,6 +721,7 @@ typedef enum BackendType
 	B_AUTOVAC_LAUNCHER,
 	B_AUTOVAC_WORKER,
 	B_BACKEND,
+	B_BASE_BACKUP,
 	B_BG_WORKER,
 	B_BG_WRITER,
 	B_CHECKPOINTER,
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..480165c51c 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -33,4 +33,6 @@ extern void SendBaseBackup(BaseBackupCmd *cmd);
 
 extern int64 sendTablespace(char *path, bool sizeonly);
 
+extern void BaseBackupMain(void);
+
 #endif							/* _BASEBACKUP_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index e12a934966..835c0b8214 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -214,6 +214,7 @@ typedef void (*walrcv_readtimelinehistoryfile_fn) (WalReceiverConn *conn,
 												   TimeLineID tli,
 												   char **filename,
 												   char **content, int *size);
+typedef void (*walrcv_base_backup_fn) (WalReceiverConn *conn);
 typedef bool (*walrcv_startstreaming_fn) (WalReceiverConn *conn,
 										  const WalRcvStreamOptions *options);
 typedef void (*walrcv_endstreaming_fn) (WalReceiverConn *conn,
@@ -241,6 +242,7 @@ typedef struct WalReceiverFunctionsType
 	walrcv_identify_system_fn walrcv_identify_system;
 	walrcv_server_version_fn walrcv_server_version;
 	walrcv_readtimelinehistoryfile_fn walrcv_readtimelinehistoryfile;
+	walrcv_base_backup_fn walrcv_base_backup;
 	walrcv_startstreaming_fn walrcv_startstreaming;
 	walrcv_endstreaming_fn walrcv_endstreaming;
 	walrcv_receive_fn walrcv_receive;
@@ -266,6 +268,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
 	WalReceiverFunctions->walrcv_server_version(conn)
 #define walrcv_readtimelinehistoryfile(conn, tli, filename, content, size) \
 	WalReceiverFunctions->walrcv_readtimelinehistoryfile(conn, tli, filename, content, size)
+#define walrcv_base_backup(conn) \
+	WalReceiverFunctions->walrcv_base_backup(conn)
 #define walrcv_startstreaming(conn, options) \
 	WalReceiverFunctions->walrcv_startstreaming(conn, options)
 #define walrcv_endstreaming(conn, next_tli) \
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 6791e0cbc2..2e12330b00 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -259,7 +259,7 @@ extern int	temp_file_limit;
 
 extern int	num_temp_buffers;
 
-extern char *cluster_name;
+extern PGDLLIMPORT char *cluster_name;
 extern PGDLLIMPORT char *ConfigFileName;
 extern char *HbaFileName;
 extern char *IdentFileName;
diff --git a/src/test/recovery/t/018_basebackup.pl b/src/test/recovery/t/018_basebackup.pl
new file mode 100644
index 0000000000..99731fc388
--- /dev/null
+++ b/src/test/recovery/t/018_basebackup.pl
@@ -0,0 +1,29 @@
+# Test basebackup worker functionality
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+my $node1 = get_new_node('node1');
+$node1->init(allows_streaming => 1);
+$node1->start;
+
+$node1->safe_psql('postgres',
+				  "CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
+
+my $node2 = get_new_node('node2');
+$node2->init(allows_streaming => 1, extra => [ '--replica' ]);
+$node2->append_conf('postgresql.conf', "primary_conninfo = '" . $node1->connstr . "'");
+my $old_mtime = (stat($node2->data_dir . '/postgresql.conf'))[9];
+$node2->start;
+
+$node1->wait_for_catchup($node2, 'replay', $node1->lsn('insert'));
+
+is($node2->safe_psql('postgres', "SELECT count(*) FROM tab_int"),
+   qq(1000),
+   'check content of standby');
+
+my $new_mtime = (stat($node2->data_dir . '/postgresql.conf'))[9];
+is($new_mtime, $old_mtime,
+   'configuration files were not copied');

base-commit: 9684e426954921e8b2bfa367f9e6a4cbbf4ce5ff
-- 
2.22.0

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#13)
Re: base backup client as auxiliary backend process

On 2019-Aug-30, Peter Eisentraut wrote:

Attached is an updated patch for this. I have changed the initdb option
name per suggestion. The WAL receiver is now started concurrently with
the base backup. There is progress reporting (ps display), fsyncing.
Configuration files are not copied anymore. There is a simple test
suite. Tablespace support is still missing, but it would be
straightforward.

This is an amazing feature. How come we don't have people cramming to
review this?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alvaro Herrera (#14)
Re: base backup client as auxiliary backend process

Hello, thanks for pinging.

At Wed, 11 Sep 2019 19:15:24 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in <20190911221524.GA16563@alvherre.pgsql>

On 2019-Aug-30, Peter Eisentraut wrote:

Attached is an updated patch for this. I have changed the initdb option
name per suggestion. The WAL receiver is now started concurrently with
the base backup. There is progress reporting (ps display), fsyncing.
Configuration files are not copied anymore. There is a simple test
suite. Tablespace support is still missing, but it would be
straightforward.

This is an amazing feature. How come we don't have people cramming to
review this?

I love it, too. As for me, the reason for hesitating review this
is the patch is said to be experimental. That means 'the details
don't matter, let's discuss it's design/outline.'. So I wanted to
see what the past reviewers comment on the revised shape before I
would stir up the discussion by maybe-pointless comment. (Then
forgotten..)

I'll re-look on this.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#16Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#13)
Re: base backup client as auxiliary backend process

On Fri, Aug 30, 2019 at 09:10:10PM +0200, Peter Eisentraut wrote:

Attached is a very hackish patch to implement this. It works like this:

# (assuming you have a primary already running somewhere)
initdb -D data2 --replica
$EDITOR data2/postgresql.conf # set primary_conninfo
pg_ctl -D data2 start

Attached is an updated patch for this. I have changed the initdb option
name per suggestion. The WAL receiver is now started concurrently with
the base backup. There is progress reporting (ps display), fsyncing.
Configuration files are not copied anymore. There is a simple test
suite. Tablespace support is still missing, but it would be
straightforward.

I find this idea and this spec neat.

-    * Verify XLOG status looks valid.
+    * Check that contents look valid.
     */
-   if (ControlFile->state < DB_SHUTDOWNED ||
-       ControlFile->state > DB_IN_PRODUCTION ||
-       !XRecOffIsValid(ControlFile->checkPoint))
+   if (!XRecOffIsValid(ControlFile->checkPoint))
             ereport(FATAL,
Doesn't seem like a good idea to me to remove this sanity check for
normal deployments, but actually you moved that down in StartupXLOG().
It seems to me tha this is unrelated and could be a separate patch so
as the errors produced are more verbose.  I think that we should also
change that code to use a switch/case on ControlFile->state.

The current defaults of pg_basebackup have been thought so as the
backups taken have a good stability and so as monitoring is eased
thanks to --wal-method=stream and the use of replication slots.
Shouldn't the use of a least a temporary replication slot be mandatory
for the stability of the copy? It seems to me that there is a good
argument for having a second process which streams WAL on top of the
main backup process, and just use a WAL receiver for that.

One problem which is not tackled here is what to do for the tablespace
map. pg_basebackup has its own specific trick for that, and with that
new feature we may want something equivalent? Not something to
consider as a first stage of course.

*/
-static void
+void
WriteControlFile(void)
[...]
-static void
+void
ReadControlFile(void)
[...]
If you begin to publish those routines, it seems to me that there
could be more consolidation with controldata_utils.c which includes
now a routine to update a control file.

+#ifndef FRONTEND
+extern void InitControlFile(uint64 sysidentifier);
+extern void WriteControlFile(void);
+extern void ReadControlFile(void);
+#endif
It would be nice to avoid that.
-extern char *cluster_name;
+extern PGDLLIMPORT char *cluster_name;
Separate patch here?
+   if (stat(BASEBACKUP_SIGNAL_FILE, &stat_buf) == 0)
+   {
+       int         fd;
+
+       fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR |
PG_BINARY,
+                              S_IRUSR | S_IWUSR);
+       if (fd >= 0)
+       {
+           (void) pg_fsync(fd);
+           close(fd);
+       }
+       basebackup_signal_file_found = true;
+   }
I would put that in a different routine.
+       /*
+        * Wait until done.  Start WAL receiver in the meantime, once
base
+        * backup has received the starting position.
+        */
+       while (BaseBackupPID != 0)
+       {
+           PG_SETMASK(&UnBlockSig);
+           pg_usleep(1000000L);
+           PG_SETMASK(&BlockSig);
+   primary_sysid = strtoull(walrcv_identify_system(wrconn,
&primaryTLI), NULL, 10);
No more strtol with base 10 stuff please :)
--
Michael
#17Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Michael Paquier (#16)
1 attachment(s)
Re: base backup client as auxiliary backend process

Updated patch attached.

On 2019-09-18 10:31, Michael Paquier wrote:

-    * Verify XLOG status looks valid.
+    * Check that contents look valid.
*/
-   if (ControlFile->state < DB_SHUTDOWNED ||
-       ControlFile->state > DB_IN_PRODUCTION ||
-       !XRecOffIsValid(ControlFile->checkPoint))
+   if (!XRecOffIsValid(ControlFile->checkPoint))
ereport(FATAL,
Doesn't seem like a good idea to me to remove this sanity check for
normal deployments, but actually you moved that down in StartupXLOG().
It seems to me tha this is unrelated and could be a separate patch so
as the errors produced are more verbose.  I think that we should also
change that code to use a switch/case on ControlFile->state.

Done. Yes, this was really a change made to get more precise error
messaged during debugging. It could be committed separately.

The current defaults of pg_basebackup have been thought so as the
backups taken have a good stability and so as monitoring is eased
thanks to --wal-method=stream and the use of replication slots.
Shouldn't the use of a least a temporary replication slot be mandatory
for the stability of the copy? It seems to me that there is a good
argument for having a second process which streams WAL on top of the
main backup process, and just use a WAL receiver for that.

Is this something that the walreceiver should be doing independent of
this patch?

One problem which is not tackled here is what to do for the tablespace
map. pg_basebackup has its own specific trick for that, and with that
new feature we may want something equivalent? Not something to
consider as a first stage of course.

The updated has support for tablespaces without mapping. I'm thinking
about putting the mapping specification into a GUC list somehow.
Shouldn't be too hard.

*/
-static void
+void
WriteControlFile(void)
[...]
-static void
+void
ReadControlFile(void)
[...]
If you begin to publish those routines, it seems to me that there
could be more consolidation with controldata_utils.c which includes
now a routine to update a control file.

Hmm, maybe long-term, but it seems too much dangerous surgery for this
patch.

+#ifndef FRONTEND
+extern void InitControlFile(uint64 sysidentifier);
+extern void WriteControlFile(void);
+extern void ReadControlFile(void);
+#endif
It would be nice to avoid that.

Fixed by renaming a function in pg_resetwal.c.

+       /*
+        * Wait until done.  Start WAL receiver in the meantime, once
base
+        * backup has received the starting position.
+        */
+       while (BaseBackupPID != 0)
+       {
+           PG_SETMASK(&UnBlockSig);
+           pg_usleep(1000000L);
+           PG_SETMASK(&BlockSig);

+ primary_sysid = strtoull(walrcv_identify_system(wrconn,
&primaryTLI), NULL, 10);
No more strtol with base 10 stuff please :)

Hmm, why not? What's the replacement?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v3-0001-Base-backup-client-as-auxiliary-backend-process.patchtext/plain; charset=UTF-8; name=v3-0001-Base-backup-client-as-auxiliary-backend-process.patch; x-mac-creator=0; x-mac-type=0Download
From ac34ece7665b62d542653cf12238973a1a45a18b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 28 Oct 2019 09:23:43 +0100
Subject: [PATCH v3] Base backup client as auxiliary backend process

Discussion: https://www.postgresql.org/message-id/flat/61b8d18d-c922-ac99-b990-a31ba63cdcbb@2ndquadrant.com
---
 doc/src/sgml/protocol.sgml                    |  12 +-
 doc/src/sgml/ref/initdb.sgml                  |  17 +
 src/backend/access/transam/xlog.c             | 184 ++++----
 src/backend/bootstrap/bootstrap.c             |   9 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           | 114 ++++-
 src/backend/replication/basebackup.c          |  70 +++
 .../libpqwalreceiver/libpqwalreceiver.c       | 400 ++++++++++++++++++
 src/backend/replication/repl_gram.y           |   9 +-
 src/backend/replication/repl_scanner.l        |   1 +
 src/backend/storage/file/fd.c                 |  36 +-
 src/bin/initdb/initdb.c                       |  39 +-
 src/bin/pg_resetwal/pg_resetwal.c             |   6 +-
 src/include/access/xlog.h                     |   8 +-
 src/include/miscadmin.h                       |   2 +
 src/include/pgstat.h                          |   1 +
 src/include/replication/basebackup.h          |   2 +
 src/include/replication/walreceiver.h         |   4 +
 src/include/storage/fd.h                      |   2 +-
 src/include/utils/guc.h                       |   2 +-
 src/test/recovery/t/018_basebackup.pl         |  29 ++
 21 files changed, 831 insertions(+), 122 deletions(-)
 create mode 100644 src/test/recovery/t/018_basebackup.pl

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 80275215e0..f54b820edf 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2466,7 +2466,7 @@ <title>Streaming Replication Protocol</title>
   </varlistentry>
 
   <varlistentry>
-    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ]
+    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>EXCLUDE_CONF</literal> ]
      <indexterm><primary>BASE_BACKUP</primary></indexterm>
     </term>
     <listitem>
@@ -2576,6 +2576,16 @@ <title>Streaming Replication Protocol</title>
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry>
+        <term><literal>EXCLUDE_CONF</literal></term>
+        <listitem>
+         <para>
+          Do not copy configuration files, that is, files that end in
+          <filename>.conf</filename>.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist>
      </para>
      <para>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..1261e02d59 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -286,6 +286,23 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-r</option></term>
+      <term><option>--replica</option></term>
+      <listitem>
+       <para>
+        Initialize a data directory for a physical replication replica.  The
+        data directory will not be initialized with a full database system,
+        but will instead only contain a minimal set of files.  A server that
+        is started on this data directory will first fetch a base backup and
+        then switch to standby mode.  The connection information for the base
+        backup has to be configured by setting <xref
+        linkend="guc-primary-conninfo"/>, and other parameters as desired,
+        before the server is started.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-S</option></term>
       <term><option>--sync-only</option></term>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2e3cc51006..6f49ccdada 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -904,8 +904,6 @@ static void CheckRecoveryConsistency(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
-static void WriteControlFile(void);
-static void ReadControlFile(void);
 static char *str_time(pg_time_t tnow);
 static bool CheckForStandbyTrigger(void);
 
@@ -4481,7 +4479,7 @@ rescanLatestTimeLine(void)
  * ReadControlFile() verifies they are correct.  We could split out the
  * I/O and compatibility-check functions, but there seems no need currently.
  */
-static void
+void
 WriteControlFile(void)
 {
 	int			fd;
@@ -4573,7 +4571,7 @@ WriteControlFile(void)
 						XLOG_CONTROL_FILE)));
 }
 
-static void
+void
 ReadControlFile(void)
 {
 	pg_crc32c	crc;
@@ -5079,6 +5077,41 @@ XLOGShmemInit(void)
 	InitSharedLatch(&XLogCtl->recoveryWakeupLatch);
 }
 
+void
+InitControlFile(uint64 sysidentifier)
+{
+	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
+
+	/*
+	 * Generate a random nonce. This is used for authentication requests that
+	 * will fail because the user does not exist. The nonce is used to create
+	 * a genuine-looking password challenge for the non-existent user, in lieu
+	 * of an actual stored password.
+	 */
+	if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
+		ereport(PANIC,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate secret authorization token")));
+
+	memset(ControlFile, 0, sizeof(ControlFileData));
+	/* Initialize pg_control status fields */
+	ControlFile->system_identifier = sysidentifier;
+	memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
+	ControlFile->state = DB_SHUTDOWNED;
+	ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
+
+	/* Set important parameter values for use when replaying WAL */
+	ControlFile->MaxConnections = MaxConnections;
+	ControlFile->max_worker_processes = max_worker_processes;
+	ControlFile->max_wal_senders = max_wal_senders;
+	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_locks_per_xact = max_locks_per_xact;
+	ControlFile->wal_level = wal_level;
+	ControlFile->wal_log_hints = wal_log_hints;
+	ControlFile->track_commit_timestamp = track_commit_timestamp;
+	ControlFile->data_checksum_version = bootstrap_data_checksum_version;
+}
+
 /*
  * This func must be called ONCE on system install.  It creates pg_control
  * and the initial XLOG segment.
@@ -5094,7 +5127,6 @@ BootStrapXLOG(void)
 	char	   *recptr;
 	bool		use_existent;
 	uint64		sysidentifier;
-	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
 	struct timeval tv;
 	pg_crc32c	crc;
 
@@ -5115,17 +5147,6 @@ BootStrapXLOG(void)
 	sysidentifier |= ((uint64) tv.tv_usec) << 12;
 	sysidentifier |= getpid() & 0xFFF;
 
-	/*
-	 * Generate a random nonce. This is used for authentication requests that
-	 * will fail because the user does not exist. The nonce is used to create
-	 * a genuine-looking password challenge for the non-existent user, in lieu
-	 * of an actual stored password.
-	 */
-	if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
-		ereport(PANIC,
-				(errcode(ERRCODE_INTERNAL_ERROR),
-				 errmsg("could not generate secret authorization token")));
-
 	/* First timeline ID is always 1 */
 	ThisTimeLineID = 1;
 
@@ -5233,30 +5254,12 @@ BootStrapXLOG(void)
 	openLogFile = -1;
 
 	/* Now create pg_control */
-
-	memset(ControlFile, 0, sizeof(ControlFileData));
-	/* Initialize pg_control status fields */
-	ControlFile->system_identifier = sysidentifier;
-	memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
-	ControlFile->state = DB_SHUTDOWNED;
+	InitControlFile(sysidentifier);
 	ControlFile->time = checkPoint.time;
 	ControlFile->checkPoint = checkPoint.redo;
 	ControlFile->checkPointCopy = checkPoint;
-	ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
-
-	/* Set important parameter values for use when replaying WAL */
-	ControlFile->MaxConnections = MaxConnections;
-	ControlFile->max_worker_processes = max_worker_processes;
-	ControlFile->max_wal_senders = max_wal_senders;
-	ControlFile->max_prepared_xacts = max_prepared_xacts;
-	ControlFile->max_locks_per_xact = max_locks_per_xact;
-	ControlFile->wal_level = wal_level;
-	ControlFile->wal_log_hints = wal_log_hints;
-	ControlFile->track_commit_timestamp = track_commit_timestamp;
-	ControlFile->data_checksum_version = bootstrap_data_checksum_version;
 
 	/* some additional ControlFile fields are set in WriteControlFile() */
-
 	WriteControlFile();
 
 	/* Bootstrap the commit log, too */
@@ -6231,45 +6234,59 @@ StartupXLOG(void)
 	CurrentResourceOwner = AuxProcessResourceOwner;
 
 	/*
-	 * Verify XLOG status looks valid.
+	 * Check that contents look valid.
 	 */
-	if (ControlFile->state < DB_SHUTDOWNED ||
-		ControlFile->state > DB_IN_PRODUCTION ||
-		!XRecOffIsValid(ControlFile->checkPoint))
+	if (!XRecOffIsValid(ControlFile->checkPoint))
 		ereport(FATAL,
-				(errmsg("control file contains invalid data")));
+				(errmsg("control file contains invalid checkpoint location")));
 
-	if (ControlFile->state == DB_SHUTDOWNED)
+	switch (ControlFile->state)
 	{
-		/* This is the expected case, so don't be chatty in standalone mode */
-		ereport(IsPostmasterEnvironment ? LOG : NOTICE,
-				(errmsg("database system was shut down at %s",
-						str_time(ControlFile->time))));
+		case DB_SHUTDOWNED:
+			/* This is the expected case, so don't be chatty in standalone mode */
+			ereport(IsPostmasterEnvironment ? LOG : NOTICE,
+					(errmsg("database system was shut down at %s",
+							str_time(ControlFile->time))));
+			break;
+
+		case DB_SHUTDOWNED_IN_RECOVERY:
+			ereport(LOG,
+					(errmsg("database system was shut down in recovery at %s",
+							str_time(ControlFile->time))));
+			break;
+
+		case DB_SHUTDOWNING:
+			ereport(LOG,
+					(errmsg("database system shutdown was interrupted; last known up at %s",
+							str_time(ControlFile->time))));
+			break;
+
+		case DB_IN_CRASH_RECOVERY:
+			ereport(LOG,
+					(errmsg("database system was interrupted while in recovery at %s",
+							str_time(ControlFile->time)),
+					 errhint("This probably means that some data is corrupted and"
+							 " you will have to use the last backup for recovery.")));
+			break;
+
+		case DB_IN_ARCHIVE_RECOVERY:
+			ereport(LOG,
+					(errmsg("database system was interrupted while in recovery at log time %s",
+							str_time(ControlFile->checkPointCopy.time)),
+					 errhint("If this has occurred more than once some data might be corrupted"
+							 " and you might need to choose an earlier recovery target.")));
+			break;
+
+		case DB_IN_PRODUCTION:
+			ereport(LOG,
+					(errmsg("database system was interrupted; last known up at %s",
+							str_time(ControlFile->time))));
+			break;
+
+		default:
+			ereport(FATAL,
+					(errmsg("control file contains invalid database cluster state")));
 	}
-	else if (ControlFile->state == DB_SHUTDOWNED_IN_RECOVERY)
-		ereport(LOG,
-				(errmsg("database system was shut down in recovery at %s",
-						str_time(ControlFile->time))));
-	else if (ControlFile->state == DB_SHUTDOWNING)
-		ereport(LOG,
-				(errmsg("database system shutdown was interrupted; last known up at %s",
-						str_time(ControlFile->time))));
-	else if (ControlFile->state == DB_IN_CRASH_RECOVERY)
-		ereport(LOG,
-				(errmsg("database system was interrupted while in recovery at %s",
-						str_time(ControlFile->time)),
-				 errhint("This probably means that some data is corrupted and"
-						 " you will have to use the last backup for recovery.")));
-	else if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
-		ereport(LOG,
-				(errmsg("database system was interrupted while in recovery at log time %s",
-						str_time(ControlFile->checkPointCopy.time)),
-				 errhint("If this has occurred more than once some data might be corrupted"
-						 " and you might need to choose an earlier recovery target.")));
-	else if (ControlFile->state == DB_IN_PRODUCTION)
-		ereport(LOG,
-				(errmsg("database system was interrupted; last known up at %s",
-						str_time(ControlFile->time))));
 
 	/* This is just to allow attaching to startup process with a debugger */
 #ifdef XLOG_REPLAY_DELAY
@@ -6284,24 +6301,31 @@ StartupXLOG(void)
 	 */
 	ValidateXLOGDirectoryStructure();
 
-	/*----------
+	/*
 	 * If we previously crashed, perform a couple of actions:
+	 *
 	 *	- The pg_wal directory may still include some temporary WAL segments
-	 * used when creating a new segment, so perform some clean up to not
-	 * bloat this path.  This is done first as there is no point to sync this
-	 * temporary data.
-	 *	- There might be data which we had written, intending to fsync it,
-	 * but which we had not actually fsync'd yet. Therefore, a power failure
-	 * in the near future might cause earlier unflushed writes to be lost,
-	 * even though more recent data written to disk from here on would be
-	 * persisted.  To avoid that, fsync the entire data directory.
-	 *---------
+	 *    used when creating a new segment, so perform some clean up to not
+	 *    bloat this path.  This is done first as there is no point to sync
+	 *    this temporary data.
+	 *
+	 *	- There might be data which we had written, intending to fsync it, but
+	 *    which we had not actually fsync'd yet.  Therefore, a power failure
+	 *    in the near future might cause earlier unflushed writes to be lost,
+	 *    even though more recent data written to disk from here on would be
+	 *    persisted.  To avoid that, fsync the entire data directory.  Errors
+	 *    are logged but not considered fatal.  Aborting on error would result
+	 *    in failure to start for harmless cases such as read-only files in
+	 *    the data directory, and that's not good either.
+	 *
+	 *    Note that if we previously crashed due to a PANIC on fsync(), we'll
+	 *    be rewriting all changes again during recovery.
 	 */
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
 		RemoveTempXlogFiles();
-		SyncDataDirectory();
+		SyncDataDirectory(true, LOG);
 	}
 
 	/*
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 9238fbe98d..a8b1ffd08a 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -36,6 +36,7 @@
 #include "postmaster/bgwriter.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
+#include "replication/basebackup.h"
 #include "replication/walreceiver.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
@@ -326,6 +327,9 @@ AuxiliaryProcessMain(int argc, char *argv[])
 			case StartupProcess:
 				statmsg = pgstat_get_backend_desc(B_STARTUP);
 				break;
+			case BaseBackupProcess:
+				statmsg = pgstat_get_backend_desc(B_BASE_BACKUP);
+				break;
 			case BgWriterProcess:
 				statmsg = pgstat_get_backend_desc(B_BG_WRITER);
 				break;
@@ -451,6 +455,11 @@ AuxiliaryProcessMain(int argc, char *argv[])
 			StartupProcessMain();
 			proc_exit(1);		/* should never return */
 
+		case BaseBackupProcess:
+			/* don't set signals, basebackup has its own agenda */
+			BaseBackupMain();
+			proc_exit(1);		/* should never return */
+
 		case BgWriterProcess:
 			/* don't set signals, bgwriter has its own agenda */
 			BackgroundWriterMain();
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 011076c3e3..5e7907f258 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2934,6 +2934,9 @@ pgstat_bestart(void)
 			case StartupProcess:
 				lbeentry.st_backendType = B_STARTUP;
 				break;
+			case BaseBackupProcess:
+				lbeentry.st_backendType = B_BASE_BACKUP;
+				break;
 			case BgWriterProcess:
 				lbeentry.st_backendType = B_BG_WRITER;
 				break;
@@ -4289,6 +4292,9 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_BG_WORKER:
 			backendDesc = "background worker";
 			break;
+		case B_BASE_BACKUP:
+			backendDesc = "base backup";
+			break;
 		case B_BG_WRITER:
 			backendDesc = "background writer";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5f30359165..51912bf718 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -116,6 +116,7 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "replication/logicallauncher.h"
+#include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -248,6 +249,7 @@ bool		restart_after_crash = true;
 
 /* PIDs of special child processes; 0 when not running */
 static pid_t StartupPID = 0,
+			BaseBackupPID = 0,
 			BgWriterPID = 0,
 			CheckpointerPID = 0,
 			WalWriterPID = 0,
@@ -539,6 +541,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #endif							/* EXEC_BACKEND */
 
 #define StartupDataBase()		StartChildProcess(StartupProcess)
+#define StartBaseBackup()		StartChildProcess(BaseBackupProcess)
 #define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
 #define StartCheckpointer()		StartChildProcess(CheckpointerProcess)
 #define StartWalWriter()		StartChildProcess(WalWriterProcess)
@@ -572,6 +575,8 @@ PostmasterMain(int argc, char *argv[])
 	bool		listen_addr_saved = false;
 	int			i;
 	char	   *output_config_variable = NULL;
+	struct stat stat_buf;
+	bool		basebackup_signal_file_found = false;
 
 	InitProcessGlobals();
 
@@ -886,12 +891,27 @@ PostmasterMain(int argc, char *argv[])
 	/* Verify that DataDir looks reasonable */
 	checkDataDir();
 
-	/* Check that pg_control exists */
-	checkControlFile();
-
 	/* And switch working directory into it */
 	ChangeToDataDir();
 
+	if (stat(BASEBACKUP_SIGNAL_FILE, &stat_buf) == 0)
+	{
+		int         fd;
+
+		fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY,
+							   S_IRUSR | S_IWUSR);
+		if (fd >= 0)
+		{
+			(void) pg_fsync(fd);
+			close(fd);
+		}
+		basebackup_signal_file_found = true;
+	}
+
+	/* Check that pg_control exists */
+	if (!basebackup_signal_file_found)
+		checkControlFile();
+
 	/*
 	 * Check for invalid combinations of GUC settings.
 	 */
@@ -970,7 +990,8 @@ PostmasterMain(int argc, char *argv[])
 	 * processes will inherit the correct function pointer and not need to
 	 * repeat the test.
 	 */
-	LocalProcessControlFile(false);
+	if (!basebackup_signal_file_found)
+		LocalProcessControlFile(false);
 
 	/*
 	 * Initialize SSL library, if specified.
@@ -1386,6 +1407,39 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
 
+	if (basebackup_signal_file_found)
+	{
+		BaseBackupPID = StartBaseBackup();
+
+		/*
+		 * Wait until done.  Start WAL receiver in the meantime, once base
+		 * backup has received the starting position.
+		 */
+		while (BaseBackupPID != 0)
+		{
+			PG_SETMASK(&UnBlockSig);
+			pg_usleep(1000000L);
+			PG_SETMASK(&BlockSig);
+			MaybeStartWalReceiver();
+		}
+
+		/*
+		 * XXX Shut down WAL receiver.  It will be restarted later in xlog.c,
+		 * and that will complain if it's already running.
+		 */
+		ShutdownWalRcv();
+
+		/*
+		 * Base backup done, now signal standby mode.
+		 */
+		durable_rename(BASEBACKUP_SIGNAL_FILE, STANDBY_SIGNAL_FILE, FATAL);
+
+		/*
+		 * Reread the control file that came in with the base backup.
+		 */
+		ReadControlFile();
+	}
+
 	/*
 	 * We're ready to rock and roll...
 	 */
@@ -2665,6 +2719,8 @@ SIGHUP_handler(SIGNAL_ARGS)
 		SignalChildren(SIGHUP);
 		if (StartupPID != 0)
 			signal_child(StartupPID, SIGHUP);
+		if (BaseBackupPID != 0)
+			signal_child(BaseBackupPID, SIGHUP);
 		if (BgWriterPID != 0)
 			signal_child(BgWriterPID, SIGHUP);
 		if (CheckpointerPID != 0)
@@ -2824,6 +2880,8 @@ pmdie(SIGNAL_ARGS)
 
 			if (StartupPID != 0)
 				signal_child(StartupPID, SIGTERM);
+			if (BaseBackupPID != 0)
+				signal_child(BaseBackupPID, SIGTERM);
 			if (BgWriterPID != 0)
 				signal_child(BgWriterPID, SIGTERM);
 			if (WalReceiverPID != 0)
@@ -3062,6 +3120,23 @@ reaper(SIGNAL_ARGS)
 			continue;
 		}
 
+		/*
+		 * Was it the base backup process?
+		 */
+		if (pid == BaseBackupPID)
+		{
+			BaseBackupPID = 0;
+			if (EXIT_STATUS_0(exitstatus))
+				;
+			else if (EXIT_STATUS_1(exitstatus))
+				ereport(FATAL,
+						(errmsg("base backup failed")));
+			else
+				HandleChildCrash(pid, exitstatus,
+								 _("base backup process"));
+			continue;
+		}
+
 		/*
 		 * Was it the bgwriter?  Normal exit can be ignored; we'll start a new
 		 * one at the next iteration of the postmaster's main loop, if
@@ -3583,6 +3658,18 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 		StartupStatus = STARTUP_SIGNALED;
 	}
 
+	/* Take care of the base backup process too */
+	if (pid == BaseBackupPID)
+		BaseBackupPID = 0;
+	else if (BaseBackupPID != 0 && take_action)
+	{
+		ereport(DEBUG2,
+				(errmsg_internal("sending %s to process %d",
+								 (SendStop ? "SIGSTOP" : "SIGQUIT"),
+								 (int) BaseBackupPID)));
+		signal_child(BaseBackupPID, (SendStop ? SIGSTOP : SIGQUIT));
+	}
+
 	/* Take care of the bgwriter too */
 	if (pid == BgWriterPID)
 		BgWriterPID = 0;
@@ -3817,6 +3904,7 @@ PostmasterStateMachine(void)
 		if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
 			StartupPID == 0 &&
 			WalReceiverPID == 0 &&
+			BaseBackupPID == 0 &&
 			BgWriterPID == 0 &&
 			(CheckpointerPID == 0 ||
 			 (!FatalError && Shutdown < ImmediateShutdown)) &&
@@ -3911,6 +3999,7 @@ PostmasterStateMachine(void)
 			/* These other guys should be dead already */
 			Assert(StartupPID == 0);
 			Assert(WalReceiverPID == 0);
+			Assert(BaseBackupPID == 0);
 			Assert(BgWriterPID == 0);
 			Assert(CheckpointerPID == 0);
 			Assert(WalWriterPID == 0);
@@ -4094,6 +4183,8 @@ TerminateChildren(int signal)
 		if (signal == SIGQUIT || signal == SIGKILL)
 			StartupStatus = STARTUP_SIGNALED;
 	}
+	if (BaseBackupPID != 0)
+		signal_child(BgWriterPID, signal);
 	if (BgWriterPID != 0)
 		signal_child(BgWriterPID, signal);
 	if (CheckpointerPID != 0)
@@ -4919,6 +5010,7 @@ SubPostmasterMain(int argc, char *argv[])
 		strcmp(argv[1], "--forkavlauncher") == 0 ||
 		strcmp(argv[1], "--forkavworker") == 0 ||
 		strcmp(argv[1], "--forkboot") == 0 ||
+		strcmp(argv[1], "--forkbasebackup") == 0 ||
 		strncmp(argv[1], "--forkbgworker=", 15) == 0)
 		PGSharedMemoryReAttach();
 	else
@@ -4958,7 +5050,8 @@ SubPostmasterMain(int argc, char *argv[])
 	 * (re-)read control file, as it contains config. The postmaster will
 	 * already have read this, but this process doesn't know about that.
 	 */
-	LocalProcessControlFile(false);
+	if (strcmp(argv[1], "--forkbasebackup") != 0)
+		LocalProcessControlFile(false);
 
 	/*
 	 * Reload any libraries that were preloaded by the postmaster.  Since we
@@ -5019,7 +5112,8 @@ SubPostmasterMain(int argc, char *argv[])
 		/* And run the backend */
 		BackendRun(&port);		/* does not return */
 	}
-	if (strcmp(argv[1], "--forkboot") == 0)
+	if (strcmp(argv[1], "--forkboot") == 0 ||
+		strcmp(argv[1], "--forkbasebackup") == 0)
 	{
 		/* Restore basic shared memory pointers */
 		InitShmemAccess(UsedShmemSegAddr);
@@ -5431,7 +5525,7 @@ StartChildProcess(AuxProcType type)
 	av[ac++] = "postgres";
 
 #ifdef EXEC_BACKEND
-	av[ac++] = "--forkboot";
+	av[ac++] = (type == BaseBackupProcess) ? "--forkbasebackup" : "--forkboot";
 	av[ac++] = NULL;			/* filled in by postmaster_forkexec */
 #endif
 
@@ -5475,6 +5569,10 @@ StartChildProcess(AuxProcType type)
 				ereport(LOG,
 						(errmsg("could not fork startup process: %m")));
 				break;
+			case BaseBackupProcess:
+				ereport(LOG,
+						(errmsg("could not fork base backup process: %m")));
+				break;
 			case BgWriterProcess:
 				ereport(LOG,
 						(errmsg("could not fork background writer process: %m")));
@@ -5616,7 +5714,7 @@ static void
 MaybeStartWalReceiver(void)
 {
 	if (WalReceiverPID == 0 &&
-		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
+		(pmState == PM_INIT || pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 		 pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
 		Shutdown == NoShutdown)
 	{
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d0f210de8c..388525500e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -29,6 +29,7 @@
 #include "port.h"
 #include "postmaster/syslogger.h"
 #include "replication/basebackup.h"
+#include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
 #include "storage/bufpage.h"
@@ -38,6 +39,7 @@
 #include "storage/ipc.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/ps_status.h"
 #include "utils/relcache.h"
 #include "utils/timestamp.h"
@@ -123,6 +125,9 @@ static long long int total_checksum_failures;
 /* Do not verify checksums. */
 static bool noverify_checksums = false;
 
+/* Do not copy config files. */
+static bool exclude_conf = false;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -652,6 +657,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
 	bool		o_noverify_checksums = false;
+	bool		o_exclude_conf = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -740,6 +746,15 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			noverify_checksums = true;
 			o_noverify_checksums = true;
 		}
+		else if (strcmp(defel->defname, "exclude_conf") == 0)
+		{
+			if (o_exclude_conf)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("duplicate option \"%s\"", defel->defname)));
+			exclude_conf = true;
+			o_exclude_conf = true;
+		}
 		else
 			elog(ERROR, "option \"%s\" not recognized",
 				 defel->defname);
@@ -1149,6 +1164,18 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
 			continue;
 		}
 
+		if (exclude_conf)
+		{
+			char	   *dot = strrchr(de->d_name, '.');
+			if (dot && strcmp(dot, ".conf") == 0)
+			{
+				elog(DEBUG2,
+					 "configuration file \"%s\" excluded from backup",
+					 de->d_name);
+				continue;
+			}
+		}
+
 		snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
 
 		/* Skip pg_control here to back up it last */
@@ -1743,3 +1770,46 @@ throttle(size_t increment)
 	 */
 	throttled_last = GetCurrentTimestamp();
 }
+
+
+/*
+ * base backup worker process (client) main function
+ */
+void
+BaseBackupMain(void)
+{
+	WalReceiverConn *wrconn = NULL;
+	char	   *err;
+	TimeLineID	primaryTLI;
+	uint64		primary_sysid;
+
+	/* Load the libpq-specific functions */
+	load_file("libpqwalreceiver", false);
+	if (WalReceiverFunctions == NULL)
+		elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+
+	/* Establish the connection to the primary */
+	wrconn = walrcv_connect(PrimaryConnInfo, false, cluster_name[0] ? cluster_name : "basebackup", &err);
+	if (!wrconn)
+		ereport(ERROR,
+				(errmsg("could not connect to the primary server: %s", err)));
+
+	/*
+	 * Get the remote sysid and stick it into the local control file, so that
+	 * the walreceiver is happy.  The control file will later be overwritten
+	 * by the base backup.
+	 */
+	primary_sysid = strtoull(walrcv_identify_system(wrconn, &primaryTLI), NULL, 10);
+	InitControlFile(primary_sysid);
+	WriteControlFile();
+
+	walrcv_base_backup(wrconn);
+
+	walrcv_disconnect(wrconn);
+
+	SyncDataDirectory(false, ERROR);
+
+	ereport(LOG,
+			(errmsg("base backup completed")));
+	proc_exit(0);
+}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6eba08a920..e45bce830f 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -17,8 +17,14 @@
 #include "postgres.h"
 
 #include <unistd.h>
+#include <sys/stat.h>
 #include <sys/time.h>
 
+#ifdef USE_SYSTEMD
+#include <systemd/sd-daemon.h>
+#endif
+
+#include "common/string.h"
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
 #include "access/xlog.h"
@@ -27,10 +33,13 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "pgtar.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
+#include "utils/ps_status.h"
 #include "utils/tuplestore.h"
 
 PG_MODULE_MAGIC;
@@ -61,6 +70,7 @@ static int	libpqrcv_server_version(WalReceiverConn *conn);
 static void libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
 											 TimeLineID tli, char **filename,
 											 char **content, int *len);
+static void libpqrcv_base_backup(WalReceiverConn *conn);
 static bool libpqrcv_startstreaming(WalReceiverConn *conn,
 									const WalRcvStreamOptions *options);
 static void libpqrcv_endstreaming(WalReceiverConn *conn,
@@ -88,6 +98,7 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = {
 	libpqrcv_identify_system,
 	libpqrcv_server_version,
 	libpqrcv_readtimelinehistoryfile,
+	libpqrcv_base_backup,
 	libpqrcv_startstreaming,
 	libpqrcv_endstreaming,
 	libpqrcv_receive,
@@ -356,6 +367,395 @@ libpqrcv_server_version(WalReceiverConn *conn)
 	return PQserverVersion(conn->streamConn);
 }
 
+/*
+ * XXX copied from pg_basebackup.c
+ */
+
+unsigned long long totaldone;
+unsigned long long totalsize_kb;
+int tablespacenum;
+int tablespacecount;
+
+static void
+base_backup_report_progress(void)
+{
+	int			percent;
+	char	   *progress;
+
+	percent = totalsize_kb ? (int) ((totaldone / 1024) * 100 / totalsize_kb) : 0;
+
+	/*
+	 * Avoid overflowing past 100% or the full size. This may make the total
+	 * size number change as we approach the end of the backup (the estimate
+	 * will always be wrong if WAL is included), but that's better than having
+	 * the done column be bigger than the total.
+	 */
+	if (percent > 100)
+		percent = 100;
+	if (totaldone / 1024 > totalsize_kb)
+		totalsize_kb = totaldone / 1024;
+
+	/* Note: no translation of ps status */
+	progress = psprintf((tablespacecount == 1 ?
+						 "%llu/%llu kB (%d%%), %d/%d tablespace" :
+						 "%llu/%llu kB (%d%%), %d/%d tablespaces"),
+						totaldone / 1024,
+						totalsize_kb,
+						percent,
+						tablespacenum,
+						tablespacecount);
+
+	set_ps_display(progress, false);
+#ifdef USE_SYSTEMD
+	sd_pid_notifyf(PostmasterPid, 0, "STATUS=base backup %s", progress);
+#endif
+
+	pfree(progress);
+}
+
+static void
+ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res)
+{
+	char		current_path[MAXPGPATH];
+	char		filename[MAXPGPATH];
+	pgoff_t		current_len_left = 0;
+	int			current_padding = 0;
+	char	   *copybuf = NULL;
+	FILE	   *file = NULL;
+	off_t		flush_offset;
+
+	strlcpy(current_path, DataDir, sizeof(current_path));
+
+	/*
+	 * Get the COPY data
+	 */
+	res = PQgetResult(conn);
+	if (PQresultStatus(res) != PGRES_COPY_OUT)
+		ereport(ERROR,
+				(errmsg("could not get COPY data stream: %s",
+						PQerrorMessage(conn))));
+
+	while (1)
+	{
+		int			r;
+
+		if (copybuf != NULL)
+		{
+			PQfreemem(copybuf);
+			copybuf = NULL;
+		}
+
+		r = PQgetCopyData(conn, &copybuf, 0);
+
+		if (r == -1)
+		{
+			/*
+			 * End of chunk
+			 */
+			if (file)
+				fclose(file);
+
+			break;
+		}
+		else if (r == -2)
+		{
+			ereport(ERROR,
+					(errmsg("could not read COPY data: %s",
+							PQerrorMessage(conn))));
+		}
+
+		if (file == NULL)
+		{
+			int			filemode;
+
+			/*
+			 * No current file, so this must be the header for a new file
+			 */
+			if (r != 512)
+				ereport(ERROR,
+						(errmsg("invalid tar block header size: %d", r)));
+
+			current_len_left = read_tar_number(&copybuf[124], 12);
+
+			/* Set permissions on the file */
+			filemode = read_tar_number(&copybuf[100], 8);
+
+			/*
+			 * All files are padded up to 512 bytes
+			 */
+			current_padding =
+				((current_len_left + 511) & ~511) - current_len_left;
+
+			/*
+			 * First part of header is zero terminated filename
+			 */
+			snprintf(filename, sizeof(filename), "%s/%s", current_path,
+					 copybuf);
+			if (filename[strlen(filename) - 1] == '/')
+			{
+				/*
+				 * Ends in a slash means directory or symlink to directory
+				 */
+				if (copybuf[156] == '5')
+				{
+					/*
+					 * Directory
+					 */
+					filename[strlen(filename) - 1] = '\0';	/* Remove trailing slash */
+					if (MakePGDirectory(filename) != 0)
+					{
+						if (errno != EEXIST)
+							ereport(ERROR,
+									(errcode_for_file_access(),
+									 errmsg("could not create directory \"%s\": %m",
+											filename)));
+					}
+#ifndef WIN32
+					if (chmod(filename, (mode_t) filemode))
+						ereport(ERROR,
+								(errcode_for_file_access(),
+								 errmsg("could not set permissions on directory \"%s\": %m",
+										filename)));
+#endif
+				}
+				/*
+				 * Symbolic link
+				 */
+				else if (copybuf[156] == '2')
+				{
+					/* TODO: tablespace mapping */
+					const char *tblspc_path = &copybuf[157];
+
+					filename[strlen(filename) - 1] = '\0';	/* Remove trailing slash */
+
+					if (symlink(tblspc_path, filename) != 0)
+						ereport(ERROR,
+								(errcode_for_file_access(),
+								 errmsg("could not create symbolic link from \"%s\" to \"%s\": %m",
+										filename, tblspc_path)));
+				}
+				else
+				{
+					ereport(ERROR,
+							(errmsg("unrecognized link indicator \"%c\"",
+									copybuf[156])));
+				}
+				continue;		/* directory or link handled */
+			}
+
+			/*
+			 * regular file
+			 */
+			file = fopen(filename, "wb");
+			if (!file)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 (errmsg("could not create file \"%s\": %m", filename))));
+
+			flush_offset = 0;
+
+#ifndef WIN32
+			if (chmod(filename, (mode_t) filemode))
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 (errmsg("could not set permissions on file \"%s\": %m",
+								 filename))));
+#endif
+
+			if (current_len_left == 0)
+			{
+				/*
+				 * Done with this file, next one will be a new tar header
+				 */
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+		}						/* new file */
+		else
+		{
+			/*
+			 * Continuing blocks in existing file
+			 */
+			if (current_len_left == 0 && r == current_padding)
+			{
+				/*
+				 * Received the padding block for this file, ignore it and
+				 * close the file, then move on to the next tar header.
+				 */
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+
+			if (fwrite(copybuf, r, 1, file) != 1)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not write to file \"%s\": %m", filename)));
+
+			pg_flush_data(fileno(file), flush_offset, r);
+			flush_offset += r;
+			totaldone += r;
+			base_backup_report_progress();
+
+			current_len_left -= r;
+			if (current_len_left == 0 && current_padding == 0)
+			{
+				/*
+				 * Received the last block, and there is no padding to be
+				 * expected. Close the file and move on to the next tar
+				 * header.
+				 */
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+		}						/* continuing data in existing file */
+	}							/* loop over all data blocks */
+	base_backup_report_progress();
+
+	if (file != NULL)
+		ereport(ERROR,
+				(errmsg("COPY stream ended before last file was finished")));
+
+	if (copybuf != NULL)
+		PQfreemem(copybuf);
+}
+
+/*
+ * Make base backup from remote and write to local disk.
+ */
+static void
+libpqrcv_base_backup(WalReceiverConn *conn)
+{
+	StringInfoData stmt;
+	PGresult   *res;
+	char		xlogstart[64];
+	TimeLineID	starttli;
+	XLogRecPtr	recptr;
+	bool		error;
+
+	ereport(LOG,
+			(errmsg("initiating base backup, waiting for remote checkpoint to complete")));
+	set_ps_display("waiting for checkpoint", false);
+
+	initStringInfo(&stmt);
+	appendStringInfo(&stmt, "BASE_BACKUP PROGRESS NOWAIT EXCLUDE_CONF");
+	if (cluster_name && cluster_name[0])
+		appendStringInfo(&stmt, " LABEL %s", quote_literal_cstr(cluster_name));
+
+	if (PQsendQuery(conn->streamConn, stmt.data) == 0)
+		ereport(ERROR,
+				(errmsg("could not start base backup on remote server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+
+	/*
+	 * First result set: WAL start position and timeline ID
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not start base backup on remote server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) != 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("server returned unexpected response to BASE_BACKUP command; got %d rows and %d fields, expected %d rows and %d fields",
+						PQntuples(res), PQnfields(res), 1, 2)));
+	}
+
+	ereport(LOG,
+			(errmsg("remote checkpoint completed")));
+
+	strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart));
+	starttli = atoi(PQgetvalue(res, 0, 1));
+	PQclear(res);
+	elog(DEBUG1, "write-ahead log start point: %s on timeline %u",
+		 xlogstart, starttli);
+	recptr = pg_lsn_in_internal(xlogstart, &error);
+	if (error)
+		elog(ERROR, "invalid LSN received: %s", xlogstart);
+
+	/*
+	 * Second result set: tablespace information
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not get backup header: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) < 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("no data returned from server")));
+	}
+
+	totalsize_kb = totaldone = 0;
+	tablespacecount = PQntuples(res);
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		totalsize_kb += atol(PQgetvalue(res, i, 2));
+	}
+
+	RequestXLogStreaming(starttli, recptr, PrimaryConnInfo, PrimarySlotName);
+
+	/*
+	 * Start receiving chunks
+	 */
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		tablespacenum = i;
+		ReceiveAndUnpackTarFile(conn->streamConn, res);
+	}
+	tablespacenum++;
+	base_backup_report_progress();
+
+	PQclear(res);
+
+	/*
+	 * Final result set: WAL end position and timeline ID
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not get write-ahead log end position from server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) != 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("no write-ahead log end position returned from server")));
+	}
+	PQclear(res);
+
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (sqlstate &&
+			strcmp(sqlstate, "XX001" /*ERRCODE_DATA_CORRUPTED*/) == 0)
+			ereport(ERROR,
+					(errmsg("checksum error occurred")));
+		else
+			ereport(ERROR,
+					(errmsg("final receive failed: %s",
+							pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	PQclear(res);
+}
+
 /*
  * Start streaming WAL data from given streaming options.
  *
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index c4e11cc4e8..8c962bc711 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -78,6 +78,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_WAL
 %token K_TABLESPACE_MAP
 %token K_NOVERIFY_CHECKSUMS
+%token K_EXCLUDE_CONF
 %token K_TIMELINE
 %token K_PHYSICAL
 %token K_LOGICAL
@@ -154,8 +155,7 @@ var_name:	IDENT	{ $$ = $1; }
 		;
 
 /*
- * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
+ * BASE_BACKUP [option]...
  */
 base_backup:
 			K_BASE_BACKUP base_backup_opt_list
@@ -214,6 +214,11 @@ base_backup_opt:
 				  $$ = makeDefElem("noverify_checksums",
 								   (Node *)makeInteger(true), -1);
 				}
+			| K_EXCLUDE_CONF
+				{
+				  $$ = makeDefElem("exclude_conf",
+								   (Node *)makeInteger(true), -1);
+				}
 			;
 
 create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 380faeb5f6..6a2d8d142b 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -93,6 +93,7 @@ MAX_RATE		{ return K_MAX_RATE; }
 WAL			{ return K_WAL; }
 TABLESPACE_MAP			{ return K_TABLESPACE_MAP; }
 NOVERIFY_CHECKSUMS	{ return K_NOVERIFY_CHECKSUMS; }
+EXCLUDE_CONF			{ return K_EXCLUDE_CONF; }
 TIMELINE			{ return K_TIMELINE; }
 START_REPLICATION	{ return K_START_REPLICATION; }
 CREATE_REPLICATION_SLOT		{ return K_CREATE_REPLICATION_SLOT; }
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index fe2bb8f859..8d2e971ff7 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3117,21 +3117,14 @@ looks_like_temp_rel_name(const char *name)
  * Other symlinks are presumed to point at files we're not responsible
  * for fsyncing, and might not have privileges to write at all.
  *
- * Errors are logged but not considered fatal; that's because this is used
- * only during database startup, to deal with the possibility that there are
- * issued-but-unsynced writes pending against the data directory.  We want to
- * ensure that such writes reach disk before anything that's done in the new
- * run.  However, aborting on error would result in failure to start for
- * harmless cases such as read-only files in the data directory, and that's
- * not good either.
- *
- * Note that if we previously crashed due to a PANIC on fsync(), we'll be
- * rewriting all changes again during recovery.
+ * If pre_sync is true, issue flush requests to the kernel before starting the
+ * actual fsync calls.  This can be skipped if the caller has already done it
+ * itself.
  *
  * Note we assume we're chdir'd into PGDATA to begin with.
  */
 void
-SyncDataDirectory(void)
+SyncDataDirectory(bool pre_sync, int loglevel)
 {
 	bool		xlog_is_symlink;
 
@@ -3150,7 +3143,7 @@ SyncDataDirectory(void)
 		struct stat st;
 
 		if (lstat("pg_wal", &st) < 0)
-			ereport(LOG,
+			ereport(loglevel,
 					(errcode_for_file_access(),
 					 errmsg("could not stat file \"%s\": %m",
 							"pg_wal")));
@@ -3164,15 +3157,18 @@ SyncDataDirectory(void)
 
 	/*
 	 * If possible, hint to the kernel that we're soon going to fsync the data
-	 * directory and its contents.  Errors in this step are even less
+	 * directory and its contents.  Errors in this step are less
 	 * interesting than normal, so log them only at DEBUG1.
 	 */
+	if (pre_sync)
+	{
 #ifdef PG_FLUSH_DATA_WORKS
-	walkdir(".", pre_sync_fname, false, DEBUG1);
-	if (xlog_is_symlink)
-		walkdir("pg_wal", pre_sync_fname, false, DEBUG1);
-	walkdir("pg_tblspc", pre_sync_fname, true, DEBUG1);
+		walkdir(".", pre_sync_fname, false, DEBUG1);
+		if (xlog_is_symlink)
+			walkdir("pg_wal", pre_sync_fname, false, DEBUG1);
+		walkdir("pg_tblspc", pre_sync_fname, true, DEBUG1);
 #endif
+	}
 
 	/*
 	 * Now we do the fsync()s in the same order.
@@ -3183,10 +3179,10 @@ SyncDataDirectory(void)
 	 * in pg_tblspc, they'll get fsync'd twice.  That's not an expected case
 	 * so we don't worry about optimizing it.
 	 */
-	walkdir(".", datadir_fsync_fname, false, LOG);
+	walkdir(".", datadir_fsync_fname, false, loglevel);
 	if (xlog_is_symlink)
-		walkdir("pg_wal", datadir_fsync_fname, false, LOG);
-	walkdir("pg_tblspc", datadir_fsync_fname, true, LOG);
+		walkdir("pg_wal", datadir_fsync_fname, false, loglevel);
+	walkdir("pg_tblspc", datadir_fsync_fname, true, loglevel);
 }
 
 /*
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 88a261d9bd..4722ad2107 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -136,6 +136,7 @@ static char *pwfilename = NULL;
 static char *superuser_password = NULL;
 static const char *authmethodhost = NULL;
 static const char *authmethodlocal = NULL;
+static bool replica = false;
 static bool debug = false;
 static bool noclean = false;
 static bool do_sync = true;
@@ -2938,6 +2939,22 @@ initialize_data_directory(void)
 	/* Now create all the text config files */
 	setup_config();
 
+	/*
+	 * If data directory for replica requested, write basebackup.signal, and
+	 * then we are done here.
+	 */
+	if (replica)
+	{
+		char	   *path;
+		char	   *lines[1] = {NULL};
+
+		path = psprintf("%s/basebackup.signal", pg_data);
+		writefile(path, lines);
+		free(path);
+
+		return;
+	}
+
 	/* Bootstrap template1 */
 	bootstrap_template1();
 
@@ -3029,6 +3046,7 @@ main(int argc, char *argv[])
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
+		{"replica", no_argument, NULL, 'r'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3070,7 +3088,7 @@ main(int argc, char *argv[])
 
 	/* process command-line options */
 
-	while ((c = getopt_long(argc, argv, "dD:E:kL:nNU:WA:sST:X:g", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "dD:E:kL:nNrU:WA:sST:X:g", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -3116,6 +3134,9 @@ main(int argc, char *argv[])
 			case 'N':
 				do_sync = false;
 				break;
+			case 'r':
+				replica = true;
+				break;
 			case 'S':
 				sync_only = true;
 				break;
@@ -3337,9 +3358,19 @@ main(int argc, char *argv[])
 	/* translator: This is a placeholder in a shell command. */
 	appendPQExpBuffer(start_db_cmd, " -l %s start", _("logfile"));
 
-	printf(_("\nSuccess. You can now start the database server using:\n\n"
-			 "    %s\n\n"),
-		   start_db_cmd->data);
+	if (!replica)
+	{
+		printf(_("\nSuccess. You can now start the database server using:\n\n"
+				 "    %s\n\n"),
+			   start_db_cmd->data);
+	}
+	else
+	{
+		printf(_("\nSo far so good. Now configure the replication connection in\n"
+				 "postgresql.conf, and then start the database server using:\n\n"
+				 "    %s\n\n"),
+			   start_db_cmd->data);
+	}
 
 	destroyPQExpBuffer(start_db_cmd);
 
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index c4ee0168a9..1acf9353d1 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -76,7 +76,7 @@ static int	WalSegSz;
 static int	set_wal_segsize;
 
 static void CheckDataVersion(void);
-static bool ReadControlFile(void);
+static bool read_controlfile(void);
 static void GuessControlValues(void);
 static void PrintControlValues(bool guessed);
 static void PrintNewControlValues(void);
@@ -393,7 +393,7 @@ main(int argc, char *argv[])
 	/*
 	 * Attempt to read the existing pg_control file
 	 */
-	if (!ReadControlFile())
+	if (!read_controlfile())
 		GuessControlValues();
 
 	/*
@@ -578,7 +578,7 @@ CheckDataVersion(void)
  * to the current format.  (Currently we don't do anything of the sort.)
  */
 static bool
-ReadControlFile(void)
+read_controlfile(void)
 {
 	int			fd;
 	int			len;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..d0d5968dcf 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -127,8 +127,8 @@ extern char *archiveCleanupCommand;
 extern bool recoveryTargetInclusive;
 extern int	recoveryTargetAction;
 extern int	recovery_min_apply_delay;
-extern char *PrimaryConnInfo;
-extern char *PrimarySlotName;
+extern PGDLLIMPORT char *PrimaryConnInfo;
+extern PGDLLIMPORT char *PrimarySlotName;
 
 /* indirectly set via GUC system */
 extern TransactionId recoveryTargetXid;
@@ -299,6 +299,9 @@ extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
 extern void BootStrapXLOG(void);
 extern void LocalProcessControlFile(bool reset);
+extern void InitControlFile(uint64 sysidentifier);
+extern void WriteControlFile(void);
+extern void ReadControlFile(void);
 extern void StartupXLOG(void);
 extern void ShutdownXLOG(int code, Datum arg);
 extern void InitXLOGAccess(void);
@@ -354,6 +357,7 @@ extern void do_pg_abort_backup(void);
 extern SessionBackupState get_backup_status(void);
 
 /* File path names (all relative to $PGDATA) */
+#define BASEBACKUP_SIGNAL_FILE	"basebackup.signal"
 #define RECOVERY_SIGNAL_FILE	"recovery.signal"
 #define STANDBY_SIGNAL_FILE		"standby.signal"
 #define BACKUP_LABEL_FILE		"backup_label"
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index bc6e03fbc7..75efc3cf5f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -398,6 +398,7 @@ typedef enum
 	CheckerProcess = 0,
 	BootstrapProcess,
 	StartupProcess,
+	BaseBackupProcess,
 	BgWriterProcess,
 	CheckpointerProcess,
 	WalWriterProcess,
@@ -410,6 +411,7 @@ extern AuxProcType MyAuxProcType;
 
 #define AmBootstrapProcess()		(MyAuxProcType == BootstrapProcess)
 #define AmStartupProcess()			(MyAuxProcType == StartupProcess)
+#define AmBaseBackupProcess()		(MyAuxProcType == BaseBackupProcess)
 #define AmBackgroundWriterProcess() (MyAuxProcType == BgWriterProcess)
 #define AmCheckpointerProcess()		(MyAuxProcType == CheckpointerProcess)
 #define AmWalWriterProcess()		(MyAuxProcType == WalWriterProcess)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..6b6a06ced8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -721,6 +721,7 @@ typedef enum BackendType
 	B_AUTOVAC_LAUNCHER,
 	B_AUTOVAC_WORKER,
 	B_BACKEND,
+	B_BASE_BACKUP,
 	B_BG_WORKER,
 	B_BG_WRITER,
 	B_CHECKPOINTER,
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 503a5b9f0b..480165c51c 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -33,4 +33,6 @@ extern void SendBaseBackup(BaseBackupCmd *cmd);
 
 extern int64 sendTablespace(char *path, bool sizeonly);
 
+extern void BaseBackupMain(void);
+
 #endif							/* _BASEBACKUP_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index e12a934966..835c0b8214 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -214,6 +214,7 @@ typedef void (*walrcv_readtimelinehistoryfile_fn) (WalReceiverConn *conn,
 												   TimeLineID tli,
 												   char **filename,
 												   char **content, int *size);
+typedef void (*walrcv_base_backup_fn) (WalReceiverConn *conn);
 typedef bool (*walrcv_startstreaming_fn) (WalReceiverConn *conn,
 										  const WalRcvStreamOptions *options);
 typedef void (*walrcv_endstreaming_fn) (WalReceiverConn *conn,
@@ -241,6 +242,7 @@ typedef struct WalReceiverFunctionsType
 	walrcv_identify_system_fn walrcv_identify_system;
 	walrcv_server_version_fn walrcv_server_version;
 	walrcv_readtimelinehistoryfile_fn walrcv_readtimelinehistoryfile;
+	walrcv_base_backup_fn walrcv_base_backup;
 	walrcv_startstreaming_fn walrcv_startstreaming;
 	walrcv_endstreaming_fn walrcv_endstreaming;
 	walrcv_receive_fn walrcv_receive;
@@ -266,6 +268,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
 	WalReceiverFunctions->walrcv_server_version(conn)
 #define walrcv_readtimelinehistoryfile(conn, tli, filename, content, size) \
 	WalReceiverFunctions->walrcv_readtimelinehistoryfile(conn, tli, filename, content, size)
+#define walrcv_base_backup(conn) \
+	WalReceiverFunctions->walrcv_base_backup(conn)
 #define walrcv_startstreaming(conn, options) \
 	WalReceiverFunctions->walrcv_startstreaming(conn, options)
 #define walrcv_endstreaming(conn, next_tli) \
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 625fbc386a..1c57ad901f 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -148,7 +148,7 @@ extern void fsync_fname(const char *fname, bool isdir);
 extern int	durable_rename(const char *oldfile, const char *newfile, int loglevel);
 extern int	durable_unlink(const char *fname, int loglevel);
 extern int	durable_link_or_rename(const char *oldfile, const char *newfile, int loglevel);
-extern void SyncDataDirectory(void);
+extern void SyncDataDirectory(bool pre_sync, int loglevel);
 extern int	data_sync_elevel(int elevel);
 
 /* Filename components */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 6791e0cbc2..2e12330b00 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -259,7 +259,7 @@ extern int	temp_file_limit;
 
 extern int	num_temp_buffers;
 
-extern char *cluster_name;
+extern PGDLLIMPORT char *cluster_name;
 extern PGDLLIMPORT char *ConfigFileName;
 extern char *HbaFileName;
 extern char *IdentFileName;
diff --git a/src/test/recovery/t/018_basebackup.pl b/src/test/recovery/t/018_basebackup.pl
new file mode 100644
index 0000000000..99731fc388
--- /dev/null
+++ b/src/test/recovery/t/018_basebackup.pl
@@ -0,0 +1,29 @@
+# Test basebackup worker functionality
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+my $node1 = get_new_node('node1');
+$node1->init(allows_streaming => 1);
+$node1->start;
+
+$node1->safe_psql('postgres',
+				  "CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
+
+my $node2 = get_new_node('node2');
+$node2->init(allows_streaming => 1, extra => [ '--replica' ]);
+$node2->append_conf('postgresql.conf', "primary_conninfo = '" . $node1->connstr . "'");
+my $old_mtime = (stat($node2->data_dir . '/postgresql.conf'))[9];
+$node2->start;
+
+$node1->wait_for_catchup($node2, 'replay', $node1->lsn('insert'));
+
+is($node2->safe_psql('postgres', "SELECT count(*) FROM tab_int"),
+   qq(1000),
+   'check content of standby');
+
+my $new_mtime = (stat($node2->data_dir . '/postgresql.conf'))[9];
+is($new_mtime, $old_mtime,
+   'configuration files were not copied');

base-commit: 61ecea45e50bcd3b87d4e905719e63e41d6321ce
-- 
2.23.0

#18Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#17)
Re: base backup client as auxiliary backend process

On Mon, Oct 28, 2019 at 09:30:52AM +0100, Peter Eisentraut wrote:

On 2019-09-18 10:31, Michael Paquier wrote:

-    * Verify XLOG status looks valid.
+    * Check that contents look valid.
*/
-   if (ControlFile->state < DB_SHUTDOWNED ||
-       ControlFile->state > DB_IN_PRODUCTION ||
-       !XRecOffIsValid(ControlFile->checkPoint))
+   if (!XRecOffIsValid(ControlFile->checkPoint))
ereport(FATAL,
Doesn't seem like a good idea to me to remove this sanity check for
normal deployments, but actually you moved that down in StartupXLOG().
It seems to me tha this is unrelated and could be a separate patch so
as the errors produced are more verbose.  I think that we should also
change that code to use a switch/case on ControlFile->state.

Done. Yes, this was really a change made to get more precise error messaged
during debugging. It could be committed separately.

If you wish to do so now, that's fine by me.

The current defaults of pg_basebackup have been thought so as the
backups taken have a good stability and so as monitoring is eased
thanks to --wal-method=stream and the use of replication slots.
Shouldn't the use of a least a temporary replication slot be mandatory
for the stability of the copy? It seems to me that there is a good
argument for having a second process which streams WAL on top of the
main backup process, and just use a WAL receiver for that.

Is this something that the walreceiver should be doing independent of this
patch?

There could be an argument for switching a WAL receiver to use a
temporary replication slot by default. Still, it seems to me that
this backup solution suffers from the same set of problems we have
spent years to fix with pg_basebackup with missing WAL files caused by
concurrent checkpoints removing things needed while the copy of the
main data folder and other tablespaces happens.

One problem which is not tackled here is what to do for the tablespace
map. pg_basebackup has its own specific trick for that, and with that
new feature we may want something equivalent? Not something to
consider as a first stage of course.

The updated has support for tablespaces without mapping. I'm thinking about
putting the mapping specification into a GUC list somehow. Shouldn't be too
hard.

That may become ugly if there are many tablespaces to take care of.
Another idea I can come up with would be to pass the new mapping to
initdb, still this requires an extra intermediate step to store the
new map, and then compare it with the mapping received at BASE_BACKUP
time. But perhaps you are looking for an experience different than
pg_basebackup. The first version of the patch does not actually
require that anyway..

No more strtol with base 10 stuff please :)

Hmm, why not? What's the replacement?

I was referring to this patch:
https://commitfest.postgresql.org/25/2272/
It happens that all our calls of strtol in core use a base of 10. But
please just ignore this part.

ReceiveAndUnpackTarFile() is in both libpqwalreceiver.c and
pg_basebackup.c. It would be nice to refactor that.
--
Michael

#19Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Michael Paquier (#18)
1 attachment(s)
Re: base backup client as auxiliary backend process

On 2019-11-07 05:16, Michael Paquier wrote:

The current defaults of pg_basebackup have been thought so as the
backups taken have a good stability and so as monitoring is eased
thanks to --wal-method=stream and the use of replication slots.
Shouldn't the use of a least a temporary replication slot be mandatory
for the stability of the copy? It seems to me that there is a good
argument for having a second process which streams WAL on top of the
main backup process, and just use a WAL receiver for that.

Is this something that the walreceiver should be doing independent of this
patch?

There could be an argument for switching a WAL receiver to use a
temporary replication slot by default. Still, it seems to me that
this backup solution suffers from the same set of problems we have
spent years to fix with pg_basebackup with missing WAL files caused by
concurrent checkpoints removing things needed while the copy of the
main data folder and other tablespaces happens.

I looked into this. It seems trivial to make walsender create and use a
temporary replication slot by default if no permanent replication slot
is specified. This is basically the logic that pg_basebackup has but
done server-side. See attached patch for a demonstration. Any reason
not to do that?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-walsender-uses-a-temporary-replication-slot-by-defau.patchtext/plain; charset=UTF-8; name=0001-walsender-uses-a-temporary-replication-slot-by-defau.patch; x-mac-creator=0; x-mac-type=0Download
From 6ae76011ece6a5900cc06e2350b0ccb930eb41a0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sat, 9 Nov 2019 22:10:19 +0100
Subject: [PATCH] walsender uses a temporary replication slot by default

---
 src/backend/replication/walsender.c   | 9 +++++++--
 src/bin/pg_basebackup/pg_basebackup.c | 8 +++++++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 7f5671504f..c16455f3f0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -563,6 +563,12 @@ StartReplication(StartReplicationCmd *cmd)
 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 					 (errmsg("cannot use a logical replication slot for physical replication"))));
 	}
+	else
+	{
+		char *slotname = psprintf("pg_walsender_%d", MyProcPid);
+
+		ReplicationSlotCreate(slotname, false, RS_TEMPORARY);
+	}
 
 	/*
 	 * Select the timeline. If it was given explicitly by the client, use
@@ -703,8 +709,7 @@ StartReplication(StartReplicationCmd *cmd)
 		Assert(streamingDoneSending && streamingDoneReceiving);
 	}
 
-	if (cmd->slotname)
-		ReplicationSlotRelease();
+	ReplicationSlotRelease();
 
 	/*
 	 * Copy is finished now. Send a single-row result set indicating the next
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a9d162a7da..687bd331dd 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -68,6 +68,11 @@ typedef struct TablespaceList
  */
 #define MINIMUM_VERSION_FOR_TEMP_SLOTS 100000
 
+/*
+ * Version 13 creates temporary slots automatically.
+ */
+#define MINIMUM_VERSION_FOR_AUTO_TEMP_SLOTS 130000
+
 /*
  * Different ways to include WAL
  */
@@ -569,7 +574,8 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
 			 "pg_xlog" : "pg_wal");
 
 	/* Temporary replication slots are only supported in 10 and newer */
-	if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_TEMP_SLOTS)
+	if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_TEMP_SLOTS ||
+		PQserverVersion(conn) >= MINIMUM_VERSION_FOR_AUTO_TEMP_SLOTS)
 		temp_replication_slot = false;
 
 	/*
-- 
2.24.0

In reply to: Peter Eisentraut (#19)
Re: base backup client as auxiliary backend process

Hello

Could you rebase patch please? I have errors during patch apply. CFbot checks latest demonstration patch.

I looked into this. It seems trivial to make walsender create and use a
temporary replication slot by default if no permanent replication slot
is specified. This is basically the logic that pg_basebackup has but
done server-side. See attached patch for a demonstration. Any reason
not to do that?

Seems this would break pg_basebackup --no-slot option?

+          Do not copy configuration files, that is, files that end in
+          <filename>.conf</filename>.

possible we need ignore *.signal files too?

+/*
+ * XXX copied from pg_basebackup.c
+ */
+
+unsigned long long totaldone;
+unsigned long long totalsize_kb;
+int tablespacenum;
+int tablespacecount;

Variable declaration in the middle of file is correct for coding style? Not a problem for me, I just want to clarify.
Should not be declared "static"?
Also how about tablespacedone instead of tablespacenum?

The updated has support for tablespaces without mapping. I'm thinking
about putting the mapping specification into a GUC list somehow.
Shouldn't be too hard.

I think we can leave tablespace mapping for pg_basebackup only. More powerful tool for less common scenarios. Or for another future patch.

regards, Sergei

#21Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Sergei Kornilov (#20)
3 attachment(s)
Re: base backup client as auxiliary backend process

On 2019-11-15 14:52, Sergei Kornilov wrote:

I looked into this. It seems trivial to make walsender create and use a
temporary replication slot by default if no permanent replication slot
is specified. This is basically the logic that pg_basebackup has but
done server-side. See attached patch for a demonstration. Any reason
not to do that?

Seems this would break pg_basebackup --no-slot option?

After thinking about this a bit more, doing the temporary slot stuff on
the walsender side might lead to too many complications in practice.

Here is another patch set that implements the temporary slot use on the
walreceiver side, essentially mirroring what pg_basebackup already does.

I think this patch set might be useful on its own, even without the base
backup stuff to follow.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Make-lsn-argument-walrcv_create_slot-optional.patchtext/plain; charset=UTF-8; name=0001-Make-lsn-argument-walrcv_create_slot-optional.patch; x-mac-creator=0; x-mac-type=0Download
From 00c816f2ee9b6b8c0668d17a596470a18c6092e1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 22 Nov 2019 11:04:26 +0100
Subject: [PATCH 1/3] Make lsn argument walrcv_create_slot() optional

Some callers are not using it, so it's wasteful to have to specify it.
---
 src/backend/commands/subscriptioncmds.c                     | 3 +--
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 6 ++++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 5408edcfc2..198aa6f4b1 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -428,7 +428,6 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 	 */
 	if (connect)
 	{
-		XLogRecPtr	lsn;
 		char	   *err;
 		WalReceiverConn *wrconn;
 		List	   *tables;
@@ -479,7 +478,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 				Assert(slotname);
 
 				walrcv_create_slot(wrconn, slotname, false,
-								   CRS_NOEXPORT_SNAPSHOT, &lsn);
+								   CRS_NOEXPORT_SNAPSHOT, NULL);
 				ereport(NOTICE,
 						(errmsg("created replication slot \"%s\" on publisher",
 								slotname)));
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 545d2fcd05..befedb811d 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -844,8 +844,10 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
 						slotname, pchomp(PQerrorMessage(conn->streamConn)))));
 	}
 
-	*lsn = DatumGetLSN(DirectFunctionCall1Coll(pg_lsn_in, InvalidOid,
-											   CStringGetDatum(PQgetvalue(res, 0, 1))));
+	if (lsn)
+		*lsn = DatumGetLSN(DirectFunctionCall1Coll(pg_lsn_in, InvalidOid,
+												   CStringGetDatum(PQgetvalue(res, 0, 1))));
+
 	if (!PQgetisnull(res, 0, 2))
 		snapshot = pstrdup(PQgetvalue(res, 0, 2));
 	else

base-commit: 4a0aab14dcb35550b55e623a3c194442c5666084
-- 
2.24.0

0002-Expose-PQbackendPID-through-walreceiver-API.patchtext/plain; charset=UTF-8; name=0002-Expose-PQbackendPID-through-walreceiver-API.patch; x-mac-creator=0; x-mac-type=0Download
From 3e42ec83d06dacf92063111e9dc1f033ef415e8a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 22 Nov 2019 11:07:48 +0100
Subject: [PATCH 2/3] Expose PQbackendPID() through walreceiver API

This will be used by a subsequent patch.
---
 .../replication/libpqwalreceiver/libpqwalreceiver.c   | 11 +++++++++++
 src/include/replication/walreceiver.h                 |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index befedb811d..ccc31f3cee 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -74,6 +74,7 @@ static char *libpqrcv_create_slot(WalReceiverConn *conn,
 								  bool temporary,
 								  CRSSnapshotAction snapshot_action,
 								  XLogRecPtr *lsn);
+static pid_t libpqrcv_get_backend_pid(WalReceiverConn *conn);
 static WalRcvExecResult *libpqrcv_exec(WalReceiverConn *conn,
 									   const char *query,
 									   const int nRetTypes,
@@ -93,6 +94,7 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = {
 	libpqrcv_receive,
 	libpqrcv_send,
 	libpqrcv_create_slot,
+	libpqrcv_get_backend_pid,
 	libpqrcv_exec,
 	libpqrcv_disconnect
 };
@@ -858,6 +860,15 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
 	return snapshot;
 }
 
+/*
+ * Return PID of remote backend process.
+ */
+static pid_t
+libpqrcv_get_backend_pid(WalReceiverConn *conn)
+{
+	return PQbackendPID(conn->streamConn);
+}
+
 /*
  * Convert tuple query result to tuplestore.
  */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index e12a934966..39be805172 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -226,6 +226,7 @@ typedef char *(*walrcv_create_slot_fn) (WalReceiverConn *conn,
 										const char *slotname, bool temporary,
 										CRSSnapshotAction snapshot_action,
 										XLogRecPtr *lsn);
+typedef pid_t (*walrcv_get_backend_pid_fn) (WalReceiverConn *conn);
 typedef WalRcvExecResult *(*walrcv_exec_fn) (WalReceiverConn *conn,
 											 const char *query,
 											 const int nRetTypes,
@@ -246,6 +247,7 @@ typedef struct WalReceiverFunctionsType
 	walrcv_receive_fn walrcv_receive;
 	walrcv_send_fn walrcv_send;
 	walrcv_create_slot_fn walrcv_create_slot;
+	walrcv_get_backend_pid_fn walrcv_get_backend_pid;
 	walrcv_exec_fn walrcv_exec;
 	walrcv_disconnect_fn walrcv_disconnect;
 } WalReceiverFunctionsType;
@@ -276,6 +278,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
 	WalReceiverFunctions->walrcv_send(conn, buffer, nbytes)
 #define walrcv_create_slot(conn, slotname, temporary, snapshot_action, lsn) \
 	WalReceiverFunctions->walrcv_create_slot(conn, slotname, temporary, snapshot_action, lsn)
+#define walrcv_get_backend_pid(conn) \
+	WalReceiverFunctions->walrcv_get_backend_pid(conn)
 #define walrcv_exec(conn, exec, nRetTypes, retTypes) \
 	WalReceiverFunctions->walrcv_exec(conn, exec, nRetTypes, retTypes)
 #define walrcv_disconnect(conn) \
-- 
2.24.0

0003-walreceiver-uses-a-temporary-replication-slot-by-def.patchtext/plain; charset=UTF-8; name=0003-walreceiver-uses-a-temporary-replication-slot-by-def.patch; x-mac-creator=0; x-mac-type=0Download
From c26defb3c50ceea9308b3441ac65bb28267e9f1c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 22 Nov 2019 11:09:26 +0100
Subject: [PATCH 3/3] walreceiver uses a temporary replication slot by default

If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot.  A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.
---
 doc/src/sgml/config.sgml                      | 20 +++++++++++++++++++
 doc/src/sgml/monitoring.sgml                  |  9 ++++++++-
 .../libpqwalreceiver/libpqwalreceiver.c       |  4 ++++
 src/backend/replication/walreceiver.c         | 14 +++++++++++++
 src/backend/utils/misc/guc.c                  |  9 +++++++++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/replication/walreceiver.h         |  1 +
 7 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d4d1fe45cc..af700f1edf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4126,6 +4126,26 @@ <title>Standby Servers</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-wal-receiver-create-temp-slot" xreflabel="wal_receiver_create_temp_slot">
+      <term><varname>wal_receiver_create_temp_slot</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>wal_receiver_create_temp_slot</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies whether a WAL receiver should create a temporary replication
+        slot on the remote instance when no permanent replication slot to use
+        has been configured (using <xref linkend="guc-primary-slot-name"/>).
+        The default is on.  The only reason to turn this off would be if the
+        remote instance is currently out of available replication slots.  This
+        parameter can only be set in the <filename>postgresql.conf</filename>
+        file or on the server command line.  Changes only take effect when the
+        WAL receiver process starts a new connection.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-wal-receiver-status-interval" xreflabel="wal_receiver_status_interval">
       <term><varname>wal_receiver_status_interval</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a3c5f86b7e..eac7aa44b6 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2113,7 +2113,14 @@ <title><structname>pg_stat_wal_receiver</structname> View</title>
     <row>
      <entry><structfield>slot_name</structfield></entry>
      <entry><type>text</type></entry>
-     <entry>Replication slot name used by this WAL receiver</entry>
+     <entry>
+      Replication slot name used by this WAL receiver.  This is only set if a
+      permanent replication slot is set using <xref
+      linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+      a temporary replication slot (determined by <xref
+      linkend="guc-wal-receiver-create-temp-slot"/>), but these are not shown
+      here.
+     </entry>
     </row>
     <row>
      <entry><structfield>sender_host</structfield></entry>
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index ccc31f3cee..e1c4c78217 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -834,6 +834,10 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
 				break;
 		}
 	}
+	else
+	{
+		appendStringInfoString(&cmd, " PHYSICAL RESERVE_WAL");
+	}
 
 	res = libpqrcv_PQexec(conn->streamConn, cmd.data);
 	pfree(cmd.data);
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index f54ae7690d..b11ecc5a12 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -72,6 +72,7 @@
 
 
 /* GUC variables */
+bool		wal_receiver_create_temp_slot;
 int			wal_receiver_status_interval;
 int			wal_receiver_timeout;
 bool		hot_standby_feedback;
@@ -345,6 +346,19 @@ WalReceiverMain(void)
 		 */
 		WalRcvFetchTimeLineHistoryFiles(startpointTLI, primaryTLI);
 
+		/*
+		 * Create temporary replication slot if no slot name is configured,
+		 * unless disabled.  Note that we don't copy the slot name into shared
+		 * memory, since it will go away when this walreceiver session ends.
+		 */
+		if (slotname[0] == '\0' && wal_receiver_create_temp_slot)
+		{
+			snprintf(slotname, sizeof(slotname),
+					 "pg_walreceiver_%d", walrcv_get_backend_pid(wrconn));
+
+			walrcv_create_slot(wrconn, slotname, true, 0, NULL);
+		}
+
 		/*
 		 * Start streaming.
 		 *
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ba4edde71a..4f590ad5af 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1959,6 +1959,15 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"wal_receiver_create_temp_slot", PGC_SIGHUP, REPLICATION_STANDBY,
+			gettext_noop("Sets whether a WAL receiver should create a temporary replication slot if no permanent slot is configured."),
+		},
+		&wal_receiver_create_temp_slot,
+		true,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 46a06ffacd..501c4e49ac 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -321,6 +321,7 @@
 #max_standby_streaming_delay = 30s	# max delay before canceling queries
 					# when reading streaming WAL;
 					# -1 allows indefinite delay
+#wal_receiver_create_temp_slot = on	# create temp slot if primary_slot_name not set
 #wal_receiver_status_interval = 10s	# send replies at least this often
 					# 0 disables
 #hot_standby_feedback = off		# send info from standby to prevent
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 39be805172..0c06b5c3de 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -23,6 +23,7 @@
 #include "utils/tuplestore.h"
 
 /* user-settable parameters */
+extern bool wal_receiver_create_temp_slot;
 extern int	wal_receiver_status_interval;
 extern int	wal_receiver_timeout;
 extern bool hot_standby_feedback;
-- 
2.24.0

#22Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#21)
Re: base backup client as auxiliary backend process

On Fri, Nov 22, 2019 at 11:21:53AM +0100, Peter Eisentraut wrote:

After thinking about this a bit more, doing the temporary slot stuff on the
walsender side might lead to too many complications in practice.

Here is another patch set that implements the temporary slot use on the
walreceiver side, essentially mirroring what pg_basebackup already does.

I have not looked at the patch, but controlling the generation of the
slot from the client feels much more natural to me. This reuses the
existing interface, which is consistent, and we avoid a new class of
bugs if there is any need to deal with the cleanup of the slot on the
WAL sender side it itself created.
--
Michael

#23Alexandra Wang
lewang@pivotal.io
In reply to: Michael Paquier (#22)
Re: base backup client as auxiliary backend process

Hi Peter,

On Fri, Nov 22, 2019 at 6:22 PM Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

Here is another patch set that implements the temporary slot use on the
walreceiver side, essentially mirroring what pg_basebackup already does.

I think this patch set might be useful on its own, even without the base
backup stuff to follow.

I very much like this idea of every replication connection should have a
replication slot, either permanent or temporary if user didn't specify. I
agree
that this patch is useful on its own.

This makes a whole bunch of things much nicer: The connection
information for where to get the base backup from comes from
postgresql.conf, so you only need to specify it in one place.
pg_basebackup is completely out of the picture; no need to deal with
command-line options, --recovery-conf, screen, monitoring for
completion, etc. If something fails, the base backup process can
automatically be restarted (maybe). Operating system integration is
much easier: You only call initdb and then pg_ctl or postgres, as you
are already doing. Automated deployment systems don't need to wait for
pg_basebackup to finish: You only call initdb, then start the server,
and then you're done -- waiting for the base backup to finish can be
done by the regular monitoring system.

Back to the base backup stuff, I don't quite understand all the benefits you
mentioned above. It seems to me the greatest benefit with this patch is that
postmaster takes care of pg_basebackup itself, which reduces the human wait
in
between running the pg_basebackup and pg_ctl/postgres commands. Is that
right?
I personally don't mind the --write-recovery-conf option because it helps me
write the primary_conninfo and primary_slot_name gucs into
postgresql.auto.conf, which to me as a developer is easier than editing
postgres.conf without automation. Sorry about the dumb question but what's
so
bad about --write-recovery-conf? Are you planning to completely replace
pg_basebackup with this? Is there any use case that a user just need a
basebackup but not immediately start the backend process?

Also the patch doesn't apply to master any more, need a rebase.

--
Alexandra

#24Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Peter Eisentraut (#21)
Re: base backup client as auxiliary backend process

On Fri, 22 Nov 2019 at 19:22, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 2019-11-15 14:52, Sergei Kornilov wrote:

I looked into this. It seems trivial to make walsender create and use a
temporary replication slot by default if no permanent replication slot
is specified. This is basically the logic that pg_basebackup has but
done server-side. See attached patch for a demonstration. Any reason
not to do that?

Seems this would break pg_basebackup --no-slot option?

After thinking about this a bit more, doing the temporary slot stuff on
the walsender side might lead to too many complications in practice.

Here is another patch set that implements the temporary slot use on the
walreceiver side, essentially mirroring what pg_basebackup already does.

I think this patch set might be useful on its own, even without the base
backup stuff to follow.

I agreed that these patches are useful on its own and 0001 patch and
0002 patch look good to me. For 0003 patch,

+      linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+      a temporary replication slot (determined by <xref
+      linkend="guc-wal-receiver-create-temp-slot"/>), but these are not shown
+      here.

I think it's better to show the temporary slot name on
pg_stat_wal_receiver view. Otherwise user would have no idea about
what wal receiver is using the temporary slot.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#25Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Masahiko Sawada (#24)
2 attachment(s)
Re: base backup client as auxiliary backend process

On 2020-01-10 04:32, Masahiko Sawada wrote:

I agreed that these patches are useful on its own and 0001 patch and

committed 0001

0002 patch look good to me. For 0003 patch,

+      linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+      a temporary replication slot (determined by <xref
+      linkend="guc-wal-receiver-create-temp-slot"/>), but these are not shown
+      here.

I think it's better to show the temporary slot name on
pg_stat_wal_receiver view. Otherwise user would have no idea about
what wal receiver is using the temporary slot.

Makes sense. It makes the code a bit more fiddly, but it seems worth
it. New patches attached.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-Expose-PQbackendPID-through-walreceiver-API.patchtext/plain; charset=UTF-8; name=v2-0001-Expose-PQbackendPID-through-walreceiver-API.patch; x-mac-creator=0; x-mac-type=0Download
From 2a089e6bd34b04e17f7b2918057d8e8eb04c117f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sat, 11 Jan 2020 10:22:08 +0100
Subject: [PATCH v2 1/2] Expose PQbackendPID() through walreceiver API

This will be used by a subsequent patch.
---
 .../replication/libpqwalreceiver/libpqwalreceiver.c   | 11 +++++++++++
 src/include/replication/walreceiver.h                 |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 658af71fec..b731d3fd04 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -74,6 +74,7 @@ static char *libpqrcv_create_slot(WalReceiverConn *conn,
 								  bool temporary,
 								  CRSSnapshotAction snapshot_action,
 								  XLogRecPtr *lsn);
+static pid_t libpqrcv_get_backend_pid(WalReceiverConn *conn);
 static WalRcvExecResult *libpqrcv_exec(WalReceiverConn *conn,
 									   const char *query,
 									   const int nRetTypes,
@@ -93,6 +94,7 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = {
 	libpqrcv_receive,
 	libpqrcv_send,
 	libpqrcv_create_slot,
+	libpqrcv_get_backend_pid,
 	libpqrcv_exec,
 	libpqrcv_disconnect
 };
@@ -858,6 +860,15 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
 	return snapshot;
 }
 
+/*
+ * Return PID of remote backend process.
+ */
+static pid_t
+libpqrcv_get_backend_pid(WalReceiverConn *conn)
+{
+	return PQbackendPID(conn->streamConn);
+}
+
 /*
  * Convert tuple query result to tuplestore.
  */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index a276237477..172cfa2862 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -226,6 +226,7 @@ typedef char *(*walrcv_create_slot_fn) (WalReceiverConn *conn,
 										const char *slotname, bool temporary,
 										CRSSnapshotAction snapshot_action,
 										XLogRecPtr *lsn);
+typedef pid_t (*walrcv_get_backend_pid_fn) (WalReceiverConn *conn);
 typedef WalRcvExecResult *(*walrcv_exec_fn) (WalReceiverConn *conn,
 											 const char *query,
 											 const int nRetTypes,
@@ -246,6 +247,7 @@ typedef struct WalReceiverFunctionsType
 	walrcv_receive_fn walrcv_receive;
 	walrcv_send_fn walrcv_send;
 	walrcv_create_slot_fn walrcv_create_slot;
+	walrcv_get_backend_pid_fn walrcv_get_backend_pid;
 	walrcv_exec_fn walrcv_exec;
 	walrcv_disconnect_fn walrcv_disconnect;
 } WalReceiverFunctionsType;
@@ -276,6 +278,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
 	WalReceiverFunctions->walrcv_send(conn, buffer, nbytes)
 #define walrcv_create_slot(conn, slotname, temporary, snapshot_action, lsn) \
 	WalReceiverFunctions->walrcv_create_slot(conn, slotname, temporary, snapshot_action, lsn)
+#define walrcv_get_backend_pid(conn) \
+	WalReceiverFunctions->walrcv_get_backend_pid(conn)
 #define walrcv_exec(conn, exec, nRetTypes, retTypes) \
 	WalReceiverFunctions->walrcv_exec(conn, exec, nRetTypes, retTypes)
 #define walrcv_disconnect(conn) \
-- 
2.24.1

v2-0002-walreceiver-uses-a-temporary-replication-slot-by-.patchtext/plain; charset=UTF-8; name=v2-0002-walreceiver-uses-a-temporary-replication-slot-by-.patch; x-mac-creator=0; x-mac-type=0Download
From c6ebd1275f8e2490c8a1a5dba981bdce53aafe20 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sat, 11 Jan 2020 10:24:49 +0100
Subject: [PATCH v2 2/2] walreceiver uses a temporary replication slot by
 default

If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot.  A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.
---
 doc/src/sgml/config.sgml                      | 20 +++++++++
 doc/src/sgml/monitoring.sgml                  |  9 +++-
 .../libpqwalreceiver/libpqwalreceiver.c       |  4 ++
 src/backend/replication/walreceiver.c         | 41 +++++++++++++++++++
 src/backend/utils/misc/guc.c                  |  9 ++++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/replication/walreceiver.h         |  7 ++++
 7 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d1c90282f..5d45b6f7cb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4124,6 +4124,26 @@ <title>Standby Servers</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-wal-receiver-create-temp-slot" xreflabel="wal_receiver_create_temp_slot">
+      <term><varname>wal_receiver_create_temp_slot</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>wal_receiver_create_temp_slot</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies whether a WAL receiver should create a temporary replication
+        slot on the remote instance when no permanent replication slot to use
+        has been configured (using <xref linkend="guc-primary-slot-name"/>).
+        The default is on.  The only reason to turn this off would be if the
+        remote instance is currently out of available replication slots.  This
+        parameter can only be set in the <filename>postgresql.conf</filename>
+        file or on the server command line.  Changes only take effect when the
+        WAL receiver process starts a new connection.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-wal-receiver-status-interval" xreflabel="wal_receiver_status_interval">
       <term><varname>wal_receiver_status_interval</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcb58115af..a2f5bbae66 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2117,7 +2117,14 @@ <title><structname>pg_stat_wal_receiver</structname> View</title>
     <row>
      <entry><structfield>slot_name</structfield></entry>
      <entry><type>text</type></entry>
-     <entry>Replication slot name used by this WAL receiver</entry>
+     <entry>
+      Replication slot name used by this WAL receiver.  This is only set if a
+      permanent replication slot is set using <xref
+      linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+      a temporary replication slot (determined by <xref
+      linkend="guc-wal-receiver-create-temp-slot"/>), but these are not shown
+      here.
+     </entry>
     </row>
     <row>
      <entry><structfield>sender_host</structfield></entry>
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index b731d3fd04..e4fd1f9bb6 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -834,6 +834,10 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
 				break;
 		}
 	}
+	else
+	{
+		appendStringInfoString(&cmd, " PHYSICAL RESERVE_WAL");
+	}
 
 	res = libpqrcv_PQexec(conn->streamConn, cmd.data);
 	pfree(cmd.data);
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 77360f1524..b464114333 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -73,6 +73,7 @@
 
 
 /* GUC variables */
+bool		wal_receiver_create_temp_slot;
 int			wal_receiver_status_interval;
 int			wal_receiver_timeout;
 bool		hot_standby_feedback;
@@ -169,6 +170,7 @@ WalReceiverMain(void)
 	char		conninfo[MAXCONNINFO];
 	char	   *tmp_conninfo;
 	char		slotname[NAMEDATALEN];
+	bool		is_temp_slot;
 	XLogRecPtr	startpoint;
 	TimeLineID	startpointTLI;
 	TimeLineID	primaryTLI;
@@ -230,6 +232,7 @@ WalReceiverMain(void)
 	walrcv->ready_to_display = false;
 	strlcpy(conninfo, (char *) walrcv->conninfo, MAXCONNINFO);
 	strlcpy(slotname, (char *) walrcv->slotname, NAMEDATALEN);
+	is_temp_slot = walrcv->is_temp_slot;
 	startpoint = walrcv->receiveStart;
 	startpointTLI = walrcv->receiveStartTLI;
 
@@ -345,6 +348,44 @@ WalReceiverMain(void)
 		 */
 		WalRcvFetchTimeLineHistoryFiles(startpointTLI, primaryTLI);
 
+		/*
+		 * Create temporary replication slot if no slot name is configured or
+		 * the slot from the previous run was temporary, unless
+		 * wal_receiver_create_temp_slot is disabled.  We also need to handle
+		 * the case where the previous run used a temporary slot but
+		 * wal_receiver_create_temp_slot was changed in the meantime.  In that
+		 * case, we delete the old slot name in shared memory.  (This would
+		 * all be a bit easier if we just didn't copy the slot name into
+		 * shared memory, since we won't need it again later, but then we
+		 * can't see the slot name in the stats views.)
+		 */
+		if (slotname[0] == '\0' || is_temp_slot)
+		{
+			bool		changed = false;
+
+			if (wal_receiver_create_temp_slot)
+			{
+				snprintf(slotname, sizeof(slotname),
+						 "pg_walreceiver_%d", walrcv_get_backend_pid(wrconn));
+
+				walrcv_create_slot(wrconn, slotname, true, 0, NULL);
+				changed = true;
+			}
+			else if (slotname[0] != '\0')
+			{
+				slotname[0] = '\0';
+				changed = true;
+			}
+
+			if (changed)
+			{
+				SpinLockAcquire(&walrcv->mutex);
+				strlcpy(walrcv->slotname, slotname, NAMEDATALEN);
+				walrcv->is_temp_slot = wal_receiver_create_temp_slot;
+				SpinLockRelease(&walrcv->mutex);
+			}
+		}
+
 		/*
 		 * Start streaming.
 		 *
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 62285792ec..e5f8a1301f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1969,6 +1969,15 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"wal_receiver_create_temp_slot", PGC_SIGHUP, REPLICATION_STANDBY,
+			gettext_noop("Sets whether a WAL receiver should create a temporary replication slot if no permanent slot is configured."),
+		},
+		&wal_receiver_create_temp_slot,
+		true,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 087190ce63..e1048c0047 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -321,6 +321,7 @@
 #max_standby_streaming_delay = 30s	# max delay before canceling queries
 					# when reading streaming WAL;
 					# -1 allows indefinite delay
+#wal_receiver_create_temp_slot = on	# create temp slot if primary_slot_name not set
 #wal_receiver_status_interval = 10s	# send replies at least this often
 					# 0 disables
 #hot_standby_feedback = off		# send info from standby to prevent
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 172cfa2862..e08afc6548 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -23,6 +23,7 @@
 #include "utils/tuplestore.h"
 
 /* user-settable parameters */
+extern bool wal_receiver_create_temp_slot;
 extern int	wal_receiver_status_interval;
 extern int	wal_receiver_timeout;
 extern bool hot_standby_feedback;
@@ -121,6 +122,12 @@ typedef struct
 	 */
 	char		slotname[NAMEDATALEN];
 
+	/*
+	 * If it's a temporary replication slot, it needs to be recreated when
+	 * connecting.
+	 */
+	bool		is_temp_slot;
+
 	/* set true once conninfo is ready to display (obfuscated pwds etc) */
 	bool		ready_to_display;
 
-- 
2.24.1

#26Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Peter Eisentraut (#25)
Re: base backup client as auxiliary backend process

On Sat, 11 Jan 2020 at 18:52, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 2020-01-10 04:32, Masahiko Sawada wrote:

I agreed that these patches are useful on its own and 0001 patch and

committed 0001

0002 patch look good to me. For 0003 patch,

+      linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+      a temporary replication slot (determined by <xref
+      linkend="guc-wal-receiver-create-temp-slot"/>), but these are not shown
+      here.

I think it's better to show the temporary slot name on
pg_stat_wal_receiver view. Otherwise user would have no idea about
what wal receiver is using the temporary slot.

Makes sense. It makes the code a bit more fiddly, but it seems worth
it. New patches attached.

Thank you for updating the patch!

-     <entry>Replication slot name used by this WAL receiver</entry>
+     <entry>
+      Replication slot name used by this WAL receiver.  This is only set if a
+      permanent replication slot is set using <xref
+      linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+      a temporary replication slot (determined by <xref
+      linkend="guc-wal-receiver-create-temp-slot"/>), but these are not shown
+      here.
+     </entry>

Now that the slot name is shown even if it's a temp slot the above
documentation changes needs to be changed. Other changes look good to
me.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#27Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Masahiko Sawada (#26)
Re: base backup client as auxiliary backend process

On 2020-01-14 07:32, Masahiko Sawada wrote:

-     <entry>Replication slot name used by this WAL receiver</entry>
+     <entry>
+      Replication slot name used by this WAL receiver.  This is only set if a
+      permanent replication slot is set using <xref
+      linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+      a temporary replication slot (determined by <xref
+      linkend="guc-wal-receiver-create-temp-slot"/>), but these are not shown
+      here.
+     </entry>

Now that the slot name is shown even if it's a temp slot the above
documentation changes needs to be changed. Other changes look good to
me.

committed, thanks

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#28Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Peter Eisentraut (#27)
Re: base backup client as auxiliary backend process

On Tue, 14 Jan 2020 at 22:58, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 2020-01-14 07:32, Masahiko Sawada wrote:

-     <entry>Replication slot name used by this WAL receiver</entry>
+     <entry>
+      Replication slot name used by this WAL receiver.  This is only set if a
+      permanent replication slot is set using <xref
+      linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+      a temporary replication slot (determined by <xref
+      linkend="guc-wal-receiver-create-temp-slot"/>), but these are not shown
+      here.
+     </entry>

Now that the slot name is shown even if it's a temp slot the above
documentation changes needs to be changed. Other changes look good to
me.

committed, thanks

Thank you for committing these patches.

Could you rebase the main patch that adds base backup client as
auxiliary backend process since the previous version patch (v3)
conflicts with the current HEAD?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#29Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Masahiko Sawada (#28)
1 attachment(s)
Re: base backup client as auxiliary backend process

On 2020-01-15 01:40, Masahiko Sawada wrote:

Could you rebase the main patch that adds base backup client as
auxiliary backend process since the previous version patch (v3)
conflicts with the current HEAD?

attached

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v4-0001-Base-backup-client-as-auxiliary-backend-process.patchtext/plain; charset=UTF-8; name=v4-0001-Base-backup-client-as-auxiliary-backend-process.patch; x-mac-creator=0; x-mac-type=0Download
From 892ba431956c7d936555f758efc874f58b3679e8 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 15 Jan 2020 16:15:06 +0100
Subject: [PATCH v4] Base backup client as auxiliary backend process

Discussion: https://www.postgresql.org/message-id/flat/61b8d18d-c922-ac99-b990-a31ba63cdcbb@2ndquadrant.com
---
 doc/src/sgml/protocol.sgml                    |  12 +-
 doc/src/sgml/ref/initdb.sgml                  |  17 +
 src/backend/access/transam/xlog.c             | 102 +++--
 src/backend/bootstrap/bootstrap.c             |   9 +
 src/backend/postmaster/pgstat.c               |   6 +
 src/backend/postmaster/postmaster.c           | 114 ++++-
 src/backend/replication/basebackup.c          |  70 +++
 .../libpqwalreceiver/libpqwalreceiver.c       | 400 ++++++++++++++++++
 src/backend/replication/repl_gram.y           |   9 +-
 src/backend/replication/repl_scanner.l        |   1 +
 src/backend/storage/file/fd.c                 |  36 +-
 src/bin/initdb/initdb.c                       |  39 +-
 src/bin/pg_resetwal/pg_resetwal.c             |   6 +-
 src/include/access/xlog.h                     |   8 +-
 src/include/miscadmin.h                       |   2 +
 src/include/pgstat.h                          |   1 +
 src/include/replication/basebackup.h          |   2 +
 src/include/replication/walreceiver.h         |   4 +
 src/include/storage/fd.h                      |   2 +-
 src/include/utils/guc.h                       |   2 +-
 src/test/recovery/t/018_basebackup.pl         |  29 ++
 21 files changed, 783 insertions(+), 88 deletions(-)
 create mode 100644 src/test/recovery/t/018_basebackup.pl

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 80275215e0..f54b820edf 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2466,7 +2466,7 @@ <title>Streaming Replication Protocol</title>
   </varlistentry>
 
   <varlistentry>
-    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ]
+    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>EXCLUDE_CONF</literal> ]
      <indexterm><primary>BASE_BACKUP</primary></indexterm>
     </term>
     <listitem>
@@ -2576,6 +2576,16 @@ <title>Streaming Replication Protocol</title>
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry>
+        <term><literal>EXCLUDE_CONF</literal></term>
+        <listitem>
+         <para>
+          Do not copy configuration files, that is, files that end in
+          <filename>.conf</filename>.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist>
      </para>
      <para>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..1261e02d59 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -286,6 +286,23 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-r</option></term>
+      <term><option>--replica</option></term>
+      <listitem>
+       <para>
+        Initialize a data directory for a physical replication replica.  The
+        data directory will not be initialized with a full database system,
+        but will instead only contain a minimal set of files.  A server that
+        is started on this data directory will first fetch a base backup and
+        then switch to standby mode.  The connection information for the base
+        backup has to be configured by setting <xref
+        linkend="guc-primary-conninfo"/>, and other parameters as desired,
+        before the server is started.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-S</option></term>
       <term><option>--sync-only</option></term>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7f4f784c0e..36c6cdef82 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -903,8 +903,6 @@ static void CheckRecoveryConsistency(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
-static void WriteControlFile(void);
-static void ReadControlFile(void);
 static char *str_time(pg_time_t tnow);
 static bool CheckForStandbyTrigger(void);
 
@@ -4494,7 +4492,7 @@ rescanLatestTimeLine(void)
  * ReadControlFile() verifies they are correct.  We could split out the
  * I/O and compatibility-check functions, but there seems no need currently.
  */
-static void
+void
 WriteControlFile(void)
 {
 	int			fd;
@@ -4585,7 +4583,7 @@ WriteControlFile(void)
 						XLOG_CONTROL_FILE)));
 }
 
-static void
+void
 ReadControlFile(void)
 {
 	pg_crc32c	crc;
@@ -5075,6 +5073,41 @@ XLOGShmemInit(void)
 	InitSharedLatch(&XLogCtl->recoveryWakeupLatch);
 }
 
+void
+InitControlFile(uint64 sysidentifier)
+{
+	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
+
+	/*
+	 * Generate a random nonce. This is used for authentication requests that
+	 * will fail because the user does not exist. The nonce is used to create
+	 * a genuine-looking password challenge for the non-existent user, in lieu
+	 * of an actual stored password.
+	 */
+	if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
+		ereport(PANIC,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate secret authorization token")));
+
+	memset(ControlFile, 0, sizeof(ControlFileData));
+	/* Initialize pg_control status fields */
+	ControlFile->system_identifier = sysidentifier;
+	memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
+	ControlFile->state = DB_SHUTDOWNED;
+	ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
+
+	/* Set important parameter values for use when replaying WAL */
+	ControlFile->MaxConnections = MaxConnections;
+	ControlFile->max_worker_processes = max_worker_processes;
+	ControlFile->max_wal_senders = max_wal_senders;
+	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_locks_per_xact = max_locks_per_xact;
+	ControlFile->wal_level = wal_level;
+	ControlFile->wal_log_hints = wal_log_hints;
+	ControlFile->track_commit_timestamp = track_commit_timestamp;
+	ControlFile->data_checksum_version = bootstrap_data_checksum_version;
+}
+
 /*
  * This func must be called ONCE on system install.  It creates pg_control
  * and the initial XLOG segment.
@@ -5090,7 +5123,6 @@ BootStrapXLOG(void)
 	char	   *recptr;
 	bool		use_existent;
 	uint64		sysidentifier;
-	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
 	struct timeval tv;
 	pg_crc32c	crc;
 
@@ -5111,17 +5143,6 @@ BootStrapXLOG(void)
 	sysidentifier |= ((uint64) tv.tv_usec) << 12;
 	sysidentifier |= getpid() & 0xFFF;
 
-	/*
-	 * Generate a random nonce. This is used for authentication requests that
-	 * will fail because the user does not exist. The nonce is used to create
-	 * a genuine-looking password challenge for the non-existent user, in lieu
-	 * of an actual stored password.
-	 */
-	if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
-		ereport(PANIC,
-				(errcode(ERRCODE_INTERNAL_ERROR),
-				 errmsg("could not generate secret authorization token")));
-
 	/* First timeline ID is always 1 */
 	ThisTimeLineID = 1;
 
@@ -5229,30 +5250,12 @@ BootStrapXLOG(void)
 	openLogFile = -1;
 
 	/* Now create pg_control */
-
-	memset(ControlFile, 0, sizeof(ControlFileData));
-	/* Initialize pg_control status fields */
-	ControlFile->system_identifier = sysidentifier;
-	memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
-	ControlFile->state = DB_SHUTDOWNED;
+	InitControlFile(sysidentifier);
 	ControlFile->time = checkPoint.time;
 	ControlFile->checkPoint = checkPoint.redo;
 	ControlFile->checkPointCopy = checkPoint;
-	ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
-
-	/* Set important parameter values for use when replaying WAL */
-	ControlFile->MaxConnections = MaxConnections;
-	ControlFile->max_worker_processes = max_worker_processes;
-	ControlFile->max_wal_senders = max_wal_senders;
-	ControlFile->max_prepared_xacts = max_prepared_xacts;
-	ControlFile->max_locks_per_xact = max_locks_per_xact;
-	ControlFile->wal_level = wal_level;
-	ControlFile->wal_log_hints = wal_log_hints;
-	ControlFile->track_commit_timestamp = track_commit_timestamp;
-	ControlFile->data_checksum_version = bootstrap_data_checksum_version;
 
 	/* some additional ControlFile fields are set in WriteControlFile() */
-
 	WriteControlFile();
 
 	/* Bootstrap the commit log, too */
@@ -6297,24 +6300,31 @@ StartupXLOG(void)
 	 */
 	ValidateXLOGDirectoryStructure();
 
-	/*----------
+	/*
 	 * If we previously crashed, perform a couple of actions:
+	 *
 	 *	- The pg_wal directory may still include some temporary WAL segments
-	 * used when creating a new segment, so perform some clean up to not
-	 * bloat this path.  This is done first as there is no point to sync this
-	 * temporary data.
-	 *	- There might be data which we had written, intending to fsync it,
-	 * but which we had not actually fsync'd yet. Therefore, a power failure
-	 * in the near future might cause earlier unflushed writes to be lost,
-	 * even though more recent data written to disk from here on would be
-	 * persisted.  To avoid that, fsync the entire data directory.
-	 *---------
+	 *    used when creating a new segment, so perform some clean up to not
+	 *    bloat this path.  This is done first as there is no point to sync
+	 *    this temporary data.
+	 *
+	 *	- There might be data which we had written, intending to fsync it, but
+	 *    which we had not actually fsync'd yet.  Therefore, a power failure
+	 *    in the near future might cause earlier unflushed writes to be lost,
+	 *    even though more recent data written to disk from here on would be
+	 *    persisted.  To avoid that, fsync the entire data directory.  Errors
+	 *    are logged but not considered fatal.  Aborting on error would result
+	 *    in failure to start for harmless cases such as read-only files in
+	 *    the data directory, and that's not good either.
+	 *
+	 *    Note that if we previously crashed due to a PANIC on fsync(), we'll
+	 *    be rewriting all changes again during recovery.
 	 */
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
 		RemoveTempXlogFiles();
-		SyncDataDirectory();
+		SyncDataDirectory(true, LOG);
 	}
 
 	/*
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index bfc629c753..c4d2bff0b3 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -36,6 +36,7 @@
 #include "postmaster/bgwriter.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
+#include "replication/basebackup.h"
 #include "replication/walreceiver.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
@@ -326,6 +327,9 @@ AuxiliaryProcessMain(int argc, char *argv[])
 			case StartupProcess:
 				statmsg = pgstat_get_backend_desc(B_STARTUP);
 				break;
+			case BaseBackupProcess:
+				statmsg = pgstat_get_backend_desc(B_BASE_BACKUP);
+				break;
 			case BgWriterProcess:
 				statmsg = pgstat_get_backend_desc(B_BG_WRITER);
 				break;
@@ -451,6 +455,11 @@ AuxiliaryProcessMain(int argc, char *argv[])
 			StartupProcessMain();
 			proc_exit(1);		/* should never return */
 
+		case BaseBackupProcess:
+			/* don't set signals, basebackup has its own agenda */
+			BaseBackupMain();
+			proc_exit(1);		/* should never return */
+
 		case BgWriterProcess:
 			/* don't set signals, bgwriter has its own agenda */
 			BackgroundWriterMain();
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 51c486bebd..f4bb4192b7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2927,6 +2927,9 @@ pgstat_bestart(void)
 			case StartupProcess:
 				lbeentry.st_backendType = B_STARTUP;
 				break;
+			case BaseBackupProcess:
+				lbeentry.st_backendType = B_BASE_BACKUP;
+				break;
 			case BgWriterProcess:
 				lbeentry.st_backendType = B_BG_WRITER;
 				break;
@@ -4285,6 +4288,9 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_BG_WORKER:
 			backendDesc = "background worker";
 			break;
+		case B_BASE_BACKUP:
+			backendDesc = "base backup";
+			break;
 		case B_BG_WRITER:
 			backendDesc = "background writer";
 			break;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 7a92dac525..07555a55d7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -116,6 +116,7 @@
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "replication/logicallauncher.h"
+#include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -248,6 +249,7 @@ bool		restart_after_crash = true;
 
 /* PIDs of special child processes; 0 when not running */
 static pid_t StartupPID = 0,
+			BaseBackupPID = 0,
 			BgWriterPID = 0,
 			CheckpointerPID = 0,
 			WalWriterPID = 0,
@@ -539,6 +541,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #endif							/* EXEC_BACKEND */
 
 #define StartupDataBase()		StartChildProcess(StartupProcess)
+#define StartBaseBackup()		StartChildProcess(BaseBackupProcess)
 #define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
 #define StartCheckpointer()		StartChildProcess(CheckpointerProcess)
 #define StartWalWriter()		StartChildProcess(WalWriterProcess)
@@ -572,6 +575,8 @@ PostmasterMain(int argc, char *argv[])
 	bool		listen_addr_saved = false;
 	int			i;
 	char	   *output_config_variable = NULL;
+	struct stat stat_buf;
+	bool		basebackup_signal_file_found = false;
 
 	InitProcessGlobals();
 
@@ -886,12 +891,27 @@ PostmasterMain(int argc, char *argv[])
 	/* Verify that DataDir looks reasonable */
 	checkDataDir();
 
-	/* Check that pg_control exists */
-	checkControlFile();
-
 	/* And switch working directory into it */
 	ChangeToDataDir();
 
+	if (stat(BASEBACKUP_SIGNAL_FILE, &stat_buf) == 0)
+	{
+		int         fd;
+
+		fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY,
+							   S_IRUSR | S_IWUSR);
+		if (fd >= 0)
+		{
+			(void) pg_fsync(fd);
+			close(fd);
+		}
+		basebackup_signal_file_found = true;
+	}
+
+	/* Check that pg_control exists */
+	if (!basebackup_signal_file_found)
+		checkControlFile();
+
 	/*
 	 * Check for invalid combinations of GUC settings.
 	 */
@@ -970,7 +990,8 @@ PostmasterMain(int argc, char *argv[])
 	 * processes will inherit the correct function pointer and not need to
 	 * repeat the test.
 	 */
-	LocalProcessControlFile(false);
+	if (!basebackup_signal_file_found)
+		LocalProcessControlFile(false);
 
 	/*
 	 * Initialize SSL library, if specified.
@@ -1386,6 +1407,39 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
 
+	if (basebackup_signal_file_found)
+	{
+		BaseBackupPID = StartBaseBackup();
+
+		/*
+		 * Wait until done.  Start WAL receiver in the meantime, once base
+		 * backup has received the starting position.
+		 */
+		while (BaseBackupPID != 0)
+		{
+			PG_SETMASK(&UnBlockSig);
+			pg_usleep(1000000L);
+			PG_SETMASK(&BlockSig);
+			MaybeStartWalReceiver();
+		}
+
+		/*
+		 * XXX Shut down WAL receiver.  It will be restarted later in xlog.c,
+		 * and that will complain if it's already running.
+		 */
+		ShutdownWalRcv();
+
+		/*
+		 * Base backup done, now signal standby mode.
+		 */
+		durable_rename(BASEBACKUP_SIGNAL_FILE, STANDBY_SIGNAL_FILE, FATAL);
+
+		/*
+		 * Reread the control file that came in with the base backup.
+		 */
+		ReadControlFile();
+	}
+
 	/*
 	 * We're ready to rock and roll...
 	 */
@@ -2665,6 +2719,8 @@ SIGHUP_handler(SIGNAL_ARGS)
 		SignalChildren(SIGHUP);
 		if (StartupPID != 0)
 			signal_child(StartupPID, SIGHUP);
+		if (BaseBackupPID != 0)
+			signal_child(BaseBackupPID, SIGHUP);
 		if (BgWriterPID != 0)
 			signal_child(BgWriterPID, SIGHUP);
 		if (CheckpointerPID != 0)
@@ -2824,6 +2880,8 @@ pmdie(SIGNAL_ARGS)
 
 			if (StartupPID != 0)
 				signal_child(StartupPID, SIGTERM);
+			if (BaseBackupPID != 0)
+				signal_child(BaseBackupPID, SIGTERM);
 			if (BgWriterPID != 0)
 				signal_child(BgWriterPID, SIGTERM);
 			if (WalReceiverPID != 0)
@@ -3062,6 +3120,23 @@ reaper(SIGNAL_ARGS)
 			continue;
 		}
 
+		/*
+		 * Was it the base backup process?
+		 */
+		if (pid == BaseBackupPID)
+		{
+			BaseBackupPID = 0;
+			if (EXIT_STATUS_0(exitstatus))
+				;
+			else if (EXIT_STATUS_1(exitstatus))
+				ereport(FATAL,
+						(errmsg("base backup failed")));
+			else
+				HandleChildCrash(pid, exitstatus,
+								 _("base backup process"));
+			continue;
+		}
+
 		/*
 		 * Was it the bgwriter?  Normal exit can be ignored; we'll start a new
 		 * one at the next iteration of the postmaster's main loop, if
@@ -3583,6 +3658,18 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 		StartupStatus = STARTUP_SIGNALED;
 	}
 
+	/* Take care of the base backup process too */
+	if (pid == BaseBackupPID)
+		BaseBackupPID = 0;
+	else if (BaseBackupPID != 0 && take_action)
+	{
+		ereport(DEBUG2,
+				(errmsg_internal("sending %s to process %d",
+								 (SendStop ? "SIGSTOP" : "SIGQUIT"),
+								 (int) BaseBackupPID)));
+		signal_child(BaseBackupPID, (SendStop ? SIGSTOP : SIGQUIT));
+	}
+
 	/* Take care of the bgwriter too */
 	if (pid == BgWriterPID)
 		BgWriterPID = 0;
@@ -3817,6 +3904,7 @@ PostmasterStateMachine(void)
 		if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER) == 0 &&
 			StartupPID == 0 &&
 			WalReceiverPID == 0 &&
+			BaseBackupPID == 0 &&
 			BgWriterPID == 0 &&
 			(CheckpointerPID == 0 ||
 			 (!FatalError && Shutdown < ImmediateShutdown)) &&
@@ -3911,6 +3999,7 @@ PostmasterStateMachine(void)
 			/* These other guys should be dead already */
 			Assert(StartupPID == 0);
 			Assert(WalReceiverPID == 0);
+			Assert(BaseBackupPID == 0);
 			Assert(BgWriterPID == 0);
 			Assert(CheckpointerPID == 0);
 			Assert(WalWriterPID == 0);
@@ -4094,6 +4183,8 @@ TerminateChildren(int signal)
 		if (signal == SIGQUIT || signal == SIGKILL)
 			StartupStatus = STARTUP_SIGNALED;
 	}
+	if (BaseBackupPID != 0)
+		signal_child(BgWriterPID, signal);
 	if (BgWriterPID != 0)
 		signal_child(BgWriterPID, signal);
 	if (CheckpointerPID != 0)
@@ -4919,6 +5010,7 @@ SubPostmasterMain(int argc, char *argv[])
 		strcmp(argv[1], "--forkavlauncher") == 0 ||
 		strcmp(argv[1], "--forkavworker") == 0 ||
 		strcmp(argv[1], "--forkboot") == 0 ||
+		strcmp(argv[1], "--forkbasebackup") == 0 ||
 		strncmp(argv[1], "--forkbgworker=", 15) == 0)
 		PGSharedMemoryReAttach();
 	else
@@ -4958,7 +5050,8 @@ SubPostmasterMain(int argc, char *argv[])
 	 * (re-)read control file, as it contains config. The postmaster will
 	 * already have read this, but this process doesn't know about that.
 	 */
-	LocalProcessControlFile(false);
+	if (strcmp(argv[1], "--forkbasebackup") != 0)
+		LocalProcessControlFile(false);
 
 	/*
 	 * Reload any libraries that were preloaded by the postmaster.  Since we
@@ -5019,7 +5112,8 @@ SubPostmasterMain(int argc, char *argv[])
 		/* And run the backend */
 		BackendRun(&port);		/* does not return */
 	}
-	if (strcmp(argv[1], "--forkboot") == 0)
+	if (strcmp(argv[1], "--forkboot") == 0 ||
+		strcmp(argv[1], "--forkbasebackup") == 0)
 	{
 		/* Restore basic shared memory pointers */
 		InitShmemAccess(UsedShmemSegAddr);
@@ -5431,7 +5525,7 @@ StartChildProcess(AuxProcType type)
 	av[ac++] = "postgres";
 
 #ifdef EXEC_BACKEND
-	av[ac++] = "--forkboot";
+	av[ac++] = (type == BaseBackupProcess) ? "--forkbasebackup" : "--forkboot";
 	av[ac++] = NULL;			/* filled in by postmaster_forkexec */
 #endif
 
@@ -5475,6 +5569,10 @@ StartChildProcess(AuxProcType type)
 				ereport(LOG,
 						(errmsg("could not fork startup process: %m")));
 				break;
+			case BaseBackupProcess:
+				ereport(LOG,
+						(errmsg("could not fork base backup process: %m")));
+				break;
 			case BgWriterProcess:
 				ereport(LOG,
 						(errmsg("could not fork background writer process: %m")));
@@ -5616,7 +5714,7 @@ static void
 MaybeStartWalReceiver(void)
 {
 	if (WalReceiverPID == 0 &&
-		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
+		(pmState == PM_INIT || pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 		 pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
 		Shutdown == NoShutdown)
 	{
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index dea8aab45e..e2f88a2f11 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -29,6 +29,7 @@
 #include "port.h"
 #include "postmaster/syslogger.h"
 #include "replication/basebackup.h"
+#include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
 #include "storage/bufpage.h"
@@ -38,6 +39,7 @@
 #include "storage/ipc.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/ps_status.h"
 #include "utils/relcache.h"
 #include "utils/timestamp.h"
@@ -121,6 +123,9 @@ static long long int total_checksum_failures;
 /* Do not verify checksums. */
 static bool noverify_checksums = false;
 
+/* Do not copy config files. */
+static bool exclude_conf = false;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -639,6 +644,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
 	bool		o_noverify_checksums = false;
+	bool		o_exclude_conf = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -727,6 +733,15 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			noverify_checksums = true;
 			o_noverify_checksums = true;
 		}
+		else if (strcmp(defel->defname, "exclude_conf") == 0)
+		{
+			if (o_exclude_conf)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("duplicate option \"%s\"", defel->defname)));
+			exclude_conf = true;
+			o_exclude_conf = true;
+		}
 		else
 			elog(ERROR, "option \"%s\" not recognized",
 				 defel->defname);
@@ -1136,6 +1151,18 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
 			continue;
 		}
 
+		if (exclude_conf)
+		{
+			char	   *dot = strrchr(de->d_name, '.');
+			if (dot && strcmp(dot, ".conf") == 0)
+			{
+				elog(DEBUG2,
+					 "configuration file \"%s\" excluded from backup",
+					 de->d_name);
+				continue;
+			}
+		}
+
 		snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
 
 		/* Skip pg_control here to back up it last */
@@ -1730,3 +1757,46 @@ throttle(size_t increment)
 	 */
 	throttled_last = GetCurrentTimestamp();
 }
+
+
+/*
+ * base backup worker process (client) main function
+ */
+void
+BaseBackupMain(void)
+{
+	WalReceiverConn *wrconn = NULL;
+	char	   *err;
+	TimeLineID	primaryTLI;
+	uint64		primary_sysid;
+
+	/* Load the libpq-specific functions */
+	load_file("libpqwalreceiver", false);
+	if (WalReceiverFunctions == NULL)
+		elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+
+	/* Establish the connection to the primary */
+	wrconn = walrcv_connect(PrimaryConnInfo, false, cluster_name[0] ? cluster_name : "basebackup", &err);
+	if (!wrconn)
+		ereport(ERROR,
+				(errmsg("could not connect to the primary server: %s", err)));
+
+	/*
+	 * Get the remote sysid and stick it into the local control file, so that
+	 * the walreceiver is happy.  The control file will later be overwritten
+	 * by the base backup.
+	 */
+	primary_sysid = strtoull(walrcv_identify_system(wrconn, &primaryTLI), NULL, 10);
+	InitControlFile(primary_sysid);
+	WriteControlFile();
+
+	walrcv_base_backup(wrconn);
+
+	walrcv_disconnect(wrconn);
+
+	SyncDataDirectory(false, ERROR);
+
+	ereport(LOG,
+			(errmsg("base backup completed")));
+	proc_exit(0);
+}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e4fd1f9bb6..52819d504c 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -17,20 +17,29 @@
 #include "postgres.h"
 
 #include <unistd.h>
+#include <sys/stat.h>
 #include <sys/time.h>
 
+#ifdef USE_SYSTEMD
+#include <systemd/sd-daemon.h>
+#endif
+
 #include "access/xlog.h"
 #include "catalog/pg_type.h"
+#include "common/string.h"
 #include "funcapi.h"
 #include "libpq-fe.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "pgtar.h"
 #include "pqexpbuffer.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
+#include "utils/ps_status.h"
 #include "utils/tuplestore.h"
 
 PG_MODULE_MAGIC;
@@ -61,6 +70,7 @@ static int	libpqrcv_server_version(WalReceiverConn *conn);
 static void libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
 											 TimeLineID tli, char **filename,
 											 char **content, int *len);
+static void libpqrcv_base_backup(WalReceiverConn *conn);
 static bool libpqrcv_startstreaming(WalReceiverConn *conn,
 									const WalRcvStreamOptions *options);
 static void libpqrcv_endstreaming(WalReceiverConn *conn,
@@ -89,6 +99,7 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = {
 	libpqrcv_identify_system,
 	libpqrcv_server_version,
 	libpqrcv_readtimelinehistoryfile,
+	libpqrcv_base_backup,
 	libpqrcv_startstreaming,
 	libpqrcv_endstreaming,
 	libpqrcv_receive,
@@ -358,6 +369,395 @@ libpqrcv_server_version(WalReceiverConn *conn)
 	return PQserverVersion(conn->streamConn);
 }
 
+/*
+ * XXX copied from pg_basebackup.c
+ */
+
+unsigned long long totaldone;
+unsigned long long totalsize_kb;
+int tablespacenum;
+int tablespacecount;
+
+static void
+base_backup_report_progress(void)
+{
+	int			percent;
+	char	   *progress;
+
+	percent = totalsize_kb ? (int) ((totaldone / 1024) * 100 / totalsize_kb) : 0;
+
+	/*
+	 * Avoid overflowing past 100% or the full size. This may make the total
+	 * size number change as we approach the end of the backup (the estimate
+	 * will always be wrong if WAL is included), but that's better than having
+	 * the done column be bigger than the total.
+	 */
+	if (percent > 100)
+		percent = 100;
+	if (totaldone / 1024 > totalsize_kb)
+		totalsize_kb = totaldone / 1024;
+
+	/* Note: no translation of ps status */
+	progress = psprintf((tablespacecount == 1 ?
+						 "%llu/%llu kB (%d%%), %d/%d tablespace" :
+						 "%llu/%llu kB (%d%%), %d/%d tablespaces"),
+						totaldone / 1024,
+						totalsize_kb,
+						percent,
+						tablespacenum,
+						tablespacecount);
+
+	set_ps_display(progress, false);
+#ifdef USE_SYSTEMD
+	sd_pid_notifyf(PostmasterPid, 0, "STATUS=base backup %s", progress);
+#endif
+
+	pfree(progress);
+}
+
+static void
+ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res)
+{
+	char		current_path[MAXPGPATH];
+	char		filename[MAXPGPATH];
+	pgoff_t		current_len_left = 0;
+	int			current_padding = 0;
+	char	   *copybuf = NULL;
+	FILE	   *file = NULL;
+	off_t		flush_offset;
+
+	strlcpy(current_path, DataDir, sizeof(current_path));
+
+	/*
+	 * Get the COPY data
+	 */
+	res = PQgetResult(conn);
+	if (PQresultStatus(res) != PGRES_COPY_OUT)
+		ereport(ERROR,
+				(errmsg("could not get COPY data stream: %s",
+						PQerrorMessage(conn))));
+
+	while (1)
+	{
+		int			r;
+
+		if (copybuf != NULL)
+		{
+			PQfreemem(copybuf);
+			copybuf = NULL;
+		}
+
+		r = PQgetCopyData(conn, &copybuf, 0);
+
+		if (r == -1)
+		{
+			/*
+			 * End of chunk
+			 */
+			if (file)
+				fclose(file);
+
+			break;
+		}
+		else if (r == -2)
+		{
+			ereport(ERROR,
+					(errmsg("could not read COPY data: %s",
+							PQerrorMessage(conn))));
+		}
+
+		if (file == NULL)
+		{
+			int			filemode;
+
+			/*
+			 * No current file, so this must be the header for a new file
+			 */
+			if (r != 512)
+				ereport(ERROR,
+						(errmsg("invalid tar block header size: %d", r)));
+
+			current_len_left = read_tar_number(&copybuf[124], 12);
+
+			/* Set permissions on the file */
+			filemode = read_tar_number(&copybuf[100], 8);
+
+			/*
+			 * All files are padded up to 512 bytes
+			 */
+			current_padding =
+				((current_len_left + 511) & ~511) - current_len_left;
+
+			/*
+			 * First part of header is zero terminated filename
+			 */
+			snprintf(filename, sizeof(filename), "%s/%s", current_path,
+					 copybuf);
+			if (filename[strlen(filename) - 1] == '/')
+			{
+				/*
+				 * Ends in a slash means directory or symlink to directory
+				 */
+				if (copybuf[156] == '5')
+				{
+					/*
+					 * Directory
+					 */
+					filename[strlen(filename) - 1] = '\0';	/* Remove trailing slash */
+					if (MakePGDirectory(filename) != 0)
+					{
+						if (errno != EEXIST)
+							ereport(ERROR,
+									(errcode_for_file_access(),
+									 errmsg("could not create directory \"%s\": %m",
+											filename)));
+					}
+#ifndef WIN32
+					if (chmod(filename, (mode_t) filemode))
+						ereport(ERROR,
+								(errcode_for_file_access(),
+								 errmsg("could not set permissions on directory \"%s\": %m",
+										filename)));
+#endif
+				}
+				/*
+				 * Symbolic link
+				 */
+				else if (copybuf[156] == '2')
+				{
+					/* TODO: tablespace mapping */
+					const char *tblspc_path = &copybuf[157];
+
+					filename[strlen(filename) - 1] = '\0';	/* Remove trailing slash */
+
+					if (symlink(tblspc_path, filename) != 0)
+						ereport(ERROR,
+								(errcode_for_file_access(),
+								 errmsg("could not create symbolic link from \"%s\" to \"%s\": %m",
+										filename, tblspc_path)));
+				}
+				else
+				{
+					ereport(ERROR,
+							(errmsg("unrecognized link indicator \"%c\"",
+									copybuf[156])));
+				}
+				continue;		/* directory or link handled */
+			}
+
+			/*
+			 * regular file
+			 */
+			file = fopen(filename, "wb");
+			if (!file)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 (errmsg("could not create file \"%s\": %m", filename))));
+
+			flush_offset = 0;
+
+#ifndef WIN32
+			if (chmod(filename, (mode_t) filemode))
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 (errmsg("could not set permissions on file \"%s\": %m",
+								 filename))));
+#endif
+
+			if (current_len_left == 0)
+			{
+				/*
+				 * Done with this file, next one will be a new tar header
+				 */
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+		}						/* new file */
+		else
+		{
+			/*
+			 * Continuing blocks in existing file
+			 */
+			if (current_len_left == 0 && r == current_padding)
+			{
+				/*
+				 * Received the padding block for this file, ignore it and
+				 * close the file, then move on to the next tar header.
+				 */
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+
+			if (fwrite(copybuf, r, 1, file) != 1)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not write to file \"%s\": %m", filename)));
+
+			pg_flush_data(fileno(file), flush_offset, r);
+			flush_offset += r;
+			totaldone += r;
+			base_backup_report_progress();
+
+			current_len_left -= r;
+			if (current_len_left == 0 && current_padding == 0)
+			{
+				/*
+				 * Received the last block, and there is no padding to be
+				 * expected. Close the file and move on to the next tar
+				 * header.
+				 */
+				fclose(file);
+				file = NULL;
+				continue;
+			}
+		}						/* continuing data in existing file */
+	}							/* loop over all data blocks */
+	base_backup_report_progress();
+
+	if (file != NULL)
+		ereport(ERROR,
+				(errmsg("COPY stream ended before last file was finished")));
+
+	if (copybuf != NULL)
+		PQfreemem(copybuf);
+}
+
+/*
+ * Make base backup from remote and write to local disk.
+ */
+static void
+libpqrcv_base_backup(WalReceiverConn *conn)
+{
+	StringInfoData stmt;
+	PGresult   *res;
+	char		xlogstart[64];
+	TimeLineID	starttli;
+	XLogRecPtr	recptr;
+	bool		error;
+
+	ereport(LOG,
+			(errmsg("initiating base backup, waiting for remote checkpoint to complete")));
+	set_ps_display("waiting for checkpoint", false);
+
+	initStringInfo(&stmt);
+	appendStringInfo(&stmt, "BASE_BACKUP PROGRESS NOWAIT EXCLUDE_CONF");
+	if (cluster_name && cluster_name[0])
+		appendStringInfo(&stmt, " LABEL %s", quote_literal_cstr(cluster_name));
+
+	if (PQsendQuery(conn->streamConn, stmt.data) == 0)
+		ereport(ERROR,
+				(errmsg("could not start base backup on remote server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+
+	/*
+	 * First result set: WAL start position and timeline ID
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not start base backup on remote server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) != 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("server returned unexpected response to BASE_BACKUP command; got %d rows and %d fields, expected %d rows and %d fields",
+						PQntuples(res), PQnfields(res), 1, 2)));
+	}
+
+	ereport(LOG,
+			(errmsg("remote checkpoint completed")));
+
+	strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart));
+	starttli = atoi(PQgetvalue(res, 0, 1));
+	PQclear(res);
+	elog(DEBUG1, "write-ahead log start point: %s on timeline %u",
+		 xlogstart, starttli);
+	recptr = pg_lsn_in_internal(xlogstart, &error);
+	if (error)
+		elog(ERROR, "invalid LSN received: %s", xlogstart);
+
+	/*
+	 * Second result set: tablespace information
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not get backup header: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) < 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("no data returned from server")));
+	}
+
+	totalsize_kb = totaldone = 0;
+	tablespacecount = PQntuples(res);
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		totalsize_kb += atol(PQgetvalue(res, i, 2));
+	}
+
+	RequestXLogStreaming(starttli, recptr, PrimaryConnInfo, PrimarySlotName);
+
+	/*
+	 * Start receiving chunks
+	 */
+	for (int i = 0; i < PQntuples(res); i++)
+	{
+		tablespacenum = i;
+		ReceiveAndUnpackTarFile(conn->streamConn, res);
+	}
+	tablespacenum++;
+	base_backup_report_progress();
+
+	PQclear(res);
+
+	/*
+	 * Final result set: WAL end position and timeline ID
+	 */
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("could not get write-ahead log end position from server: %s",
+						pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	if (PQntuples(res) != 1)
+	{
+		PQclear(res);
+		ereport(ERROR,
+				(errmsg("no write-ahead log end position returned from server")));
+	}
+	PQclear(res);
+
+	res = PQgetResult(conn->streamConn);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	{
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+		if (sqlstate &&
+			strcmp(sqlstate, "XX001" /*ERRCODE_DATA_CORRUPTED*/) == 0)
+			ereport(ERROR,
+					(errmsg("checksum error occurred")));
+		else
+			ereport(ERROR,
+					(errmsg("final receive failed: %s",
+							pchomp(PQerrorMessage(conn->streamConn)))));
+	}
+	PQclear(res);
+}
+
 /*
  * Start streaming WAL data from given streaming options.
  *
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 2d96567409..0a679ea0ee 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -78,6 +78,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_WAL
 %token K_TABLESPACE_MAP
 %token K_NOVERIFY_CHECKSUMS
+%token K_EXCLUDE_CONF
 %token K_TIMELINE
 %token K_PHYSICAL
 %token K_LOGICAL
@@ -154,8 +155,7 @@ var_name:	IDENT	{ $$ = $1; }
 		;
 
 /*
- * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
+ * BASE_BACKUP [option]...
  */
 base_backup:
 			K_BASE_BACKUP base_backup_opt_list
@@ -214,6 +214,11 @@ base_backup_opt:
 				  $$ = makeDefElem("noverify_checksums",
 								   (Node *)makeInteger(true), -1);
 				}
+			| K_EXCLUDE_CONF
+				{
+				  $$ = makeDefElem("exclude_conf",
+								   (Node *)makeInteger(true), -1);
+				}
 			;
 
 create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 14c9a1e798..c2e2aced7d 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -93,6 +93,7 @@ MAX_RATE		{ return K_MAX_RATE; }
 WAL			{ return K_WAL; }
 TABLESPACE_MAP			{ return K_TABLESPACE_MAP; }
 NOVERIFY_CHECKSUMS	{ return K_NOVERIFY_CHECKSUMS; }
+EXCLUDE_CONF			{ return K_EXCLUDE_CONF; }
 TIMELINE			{ return K_TIMELINE; }
 START_REPLICATION	{ return K_START_REPLICATION; }
 CREATE_REPLICATION_SLOT		{ return K_CREATE_REPLICATION_SLOT; }
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index fa79b45f63..6dd7e2f938 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3154,21 +3154,14 @@ looks_like_temp_rel_name(const char *name)
  * Other symlinks are presumed to point at files we're not responsible
  * for fsyncing, and might not have privileges to write at all.
  *
- * Errors are logged but not considered fatal; that's because this is used
- * only during database startup, to deal with the possibility that there are
- * issued-but-unsynced writes pending against the data directory.  We want to
- * ensure that such writes reach disk before anything that's done in the new
- * run.  However, aborting on error would result in failure to start for
- * harmless cases such as read-only files in the data directory, and that's
- * not good either.
- *
- * Note that if we previously crashed due to a PANIC on fsync(), we'll be
- * rewriting all changes again during recovery.
+ * If pre_sync is true, issue flush requests to the kernel before starting the
+ * actual fsync calls.  This can be skipped if the caller has already done it
+ * itself.
  *
  * Note we assume we're chdir'd into PGDATA to begin with.
  */
 void
-SyncDataDirectory(void)
+SyncDataDirectory(bool pre_sync, int loglevel)
 {
 	bool		xlog_is_symlink;
 
@@ -3187,7 +3180,7 @@ SyncDataDirectory(void)
 		struct stat st;
 
 		if (lstat("pg_wal", &st) < 0)
-			ereport(LOG,
+			ereport(loglevel,
 					(errcode_for_file_access(),
 					 errmsg("could not stat file \"%s\": %m",
 							"pg_wal")));
@@ -3201,15 +3194,18 @@ SyncDataDirectory(void)
 
 	/*
 	 * If possible, hint to the kernel that we're soon going to fsync the data
-	 * directory and its contents.  Errors in this step are even less
+	 * directory and its contents.  Errors in this step are less
 	 * interesting than normal, so log them only at DEBUG1.
 	 */
+	if (pre_sync)
+	{
 #ifdef PG_FLUSH_DATA_WORKS
-	walkdir(".", pre_sync_fname, false, DEBUG1);
-	if (xlog_is_symlink)
-		walkdir("pg_wal", pre_sync_fname, false, DEBUG1);
-	walkdir("pg_tblspc", pre_sync_fname, true, DEBUG1);
+		walkdir(".", pre_sync_fname, false, DEBUG1);
+		if (xlog_is_symlink)
+			walkdir("pg_wal", pre_sync_fname, false, DEBUG1);
+		walkdir("pg_tblspc", pre_sync_fname, true, DEBUG1);
 #endif
+	}
 
 	/*
 	 * Now we do the fsync()s in the same order.
@@ -3220,10 +3216,10 @@ SyncDataDirectory(void)
 	 * in pg_tblspc, they'll get fsync'd twice.  That's not an expected case
 	 * so we don't worry about optimizing it.
 	 */
-	walkdir(".", datadir_fsync_fname, false, LOG);
+	walkdir(".", datadir_fsync_fname, false, loglevel);
 	if (xlog_is_symlink)
-		walkdir("pg_wal", datadir_fsync_fname, false, LOG);
-	walkdir("pg_tblspc", datadir_fsync_fname, true, LOG);
+		walkdir("pg_wal", datadir_fsync_fname, false, loglevel);
+	walkdir("pg_tblspc", datadir_fsync_fname, true, loglevel);
 }
 
 /*
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ec6d0bdf8e..fc8ca5a65b 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -136,6 +136,7 @@ static char *pwfilename = NULL;
 static char *superuser_password = NULL;
 static const char *authmethodhost = NULL;
 static const char *authmethodlocal = NULL;
+static bool replica = false;
 static bool debug = false;
 static bool noclean = false;
 static bool do_sync = true;
@@ -2935,6 +2936,22 @@ initialize_data_directory(void)
 	/* Now create all the text config files */
 	setup_config();
 
+	/*
+	 * If data directory for replica requested, write basebackup.signal, and
+	 * then we are done here.
+	 */
+	if (replica)
+	{
+		char	   *path;
+		char	   *lines[1] = {NULL};
+
+		path = psprintf("%s/basebackup.signal", pg_data);
+		writefile(path, lines);
+		free(path);
+
+		return;
+	}
+
 	/* Bootstrap template1 */
 	bootstrap_template1();
 
@@ -3026,6 +3043,7 @@ main(int argc, char *argv[])
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
+		{"replica", no_argument, NULL, 'r'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3067,7 +3085,7 @@ main(int argc, char *argv[])
 
 	/* process command-line options */
 
-	while ((c = getopt_long(argc, argv, "dD:E:kL:nNU:WA:sST:X:g", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "dD:E:kL:nNrU:WA:sST:X:g", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -3113,6 +3131,9 @@ main(int argc, char *argv[])
 			case 'N':
 				do_sync = false;
 				break;
+			case 'r':
+				replica = true;
+				break;
 			case 'S':
 				sync_only = true;
 				break;
@@ -3334,9 +3355,19 @@ main(int argc, char *argv[])
 	/* translator: This is a placeholder in a shell command. */
 	appendPQExpBuffer(start_db_cmd, " -l %s start", _("logfile"));
 
-	printf(_("\nSuccess. You can now start the database server using:\n\n"
-			 "    %s\n\n"),
-		   start_db_cmd->data);
+	if (!replica)
+	{
+		printf(_("\nSuccess. You can now start the database server using:\n\n"
+				 "    %s\n\n"),
+			   start_db_cmd->data);
+	}
+	else
+	{
+		printf(_("\nSo far so good. Now configure the replication connection in\n"
+				 "postgresql.conf, and then start the database server using:\n\n"
+				 "    %s\n\n"),
+			   start_db_cmd->data);
+	}
 
 	destroyPQExpBuffer(start_db_cmd);
 
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index f9cfeae264..c9edeb54d3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -76,7 +76,7 @@ static int	WalSegSz;
 static int	set_wal_segsize;
 
 static void CheckDataVersion(void);
-static bool ReadControlFile(void);
+static bool read_controlfile(void);
 static void GuessControlValues(void);
 static void PrintControlValues(bool guessed);
 static void PrintNewControlValues(void);
@@ -393,7 +393,7 @@ main(int argc, char *argv[])
 	/*
 	 * Attempt to read the existing pg_control file
 	 */
-	if (!ReadControlFile())
+	if (!read_controlfile())
 		GuessControlValues();
 
 	/*
@@ -578,7 +578,7 @@ CheckDataVersion(void)
  * to the current format.  (Currently we don't do anything of the sort.)
  */
 static bool
-ReadControlFile(void)
+read_controlfile(void)
 {
 	int			fd;
 	int			len;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 98b033fc20..e56d85a96d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -127,8 +127,8 @@ extern char *archiveCleanupCommand;
 extern bool recoveryTargetInclusive;
 extern int	recoveryTargetAction;
 extern int	recovery_min_apply_delay;
-extern char *PrimaryConnInfo;
-extern char *PrimarySlotName;
+extern PGDLLIMPORT char *PrimaryConnInfo;
+extern PGDLLIMPORT char *PrimarySlotName;
 
 /* indirectly set via GUC system */
 extern TransactionId recoveryTargetXid;
@@ -298,6 +298,9 @@ extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
 extern void BootStrapXLOG(void);
 extern void LocalProcessControlFile(bool reset);
+extern void InitControlFile(uint64 sysidentifier);
+extern void WriteControlFile(void);
+extern void ReadControlFile(void);
 extern void StartupXLOG(void);
 extern void ShutdownXLOG(int code, Datum arg);
 extern void InitXLOGAccess(void);
@@ -354,6 +357,7 @@ extern void register_persistent_abort_backup_handler(void);
 extern SessionBackupState get_backup_status(void);
 
 /* File path names (all relative to $PGDATA) */
+#define BASEBACKUP_SIGNAL_FILE	"basebackup.signal"
 #define RECOVERY_SIGNAL_FILE	"recovery.signal"
 #define STANDBY_SIGNAL_FILE		"standby.signal"
 #define BACKUP_LABEL_FILE		"backup_label"
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 62d64aa0a1..38311b05a1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -396,6 +396,7 @@ typedef enum
 	CheckerProcess = 0,
 	BootstrapProcess,
 	StartupProcess,
+	BaseBackupProcess,
 	BgWriterProcess,
 	CheckpointerProcess,
 	WalWriterProcess,
@@ -408,6 +409,7 @@ extern AuxProcType MyAuxProcType;
 
 #define AmBootstrapProcess()		(MyAuxProcType == BootstrapProcess)
 #define AmStartupProcess()			(MyAuxProcType == StartupProcess)
+#define AmBaseBackupProcess()		(MyAuxProcType == BaseBackupProcess)
 #define AmBackgroundWriterProcess() (MyAuxProcType == BgWriterProcess)
 #define AmCheckpointerProcess()		(MyAuxProcType == CheckpointerProcess)
 #define AmWalWriterProcess()		(MyAuxProcType == WalWriterProcess)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 36b530bc27..2f87ef63a6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -721,6 +721,7 @@ typedef enum BackendType
 	B_AUTOVAC_LAUNCHER,
 	B_AUTOVAC_WORKER,
 	B_BACKEND,
+	B_BASE_BACKUP,
 	B_BG_WORKER,
 	B_BG_WRITER,
 	B_CHECKPOINTER,
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 07ed281bd6..0764a7b67b 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -33,4 +33,6 @@ extern void SendBaseBackup(BaseBackupCmd *cmd);
 
 extern int64 sendTablespace(char *path, bool sizeonly);
 
+extern void BaseBackupMain(void);
+
 #endif							/* _BASEBACKUP_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index e08afc6548..83fc4b3fb0 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -221,6 +221,7 @@ typedef void (*walrcv_readtimelinehistoryfile_fn) (WalReceiverConn *conn,
 												   TimeLineID tli,
 												   char **filename,
 												   char **content, int *size);
+typedef void (*walrcv_base_backup_fn) (WalReceiverConn *conn);
 typedef bool (*walrcv_startstreaming_fn) (WalReceiverConn *conn,
 										  const WalRcvStreamOptions *options);
 typedef void (*walrcv_endstreaming_fn) (WalReceiverConn *conn,
@@ -249,6 +250,7 @@ typedef struct WalReceiverFunctionsType
 	walrcv_identify_system_fn walrcv_identify_system;
 	walrcv_server_version_fn walrcv_server_version;
 	walrcv_readtimelinehistoryfile_fn walrcv_readtimelinehistoryfile;
+	walrcv_base_backup_fn walrcv_base_backup;
 	walrcv_startstreaming_fn walrcv_startstreaming;
 	walrcv_endstreaming_fn walrcv_endstreaming;
 	walrcv_receive_fn walrcv_receive;
@@ -275,6 +277,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
 	WalReceiverFunctions->walrcv_server_version(conn)
 #define walrcv_readtimelinehistoryfile(conn, tli, filename, content, size) \
 	WalReceiverFunctions->walrcv_readtimelinehistoryfile(conn, tli, filename, content, size)
+#define walrcv_base_backup(conn) \
+	WalReceiverFunctions->walrcv_base_backup(conn)
 #define walrcv_startstreaming(conn, options) \
 	WalReceiverFunctions->walrcv_startstreaming(conn, options)
 #define walrcv_endstreaming(conn, next_tli) \
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index c6ce7eacf2..5b3153455c 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -148,7 +148,7 @@ extern void fsync_fname(const char *fname, bool isdir);
 extern int	durable_rename(const char *oldfile, const char *newfile, int loglevel);
 extern int	durable_unlink(const char *fname, int loglevel);
 extern int	durable_link_or_rename(const char *oldfile, const char *newfile, int loglevel);
-extern void SyncDataDirectory(void);
+extern void SyncDataDirectory(bool pre_sync, int loglevel);
 extern int	data_sync_elevel(int elevel);
 
 /* Filename components */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index ce93ace76c..c087c51dbe 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -264,7 +264,7 @@ extern int	temp_file_limit;
 
 extern int	num_temp_buffers;
 
-extern char *cluster_name;
+extern PGDLLIMPORT char *cluster_name;
 extern PGDLLIMPORT char *ConfigFileName;
 extern char *HbaFileName;
 extern char *IdentFileName;
diff --git a/src/test/recovery/t/018_basebackup.pl b/src/test/recovery/t/018_basebackup.pl
new file mode 100644
index 0000000000..99731fc388
--- /dev/null
+++ b/src/test/recovery/t/018_basebackup.pl
@@ -0,0 +1,29 @@
+# Test basebackup worker functionality
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+my $node1 = get_new_node('node1');
+$node1->init(allows_streaming => 1);
+$node1->start;
+
+$node1->safe_psql('postgres',
+				  "CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
+
+my $node2 = get_new_node('node2');
+$node2->init(allows_streaming => 1, extra => [ '--replica' ]);
+$node2->append_conf('postgresql.conf', "primary_conninfo = '" . $node1->connstr . "'");
+my $old_mtime = (stat($node2->data_dir . '/postgresql.conf'))[9];
+$node2->start;
+
+$node1->wait_for_catchup($node2, 'replay', $node1->lsn('insert'));
+
+is($node2->safe_psql('postgres', "SELECT count(*) FROM tab_int"),
+   qq(1000),
+   'check content of standby');
+
+my $new_mtime = (stat($node2->data_dir . '/postgresql.conf'))[9];
+is($new_mtime, $old_mtime,
+   'configuration files were not copied');

base-commit: a166d408eb0b35023c169e765f4664c3b114b52e
-- 
2.24.1

#30Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Alexandra Wang (#23)
Re: base backup client as auxiliary backend process

On 2020-01-09 11:57, Alexandra Wang wrote:

Back to the base backup stuff, I don't quite understand all the benefits you
mentioned above. It seems to me the greatest benefit with this patch is that
postmaster takes care of pg_basebackup itself, which reduces the human
wait in
between running the pg_basebackup and pg_ctl/postgres commands. Is that
right?
I personally don't mind the --write-recovery-conf option because it helps me
write the primary_conninfo and primary_slot_name gucs into
postgresql.auto.conf, which to me as a developer is easier than editing
postgres.conf without automation.  Sorry about the dumb question but
what's so
bad about --write-recovery-conf?

Making it easier to automate is one major appeal of my proposal. The
current way of setting up a standby is very difficult to automate correctly.

Are you planning to completely replace
pg_basebackup with this? Is there any use case that a user just need a
basebackup but not immediately start the backend process?

I'm not planning to replace or change pg_basebackup.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#31Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Peter Eisentraut (#29)
Re: base backup client as auxiliary backend process

On Thu, 16 Jan 2020 at 00:17, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 2020-01-15 01:40, Masahiko Sawada wrote:

Could you rebase the main patch that adds base backup client as
auxiliary backend process since the previous version patch (v3)
conflicts with the current HEAD?

attached

Thanks. I used and briefly looked at this patch. Here are some comments:

1.
+        /*
+         * Wait until done.  Start WAL receiver in the meantime, once base
+         * backup has received the starting position.
+         */
+        while (BaseBackupPID != 0)
+        {
+            PG_SETMASK(&UnBlockSig);
+            pg_usleep(1000000L);
+            PG_SETMASK(&BlockSig);
+            MaybeStartWalReceiver();
+        }

Since the postmaster is sleeping the new connection hangs without any
message whereas normally we can get the message like "the database
system is starting up" during not accepting new connections. I think
some programs that checks the connectivity of PostgreSQL starting up
might not work fine with this. So many we might want to refuse all new
connections while waiting for taking basebackup.

2.
+    initStringInfo(&stmt);
+    appendStringInfo(&stmt, "BASE_BACKUP PROGRESS NOWAIT EXCLUDE_CONF");
+    if (cluster_name && cluster_name[0])

While using this patch I realized that the standby server cannot start
when the master server has larger value of some GUC parameter such as
max_connections and max_prepared_transactions than the default values.
And unlike taking basebackup using pg_basebacup or other methods the
database cluster initialized by this feature use default values for
all configuration parameters regardless of values in the master. So I
think it's better to include .conf files but we will end up with
overwriting the local .conf files instead. So I thought that
basebackup process can fetch .conf files from the master server and
add primary_conninfo to postgresql.auto.conf but I'm not sure.

3.
+    if (stat(BASEBACKUP_SIGNAL_FILE, &stat_buf) == 0)
+    {
+        int         fd;
+
+        fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY,
+                               S_IRUSR | S_IWUSR);
+        if (fd >= 0)
+        {
+            (void) pg_fsync(fd);
+            close(fd);
+        }
+        basebackup_signal_file_found = true;
+    }
+

Why do we open and just close the file?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#32Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#25)
Re: base backup client as auxiliary backend process

Hi,

On 2020-01-11 10:52:30 +0100, Peter Eisentraut wrote:

On 2020-01-10 04:32, Masahiko Sawada wrote:

I agreed that these patches are useful on its own and 0001 patch and

committed 0001

over on -committers Robert complained:

On 2020-01-23 15:49:37 -0500, Robert Haas wrote:

On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org> wrote:

walreceiver uses a temporary replication slot by default

If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot. A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.

Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: /messages/by-id/CA+fd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V+nqZA@mail.gmail.com

Neither the commit message for this patch nor any of the comments in
the patch seem to explain why this is a desirable change.

I assume that's probably discussed on the thread that is linked here,
but you shouldn't have to dig through the discussion thread to figure
out what the benefits of a change like this are.

which I fully agree with.

It's not at all clear to me that the potential downsides of this have
been fully thought through. And even if they have, they've not been
documented.

Previously if a standby without a slot was slow receiving WAL,
e.g. because the network bandwidth was insufficient, it'd at some point
just fail because the required WAL is removed. But with this patch that
won't happen - instead the primary will just run out of space. At the
very least this would need to add documentation of this caveat to a few
places.

Perhaps that's worth doing anyway, because it's probably more common for
a standby to just temporarily run behind - but given that this feature
doesn't actually provide any robustness, due to e.g. the possibility of
temporary disconnections or restarts, I'm not sure it's providing all
that much compared to the dangers, for a feature on by default.

Greetings,

Andres Freund

#33Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#29)
Re: base backup client as auxiliary backend process

Hi,

Comment:

- It'd be good to split out the feature independent refactorings, like
the introduction of InitControlFile(), into their own commit. Right
now it's hard to separate out what should just should be moved code,
and what should be behavioural changes. Especially when there's stuff
like just reindenting comments as part of the patch.

@@ -886,12 +891,27 @@ PostmasterMain(int argc, char *argv[])
/* Verify that DataDir looks reasonable */
checkDataDir();

- /* Check that pg_control exists */
- checkControlFile();
-
/* And switch working directory into it */
ChangeToDataDir();

+	if (stat(BASEBACKUP_SIGNAL_FILE, &stat_buf) == 0)
+	{
+		int         fd;
+
+		fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY,
+							   S_IRUSR | S_IWUSR);
+		if (fd >= 0)
+		{
+			(void) pg_fsync(fd);
+			close(fd);
+		}
+		basebackup_signal_file_found = true;
+	}
+
+	/* Check that pg_control exists */
+	if (!basebackup_signal_file_found)
+		checkControlFile();
+

This should be moved into its own function, rather than open coded in
PostmasterMain(). Imagine how PostmasterMain() would look if all the
check/initialization functions weren't extracted into functions.

/*
* Check for invalid combinations of GUC settings.
*/
@@ -970,7 +990,8 @@ PostmasterMain(int argc, char *argv[])
* processes will inherit the correct function pointer and not need to
* repeat the test.
*/
- LocalProcessControlFile(false);
+ if (!basebackup_signal_file_found)
+ LocalProcessControlFile(false);

/*
* Initialize SSL library, if specified.
@@ -1386,6 +1407,39 @@ PostmasterMain(int argc, char *argv[])
*/
AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);

+ if (basebackup_signal_file_found)
+ {

This imo *really* should be a separate function.

+		BaseBackupPID = StartBaseBackup();
+
+		/*
+		 * Wait until done.  Start WAL receiver in the meantime, once base
+		 * backup has received the starting position.
+		 */
+		while (BaseBackupPID != 0)
+		{
+			PG_SETMASK(&UnBlockSig);
+			pg_usleep(1000000L);
+			PG_SETMASK(&BlockSig);
+			MaybeStartWalReceiver();
+		}

Is there seriously no better signalling that we can use than just
looping for a couple hours?

Is it actully guaranteed that a compiler wouldn't just load
BaseBackupPID into a register, and never see a change to it done in a
signal handler?

There should be a note mentioning that we'll just FATAL out if the base
backup process fails. Otherwise it's the obvious question reading this
code. Also - we have handling to restart WAL receiver, but there's no
handling for the base backup temporarily failing: Is that just because
its easy to do in one, but not the other case?

+		/*
+		 * Reread the control file that came in with the base backup.
+		 */
+		ReadControlFile();
+	}

Is it actualy rereading? I'm just reading the diff, so maybe I'm missing
something, but you've made LocalProcessControlFile not enter this code
path...

@@ -2824,6 +2880,8 @@ pmdie(SIGNAL_ARGS)

if (StartupPID != 0)
signal_child(StartupPID, SIGTERM);
+			if (BaseBackupPID != 0)
+				signal_child(BaseBackupPID, SIGTERM);
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGTERM);
if (WalReceiverPID != 0)
@@ -3062,6 +3120,23 @@ reaper(SIGNAL_ARGS)

continue;
}

+		/*
+		 * Was it the base backup process?
+		 */
+		if (pid == BaseBackupPID)
+		{
+			BaseBackupPID = 0;
+			if (EXIT_STATUS_0(exitstatus))
+				;
+			else if (EXIT_STATUS_1(exitstatus))
+				ereport(FATAL,
+						(errmsg("base backup failed")));
+			else
+				HandleChildCrash(pid, exitstatus,
+								 _("base backup process"));
+			continue;
+		}
+

What's the error handling for the case we shut down either because of
SIGTERM above, or here? Does all the code just deal with that the next
start? If not, what makes this safe?

+/*
+ * base backup worker process (client) main function
+ */
+void
+BaseBackupMain(void)
+{
+	WalReceiverConn *wrconn = NULL;
+	char	   *err;
+	TimeLineID	primaryTLI;
+	uint64		primary_sysid;
+
+	/* Load the libpq-specific functions */
+	load_file("libpqwalreceiver", false);
+	if (WalReceiverFunctions == NULL)
+		elog(ERROR, "libpqwalreceiver didn't initialize correctly");
+
+	/* Establish the connection to the primary */
+	wrconn = walrcv_connect(PrimaryConnInfo, false, cluster_name[0] ? cluster_name : "basebackup", &err);
+	if (!wrconn)
+		ereport(ERROR,
+				(errmsg("could not connect to the primary server: %s", err)));
+
+	/*
+	 * Get the remote sysid and stick it into the local control file, so that
+	 * the walreceiver is happy.  The control file will later be overwritten
+	 * by the base backup.
+	 */
+	primary_sysid = strtoull(walrcv_identify_system(wrconn, &primaryTLI), NULL, 10);
+	InitControlFile(primary_sysid);
+	WriteControlFile();
+
+	walrcv_base_backup(wrconn);
+
+	walrcv_disconnect(wrconn);
+
+	SyncDataDirectory(false, ERROR);
+
+	ereport(LOG,
+			(errmsg("base backup completed")));
+	proc_exit(0);
+}
So there's no error handling here (as in a sigsetjmp)? Nor any signal
handlers set up, despite
+		case BaseBackupProcess:
+			/* don't set signals, basebackup has its own agenda */
+			BaseBackupMain();
+			proc_exit(1);		/* should never return */
+

You did set up forwarding of things like SIGHUP - but afaict that's not
correctly wired up?

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e4fd1f9bb6..52819d504c 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -17,20 +17,29 @@
+#include "pgtar.h"
#include "pqexpbuffer.h"
#include "replication/walreceiver.h"
#include "utils/builtins.h"
+#include "utils/guc.h"
#include "utils/memutils.h"
#include "utils/pg_lsn.h"
+#include "utils/ps_status.h"
#include "utils/tuplestore.h"
PG_MODULE_MAGIC;
@@ -61,6 +70,7 @@ static int	libpqrcv_server_version(WalReceiverConn *conn);
static void libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
TimeLineID tli, char **filename,
char **content, int *len);
+static void libpqrcv_base_backup(WalReceiverConn *conn);
static bool libpqrcv_startstreaming(WalReceiverConn *conn,
const WalRcvStreamOptions *options);
static void libpqrcv_endstreaming(WalReceiverConn *conn,
@@ -89,6 +99,7 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = {
libpqrcv_identify_system,
libpqrcv_server_version,
libpqrcv_readtimelinehistoryfile,
+	libpqrcv_base_backup,
libpqrcv_startstreaming,
libpqrcv_endstreaming,
libpqrcv_receive,
@@ -358,6 +369,395 @@ libpqrcv_server_version(WalReceiverConn *conn)
return PQserverVersion(conn->streamConn);
}
+/*
+ * XXX copied from pg_basebackup.c
+ */
+
+unsigned long long totaldone;
+unsigned long long totalsize_kb;
+int tablespacenum;
+int tablespacecount;
+
+static void
+base_backup_report_progress(void)
+{

Putting all of this into libpqwalreceiver.c seems like quite a
significant modularity violation. The header says:

* libpqwalreceiver.c
*
* This file contains the libpq-specific parts of walreceiver. It's
* loaded as a dynamic module to avoid linking the main server binary with
* libpq.

which really doesn't agree with all of the new stuff you're putting
here.

--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3154,21 +3154,14 @@ looks_like_temp_rel_name(const char *name)
* Other symlinks are presumed to point at files we're not responsible
* for fsyncing, and might not have privileges to write at all.
*
- * Errors are logged but not considered fatal; that's because this is used
- * only during database startup, to deal with the possibility that there are
- * issued-but-unsynced writes pending against the data directory.  We want to
- * ensure that such writes reach disk before anything that's done in the new
- * run.  However, aborting on error would result in failure to start for
- * harmless cases such as read-only files in the data directory, and that's
- * not good either.
- *
- * Note that if we previously crashed due to a PANIC on fsync(), we'll be
- * rewriting all changes again during recovery.
+ * If pre_sync is true, issue flush requests to the kernel before starting the
+ * actual fsync calls.  This can be skipped if the caller has already done it
+ * itself.
*

Huh, what happened with the previous comments here?

diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index f9cfeae264..c9edeb54d3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -76,7 +76,7 @@ static int	WalSegSz;
static int	set_wal_segsize;

static void CheckDataVersion(void);
-static bool ReadControlFile(void);
+static bool read_controlfile(void);
static void GuessControlValues(void);
static void PrintControlValues(bool guessed);
static void PrintNewControlValues(void);
@@ -393,7 +393,7 @@ main(int argc, char *argv[])
/*
* Attempt to read the existing pg_control file
*/
- if (!ReadControlFile())
+ if (!read_controlfile())
GuessControlValues();

/*
@@ -578,7 +578,7 @@ CheckDataVersion(void)
* to the current format. (Currently we don't do anything of the sort.)
*/
static bool
-ReadControlFile(void)
+read_controlfile(void)
{
int fd;
int len;

Huh?

Greetings,

Andres Freund

#34Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#32)
Re: base backup client as auxiliary backend process

On Mon, Feb 03, 2020 at 01:37:25AM -0800, Andres Freund wrote:

On 2020-01-23 15:49:37 -0500, Robert Haas wrote:

I assume that's probably discussed on the thread that is linked here,
but you shouldn't have to dig through the discussion thread to figure
out what the benefits of a change like this are.

which I fully agree with.

It's not at all clear to me that the potential downsides of this have
been fully thought through. And even if they have, they've not been
documented.

There is this, and please let me add a reference to another complaint
I had about this commit:
/messages/by-id/20200122055510.GH174860@paquier.xyz
--
Michael

#35Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Andres Freund (#32)
Re: base backup client as auxiliary backend process

On Mon, 3 Feb 2020 at 20:06, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-01-11 10:52:30 +0100, Peter Eisentraut wrote:

On 2020-01-10 04:32, Masahiko Sawada wrote:

I agreed that these patches are useful on its own and 0001 patch and

committed 0001

over on -committers Robert complained:

On 2020-01-23 15:49:37 -0500, Robert Haas wrote:

On Tue, Jan 14, 2020 at 8:57 AM Peter Eisentraut <peter@eisentraut.org> wrote:

walreceiver uses a temporary replication slot by default

If no permanent replication slot is configured using
primary_slot_name, the walreceiver now creates and uses a temporary
replication slot. A new setting wal_receiver_create_temp_slot can be
used to disable this behavior, for example, if the remote instance is
out of replication slots.

Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: /messages/by-id/CA+fd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V+nqZA@mail.gmail.com

Neither the commit message for this patch nor any of the comments in
the patch seem to explain why this is a desirable change.

I assume that's probably discussed on the thread that is linked here,
but you shouldn't have to dig through the discussion thread to figure
out what the benefits of a change like this are.

which I fully agree with.

It's not at all clear to me that the potential downsides of this have
been fully thought through. And even if they have, they've not been
documented.

Previously if a standby without a slot was slow receiving WAL,
e.g. because the network bandwidth was insufficient, it'd at some point
just fail because the required WAL is removed. But with this patch that
won't happen - instead the primary will just run out of space. At the
very least this would need to add documentation of this caveat to a few
places.

+1 to add downsides to the documentation.

It might not normally happen but with this parameter we will need to
have enough setting of max_replication_slots because the standby will
fail to start after failover due to full of slots.

WAL required by the standby could be removed on the primary due to the
standby delaying much, for example when the standby stopped for a long
time or when the standby is running but delayed for some reason. This
feature prevents WAL from removal for the latter case. That is, we can
ensure that required WAL is not removed during replication running.
For the former case we can use a permanent replication slot. Although
there is a risk of running out of space but I personally think this
behavior is better for most cases.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#36Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Andres Freund (#33)
3 attachment(s)
Re: base backup client as auxiliary backend process

On 2020-02-03 13:47, Andres Freund wrote:

Comment:

- It'd be good to split out the feature independent refactorings, like
the introduction of InitControlFile(), into their own commit. Right
now it's hard to separate out what should just should be moved code,
and what should be behavioural changes. Especially when there's stuff
like just reindenting comments as part of the patch.

Agreed. Here are three refactoring patches extracted that seem useful
on their own.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-pg_resetwal-Rename-function-to-avoid-potential-confl.patchtext/plain; charset=UTF-8; name=0001-pg_resetwal-Rename-function-to-avoid-potential-confl.patch; x-mac-creator=0; x-mac-type=0Download
From 49b51b362b4d86103c74057186154a42b9ef335b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 17 Feb 2020 17:35:48 +0100
Subject: [PATCH 1/3] pg_resetwal: Rename function to avoid potential conflict

ReadControlFile() here conflicts with a function of the same name in
xlog.c.  There is no actual conflict right now, but since
pg_resetwal.c reaches deep inside backend headers, it's possible in
the future.
---
 src/bin/pg_resetwal/pg_resetwal.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index f9cfeae264..c9edeb54d3 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -76,7 +76,7 @@ static int	WalSegSz;
 static int	set_wal_segsize;
 
 static void CheckDataVersion(void);
-static bool ReadControlFile(void);
+static bool read_controlfile(void);
 static void GuessControlValues(void);
 static void PrintControlValues(bool guessed);
 static void PrintNewControlValues(void);
@@ -393,7 +393,7 @@ main(int argc, char *argv[])
 	/*
 	 * Attempt to read the existing pg_control file
 	 */
-	if (!ReadControlFile())
+	if (!read_controlfile())
 		GuessControlValues();
 
 	/*
@@ -578,7 +578,7 @@ CheckDataVersion(void)
  * to the current format.  (Currently we don't do anything of the sort.)
  */
 static bool
-ReadControlFile(void)
+read_controlfile(void)
 {
 	int			fd;
 	int			len;
-- 
2.25.0

0002-Reformat-code-comment.patchtext/plain; charset=UTF-8; name=0002-Reformat-code-comment.patch; x-mac-creator=0; x-mac-type=0Download
From 91c816533a9e79623576a9303b2a793ee713eaed Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 17 Feb 2020 17:46:37 +0100
Subject: [PATCH 2/3] Reformat code comment

---
 src/backend/access/transam/xlog.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3813eadfb4..b017fd286f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6297,16 +6297,17 @@ StartupXLOG(void)
 
 	/*----------
 	 * If we previously crashed, perform a couple of actions:
-	 *	- The pg_wal directory may still include some temporary WAL segments
-	 * used when creating a new segment, so perform some clean up to not
-	 * bloat this path.  This is done first as there is no point to sync this
-	 * temporary data.
-	 *	- There might be data which we had written, intending to fsync it,
-	 * but which we had not actually fsync'd yet. Therefore, a power failure
-	 * in the near future might cause earlier unflushed writes to be lost,
-	 * even though more recent data written to disk from here on would be
-	 * persisted.  To avoid that, fsync the entire data directory.
-	 *---------
+	 *
+	 * - The pg_wal directory may still include some temporary WAL segments
+	 *   used when creating a new segment, so perform some clean up to not
+	 *   bloat this path.  This is done first as there is no point to sync
+	 *   this temporary data.
+	 *
+	 * - There might be data which we had written, intending to fsync it, but
+	 *   which we had not actually fsync'd yet.  Therefore, a power failure in
+	 *   the near future might cause earlier unflushed writes to be lost, even
+	 *   though more recent data written to disk from here on would be
+	 *   persisted.  To avoid that, fsync the entire data directory.
 	 */
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
-- 
2.25.0

0003-Factor-out-InitControlFile-from-BootStrapXLOG.patchtext/plain; charset=UTF-8; name=0003-Factor-out-InitControlFile-from-BootStrapXLOG.patch; x-mac-creator=0; x-mac-type=0Download
From 0cebb19f719f1894140f13739f4a1851a929186d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 17 Feb 2020 17:58:02 +0100
Subject: [PATCH 3/3] Factor out InitControlFile() from BootStrapXLOG()

Right now this only makes BootStrapXLOG() a bit more manageable, but
in the future there may be external callers.
---
 src/backend/access/transam/xlog.c | 70 +++++++++++++++++--------------
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b017fd286f..fd527f20c5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -903,6 +903,7 @@ static void CheckRecoveryConsistency(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
+static void InitControlFile(uint64 sysidentifier);
 static void WriteControlFile(void);
 static void ReadControlFile(void);
 static char *str_time(pg_time_t tnow);
@@ -4486,12 +4487,49 @@ rescanLatestTimeLine(void)
  * given a preloaded buffer, ReadControlFile() loads the buffer from
  * the pg_control file (during postmaster or standalone-backend startup),
  * and UpdateControlFile() rewrites pg_control after we modify xlog state.
+ * InitControlFile() fills the buffer with initial values.
  *
  * For simplicity, WriteControlFile() initializes the fields of pg_control
  * that are related to checking backend/database compatibility, and
  * ReadControlFile() verifies they are correct.  We could split out the
  * I/O and compatibility-check functions, but there seems no need currently.
  */
+
+static void
+InitControlFile(uint64 sysidentifier)
+{
+	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
+
+	/*
+	 * Generate a random nonce. This is used for authentication requests that
+	 * will fail because the user does not exist. The nonce is used to create
+	 * a genuine-looking password challenge for the non-existent user, in lieu
+	 * of an actual stored password.
+	 */
+	if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
+		ereport(PANIC,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate secret authorization token")));
+
+	memset(ControlFile, 0, sizeof(ControlFileData));
+	/* Initialize pg_control status fields */
+	ControlFile->system_identifier = sysidentifier;
+	memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
+	ControlFile->state = DB_SHUTDOWNED;
+	ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
+
+	/* Set important parameter values for use when replaying WAL */
+	ControlFile->MaxConnections = MaxConnections;
+	ControlFile->max_worker_processes = max_worker_processes;
+	ControlFile->max_wal_senders = max_wal_senders;
+	ControlFile->max_prepared_xacts = max_prepared_xacts;
+	ControlFile->max_locks_per_xact = max_locks_per_xact;
+	ControlFile->wal_level = wal_level;
+	ControlFile->wal_log_hints = wal_log_hints;
+	ControlFile->track_commit_timestamp = track_commit_timestamp;
+	ControlFile->data_checksum_version = bootstrap_data_checksum_version;
+}
+
 static void
 WriteControlFile(void)
 {
@@ -5088,7 +5126,6 @@ BootStrapXLOG(void)
 	char	   *recptr;
 	bool		use_existent;
 	uint64		sysidentifier;
-	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
 	struct timeval tv;
 	pg_crc32c	crc;
 
@@ -5109,17 +5146,6 @@ BootStrapXLOG(void)
 	sysidentifier |= ((uint64) tv.tv_usec) << 12;
 	sysidentifier |= getpid() & 0xFFF;
 
-	/*
-	 * Generate a random nonce. This is used for authentication requests that
-	 * will fail because the user does not exist. The nonce is used to create
-	 * a genuine-looking password challenge for the non-existent user, in lieu
-	 * of an actual stored password.
-	 */
-	if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
-		ereport(PANIC,
-				(errcode(ERRCODE_INTERNAL_ERROR),
-				 errmsg("could not generate secret authorization token")));
-
 	/* First timeline ID is always 1 */
 	ThisTimeLineID = 1;
 
@@ -5227,30 +5253,12 @@ BootStrapXLOG(void)
 	openLogFile = -1;
 
 	/* Now create pg_control */
-
-	memset(ControlFile, 0, sizeof(ControlFileData));
-	/* Initialize pg_control status fields */
-	ControlFile->system_identifier = sysidentifier;
-	memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
-	ControlFile->state = DB_SHUTDOWNED;
+	InitControlFile(sysidentifier);
 	ControlFile->time = checkPoint.time;
 	ControlFile->checkPoint = checkPoint.redo;
 	ControlFile->checkPointCopy = checkPoint;
-	ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
-
-	/* Set important parameter values for use when replaying WAL */
-	ControlFile->MaxConnections = MaxConnections;
-	ControlFile->max_worker_processes = max_worker_processes;
-	ControlFile->max_wal_senders = max_wal_senders;
-	ControlFile->max_prepared_xacts = max_prepared_xacts;
-	ControlFile->max_locks_per_xact = max_locks_per_xact;
-	ControlFile->wal_level = wal_level;
-	ControlFile->wal_log_hints = wal_log_hints;
-	ControlFile->track_commit_timestamp = track_commit_timestamp;
-	ControlFile->data_checksum_version = bootstrap_data_checksum_version;
 
 	/* some additional ControlFile fields are set in WriteControlFile() */
-
 	WriteControlFile();
 
 	/* Bootstrap the commit log, too */
-- 
2.25.0

#37Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Peter Eisentraut (#36)
Re: base backup client as auxiliary backend process

On 2020-02-17 18:42, Peter Eisentraut wrote:

On 2020-02-03 13:47, Andres Freund wrote:

Comment:

- It'd be good to split out the feature independent refactorings, like
the introduction of InitControlFile(), into their own commit. Right
now it's hard to separate out what should just should be moved code,
and what should be behavioural changes. Especially when there's stuff
like just reindenting comments as part of the patch.

Agreed. Here are three refactoring patches extracted that seem useful
on their own.

These have been committed.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#38Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Peter Eisentraut (#37)
Re: base backup client as auxiliary backend process

I have set this patch to "returned with feedback" in the upcoming commit
fest, because I will not be able to finish it.

Unsurprisingly, the sequencing of startup actions in postmaster.c is
extremely tricky and needs more thinking. All the rest worked pretty
well, I thought.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#27)
Re: base backup client as auxiliary backend process

On 2020-Jan-14, Peter Eisentraut wrote:

On 2020-01-14 07:32, Masahiko Sawada wrote:

-     <entry>Replication slot name used by this WAL receiver</entry>
+     <entry>
+      Replication slot name used by this WAL receiver.  This is only set if a
+      permanent replication slot is set using <xref
+      linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may use
+      a temporary replication slot (determined by <xref
+      linkend="guc-wal-receiver-create-temp-slot"/>), but these are not shown
+      here.
+     </entry>

Now that the slot name is shown even if it's a temp slot the above
documentation changes needs to be changed. Other changes look good to
me.

committed, thanks

Sergei has just proposed a change in semantics: if primary_slot_name is
specified as well as wal_receiver_create_temp_slot, then a temp slot is
used and it uses the specified name, instead of ignoring the temp-slot
option as currently.

Patch is at /messages/by-id/3109511585392143@myt6-887fb48a9c29.qloud-c.yandex.net

(To clarify: the current semantics if both options are set is that an
existing permanent slot is sought with the given name, and an error is
raised if it doesn't exist.)

What do you think? Preliminarly I think the proposed semantics are
saner.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services