pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments
Hi,
We've recently observed a case where, after a promotion, a postgres
server suddenly started to archive a large amount of old WAL.
After some digging the problem is this:
pg_basebackup -X creates files in pg_xlog/ without creating the
corresponding .done file. Note that walreceiver *does* create them. The
standby in this case, just like the master, had a significant
wal_keep_segments. RemoveOldXlogFiles() then, during recovery restart
points, calls XLogArchiveCheckDone() which in turn does:
/* Retry creation of the .ready file */
XLogArchiveNotify(xlog);
return false;
if there's neither a .done nor a .ready file present and archive_mode is
enabled. These segments then aren't removed because there's a .ready
present and they're never archived as long as the node is a standby
because we don't do archiving on standbys.
Once the node is promoted archiver will be started and suddenly archive
all these files - which might be months old.
And additional, at first strange, nice detail is that a lot of the
.ready files had nearly the same timestamps. Turns out that's due to
wal_keep_segments. Initially RemoveOldXlogFiles() doesn't process the
files because they're newer than allowed due to wal_keep_segments. Then
every checkpoint a couple segments would be old enough to reach
XLogArchiveCheckDone() which then'd create the .ready marker... But not
all at once :)
So I think we just need to make pg_basebackup create to .ready
files. Given that the walreceiver and restore_command already
unconditionally do XLogArchiveForceDone() I think we'd follow the
established precedent. Arguably it could make sense to archive files
again on the standby after a promotion as they aren't guaranteed to have
been on the then primary. But we don't have any infrastructure anyway
for that and walsender doesn't do so, so it doesn't seem to make any
sense to do that for pg_basebackup.
Independent from this bug, there's also some debatable behaviour about
what happens if a node with a high wal_keep_segments turns on
archive_mode. Suddenly all those old files are archived... I think it
might be a good idea to simply always create .done files when
archive_mode is disabled while a wal segment is finished.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 5, 2014 at 9:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,
We've recently observed a case where, after a promotion, a postgres
server suddenly started to archive a large amount of old WAL.After some digging the problem is this:
pg_basebackup -X creates files in pg_xlog/ without creating the
corresponding .done file. Note that walreceiver *does* create them. The
standby in this case, just like the master, had a significant
wal_keep_segments. RemoveOldXlogFiles() then, during recovery restart
points, calls XLogArchiveCheckDone() which in turn does:
/* Retry creation of the .ready file */
XLogArchiveNotify(xlog);
return false;
if there's neither a .done nor a .ready file present and archive_mode is
enabled. These segments then aren't removed because there's a .ready
present and they're never archived as long as the node is a standby
because we don't do archiving on standbys.
Once the node is promoted archiver will be started and suddenly archive
all these files - which might be months old.And additional, at first strange, nice detail is that a lot of the
.ready files had nearly the same timestamps. Turns out that's due to
wal_keep_segments. Initially RemoveOldXlogFiles() doesn't process the
files because they're newer than allowed due to wal_keep_segments. Then
every checkpoint a couple segments would be old enough to reach
XLogArchiveCheckDone() which then'd create the .ready marker... But not
all at once :)So I think we just need to make pg_basebackup create to .ready
files.
s/.ready/.done? If yes, +1.
Given that the walreceiver and restore_command already
unconditionally do XLogArchiveForceDone() I think we'd follow the
established precedent. Arguably it could make sense to archive files
again on the standby after a promotion as they aren't guaranteed to have
been on the then primary. But we don't have any infrastructure anyway
for that and walsender doesn't do so, so it doesn't seem to make any
sense to do that for pg_basebackup.Independent from this bug, there's also some debatable behaviour about
what happens if a node with a high wal_keep_segments turns on
archive_mode. Suddenly all those old files are archived... I think it
might be a good idea to simply always create .done files when
archive_mode is disabled while a wal segment is finished.
+1
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2014-12-05 16:18:02 +0900, Fujii Masao wrote:
On Fri, Dec 5, 2014 at 9:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
So I think we just need to make pg_basebackup create to .ready
files.s/.ready/.done? If yes, +1.
That unfortunately requires changes to both backend and pg_basebackup to
support fetch and stream modes respectively.
I've attached a preliminary patch for this. I'd appreciate feedback. I
plan to commit it in a couple of days, after some more
testing/rereading.
Given that the walreceiver and restore_command already
unconditionally do XLogArchiveForceDone() I think we'd follow the
established precedent. Arguably it could make sense to archive files
again on the standby after a promotion as they aren't guaranteed to have
been on the then primary. But we don't have any infrastructure anyway
for that and walsender doesn't do so, so it doesn't seem to make any
sense to do that for pg_basebackup.Independent from this bug, there's also some debatable behaviour about
what happens if a node with a high wal_keep_segments turns on
archive_mode. Suddenly all those old files are archived... I think it
might be a good idea to simply always create .done files when
archive_mode is disabled while a wal segment is finished.+1
I tend to think that's a master only change. Agreed?
Greetings,
Andres Freund
Attachments:
0001-Add-pg_string_endswith-as-the-start-of-a-string-help.patchtext/x-patch; charset=us-asciiDownload
>From 3db116bc5b9465f555957bb11ac6cb8b20c18405 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 31 Dec 2014 14:50:57 +0100
Subject: [PATCH 1/2] Add pg_string_endswith as the start of a string helper
library in src/common.
Backpatch to 9.3 where src/common was introduce, because a bugfix that
needs to be backpatched, requires the function. Earlier branches will
have to duplicate the code.
---
src/backend/replication/slot.c | 21 ++-------------------
src/common/Makefile | 2 +-
src/common/string.c | 43 ++++++++++++++++++++++++++++++++++++++++++
src/include/common/string.h | 15 +++++++++++++++
src/tools/msvc/Mkvcbuild.pm | 2 +-
5 files changed, 62 insertions(+), 21 deletions(-)
create mode 100644 src/common/string.c
create mode 100644 src/include/common/string.h
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 937b669..698ca6b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -40,6 +40,7 @@
#include <sys/stat.h>
#include "access/transam.h"
+#include "common/string.h"
#include "miscadmin.h"
#include "replication/slot.h"
#include "storage/fd.h"
@@ -780,24 +781,6 @@ CheckSlotRequirements(void)
}
/*
- * Returns whether the string `str' has the postfix `end'.
- */
-static bool
-string_endswith(const char *str, const char *end)
-{
- size_t slen = strlen(str);
- size_t elen = strlen(end);
-
- /* can't be a postfix if longer */
- if (elen > slen)
- return false;
-
- /* compare the end of the strings */
- str += slen - elen;
- return strcmp(str, end) == 0;
-}
-
-/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
@@ -864,7 +847,7 @@ StartupReplicationSlots(void)
continue;
/* we crashed while a slot was being setup or deleted, clean up */
- if (string_endswith(replication_de->d_name, ".tmp"))
+ if (pg_string_endswith(replication_de->d_name, ".tmp"))
{
if (!rmtree(path, true))
{
diff --git a/src/common/Makefile b/src/common/Makefile
index 7edbaaa..e5c345d 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -23,7 +23,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
LIBS += $(PTHREAD_LIBS)
-OBJS_COMMON = exec.o pgfnames.o psprintf.o relpath.o rmtree.o username.o wait_error.o
+OBJS_COMMON = exec.o pgfnames.o psprintf.o relpath.o rmtree.o string.o username.o wait_error.o
OBJS_FRONTEND = $(OBJS_COMMON) fe_memutils.o
diff --git a/src/common/string.c b/src/common/string.c
new file mode 100644
index 0000000..07a2aaf
--- /dev/null
+++ b/src/common/string.c
@@ -0,0 +1,43 @@
+/*-------------------------------------------------------------------------
+ *
+ * string.c
+ * string handling helpers
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/common/string.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/string.h"
+
+
+/*
+ * Returns whether the string `str' has the postfix `end'.
+ */
+bool
+pg_string_endswith(const char *str, const char *end)
+{
+ size_t slen = strlen(str);
+ size_t elen = strlen(end);
+
+ /* can't be a postfix if longer */
+ if (elen > slen)
+ return false;
+
+ /* compare the end of the strings */
+ str += slen - elen;
+ return strcmp(str, end) == 0;
+}
diff --git a/src/include/common/string.h b/src/include/common/string.h
new file mode 100644
index 0000000..3e1a650
--- /dev/null
+++ b/src/include/common/string.h
@@ -0,0 +1,15 @@
+/*
+ * string.h
+ * string handling helpers
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/string.h
+ */
+#ifndef COMMON_STRING_H
+#define COMMON_STRING_H
+
+extern bool pg_string_endswith(const char *str, const char *end);
+
+#endif /* COMMON_STRING_H */
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 4506739..4336f2e 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -76,7 +76,7 @@ sub mkvcbuild
push(@pgportfiles, 'rint.c') if ($vsVersion < '12.00');
our @pgcommonallfiles = qw(
- exec.c pgfnames.c psprintf.c relpath.c rmtree.c username.c wait_error.c);
+ exec.c pgfnames.c psprintf.c relpath.c rmtree.c string.c username.c wait_error.c);
our @pgcommonfrontendfiles = (@pgcommonallfiles, qw(fe_memutils.c));
--
2.2.0.rc0.18.ga1ad247
0002-Prevent-WAL-files-created-by-pg_basebackup-x-X-from-.patchtext/x-patch; charset=us-asciiDownload
>From 6cad7b85c4fb36c2c5675bb83070693a0a7b9232 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 31 Dec 2014 15:40:13 +0100
Subject: [PATCH 2/2] Prevent WAL files created by pg_basebackup -x/X from
being archived again.
WAL (and timeline history) files created by pg_basebackup did not
maintain the new base backup's archive status. That's currently not a
problem if the new node is used as a primary - but if that node is
promoted all still existing files can get archived again. With a high
wal_keep_segment settings that can be a significant time later - which
is quite confusing.
Change both the backend (for the -x fetch case) and pg_basebackup
itself to always mark files as .done. That's in line with
walreceiver.c doing so.
Backpatch to 9.1 where pg_basebackup was introduced.
---
src/backend/replication/basebackup.c | 24 ++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 32 +++++++++----
src/bin/pg_basebackup/pg_receivexlog.c | 2 +-
src/bin/pg_basebackup/receivelog.c | 87 ++++++++++++++++++++++++++--------
src/bin/pg_basebackup/receivelog.h | 3 +-
5 files changed, 117 insertions(+), 31 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbcecbb..24c3d8d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -471,6 +471,7 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
errmsg("unexpected WAL file size \"%s\"", walFiles[i])));
}
+ /* send the WAL file itself */
_tarWriteHeader(pathbuf, NULL, &statbuf);
while ((cnt = fread(buf, 1, Min(sizeof(buf), XLogSegSize - len), fp)) > 0)
@@ -497,7 +498,17 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
}
/* XLogSegSize is a multiple of 512, so no need for padding */
+
FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again
+ * after promotion of a new node. This is in line with
+ * walreceiver.c always doing a XLogArchiveForceDone() after a
+ * complete segment.
+ */
+ StatusFilePath(pathbuf, walFiles[i], ".done");
+ sendFileWithContent(pathbuf, "");
}
/*
@@ -521,6 +532,10 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
errmsg("could not stat file \"%s\": %m", pathbuf)));
sendFile(pathbuf, pathbuf, &statbuf, false);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
}
/* Send CopyDone message for the last tar file */
@@ -1021,6 +1036,15 @@ sendDir(char *path, int basepathlen, bool sizeonly, List *tablespaces)
_tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf);
}
size += 512; /* Size of the header just added */
+
+ /*
+ * Also send archive_status directory (by hackishly reusing
+ * statbuf from above ...).
+ */
+ if (!sizeonly)
+ _tarWriteHeader("./pg_xlog/archive_status", NULL, &statbuf);
+ size += 512; /* Size of the header just added */
+
continue; /* don't recurse into pg_xlog */
}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0470401..fc30a3c 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -25,6 +25,7 @@
#include <zlib.h>
#endif
+#include "common/string.h"
#include "getopt_long.h"
#include "libpq-fe.h"
#include "pqexpbuffer.h"
@@ -370,7 +371,7 @@ LogStreamerMain(logstreamer_param *param)
if (!ReceiveXlogStream(param->bgconn, param->startptr, param->timeline,
param->sysidentifier, param->xlogdir,
reached_end_position, standby_message_timeout,
- NULL, false))
+ NULL, false, true))
/*
* Any errors will already have been reported in the function process,
@@ -394,6 +395,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
logstreamer_param *param;
uint32 hi,
lo;
+ char *statusdir;
param = pg_malloc0(sizeof(logstreamer_param));
param->timeline = timeline;
@@ -428,13 +430,24 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
/* Error message already written in GetConnection() */
exit(1);
+ snprintf(param->xlogdir, sizeof(param->xlogdir), "%s/pg_xlog", basedir);
+
/*
- * Always in plain format, so we can write to basedir/pg_xlog. But the
- * directory entry in the tar file may arrive later, so make sure it's
- * created before we start.
+ * Create pg_xlog/archive_status (and thus pg_xlog) so we can can write to
+ * basedir/pg_xlog as the directory entry in the tar file may arrive
+ * later.
*/
- snprintf(param->xlogdir, sizeof(param->xlogdir), "%s/pg_xlog", basedir);
- verify_dir_is_empty_or_create(param->xlogdir);
+ statusdir = psprintf("%s/pg_xlog/archive_status", basedir);
+
+ if (pg_mkdir_p(statusdir, S_IRWXU) != 0 && errno != EEXIST)
+ {
+ fprintf(stderr,
+ _("%s: could not create directory \"%s\": %s\n"),
+ progname, param->xlogdir, strerror(errno));
+ disconnect_and_exit(1);
+ }
+
+ free(statusdir);
/*
* Start a child process and tell it to start streaming. On Unix, this is
@@ -1237,10 +1250,11 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* log directory location was specified, pg_xlog has
* already been created as a symbolic link before
* starting the actual backup. So just ignore failure
- * on them.
+ * on related directories.
*/
- if ((!streamwal && (strcmp(xlog_dir, "") == 0))
- || strcmp(filename + strlen(filename) - 8, "/pg_xlog") != 0)
+ if (errno == EEXIST &&
+ !pg_string_endswith(filename, "/pg_xlog") &&
+ !pg_string_endswith(filename, "/archive_status"))
{
fprintf(stderr,
_("%s: could not create directory \"%s\": %s\n"),
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 4658f08..b10da73 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -342,7 +342,7 @@ StreamLog(void)
ReceiveXlogStream(conn, startpos, starttli, NULL, basedir,
stop_streaming, standby_message_timeout, ".partial",
- synchronous);
+ synchronous, false);
PQfinish(conn);
conn = NULL;
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index f0f8760..5ce1c7a 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -37,7 +37,7 @@ static PGresult *HandleCopyStream(PGconn *conn, XLogRecPtr startpos,
uint32 timeline, char *basedir,
stream_stop_callback stream_stop, int standby_message_timeout,
char *partial_suffix, XLogRecPtr *stoppos,
- bool synchronous);
+ bool synchronous, bool mark_done);
static int CopyStreamPoll(PGconn *conn, long timeout_ms);
static int CopyStreamReceive(PGconn *conn, long timeout, char **buffer);
static bool ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
@@ -45,20 +45,40 @@ static bool ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
static bool ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
XLogRecPtr *blockpos, uint32 timeline,
char *basedir, stream_stop_callback stream_stop,
- char *partial_suffix);
+ char *partial_suffix, bool mark_done);
static PGresult *HandleEndOfCopyStream(PGconn *conn, char *copybuf,
XLogRecPtr blockpos, char *basedir, char *partial_suffix,
- XLogRecPtr *stoppos);
+ XLogRecPtr *stoppos, bool mark_done);
static bool CheckCopyStreamStop(PGconn *conn, XLogRecPtr blockpos,
uint32 timeline, char *basedir,
stream_stop_callback stream_stop,
- char *partial_suffix, XLogRecPtr *stoppos);
+ char *partial_suffix, XLogRecPtr *stoppos,
+ bool mark_done);
static long CalculateCopyStreamSleeptime(int64 now, int standby_message_timeout,
int64 last_status);
static bool ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos,
uint32 *timeline);
+static bool
+mark_file_as_archived(const char *basedir, const char *fname)
+{
+ int fd;
+ static char tmppath[MAXPGPATH];
+
+ snprintf(tmppath, sizeof(tmppath), "%s/archive_status/%s.done",
+ basedir, fname);
+
+ fd = open(tmppath, O_WRONLY | O_CREAT | PG_BINARY, S_IRUSR | S_IWUSR);
+ if (fd < 0)
+ {
+ fprintf(stderr, _("%s: could not create archive status file \"%s\": %s\n"),
+ progname, tmppath, strerror(errno));
+ return false;
+ }
+ return false;
+}
+
/*
* Open a new WAL file in the specified directory.
*
@@ -152,7 +172,7 @@ open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir,
* and returns false, otherwise returns true.
*/
static bool
-close_walfile(char *basedir, char *partial_suffix, XLogRecPtr pos)
+close_walfile(char *basedir, char *partial_suffix, XLogRecPtr pos, bool mark_done)
{
off_t currpos;
@@ -206,6 +226,19 @@ close_walfile(char *basedir, char *partial_suffix, XLogRecPtr pos)
_("%s: not renaming \"%s%s\", segment is not complete\n"),
progname, current_walfile_name, partial_suffix);
+ /*
+ * Mark file as archived if requested by the caller - pg_basebackup needs
+ * to do so as files can otherwise get archived again after promotion of a
+ * new node. This is in line with walreceiver.c always doing a
+ * XLogArchiveForceDone() after a complete segment.
+ */
+ if (currpos == XLOG_SEG_SIZE && mark_done)
+ {
+ /* writes error message if failed */
+ if (!mark_file_as_archived(basedir, current_walfile_name))
+ return false;
+ }
+
lastFlushPosition = pos;
return true;
}
@@ -248,7 +281,8 @@ existsTimeLineHistoryFile(char *basedir, TimeLineID tli)
}
static bool
-writeTimeLineHistoryFile(char *basedir, TimeLineID tli, char *filename, char *content)
+writeTimeLineHistoryFile(char *basedir, TimeLineID tli, char *filename,
+ char *content, bool mark_done)
{
int size = strlen(content);
char path[MAXPGPATH];
@@ -327,6 +361,14 @@ writeTimeLineHistoryFile(char *basedir, TimeLineID tli, char *filename, char *co
return false;
}
+ /* Maintain archive_status, check close_walfile() for details. */
+ if (mark_done)
+ {
+ /* writes error message if failed */
+ if (!mark_file_as_archived(basedir, histfname))
+ return false;
+ }
+
return true;
}
@@ -447,7 +489,7 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
char *sysidentifier, char *basedir,
stream_stop_callback stream_stop,
int standby_message_timeout, char *partial_suffix,
- bool synchronous)
+ bool synchronous, bool mark_done)
{
char query[128];
char slotcmd[128];
@@ -562,7 +604,8 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
/* Write the history file to disk */
writeTimeLineHistoryFile(basedir, timeline,
PQgetvalue(res, 0, 0),
- PQgetvalue(res, 0, 1));
+ PQgetvalue(res, 0, 1),
+ mark_done);
PQclear(res);
}
@@ -592,7 +635,7 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
/* Stream the WAL */
res = HandleCopyStream(conn, startpos, timeline, basedir, stream_stop,
standby_message_timeout, partial_suffix,
- &stoppos, synchronous);
+ &stoppos, synchronous, mark_done);
if (res == NULL)
goto error;
@@ -757,7 +800,7 @@ static PGresult *
HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
char *basedir, stream_stop_callback stream_stop,
int standby_message_timeout, char *partial_suffix,
- XLogRecPtr *stoppos, bool synchronous)
+ XLogRecPtr *stoppos, bool synchronous, bool mark_done)
{
char *copybuf = NULL;
int64 last_status = -1;
@@ -775,7 +818,8 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
* Check if we should continue streaming, or abort at this point.
*/
if (!CheckCopyStreamStop(conn, blockpos, timeline, basedir,
- stream_stop, partial_suffix, stoppos))
+ stream_stop, partial_suffix, stoppos,
+ mark_done))
goto error;
now = feGetCurrentTimestamp();
@@ -830,7 +874,8 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
if (r == -2)
{
PGresult *res = HandleEndOfCopyStream(conn, copybuf, blockpos,
- basedir, partial_suffix, stoppos);
+ basedir, partial_suffix,
+ stoppos, mark_done);
if (res == NULL)
goto error;
else
@@ -847,14 +892,16 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
else if (copybuf[0] == 'w')
{
if (!ProcessXLogDataMsg(conn, copybuf, r, &blockpos,
- timeline, basedir, stream_stop, partial_suffix))
+ timeline, basedir, stream_stop,
+ partial_suffix, true))
goto error;
/*
* Check if we should continue streaming, or abort at this point.
*/
if (!CheckCopyStreamStop(conn, blockpos, timeline, basedir,
- stream_stop, partial_suffix, stoppos))
+ stream_stop, partial_suffix, stoppos,
+ mark_done))
goto error;
}
else
@@ -1055,7 +1102,7 @@ static bool
ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
XLogRecPtr *blockpos, uint32 timeline,
char *basedir, stream_stop_callback stream_stop,
- char *partial_suffix)
+ char *partial_suffix, bool mark_done)
{
int xlogoff;
int bytes_left;
@@ -1163,7 +1210,7 @@ ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
/* Did we reach the end of a WAL segment? */
if (*blockpos % XLOG_SEG_SIZE == 0)
{
- if (!close_walfile(basedir, partial_suffix, *blockpos))
+ if (!close_walfile(basedir, partial_suffix, *blockpos, mark_done))
/* Error message written in close_walfile() */
return false;
@@ -1193,7 +1240,7 @@ ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
static PGresult *
HandleEndOfCopyStream(PGconn *conn, char *copybuf,
XLogRecPtr blockpos, char *basedir, char *partial_suffix,
- XLogRecPtr *stoppos)
+ XLogRecPtr *stoppos, bool mark_done)
{
PGresult *res = PQgetResult(conn);
@@ -1204,7 +1251,7 @@ HandleEndOfCopyStream(PGconn *conn, char *copybuf,
*/
if (still_sending)
{
- if (!close_walfile(basedir, partial_suffix, blockpos))
+ if (!close_walfile(basedir, partial_suffix, blockpos, mark_done))
{
/* Error message written in close_walfile() */
PQclear(res);
@@ -1236,11 +1283,11 @@ HandleEndOfCopyStream(PGconn *conn, char *copybuf,
static bool
CheckCopyStreamStop(PGconn *conn, XLogRecPtr blockpos, uint32 timeline,
char *basedir, stream_stop_callback stream_stop,
- char *partial_suffix, XLogRecPtr *stoppos)
+ char *partial_suffix, XLogRecPtr *stoppos, bool mark_done)
{
if (still_sending && stream_stop(blockpos, timeline, false))
{
- if (!close_walfile(basedir, partial_suffix, blockpos))
+ if (!close_walfile(basedir, partial_suffix, blockpos, mark_done))
{
/* Potential error message is written by close_walfile */
return false;
diff --git a/src/bin/pg_basebackup/receivelog.h b/src/bin/pg_basebackup/receivelog.h
index 9dd7005..1f64a74 100644
--- a/src/bin/pg_basebackup/receivelog.h
+++ b/src/bin/pg_basebackup/receivelog.h
@@ -31,6 +31,7 @@ extern bool ReceiveXlogStream(PGconn *conn,
stream_stop_callback stream_stop,
int standby_message_timeout,
char *partial_suffix,
- bool synchronous);
+ bool synchronous,
+ bool mark_done);
#endif /* RECEIVELOG_H */
--
2.2.0.rc0.18.ga1ad247
On 2014-12-31 16:32:19 +0100, Andres Freund wrote:
On 2014-12-05 16:18:02 +0900, Fujii Masao wrote:
On Fri, Dec 5, 2014 at 9:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
So I think we just need to make pg_basebackup create to .ready
files.s/.ready/.done? If yes, +1.
That unfortunately requires changes to both backend and pg_basebackup to
support fetch and stream modes respectively.I've attached a preliminary patch for this. I'd appreciate feedback. I
plan to commit it in a couple of days, after some more
testing/rereading.
Attached are two updated patches that I am starting to backport
now. I've fixed a couple minor oversights. And tested the patches.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Add-pg_string_endswith-as-the-start-of-a-string-help.patchtext/x-patch; charset=us-asciiDownload
>From 7140b41651a5a7a7bfdbe2ed7192bf0a2475e57f Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 3 Jan 2015 15:57:49 +0100
Subject: [PATCH 1/2] Add pg_string_endswith as the start of a string helper
library in src/common.
Backpatch to 9.3 where src/common was introduce, because a bugfix that
needs to be backpatched, requires the function. Earlier branches will
have to duplicate the code.
---
src/backend/replication/slot.c | 21 ++-------------------
src/common/Makefile | 2 +-
src/common/string.c | 43 ++++++++++++++++++++++++++++++++++++++++++
src/include/common/string.h | 15 +++++++++++++++
src/tools/msvc/Mkvcbuild.pm | 2 +-
5 files changed, 62 insertions(+), 21 deletions(-)
create mode 100644 src/common/string.c
create mode 100644 src/include/common/string.h
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 937b669..8708616 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -40,6 +40,7 @@
#include <sys/stat.h>
#include "access/transam.h"
+#include "common/string.h"
#include "miscadmin.h"
#include "replication/slot.h"
#include "storage/fd.h"
@@ -780,24 +781,6 @@ CheckSlotRequirements(void)
}
/*
- * Returns whether the string `str' has the postfix `end'.
- */
-static bool
-string_endswith(const char *str, const char *end)
-{
- size_t slen = strlen(str);
- size_t elen = strlen(end);
-
- /* can't be a postfix if longer */
- if (elen > slen)
- return false;
-
- /* compare the end of the strings */
- str += slen - elen;
- return strcmp(str, end) == 0;
-}
-
-/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
@@ -864,7 +847,7 @@ StartupReplicationSlots(void)
continue;
/* we crashed while a slot was being setup or deleted, clean up */
- if (string_endswith(replication_de->d_name, ".tmp"))
+ if (pg_str_endswith(replication_de->d_name, ".tmp"))
{
if (!rmtree(path, true))
{
diff --git a/src/common/Makefile b/src/common/Makefile
index 7edbaaa..e5c345d 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -23,7 +23,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
LIBS += $(PTHREAD_LIBS)
-OBJS_COMMON = exec.o pgfnames.o psprintf.o relpath.o rmtree.o username.o wait_error.o
+OBJS_COMMON = exec.o pgfnames.o psprintf.o relpath.o rmtree.o string.o username.o wait_error.o
OBJS_FRONTEND = $(OBJS_COMMON) fe_memutils.o
diff --git a/src/common/string.c b/src/common/string.c
new file mode 100644
index 0000000..27e0743
--- /dev/null
+++ b/src/common/string.c
@@ -0,0 +1,43 @@
+/*-------------------------------------------------------------------------
+ *
+ * string.c
+ * string handling helpers
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/common/string.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/string.h"
+
+
+/*
+ * Returns whether the string `str' has the postfix `end'.
+ */
+bool
+pg_str_endswith(const char *str, const char *end)
+{
+ size_t slen = strlen(str);
+ size_t elen = strlen(end);
+
+ /* can't be a postfix if longer */
+ if (elen > slen)
+ return false;
+
+ /* compare the end of the strings */
+ str += slen - elen;
+ return strcmp(str, end) == 0;
+}
diff --git a/src/include/common/string.h b/src/include/common/string.h
new file mode 100644
index 0000000..0233858
--- /dev/null
+++ b/src/include/common/string.h
@@ -0,0 +1,15 @@
+/*
+ * string.h
+ * string handling helpers
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/string.h
+ */
+#ifndef COMMON_STRING_H
+#define COMMON_STRING_H
+
+extern bool pg_str_endswith(const char *str, const char *end);
+
+#endif /* COMMON_STRING_H */
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 4506739..4336f2e 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -76,7 +76,7 @@ sub mkvcbuild
push(@pgportfiles, 'rint.c') if ($vsVersion < '12.00');
our @pgcommonallfiles = qw(
- exec.c pgfnames.c psprintf.c relpath.c rmtree.c username.c wait_error.c);
+ exec.c pgfnames.c psprintf.c relpath.c rmtree.c string.c username.c wait_error.c);
our @pgcommonfrontendfiles = (@pgcommonallfiles, qw(fe_memutils.c));
--
2.2.0.rc0.18.ga1ad247
0002-Prevent-WAL-files-created-by-pg_basebackup-x-X-from-.patchtext/x-patch; charset=us-asciiDownload
>From c589afe5d61edf574b4f88b75ab1baef97a01e11 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 3 Jan 2015 15:57:49 +0100
Subject: [PATCH 2/2] Prevent WAL files created by pg_basebackup -x/X from
being archived again.
WAL (and timeline history) files created by pg_basebackup did not
maintain the new base backup's archive status. That's currently not a
problem if the new node is used as a standby - but if that node is
promoted all still existing files can get archived again. With a high
wal_keep_segment settings that can happen a significant time later -
which is quite confusing.
Change both the backend (for the -x/-X fetch case) and pg_basebackup
(for -X stream) itself to always mark WAL/timeline files included in
the base backup as .done. That's in line with walreceiver.c doing so.
The verbosity of the pg_basebackup changes show pretty clearly that it
needs some refactoring, but that'd result in not be backpatchable
changes.
Backpatch to 9.1 where pg_basebackup was introduced.
---
src/backend/replication/basebackup.c | 24 +++++++++
src/bin/pg_basebackup/pg_basebackup.c | 34 ++++++++----
src/bin/pg_basebackup/pg_receivexlog.c | 2 +-
src/bin/pg_basebackup/receivelog.c | 97 +++++++++++++++++++++++++++-------
src/bin/pg_basebackup/receivelog.h | 3 +-
5 files changed, 128 insertions(+), 32 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbcecbb..24c3d8d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -471,6 +471,7 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
errmsg("unexpected WAL file size \"%s\"", walFiles[i])));
}
+ /* send the WAL file itself */
_tarWriteHeader(pathbuf, NULL, &statbuf);
while ((cnt = fread(buf, 1, Min(sizeof(buf), XLogSegSize - len), fp)) > 0)
@@ -497,7 +498,17 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
}
/* XLogSegSize is a multiple of 512, so no need for padding */
+
FreeFile(fp);
+
+ /*
+ * Mark file as archived, otherwise files can get archived again
+ * after promotion of a new node. This is in line with
+ * walreceiver.c always doing a XLogArchiveForceDone() after a
+ * complete segment.
+ */
+ StatusFilePath(pathbuf, walFiles[i], ".done");
+ sendFileWithContent(pathbuf, "");
}
/*
@@ -521,6 +532,10 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
errmsg("could not stat file \"%s\": %m", pathbuf)));
sendFile(pathbuf, pathbuf, &statbuf, false);
+
+ /* unconditionally mark file as archived */
+ StatusFilePath(pathbuf, fname, ".done");
+ sendFileWithContent(pathbuf, "");
}
/* Send CopyDone message for the last tar file */
@@ -1021,6 +1036,15 @@ sendDir(char *path, int basepathlen, bool sizeonly, List *tablespaces)
_tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf);
}
size += 512; /* Size of the header just added */
+
+ /*
+ * Also send archive_status directory (by hackishly reusing
+ * statbuf from above ...).
+ */
+ if (!sizeonly)
+ _tarWriteHeader("./pg_xlog/archive_status", NULL, &statbuf);
+ size += 512; /* Size of the header just added */
+
continue; /* don't recurse into pg_xlog */
}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0470401..429976b 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -25,6 +25,7 @@
#include <zlib.h>
#endif
+#include "common/string.h"
#include "getopt_long.h"
#include "libpq-fe.h"
#include "pqexpbuffer.h"
@@ -370,7 +371,7 @@ LogStreamerMain(logstreamer_param *param)
if (!ReceiveXlogStream(param->bgconn, param->startptr, param->timeline,
param->sysidentifier, param->xlogdir,
reached_end_position, standby_message_timeout,
- NULL, false))
+ NULL, false, true))
/*
* Any errors will already have been reported in the function process,
@@ -394,6 +395,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
logstreamer_param *param;
uint32 hi,
lo;
+ char *statusdir;
param = pg_malloc0(sizeof(logstreamer_param));
param->timeline = timeline;
@@ -428,13 +430,24 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
/* Error message already written in GetConnection() */
exit(1);
+ snprintf(param->xlogdir, sizeof(param->xlogdir), "%s/pg_xlog", basedir);
+
/*
- * Always in plain format, so we can write to basedir/pg_xlog. But the
- * directory entry in the tar file may arrive later, so make sure it's
- * created before we start.
+ * Create pg_xlog/archive_status (and thus pg_xlog) so we can can write to
+ * basedir/pg_xlog as the directory entry in the tar file may arrive
+ * later.
*/
- snprintf(param->xlogdir, sizeof(param->xlogdir), "%s/pg_xlog", basedir);
- verify_dir_is_empty_or_create(param->xlogdir);
+ statusdir = psprintf("%s/pg_xlog/archive_status", basedir);
+
+ if (pg_mkdir_p(statusdir, S_IRWXU) != 0 && errno != EEXIST)
+ {
+ fprintf(stderr,
+ _("%s: could not create directory \"%s\": %s\n"),
+ progname, param->xlogdir, strerror(errno));
+ disconnect_and_exit(1);
+ }
+
+ free(statusdir);
/*
* Start a child process and tell it to start streaming. On Unix, this is
@@ -1236,11 +1249,12 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* by the wal receiver process. Also, when transaction
* log directory location was specified, pg_xlog has
* already been created as a symbolic link before
- * starting the actual backup. So just ignore failure
- * on them.
+ * starting the actual backup. So just ignore creation
+ * failures on related directories.
*/
- if ((!streamwal && (strcmp(xlog_dir, "") == 0))
- || strcmp(filename + strlen(filename) - 8, "/pg_xlog") != 0)
+ if (!((pg_str_endswith(filename, "/pg_xlog") ||
+ pg_str_endswith(filename, "/archive_status")) &&
+ errno == EEXIST))
{
fprintf(stderr,
_("%s: could not create directory \"%s\": %s\n"),
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 4658f08..b10da73 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -342,7 +342,7 @@ StreamLog(void)
ReceiveXlogStream(conn, startpos, starttli, NULL, basedir,
stop_streaming, standby_message_timeout, ".partial",
- synchronous);
+ synchronous, false);
PQfinish(conn);
conn = NULL;
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index f0f8760..123f445 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -37,7 +37,7 @@ static PGresult *HandleCopyStream(PGconn *conn, XLogRecPtr startpos,
uint32 timeline, char *basedir,
stream_stop_callback stream_stop, int standby_message_timeout,
char *partial_suffix, XLogRecPtr *stoppos,
- bool synchronous);
+ bool synchronous, bool mark_done);
static int CopyStreamPoll(PGconn *conn, long timeout_ms);
static int CopyStreamReceive(PGconn *conn, long timeout, char **buffer);
static bool ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
@@ -45,20 +45,50 @@ static bool ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
static bool ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
XLogRecPtr *blockpos, uint32 timeline,
char *basedir, stream_stop_callback stream_stop,
- char *partial_suffix);
+ char *partial_suffix, bool mark_done);
static PGresult *HandleEndOfCopyStream(PGconn *conn, char *copybuf,
XLogRecPtr blockpos, char *basedir, char *partial_suffix,
- XLogRecPtr *stoppos);
+ XLogRecPtr *stoppos, bool mark_done);
static bool CheckCopyStreamStop(PGconn *conn, XLogRecPtr blockpos,
uint32 timeline, char *basedir,
stream_stop_callback stream_stop,
- char *partial_suffix, XLogRecPtr *stoppos);
+ char *partial_suffix, XLogRecPtr *stoppos,
+ bool mark_done);
static long CalculateCopyStreamSleeptime(int64 now, int standby_message_timeout,
int64 last_status);
static bool ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos,
uint32 *timeline);
+static bool
+mark_file_as_archived(const char *basedir, const char *fname)
+{
+ int fd;
+ static char tmppath[MAXPGPATH];
+
+ snprintf(tmppath, sizeof(tmppath), "%s/archive_status/%s.done",
+ basedir, fname);
+
+ fd = open(tmppath, O_WRONLY | O_CREAT | PG_BINARY, S_IRUSR | S_IWUSR);
+ if (fd < 0)
+ {
+ fprintf(stderr, _("%s: could not create archive status file \"%s\": %s\n"),
+ progname, tmppath, strerror(errno));
+ return false;
+ }
+
+ if (fsync(fd) != 0)
+ {
+ fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+ progname, tmppath, strerror(errno));
+ return false;
+ }
+
+ close(fd);
+
+ return true;
+}
+
/*
* Open a new WAL file in the specified directory.
*
@@ -152,7 +182,7 @@ open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir,
* and returns false, otherwise returns true.
*/
static bool
-close_walfile(char *basedir, char *partial_suffix, XLogRecPtr pos)
+close_walfile(char *basedir, char *partial_suffix, XLogRecPtr pos, bool mark_done)
{
off_t currpos;
@@ -206,6 +236,19 @@ close_walfile(char *basedir, char *partial_suffix, XLogRecPtr pos)
_("%s: not renaming \"%s%s\", segment is not complete\n"),
progname, current_walfile_name, partial_suffix);
+ /*
+ * Mark file as archived if requested by the caller - pg_basebackup needs
+ * to do so as files can otherwise get archived again after promotion of a
+ * new node. This is in line with walreceiver.c always doing a
+ * XLogArchiveForceDone() after a complete segment.
+ */
+ if (currpos == XLOG_SEG_SIZE && mark_done)
+ {
+ /* writes error message if failed */
+ if (!mark_file_as_archived(basedir, current_walfile_name))
+ return false;
+ }
+
lastFlushPosition = pos;
return true;
}
@@ -248,7 +291,8 @@ existsTimeLineHistoryFile(char *basedir, TimeLineID tli)
}
static bool
-writeTimeLineHistoryFile(char *basedir, TimeLineID tli, char *filename, char *content)
+writeTimeLineHistoryFile(char *basedir, TimeLineID tli, char *filename,
+ char *content, bool mark_done)
{
int size = strlen(content);
char path[MAXPGPATH];
@@ -327,6 +371,14 @@ writeTimeLineHistoryFile(char *basedir, TimeLineID tli, char *filename, char *co
return false;
}
+ /* Maintain archive_status, check close_walfile() for details. */
+ if (mark_done)
+ {
+ /* writes error message if failed */
+ if (!mark_file_as_archived(basedir, histfname))
+ return false;
+ }
+
return true;
}
@@ -447,7 +499,7 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
char *sysidentifier, char *basedir,
stream_stop_callback stream_stop,
int standby_message_timeout, char *partial_suffix,
- bool synchronous)
+ bool synchronous, bool mark_done)
{
char query[128];
char slotcmd[128];
@@ -562,7 +614,8 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
/* Write the history file to disk */
writeTimeLineHistoryFile(basedir, timeline,
PQgetvalue(res, 0, 0),
- PQgetvalue(res, 0, 1));
+ PQgetvalue(res, 0, 1),
+ mark_done);
PQclear(res);
}
@@ -592,7 +645,7 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
/* Stream the WAL */
res = HandleCopyStream(conn, startpos, timeline, basedir, stream_stop,
standby_message_timeout, partial_suffix,
- &stoppos, synchronous);
+ &stoppos, synchronous, mark_done);
if (res == NULL)
goto error;
@@ -757,7 +810,7 @@ static PGresult *
HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
char *basedir, stream_stop_callback stream_stop,
int standby_message_timeout, char *partial_suffix,
- XLogRecPtr *stoppos, bool synchronous)
+ XLogRecPtr *stoppos, bool synchronous, bool mark_done)
{
char *copybuf = NULL;
int64 last_status = -1;
@@ -775,7 +828,8 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
* Check if we should continue streaming, or abort at this point.
*/
if (!CheckCopyStreamStop(conn, blockpos, timeline, basedir,
- stream_stop, partial_suffix, stoppos))
+ stream_stop, partial_suffix, stoppos,
+ mark_done))
goto error;
now = feGetCurrentTimestamp();
@@ -830,7 +884,8 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
if (r == -2)
{
PGresult *res = HandleEndOfCopyStream(conn, copybuf, blockpos,
- basedir, partial_suffix, stoppos);
+ basedir, partial_suffix,
+ stoppos, mark_done);
if (res == NULL)
goto error;
else
@@ -847,14 +902,16 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
else if (copybuf[0] == 'w')
{
if (!ProcessXLogDataMsg(conn, copybuf, r, &blockpos,
- timeline, basedir, stream_stop, partial_suffix))
+ timeline, basedir, stream_stop,
+ partial_suffix, true))
goto error;
/*
* Check if we should continue streaming, or abort at this point.
*/
if (!CheckCopyStreamStop(conn, blockpos, timeline, basedir,
- stream_stop, partial_suffix, stoppos))
+ stream_stop, partial_suffix, stoppos,
+ mark_done))
goto error;
}
else
@@ -1055,7 +1112,7 @@ static bool
ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
XLogRecPtr *blockpos, uint32 timeline,
char *basedir, stream_stop_callback stream_stop,
- char *partial_suffix)
+ char *partial_suffix, bool mark_done)
{
int xlogoff;
int bytes_left;
@@ -1163,7 +1220,7 @@ ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
/* Did we reach the end of a WAL segment? */
if (*blockpos % XLOG_SEG_SIZE == 0)
{
- if (!close_walfile(basedir, partial_suffix, *blockpos))
+ if (!close_walfile(basedir, partial_suffix, *blockpos, mark_done))
/* Error message written in close_walfile() */
return false;
@@ -1193,7 +1250,7 @@ ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
static PGresult *
HandleEndOfCopyStream(PGconn *conn, char *copybuf,
XLogRecPtr blockpos, char *basedir, char *partial_suffix,
- XLogRecPtr *stoppos)
+ XLogRecPtr *stoppos, bool mark_done)
{
PGresult *res = PQgetResult(conn);
@@ -1204,7 +1261,7 @@ HandleEndOfCopyStream(PGconn *conn, char *copybuf,
*/
if (still_sending)
{
- if (!close_walfile(basedir, partial_suffix, blockpos))
+ if (!close_walfile(basedir, partial_suffix, blockpos, mark_done))
{
/* Error message written in close_walfile() */
PQclear(res);
@@ -1236,11 +1293,11 @@ HandleEndOfCopyStream(PGconn *conn, char *copybuf,
static bool
CheckCopyStreamStop(PGconn *conn, XLogRecPtr blockpos, uint32 timeline,
char *basedir, stream_stop_callback stream_stop,
- char *partial_suffix, XLogRecPtr *stoppos)
+ char *partial_suffix, XLogRecPtr *stoppos, bool mark_done)
{
if (still_sending && stream_stop(blockpos, timeline, false))
{
- if (!close_walfile(basedir, partial_suffix, blockpos))
+ if (!close_walfile(basedir, partial_suffix, blockpos, mark_done))
{
/* Potential error message is written by close_walfile */
return false;
diff --git a/src/bin/pg_basebackup/receivelog.h b/src/bin/pg_basebackup/receivelog.h
index 9dd7005..1f64a74 100644
--- a/src/bin/pg_basebackup/receivelog.h
+++ b/src/bin/pg_basebackup/receivelog.h
@@ -31,6 +31,7 @@ extern bool ReceiveXlogStream(PGconn *conn,
stream_stop_callback stream_stop,
int standby_message_timeout,
char *partial_suffix,
- bool synchronous);
+ bool synchronous,
+ bool mark_done);
#endif /* RECEIVELOG_H */
--
2.2.0.rc0.18.ga1ad247
On 2015-01-03 16:03:36 +0100, Andres Freund wrote:
On 2014-12-31 16:32:19 +0100, Andres Freund wrote:
On 2014-12-05 16:18:02 +0900, Fujii Masao wrote:
On Fri, Dec 5, 2014 at 9:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
So I think we just need to make pg_basebackup create to .ready
files.s/.ready/.done? If yes, +1.
That unfortunately requires changes to both backend and pg_basebackup to
support fetch and stream modes respectively.I've attached a preliminary patch for this. I'd appreciate feedback. I
plan to commit it in a couple of days, after some more
testing/rereading.Attached are two updated patches that I am starting to backport
now. I've fixed a couple minor oversights. And tested the patches.
Pushed this after some major pain with backporting. pg_basebackup really
changed heavily since it's introduction. And desparately needs some
restructuring.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-01-03 16:03:36 +0100, Andres Freund wrote:
On 2014-12-31 16:32:19 +0100, Andres Freund wrote:
On 2014-12-05 16:18:02 +0900, Fujii Masao wrote:
On Fri, Dec 5, 2014 at 9:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
So I think we just need to make pg_basebackup create to .ready
files.s/.ready/.done? If yes, +1.
That unfortunately requires changes to both backend and pg_basebackup to
support fetch and stream modes respectively.I've attached a preliminary patch for this. I'd appreciate feedback. I
plan to commit it in a couple of days, after some more
testing/rereading.Attached are two updated patches that I am starting to backport
now. I've fixed a couple minor oversights. And tested the patches.Pushed this after some major pain with backporting.
Thanks!
pg_basebackup really
changed heavily since it's introduction. And desparately needs some
restructuring.
The patch seems to break pg_receivexlog. I got the following error message
while running pg_receivexlog.
pg_receivexlog: could not create archive status file
"mmm/archive_status/000000010000000000000003.done": No such file or
directory
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-05 16:22:56 +0900, Fujii Masao wrote:
On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-01-03 16:03:36 +0100, Andres Freund wrote:
pg_basebackup really
changed heavily since it's introduction. And desparately needs some
restructuring.The patch seems to break pg_receivexlog. I got the following error message
while running pg_receivexlog.
pg_receivexlog: could not create archive status file
"mmm/archive_status/000000010000000000000003.done": No such file or
directory
Dang. Stupid typo. And my tests didn't catch it, because I had
archive_directory in the target directory :(
At least it's only broken in master :/
Thanks for the catch. Do you have some additional testsuite or did you
catch it manually?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 5, 2015 at 6:22 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-01-05 16:22:56 +0900, Fujii Masao wrote:
On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-01-03 16:03:36 +0100, Andres Freund wrote:
pg_basebackup really
changed heavily since it's introduction. And desparately needs some
restructuring.The patch seems to break pg_receivexlog. I got the following error message
while running pg_receivexlog.pg_receivexlog: could not create archive status file
"mmm/archive_status/000000010000000000000003.done": No such file or
directoryDang. Stupid typo. And my tests didn't catch it, because I had
archive_directory in the target directory :(At least it's only broken in master :/
Thanks for the catch. Do you have some additional testsuite or did you
catch it manually?
Manually... I just tested the tools and options which the patch may affect...
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-05 18:49:06 +0900, Fujii Masao wrote:
On Mon, Jan 5, 2015 at 6:22 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-01-05 16:22:56 +0900, Fujii Masao wrote:
On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-01-03 16:03:36 +0100, Andres Freund wrote:
pg_basebackup really
changed heavily since it's introduction. And desparately needs some
restructuring.The patch seems to break pg_receivexlog. I got the following error message
while running pg_receivexlog.pg_receivexlog: could not create archive status file
"mmm/archive_status/000000010000000000000003.done": No such file or
directoryDang. Stupid typo. And my tests didn't catch it, because I had
archive_directory in the target directory :(At least it's only broken in master :/
I've pushed the trivial fix, and verified using my adapted testscript
that it works on all branches.
Thanks for the catch. Do you have some additional testsuite or did you
catch it manually?Manually... I just tested the tools and options which the patch may affect...
Thanks!
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On Mon, Jan 5, 2015 at 4:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:
pg_receivexlog: could not create archive status file
"mmm/archive_status/000000010000000000000003.done": No such file or
directoryDang. Stupid typo. And my tests didn't catch it, because I had
archive_directory in the target directory :(
I started getting these errors after upgrading from 9.2.8 to 9.2.10.
Is it something critical that requires version downgrade or I can just
ignore that errors?
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On February 12, 2015 8:11:05 PM CET, Sergey Konoplev <gray.ru@gmail.com> wrote:
Hi,
On Mon, Jan 5, 2015 at 4:34 AM, Andres Freund <andres@2ndquadrant.com>
wrote:pg_receivexlog: could not create archive status file
"mmm/archive_status/000000010000000000000003.done": No such fileor
directory
Dang. Stupid typo. And my tests didn't catch it, because I had
archive_directory in the target directory :(I started getting these errors after upgrading from 9.2.8 to 9.2.10.
Is it something critical that requires version downgrade or I can just
ignore that errors?
What errors are you getting in precisely which circumstances? You're using pg-receivexlog?
--
Please excuse brevity and formatting - I am writing this on my mobile phone.
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 12, 2015 at 11:13 AM, Andres Freund <andres@2ndquadrant.com> wrote:
I started getting these errors after upgrading from 9.2.8 to 9.2.10.
Is it something critical that requires version downgrade or I can just
ignore that errors?What errors are you getting in precisely which circumstances? You're using pg-receivexlog?
Errors like this one
pg_receivexlog: could not create archive status file
"/mnt/archive/wal/archive_status/00000004000002AF000000B7.done": No
such file or directory
pg_receivexlog: disconnected..
on
Linux xyz 3.2.0-76-generic #111-Ubuntu SMP
PostgreSQL 9.2.10
Yes, I use pg_receivexlog. I also use a wrapper/watchdog script around
pg_receivexlog which tracks failures and restarts the latter.
The WAL files time correlates with the pg_receivexlog failures.
postgres@xyz:~$ ls -ltr /mnt/archive/wal/ | tail
-rw------- 1 postgres postgres 16777216 Feb 12 10:58 00000004000002B600000011
-rw------- 1 postgres postgres 16777216 Feb 12 11:02 00000004000002B600000012
-rw------- 1 postgres postgres 16777216 Feb 12 11:06 00000004000002B600000013
-rw------- 1 postgres postgres 16777216 Feb 12 11:11 00000004000002B600000014
-rw------- 1 postgres postgres 16777216 Feb 12 11:15 00000004000002B600000015
-rw------- 1 postgres postgres 16777216 Feb 12 11:19 00000004000002B600000016
-rw------- 1 postgres postgres 16777216 Feb 12 11:23 00000004000002B600000017
-rw------- 1 postgres postgres 16777216 Feb 12 11:27 00000004000002B600000018
-rw------- 1 postgres postgres 16777216 Feb 12 11:30 00000004000002B600000019
-rw------- 1 postgres postgres 16777216 Feb 12 11:32
00000004000002B60000001A.partial
postgres@xyz:~$ cat /var/log/pgcookbook/manage_pitr-wal.log | tail
Thu Feb 12 11:15:18 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/00000004000002B600000015.done":
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:19:33 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/00000004000002B600000016.done":
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:23:38 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/00000004000002B600000017.done":
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:27:32 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/00000004000002B600000018.done":
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:30:34 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/00000004000002B600000019.done":
No such file or directory
pg_receivexlog: disconnected..
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi.
This obviously should not be the case. I'll have a look in a couple of hours. Until then you can likely just work around the problem by creating the archive_status directory.
--
Please excuse brevity and formatting - I am writing this on my mobile phone.
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
This obviously should not be the case. I'll have a look in a couple of hours. Until then you can likely just work around the problem by creating the archive_status directory.
Thank you. Just let me know if you need some extra info or debugging.
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-02-12 11:44:05 -0800, Sergey Konoplev wrote:
On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
This obviously should not be the case. I'll have a look in a couple of hours. Until then you can likely just work around the problem by creating the archive_status directory.
Thank you. Just let me know if you need some extra info or debugging.
No need for debugging. It's plain and simply a (cherry-pick) conflict I
resolved wrongly during backpatching. 9.3, 9.4 and master do not have
that problem. That whole fix was quite painful because every single
release had significantly different code :(. pg_basebackup/ is pretty
messy.
I'm not sure why my testsuite didn't trigger that problem. Possibly
because a retry makes things work :(
Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
releases don't have pg_receivexlog) and can quite easily be worked
around by creating the archive_status directory.
If you want to fix it locally, you just need to replace
ReceiveXlogStream(conn, startpos, timeline, NULL, basedir,
stop_streaming, standby_message_timeout, false, true);
by
ReceiveXlogStream(conn, startpos, timeline, NULL, basedir,
stop_streaming, standby_message_timeout, false, false);
Yes, that and pretty much all other functions in that directory have too
many parameters.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 12, 2015 at 4:18 PM, Andres Freund <andres@2ndquadrant.com> wrote:
No need for debugging. It's plain and simply a (cherry-pick) conflict I
resolved wrongly during backpatching. 9.3, 9.4 and master do not have
that problem. That whole fix was quite painful because every single
release had significantly different code :(. pg_basebackup/ is pretty
messy.
I'm not sure why my testsuite didn't trigger that problem. Possibly
because a retry makes things work :(Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
releases don't have pg_receivexlog) and can quite easily be worked
around by creating the archive_status directory.
The workaround works perfectly for me in this case, I'm going to
updrade it up to 9.4 anyway soon.
Thank you, Andres.
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Feb 13, 2015 at 9:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-02-12 11:44:05 -0800, Sergey Konoplev wrote:
On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
This obviously should not be the case. I'll have a look in a couple of hours. Until then you can likely just work around the problem by creating the archive_status directory.
Thank you. Just let me know if you need some extra info or debugging.
No need for debugging. It's plain and simply a (cherry-pick) conflict I
resolved wrongly during backpatching. 9.3, 9.4 and master do not have
that problem. That whole fix was quite painful because every single
release had significantly different code :(. pg_basebackup/ is pretty
messy.
I'm not sure why my testsuite didn't trigger that problem. Possibly
because a retry makes things work :(Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
releases don't have pg_receivexlog)
Are you planning to back-patch the fix to 9.2?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-02-17 12:18:41 +0900, Fujii Masao wrote:
On Fri, Feb 13, 2015 at 9:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
releases don't have pg_receivexlog)Are you planning to back-patch the fix to 9.2?
Yes, but I want to look through all versions, to make sure there's no
other merge resolution mistakes lurking.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-02-17 12:18:41 +0900, Fujii Masao wrote:
On Fri, Feb 13, 2015 at 9:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-02-12 11:44:05 -0800, Sergey Konoplev wrote:
On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
This obviously should not be the case. I'll have a look in a couple of hours. Until then you can likely just work around the problem by creating the archive_status directory.
Thank you. Just let me know if you need some extra info or debugging.
No need for debugging. It's plain and simply a (cherry-pick) conflict I
resolved wrongly during backpatching. 9.3, 9.4 and master do not have
that problem. That whole fix was quite painful because every single
release had significantly different code :(. pg_basebackup/ is pretty
messy.
I'm not sure why my testsuite didn't trigger that problem. Possibly
because a retry makes things work :(Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
releases don't have pg_receivexlog)Are you planning to back-patch the fix to 9.2?
Now done. Thanks Sergey, Fujii. And sorry for the 9.2 screwup.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers