pg_basebackup from cascading standby after timeline switch

Started by Heikki Linnakangasabout 13 years ago14 messages

hlinnakangas@vmware.com

about 13 years ago

pg_basebackup -x is supposed to include all the required WAL files in
the backup, so that you have everything needed to restore a consistent
database. However, it's not including the timeline history files.
Usually that's not a problem because normally you don't need to follow
any old timelines when restoring, but there is one scenario where it
causes a failure to restore:

Create a master, a standby, and a cascading standby. Kill the master
server, promote the standby to become new master, bumping the timeline.
After the cascading standby has followed the timeline switch (either
through the archive, which also works on 9.2, or directly via streaming
replication which only works on 9.3devel), take a base backup from the
cascading standby using pg_basebackup -x. When you try to start the
server from the new backup (without setting up a restore_command or
streaming replication), you get an error about "unexpected timeline ID 1
in log segment ..."

C 2012-12-17 15:55:25.732 EET 534 LOG: database system was interrupted
while in recovery at log time 2012-12-17 15:55:15 EET
C 2012-12-17 15:55:25.732 EET 534 HINT: If this has occurred more than
once some data might be corrupted and you might need to choose an
earlier recovery target.
C 2012-12-17 15:55:25.732 EET 534 LOG: creating missing WAL directory
"pg_xlog/archive_status"
C 2012-12-17 15:55:25.732 EET 534 LOG: unexpected timeline ID 1 in log
segment 000000020000000000000003, offset 0
C 2012-12-17 15:55:25.732 EET 534 LOG: invalid checkpoint record
C 2012-12-17 15:55:25.733 EET 534 FATAL: could not locate required
checkpoint record
C 2012-12-17 15:55:25.733 EET 534 HINT: If you are not restoring from a
backup, try removing the file
"/home/heikki/pgsql.master/data-standbyC/backup_label".
C 2012-12-17 15:55:25.733 EET 533 LOG: startup process (PID 534) exited
with exit code 1
C 2012-12-17 15:55:25.733 EET 533 LOG: aborting startup due to startup
process failure

The timeline was bumped within the log segment 000000020000000000000003,
so the beginning of the file uses timeline 1, up to the checkpoint
record that changes the timeline. Normally, recovery accepts that
because timeline 1 is an ancestor of timeline 2, but because the backup
does not include the timelime history file, it does not know that.

This does not happen if you run pg_basebackup against the master server,
because in the master it forces an xlog switch, which ensures that the
new xlog file only contains pages with the latest timeline ID. There's
even comments in pg_start_backup explaining that that's the reason for
the xlog switch:

/*
* Force an XLOG file switch before the checkpoint, to ensure that the
* WAL segment the checkpoint is written to doesn't contain pages with
* old timeline IDs. That would otherwise happen if you called
* pg_start_backup() right after restoring from a PITR archive: the
* first WAL segment containing the startup checkpoint has pages in
* the beginning with the old timeline ID. That can cause trouble at
* recovery: we won't have a history file covering the old timeline if
* pg_xlog directory was not included in the base backup and the WAL
* archive was cleared too before starting the backup.
*
* This also ensures that we have emitted a WAL page header that has
* XLP_BKP_REMOVABLE off before we emit the checkpoint record.
* Therefore, if a WAL archiver (such as pglesslog) is trying to
* compress out removable backup blocks, it won't remove any that
* occur after this point.
*
* During recovery, we skip forcing XLOG file switch, which means that
* the backup taken during recovery is not available for the special
* recovery case described above.
*/
if (!backup_started_in_recovery)
RequestXLogSwitch();

I'm not happy with the fact that we just ignore the problem in a backup
taken from a standby, silently giving the user a backup that won't start
up. Why not include the timeline history file in the backup? That seems
like a good idea regardless of this issue. I also wonder if
pg_basebackup should include *all* timeline history files in the backup,
not just the latest one strictly required to restore. They're fairly
small, so our approach has generally been to try to include them all in
the archive, and not try to prune them, so the same might make sense here.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Heikki Linnakangas (#1)

Re: pg_basebackup from cascading standby after timeline switch

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

I'm not happy with the fact that we just ignore the problem in a backup
taken from a standby, silently giving the user a backup that won't start
up. Why not include the timeline history file in the backup?

+1. I was not aware that we weren't doing that --- it seems pretty
foolish, especially since as you say they're tiny.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Magnus Hagander

magnus@hagander.net

about 13 years ago

In reply to: Tom Lane (#2)

Re: pg_basebackup from cascading standby after timeline switch

On Mon, Dec 17, 2012 at 5:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

I'm not happy with the fact that we just ignore the problem in a backup
taken from a standby, silently giving the user a backup that won't start
up. Why not include the timeline history file in the backup?

+1. I was not aware that we weren't doing that --- it seems pretty
foolish, especially since as you say they're tiny.

Yeah, +1. That should probably have been a part of the whole
"basebackup from slave" patch, so it can probably be considered a
back-patchable bugfix in itself, no?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

about 13 years ago

In reply to: Heikki Linnakangas (#1)

Re: pg_basebackup from cascading standby after timeline switch

On 17 December 2012 14:16, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

I'm not happy with the fact that we just ignore the problem in a backup
taken from a standby, silently giving the user a backup that won't start up.

That's spooky. I just found a different issue with prmotion during
backup on your other thread.

Why not include the timeline history file in the backup? That seems like a
good idea regardless of this issue.

Yeh

I also wonder if pg_basebackup should
include *all* timeline history files in the backup, not just the latest one
strictly required to restore.

Why? the timeline history file includes the previous timelines already.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Simon Riggs (#4)

Re: pg_basebackup from cascading standby after timeline switch

Simon Riggs <simon@2ndQuadrant.com> writes:

On 17 December 2012 14:16, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

I also wonder if pg_basebackup should
include *all* timeline history files in the backup, not just the latest one
strictly required to restore.

Why? the timeline history file includes the previous timelines already.

The original intention was that the WAL archive might contain multiple
timeline files corresponding to various experimental recovery attempts;
furthermore, such files might be hand-annotated (that's why there's a
comment provision). So they would be at least as valuable from an
archival standpoint as the WAL files proper. I think we ought to just
copy all of them, period. Anything less is penny-wise and
pound-foolish.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

about 13 years ago

In reply to: Tom Lane (#5)

Re: pg_basebackup from cascading standby after timeline switch

On 18 December 2012 00:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 17 December 2012 14:16, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

I also wonder if pg_basebackup should
include *all* timeline history files in the backup, not just the latest one
strictly required to restore.

Why? the timeline history file includes the previous timelines already.

The original intention was that the WAL archive might contain multiple
timeline files corresponding to various experimental recovery attempts;
furthermore, such files might be hand-annotated (that's why there's a
comment provision). So they would be at least as valuable from an
archival standpoint as the WAL files proper. I think we ought to just
copy all of them, period. Anything less is penny-wise and
pound-foolish.

What I'm saying is that the new history file is created from the old
one, so the latest file includes all the history as a direct copy. The
only thing new is one line of information.

Copying all files grows at O(N^2) with redundancy and will eventually
become a space problem and a performance issue for smaller systems.
There should be some limit to keep this sane, for example, the last 32
history files, or the last 1000 lines of history. Some sane limit.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fujii Masao

masao.fujii@gmail.com

about 13 years ago

In reply to: Simon Riggs (#6)

Re: pg_basebackup from cascading standby after timeline switch

On Tue, Dec 18, 2012 at 8:09 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 18 December 2012 00:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 17 December 2012 14:16, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

I also wonder if pg_basebackup should
include *all* timeline history files in the backup, not just the latest one
strictly required to restore.

Why? the timeline history file includes the previous timelines already.

The original intention was that the WAL archive might contain multiple
timeline files corresponding to various experimental recovery attempts;
furthermore, such files might be hand-annotated (that's why there's a
comment provision). So they would be at least as valuable from an
archival standpoint as the WAL files proper. I think we ought to just
copy all of them, period. Anything less is penny-wise and
pound-foolish.

What I'm saying is that the new history file is created from the old
one, so the latest file includes all the history as a direct copy. The
only thing new is one line of information.

The timeline history file includes only ancestor timelines history. So
the latest one might not include all the history.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Fujii Masao (#7)

Re: pg_basebackup from cascading standby after timeline switch

Fujii Masao <masao.fujii@gmail.com> writes:

On Tue, Dec 18, 2012 at 8:09 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

What I'm saying is that the new history file is created from the old
one, so the latest file includes all the history as a direct copy. The
only thing new is one line of information.

The timeline history file includes only ancestor timelines history. So
the latest one might not include all the history.

Indeed. And even if there are a thousand of them, so what? It'd still
be less space than a single WAL segment file.

Better to keep the data than wish we had it later.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Heikki Linnakangas

hlinnakangas@vmware.com

about 13 years ago

In reply to: Magnus Hagander (#3)

2 attachment(s)

Re: pg_basebackup from cascading standby after timeline switch

On 17.12.2012 18:58, Magnus Hagander wrote:

On Mon, Dec 17, 2012 at 5:19 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

I'm not happy with the fact that we just ignore the problem in a backup
taken from a standby, silently giving the user a backup that won't start
up. Why not include the timeline history file in the backup?

+1. I was not aware that we weren't doing that --- it seems pretty
foolish, especially since as you say they're tiny.

Yeah, +1. That should probably have been a part of the whole
"basebackup from slave" patch, so it can probably be considered a
back-patchable bugfix in itself, no?

Yes, this should be backpatched to 9.2. I came up with the attached.

However, thinking about this some more, there's a another bug in the way
WAL files are included in the backup, when a timeline switch happens.
basebackup.c includes all the WAL files on ThisTimeLineID, but when the
backup is taken from a standby, the standby might've followed a timeline
switch. So it's possible that some of the WAL files should come from
timeline 1, while others should come from timeline 2. This leads to an
error like "requested WAL segment 00000001000000000000000C has already
been removed" in pg_basebackup.

Attached is a script to reproduce that bug, if someone wants to play
with it. It's a bit sensitive to timing, and needs tweaking the paths at
the top.

One solution to that would be to pay more attention to the timelines to
include WAL from. basebackup.c could read the timeline history file, to
see exactly where the timeline switches happened, and then construct the
filename of each WAL segment using the correct timeline id. Another
approach would be to do readdir() on pg_xlog, and include all WAL files,
regardless of timeline IDs, that fall in the right XLogRecPtr range. The
latter seems easier to backpatch.

- Heikki

Attachments:

include-all-tli-files-in-backup-1.patchtext/x-diff; name=include-all-tli-files-in-backup-1.patchDownload

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 65200c1..5c0deaa 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -12,12 +12,13 @@
  */
 #include "postgres.h"
 
+#include <string.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <unistd.h>
 #include <time.h>
 
-#include "access/xlog_internal.h"		/* for pg_start/stop_backup */
+#include "access/xlog_internal.h"
 #include "catalog/pg_type.h"
 #include "lib/stringinfo.h"
 #include "libpq/libpq.h"
@@ -44,6 +45,7 @@ typedef struct
 
 
 static int64 sendDir(char *path, int basepathlen, bool sizeonly);
+static void sendTimeLineHistoryFiles(void);
 static void sendFile(char *readfilename, char *tarfilename,
 		 struct stat * statbuf);
 static void sendFileWithContent(const char *filename, const char *content);
@@ -286,6 +288,27 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 				break;
 		}
 
+		/*
+		 * Include all timeline history files.
+		 *
+		 * The timeline history files are usually not strictly required to
+		 * restore the backup, but if you take a backup from a standby server,
+		 * and the WAL segment containing the checkpoint record contains WAL
+		 * from an older timeline, recovery will complain on the older
+		 * timeline's ID if there is no timeline history file listing it. This
+		 * can happen if you take a backup right after promoting a standby to
+		 * become new master, and take the backup from a different, cascading
+		 * standby server.
+		 *
+		 * However, even when not strictly required, the timeline history
+		 * files are tiny, and provide a lot of forensic information about the
+		 * recovery history of the database, so it's best to always include
+		 * them all. (If asked to include WAL, that is. Otherwise you need a
+		 * WAL archive to restore anyway, and the timeline history files
+		 * should be present in the archive)
+		 */
+		sendTimeLineHistoryFiles();
+
 		/* Send CopyDone message for the last tar file */
 		pq_putemptymessage('c');
 	}
@@ -726,6 +749,58 @@ sendDir(char *path, int basepathlen, bool sizeonly)
 	return size;
 }
 
+/*
+ * Include all timeline history files from pg_xlog in the output tar stream.
+ */
+static void
+sendTimeLineHistoryFiles(void)
+{
+	DIR		   *dir;
+	struct dirent *de;
+	char		pathbuf[MAXPGPATH];
+	struct stat statbuf;
+
+	dir = AllocateDir("./pg_xlog");
+	while ((de = ReadDir(dir, "./pg_xlog")) != NULL)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		if (strlen(de->d_name) == 8 + strlen(".history") &&
+			strspn(de->d_name, "0123456789ABCDEF") == 8 &&
+			strcmp(de->d_name + 8, ".history") == 0)
+		{
+			/* It looks like a timeline history file. Include it. */
+			snprintf(pathbuf, MAXPGPATH, "./pg_xlog/%s", de->d_name);
+
+			if (lstat(pathbuf, &statbuf) != 0)
+			{
+				if (errno != ENOENT)
+					ereport(ERROR,
+							(errcode_for_file_access(),
+							 errmsg("could not stat file or directory \"%s\": %m",
+									pathbuf)));
+
+				/* If the file went away while scanning, it's no error. */
+				continue;
+			}
+
+			if (!S_ISREG(statbuf.st_mode))
+			{
+				/*
+				 * Huh? It's named like a timeline history file, but isn't a
+				 * regular file.
+				 */
+				ereport(WARNING,
+						(errmsg("skipping special file \"%s\"", pathbuf)));
+				continue;
+			}
+
+			sendFile(pathbuf, pathbuf + 1, &statbuf);
+		}
+	}
+	FreeDir(dir);
+}
+
 /*****
  * Functions for handling tar file format
  *

recipe12.shapplication/x-sh; name=recipe12.shDownload

#10

Amit kapila

amit.kapila@huawei.com

about 13 years ago

In reply to: Heikki Linnakangas (#9)

Re: pg_basebackup from cascading standby after timeline switch

On Friday, December 21, 2012 6:24 PM Heikki Linnakangas wrote:
On 17.12.2012 18:58, Magnus Hagander wrote:

On Mon, Dec 17, 2012 at 5:19 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

I'm not happy with the fact that we just ignore the problem in a backup
taken from a standby, silently giving the user a backup that won't start
up. Why not include the timeline history file in the backup?

+1. I was not aware that we weren't doing that --- it seems pretty
foolish, especially since as you say they're tiny.

Yeah, +1. That should probably have been a part of the whole
"basebackup from slave" patch, so it can probably be considered a
back-patchable bugfix in itself, no?

Yes, this should be backpatched to 9.2. I came up with the attached.

One solution to that would be to pay more attention to the timelines to
include WAL from. basebackup.c could read the timeline history file, to
see exactly where the timeline switches happened, and then construct the
filename of each WAL segment using the correct timeline id. Another
approach would be to do readdir() on pg_xlog, and include all WAL files,
regardless of timeline IDs, that fall in the right XLogRecPtr range. The
latter seems easier to backpatch.

I also think approach implemented by you is more better.
One small point, shouldn't it check (walsender_shutdown_requested || walsender_ready_to_stop) during ReadDir of pg_xlog similar to what is done in ReadDir() in SendDir?

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Fujii Masao

masao.fujii@gmail.com

about 13 years ago

In reply to: Heikki Linnakangas (#9)

Re: pg_basebackup from cascading standby after timeline switch

On Fri, Dec 21, 2012 at 9:54 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Yes, this should be backpatched to 9.2. I came up with the attached.

In this patch, if '-X stream' is specified in pg_basebackup, the timeline
history files are not backed up. We should change pg_backup background
process and walsender so that they stream also timeline history files,
for example, by using 'TIMELINE_HISTORY' replication command?
Or basebackup.c should send all timeline history files at the end of backup
even if '-X stream' is specified?

However, thinking about this some more, there's a another bug in the way WAL
files are included in the backup, when a timeline switch happens.
basebackup.c includes all the WAL files on ThisTimeLineID, but when the
backup is taken from a standby, the standby might've followed a timeline
switch. So it's possible that some of the WAL files should come from
timeline 1, while others should come from timeline 2. This leads to an error
like "requested WAL segment 00000001000000000000000C has already been
removed" in pg_basebackup.

Attached is a script to reproduce that bug, if someone wants to play with
it. It's a bit sensitive to timing, and needs tweaking the paths at the top.

One solution to that would be to pay more attention to the timelines to
include WAL from. basebackup.c could read the timeline history file, to see
exactly where the timeline switches happened, and then construct the
filename of each WAL segment using the correct timeline id. Another approach
would be to do readdir() on pg_xlog, and include all WAL files, regardless
of timeline IDs, that fall in the right XLogRecPtr range. The latter seems
easier to backpatch.

The latter sounds good to me. But how all WAL files with different timelines
are shipped in pg_basebackup -X stream mode?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Heikki Linnakangas

hlinnakangas@vmware.com

about 13 years ago

In reply to: Fujii Masao (#11)

Re: pg_basebackup from cascading standby after timeline switch

On 23.12.2012 15:33, Fujii Masao wrote:

On Fri, Dec 21, 2012 at 9:54 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Yes, this should be backpatched to 9.2. I came up with the attached.

In this patch, if '-X stream' is specified in pg_basebackup, the timeline
history files are not backed up.

Good point.

We should change pg_backup background
process and walsender so that they stream also timeline history files,
for example, by using 'TIMELINE_HISTORY' replication command?
Or basebackup.c should send all timeline history files at the end of backup
even if '-X stream' is specified?

Perhaps. We should enhance pg_receivexlog to follow timeline switches,
anyway. I was thinking of leaving that as a todo item, but pg_basebackup
-X stream shares the code, so we should implement that now to get that
support into both.

In the problem you reported on the other thread
(http://archives.postgresql.org/message-id/50DB5EA9.7010406@vmware.com),
you also need the timeline history files, but that one didn't use "-X"
at all. Even if we teach pg_basebackup to fetch the timeline history
files in "-X stream" mode, that still leaves the problem on that other
thread.

The simplest solution would be to always include all timeline history
files in the backup, even if -X is not used. Currently, however, pg_xlog
is backed up as an empty directory in that case, but that would no
longer be the case if we start including timeline history files there. I
wonder if that would confuse any existing backup scripts people are using.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Heikki Linnakangas

hlinnakangas@vmware.com

about 13 years ago

In reply to: Heikki Linnakangas (#12)

2 attachment(s)

Re: pg_basebackup from cascading standby after timeline switch

On 27.12.2012 12:06, Heikki Linnakangas wrote:

On 23.12.2012 15:33, Fujii Masao wrote:

On Fri, Dec 21, 2012 at 9:54 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Yes, this should be backpatched to 9.2. I came up with the attached.

In this patch, if '-X stream' is specified in pg_basebackup, the timeline
history files are not backed up.

Good point.

We should change pg_backup background
process and walsender so that they stream also timeline history files,
for example, by using 'TIMELINE_HISTORY' replication command?
Or basebackup.c should send all timeline history files at the end of
backup
even if '-X stream' is specified?

Perhaps. We should enhance pg_receivexlog to follow timeline switches,
anyway. I was thinking of leaving that as a todo item, but pg_basebackup
-X stream shares the code, so we should implement that now to get that
support into both.

In the problem you reported on the other thread
(http://archives.postgresql.org/message-id/50DB5EA9.7010406@vmware.com),
you also need the timeline history files, but that one didn't use "-X"
at all. Even if we teach pg_basebackup to fetch the timeline history
files in "-X stream" mode, that still leaves the problem on that other
thread.

The simplest solution would be to always include all timeline history
files in the backup, even if -X is not used. Currently, however, pg_xlog
is backed up as an empty directory in that case, but that would no
longer be the case if we start including timeline history files there. I
wonder if that would confuse any existing backup scripts people are using.

This thread has spread out a bit, so here's a summary of the remaining
issues and what I'm going to do about them:

9.2
---

If you take a backup with "pg_basebackup -X fetch", and the timeline
switches while the backup is taken, you currently get an error like
"requested WAL segment 00000001000000000000000C has already been
removed". To fix, let's change the server-side support of "-X fetch" to
include all WAL files between the backup start and end pointers,
regardless of timelines. I'm thinking of doing this by scanning pg_xlog
with readdir(), and sending over any files in that range. Another option
would be to read and parse the timeline history file to figure out the
exact filenames expected, but the readdir() approach seems simpler.

You also need the timeline history files. With "-X fetch", I think we
should always include them in the pg_xlog directory of the backup, along
with the WAL files themselves.

"-X stream" has a similar problem. If timeline changes during backup,
the replication will stop at the timeline switch, and the backup fails.
There isn't much we can do about that, as you can't follow a timeline
switch via streaming replication in 9.2. At best, we could try to detect
the situation and give a better error message.

With plain pg_basebackup, without -X option, you usually need a WAL
archive to restore. The only exception is when you initialize a
streaming standby with plain "pg_basebackup", intending to connect it to
the master right after taking the backup, so that it can stream all the
required WAL at that point. We have a problem with that scenario,
because even if there was no timeline switch while the backup was taken
(if there was, you're screwed the same as with "-X stream"), because of
the issue mentioned in the first post in this thread: the beginning of
the first WAL file contains the old timeline ID. Even though that's not
replayed, because the checkpoint is in the later part of the segment,
recovery still complains if there is a timeline ID in the beginning of
the file that we don't recognize as our ancestor. This could be fixed if
we somehow got the timeline history files in the backup, but I'm worried
that might break people's backup scripts. At the moment, the pg_xlog
directory in the backup is empty when -X is not used, we even spell that
out explicitly in the manual. Including timeline history files would
change that. That might be an issue if you symlink pg_xlog to a
different drive, for example. To make things worse, there is no timeline
history file for timeline 1, so you would not notice when you test your
backup scripts in simple cases.

In summary, in 9.2 I think we should fix "-X fetch" to tolerate a
timeline switch, and include all the timeline history files. The
behavior of other modes would not be changed, and they will fail if a
timeline changes during or just before backup.

Master
------

In master, we can try harder for the "-X stream" case, because you can
replicate a timeline switch, and fetch timeline history files via a
replication connection. Let's teach pg_receivexlog, and "pg_basebackup
-X stream", to use those facilities, so that even if the timeline
changes while the backup is taken, all the history files and WAL files
are correctly included in the backup. I'll start working on a patch to
do that.

That leaves one case not covered: If you take a backup with plain
"pg_basebackup" from a standby, without -X, and the first WAL segment
contains a timeline switch (ie. you take the backup right after a
failover), and you try to recover from it without a WAL archive, it
doesn't work. This is the original issue that started this thread,
except that I used "-x" in my original test case. The problem here is
that even though streaming replication will fetch the timeline history
file when it connects, at the very beginning of recovery, we expect that
we already have the timeline history file corresponding the initial
timeline available, either in pg_xlog or the WAL archive. By the time
streaming replication has connected and fetched the history file, we've
already initialized expectedTLEs to contain just the latest TLI. To fix
that, we should delay calling readTimeLineHistoryFile() until streaming
replication has connected and fetched the file.

Barring objections, I'll commit the attached two patches.
include-wal-files-from-all-timelines-in-base-backup-1.patch is for 9.2
and master, and it fixes the "pg_basebackup -X fetch" case.
delay-reading-timeline-history-file.patch is for master, and it changes
recovery so if a timeline history file for the initial target timeline
is fetched over streaming replication, expectedTLEs is initialized with
the streamed file. That fixes the plain "pg_basebackup" without -X case
on master.

What remains is to teach "pg_receivexlog" and "pg_basebackup -X stream"
to cross timeline changes. I'll start working on a patch for that.

- Heikki

Attachments:

delay-reading-timeline-history-file.patchtext/x-diff; name=delay-reading-timeline-history-file.patchDownload

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c847913..060dc08 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2675,6 +2675,7 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source)
 	char		path[MAXPGPATH];
 	ListCell   *cell;
 	int			fd;
+	List	   *tles;
 
 	/*
 	 * Loop looking for a suitable timeline ID: we might need to read any of
@@ -2685,8 +2686,21 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source)
 	 * to go backwards; this prevents us from picking up the wrong file when a
 	 * parent timeline extends to higher segment numbers than the child we
 	 * want to read.
-	 */
-	foreach(cell, expectedTLEs)
+	 *
+	 * If we haven't read the timeline history file yet, we don't know which
+	 * TLIs to scan, so read it now. We don't save the list in expectedTLEs,
+	 * however, unless we actually find a valid segment. That way if there is
+	 * neither timeline history file nor WAL segment in the archive, and
+	 * streaming replication is set up, we'll read the timeline history file
+	 * streamed from the master when we start streaming, instead of recovering
+	 * with a dummy history generated here.
+	 */
+	if (expectedTLEs)
+		tles = expectedTLEs;
+	else
+		tles = readTimeLineHistory(recoveryTargetTLI);
+
+	foreach(cell, tles)
 	{
 		TimeLineID	tli = ((TimeLineHistoryEntry *) lfirst(cell))->tli;
 
@@ -2699,6 +2713,8 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source)
 			if (fd != -1)
 			{
 				elog(DEBUG1, "got WAL segment from archive");
+				if (!expectedTLEs)
+					expectedTLEs = tles;
 				return fd;
 			}
 		}
@@ -2707,7 +2723,11 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source)
 		{
 			fd = XLogFileRead(segno, emode, tli, XLOG_FROM_PG_XLOG, true);
 			if (fd != -1)
+			{
+				if (!expectedTLEs)
+					expectedTLEs = tles;
 				return fd;
+			}
 		}
 	}
 
@@ -5279,49 +5299,6 @@ StartupXLOG(void)
 	 */
 	readRecoveryCommandFile();
 
-	/* Now we can determine the list of expected TLIs */
-	expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
-
-	/*
-	 * If the location of the checkpoint record is not on the expected
-	 * timeline in the history of the requested timeline, we cannot proceed:
-	 * the backup is not part of the history of the requested timeline.
-	 */
-	if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) !=
-			ControlFile->checkPointCopy.ThisTimeLineID)
-	{
-		XLogRecPtr switchpoint;
-
-		/*
-		 * tliSwitchPoint will throw an error if the checkpoint's timeline
-		 * is not in expectedTLEs at all.
-		 */
-		switchpoint = tliSwitchPoint(ControlFile->checkPointCopy.ThisTimeLineID, expectedTLEs);
-		ereport(FATAL,
-				(errmsg("requested timeline %u is not a child of this server's history",
-						recoveryTargetTLI),
-				 errdetail("Latest checkpoint is at %X/%X on timeline %u, but in the history of the requested timeline, the server forked off from that timeline at %X/%X",
-						   (uint32) (ControlFile->checkPoint >> 32),
-						   (uint32) ControlFile->checkPoint,
-						   ControlFile->checkPointCopy.ThisTimeLineID,
-						   (uint32) (switchpoint >> 32),
-						   (uint32) switchpoint)));
-	}
-
-	/*
-	 * The min recovery point should be part of the requested timeline's
-	 * history, too.
-	 */
-	if (!XLogRecPtrIsInvalid(ControlFile->minRecoveryPoint) &&
-		tliOfPointInHistory(ControlFile->minRecoveryPoint - 1, expectedTLEs) !=
-			ControlFile->minRecoveryPointTLI)
-		ereport(FATAL,
-				(errmsg("requested timeline %u does not contain minimum recovery point %X/%X on timeline %u",
-						recoveryTargetTLI,
-						(uint32) (ControlFile->minRecoveryPoint >> 32),
-						(uint32) ControlFile->minRecoveryPoint,
-						ControlFile->minRecoveryPointTLI)));
-
 	/*
 	 * Save archive_cleanup_command in shared memory so that other processes
 	 * can see it.
@@ -5443,6 +5420,47 @@ StartupXLOG(void)
 		wasShutdown = (record->xl_info == XLOG_CHECKPOINT_SHUTDOWN);
 	}
 
+	/*
+	 * If the location of the checkpoint record is not on the expected
+	 * timeline in the history of the requested timeline, we cannot proceed:
+	 * the backup is not part of the history of the requested timeline.
+	 */
+	Assert(expectedTLEs); /* was initialized by reading checkpoint record */
+	if (tliOfPointInHistory(checkPointLoc, expectedTLEs) !=
+			checkPoint.ThisTimeLineID)
+	{
+		XLogRecPtr switchpoint;
+
+		/*
+		 * tliSwitchPoint will throw an error if the checkpoint's timeline
+		 * is not in expectedTLEs at all.
+		 */
+		switchpoint = tliSwitchPoint(ControlFile->checkPointCopy.ThisTimeLineID, expectedTLEs);
+		ereport(FATAL,
+				(errmsg("requested timeline %u is not a child of this server's history",
+						recoveryTargetTLI),
+				 errdetail("Latest checkpoint is at %X/%X on timeline %u, but in the history of the requested timeline, the server forked off from that timeline at %X/%X",
+						   (uint32) (ControlFile->checkPoint >> 32),
+						   (uint32) ControlFile->checkPoint,
+						   ControlFile->checkPointCopy.ThisTimeLineID,
+						   (uint32) (switchpoint >> 32),
+						   (uint32) switchpoint)));
+	}
+
+	/*
+	 * The min recovery point should be part of the requested timeline's
+	 * history, too.
+	 */
+	if (!XLogRecPtrIsInvalid(ControlFile->minRecoveryPoint) &&
+		tliOfPointInHistory(ControlFile->minRecoveryPoint - 1, expectedTLEs) !=
+			ControlFile->minRecoveryPointTLI)
+		ereport(FATAL,
+				(errmsg("requested timeline %u does not contain minimum recovery point %X/%X on timeline %u",
+						recoveryTargetTLI,
+						(uint32) (ControlFile->minRecoveryPoint >> 32),
+						(uint32) ControlFile->minRecoveryPoint,
+						ControlFile->minRecoveryPointTLI)));
+
 	LastRec = RecPtr = checkPointLoc;
 
 	ereport(DEBUG1,
@@ -9569,13 +9587,24 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					 */
 					if (PrimaryConnInfo)
 					{
-						XLogRecPtr ptr = fetching_ckpt ? RedoStartLSN : RecPtr;
-						TimeLineID tli = tliOfPointInHistory(ptr, expectedTLEs);
+						XLogRecPtr ptr;
+						TimeLineID tli;
 
-						if (curFileTLI > 0 && tli < curFileTLI)
-							elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
-								 (uint32) (ptr >> 32), (uint32) ptr,
-								 tli, curFileTLI);
+						if (fetching_ckpt)
+						{
+							ptr = RedoStartLSN;
+							tli = ControlFile->checkPointCopy.ThisTimeLineID;
+						}
+						else
+						{
+							ptr = RecPtr;
+							tli = tliOfPointInHistory(ptr, expectedTLEs);
+
+							if (curFileTLI > 0 && tli < curFileTLI)
+								elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
+									 (uint32) (ptr >> 32), (uint32) ptr,
+									 tli, curFileTLI);
+						}
 						curFileTLI = tli;
 						RequestXLogStreaming(curFileTLI, ptr, PrimaryConnInfo);
 					}
@@ -9739,11 +9768,16 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 				{
 					/*
 					 * Great, streamed far enough.  Open the file if it's not
-					 * open already.  Use XLOG_FROM_STREAM so that source info
-					 * is set correctly and XLogReceiptTime isn't changed.
+					 * open already.  Also read the timeline history file if
+					 * we haven't initialized timeline history yet; it should
+					 * be streamed over and present in pg_xlog by now.  Use
+					 * XLOG_FROM_STREAM so that source info is set correctly
+					 * and XLogReceiptTime isn't changed.
 					 */
 					if (readFile < 0)
 					{
+						if (!expectedTLEs)
+							expectedTLEs = readTimeLineHistory(receiveTLI);
 						readFile = XLogFileRead(readSegNo, PANIC,
 												receiveTLI,
 												XLOG_FROM_STREAM, false);
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 326c313..99280d8 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -338,7 +338,7 @@ WalReceiverMain(void)
 		 * ensure that a unique timeline id is chosen in every case, but let's
 		 * avoid the confusion of timeline id collisions where we can.
 		 */
-		WalRcvFetchTimeLineHistoryFiles(startpointTLI + 1, primaryTLI);
+		WalRcvFetchTimeLineHistoryFiles(startpointTLI, primaryTLI);
 
 		/*
 		 * Start streaming.
@@ -627,7 +627,7 @@ WalRcvFetchTimeLineHistoryFiles(TimeLineID first, TimeLineID last)
 
 	for (tli = first; tli <= last; tli++)
 	{
-		if (!existsTimeLineHistory(tli))
+		if (tli != 1 && !existsTimeLineHistory(tli))
 		{
 			char	   *fname;
 			char	   *content;

include-wal-files-from-all-timelines-in-base-backup-1.patchtext/x-diff; name=include-wal-files-from-all-timelines-in-base-backup-1.patchDownload

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f352fd..30d877b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3456,19 +3456,36 @@ PreallocXlogFiles(XLogRecPtr endptr)
 }
 
 /*
- * Get the log/seg of the latest removed or recycled WAL segment.
- * Returns 0/0 if no WAL segments have been removed since startup.
+ * Throws an error if the given log segment has already been removed or
+ * recycled. The caller should only pass a segment that it knows to have
+ * existed while the server has been running, as this function always
+ * succeeds if no WAL segments have been removed since startup.
+ * 'tli' is only used in the error message.
  */
 void
-XLogGetLastRemoved(uint32 *log, uint32 *seg)
+CheckXLogRemoved(uint32 log, uint32 seg, TimeLineID tli)
 {
 	/* use volatile pointer to prevent code rearrangement */
 	volatile XLogCtlData *xlogctl = XLogCtl;
+	uint32		lastRemovedLog,
+				lastRemovedSeg;
 
 	SpinLockAcquire(&xlogctl->info_lck);
-	*log = xlogctl->lastRemovedLog;
-	*seg = xlogctl->lastRemovedSeg;
+	lastRemovedLog = xlogctl->lastRemovedLog;
+	lastRemovedSeg = xlogctl->lastRemovedSeg;
 	SpinLockRelease(&xlogctl->info_lck);
+
+	if (log < lastRemovedLog ||
+		(log == lastRemovedLog && seg <= lastRemovedSeg))
+	{
+		char		filename[MAXFNAMELEN];
+
+		XLogFileName(filename, tli, log, seg);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("requested WAL segment %s has already been removed",
+						filename)));
+	}
 }
 
 /*
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index bc95215..3f46bfc 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,11 +55,10 @@ static void base_backup_cleanup(int code, Datum arg);
 static void perform_base_backup(basebackup_options *opt, DIR *tblspcdir);
 static void parse_basebackup_options(List *options, basebackup_options *opt);
 static void SendXlogRecPtrResult(XLogRecPtr ptr);
+static int compareWalFileNames(const void *a, const void *b);
 
 /*
  * Size of each block sent into the tar stream for larger files.
- *
- * XLogSegSize *MUST* be evenly dividable by this
  */
 #define TAR_SEND_SIZE 32768
 
@@ -219,70 +218,203 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 	{
 		/*
 		 * We've left the last tar file "open", so we can now append the
-		 * required WAL files to it.
+		 * required WAL files to it. I'd rather not worry about timelines
+		 * here, so include WAL files belonging to any timeline, as long as
+		 * it's in the right WAL range, between 'startptr' and 'endptr'.
 		 */
+		char		pathbuf[MAXPGPATH];
 		uint32		logid,
 					logseg;
+		uint32		startlogid,
+					startlogseg;
 		uint32		endlogid,
 					endlogseg;
 		struct stat statbuf;
+		List	   *historyFileList = NIL;
+		List	   *walFileList = NIL;
+		char	  **walFiles;
+		int			nWalFiles;
+		char		firstoff[MAXFNAMELEN];
+		char		lastoff[MAXFNAMELEN];
+		DIR		   *dir;
+		struct dirent *de;
+		int			i;
+		ListCell   *lc;
+		TimeLineID	tli;
 
-		MemSet(&statbuf, 0, sizeof(statbuf));
-		statbuf.st_mode = S_IRUSR | S_IWUSR;
-#ifndef WIN32
-		statbuf.st_uid = geteuid();
-		statbuf.st_gid = getegid();
-#endif
-		statbuf.st_size = XLogSegSize;
-		statbuf.st_mtime = time(NULL);
-
-		XLByteToSeg(startptr, logid, logseg);
+		XLByteToSeg(startptr, startlogid, startlogseg);
+		XLogFileName(firstoff, ThisTimeLineID, startlogid, startlogseg);
 		XLByteToPrevSeg(endptr, endlogid, endlogseg);
-
-		while (true)
+		XLogFileName(lastoff, ThisTimeLineID, endlogid, endlogseg);
+		/* Read list of eligible WAL files into an array */
+		dir = AllocateDir("pg_xlog");
+		if (!dir)
+			ereport(ERROR,
+					(errmsg("could not open directory \"%s\": %m", "pg_xlog")));
+		while ((de = ReadDir(dir, "pg_xlog")) != NULL)
 		{
-			/* Send another xlog segment */
-			char		fn[MAXPGPATH];
-			int			i;
+			/* Does it look like a WAL segment, and is it in the range? */
+			if (strlen(de->d_name) == 24 &&
+				strspn(de->d_name, "0123456789ABCDEF") == 24 &&
+				strcmp(de->d_name + 8, firstoff + 8) >= 0 &&
+				strcmp(de->d_name + 8, lastoff + 8) <= 0)
+			{
+				walFileList = lappend(walFileList, pstrdup(de->d_name));
+			}
+			/* Does it look like a timeline history file? */
+			else if (strlen(de->d_name) == 8 + strlen(".history") &&
+					 strspn(de->d_name, "0123456789ABCDEF") == 8 &&
+					 strcmp(de->d_name + 8, ".history") == 0)
+			{
+				historyFileList = lappend(historyFileList, pstrdup(de->d_name));
+			}
+		}
+		FreeDir(dir);
 
-			XLogFilePath(fn, ThisTimeLineID, logid, logseg);
-			_tarWriteHeader(fn, NULL, &statbuf);
+		/*
+		 * Before we go any further, check that none of the WAL segments we
+		 * need were removed.
+		 */
+		CheckXLogRemoved(startlogid, startlogseg, ThisTimeLineID);
+
+		/*
+		 * Put the WAL filenames into an array, and sort. We send the files
+		 * in order from oldest to newest, to reduce the chance that a file
+		 * is recycled before we get a chance to send it over.
+		 */
+		nWalFiles = list_length(walFileList);
+		walFiles = palloc(nWalFiles * sizeof(char *));
+		i = 0;
+		foreach(lc, walFileList)
+		{
+			walFiles[i++] = lfirst(lc);
+		}
+		qsort(walFiles, nWalFiles, sizeof(char *), compareWalFileNames);
 
-			/* Send the actual WAL file contents, block-by-block */
-			for (i = 0; i < XLogSegSize / TAR_SEND_SIZE; i++)
+		/*
+		 * Sanity check: the first and last segment should include startptr
+		 * and endptr, with no gaps in between.
+		 */
+		XLogFromFileName(walFiles[0], &tli, &logid, &logseg);
+		if (logid != startlogid || logseg != startlogseg)
+		{
+			char startfname[MAXFNAMELEN];
+			XLogFileName(startfname, ThisTimeLineID, startlogid, startlogseg);
+			ereport(ERROR,
+					(errmsg("could not find WAL file %s", startfname)));
+		}
+		for (i = 0; i < nWalFiles; i++)
+		{
+			int		currlogid = logid,
+					currlogseg = logseg;
+			int		nextlogid = logid,
+					nextlogseg = logseg;
+			NextLogSeg(nextlogid, nextlogseg);
+
+			XLogFromFileName(walFiles[i], &tli, &logid, &logseg);
+			if (!((nextlogid == logid && nextlogseg == logseg) ||
+				  (currlogid == logid && currlogseg == logseg)))
 			{
-				char		buf[TAR_SEND_SIZE];
-				XLogRecPtr	ptr;
+				char nextfname[MAXFNAMELEN];
+				XLogFileName(nextfname, ThisTimeLineID, nextlogid, nextlogseg);
+				ereport(ERROR,
+						(errmsg("could not find WAL file %s", nextfname)));
+			}
+		}
+		if (logid != endlogid || logseg != endlogseg)
+		{
+			char endfname[MAXFNAMELEN];
+			XLogFileName(endfname, ThisTimeLineID, endlogid, endlogseg);
+			ereport(ERROR,
+					(errmsg("could not find WAL file %s", endfname)));
+		}
 
-				ptr.xlogid = logid;
-				ptr.xrecoff = logseg * XLogSegSize + TAR_SEND_SIZE * i;
+		/* Ok, we have everything we need. Send the WAL files. */
+		for (i = 0; i < nWalFiles; i++)
+		{
+			FILE	   *fp;
+			char		buf[TAR_SEND_SIZE];
+			size_t		cnt;
+			pgoff_t		len = 0;
+
+			snprintf(pathbuf, MAXPGPATH, "./pg_xlog/%s", walFiles[i]);
+			XLogFromFileName(walFiles[i], &tli, &logid, &logseg);
 
+			fp = AllocateFile(pathbuf, "rb");
+			if (fp == NULL)
+			{
 				/*
-				 * Some old compilers, e.g. gcc 2.95.3/x86, think that passing
-				 * a struct in the same function as a longjump might clobber a
-				 * variable.  bjm 2011-02-04
-				 * http://lists.apple.com/archives/xcode-users/2003/Dec//msg000
-				 * 51.html
+				 * Most likely reason for this is that the file was already
+				 * removed by a checkpoint, so check for that to get a better
+				 * error message.
 				 */
-				XLogRead(buf, ptr, TAR_SEND_SIZE);
-				if (pq_putmessage('d', buf, TAR_SEND_SIZE))
+				CheckXLogRemoved(logid, logseg, tli);
+
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not open file \"%s\": %m", pathbuf)));
+			}
+
+			if (fstat(fileno(fp), &statbuf) != 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not stat file \"%s\": %m",
+								pathbuf)));
+			if (statbuf.st_size != XLogSegSize)
+			{
+				CheckXLogRemoved(logid, logseg, tli);
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("unexpected WAL file size \"%s\"", walFiles[i])));
+			}
+
+			_tarWriteHeader(pathbuf + 1, NULL, &statbuf);
+
+			while ((cnt = fread(buf, 1, Min(sizeof(buf), XLogSegSize - len), fp)) > 0)
+			{
+				CheckXLogRemoved(logid, logseg, tli);
+				/* Send the chunk as a CopyData message */
+				if (pq_putmessage('d', buf, cnt))
 					ereport(ERROR,
 							(errmsg("base backup could not send data, aborting backup")));
+
+				len += cnt;
+				if (len == XLogSegSize)
+					break;
 			}
 
-			/*
-			 * Files are always fixed size, and always end on a 512 byte
-			 * boundary, so padding is never necessary.
-			 */
+			if (len != XLogSegSize)
+			{
+				CheckXLogRemoved(logid, logseg, tli);
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("unexpected WAL file size \"%s\"", walFiles[i])));
+			}
 
+			/* XLogSegSize is a multiple of 512, so no need for padding */
+			FreeFile(fp);
+		}
 
-			/* Advance to the next WAL file */
-			NextLogSeg(logid, logseg);
+		/*
+		 * Send timeline history files too. Only the latest timeline history
+		 * file is required for recovery, and even that only if there happens
+		 * to be a timeline switch in the first WAL segment that contains the
+		 * checkpoint record, or if we're taking a base backup from a standby
+		 * server and the target timeline changes while the backup is taken. 
+		 * But they are small and highly useful for debugging purposes, so
+		 * better include them all, always.
+		 */
+		foreach(lc, historyFileList)
+		{
+			char *fname = lfirst(lc);
+			snprintf(pathbuf, MAXPGPATH, "./pg_xlog/%s", fname);
 
-			/* Have we reached our stop position yet? */
-			if (logid > endlogid ||
-				(logid == endlogid && logseg > endlogseg))
-				break;
+			if (lstat(pathbuf, &statbuf) != 0)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not stat file \"%s\": %m", pathbuf)));
+
+			sendFile(pathbuf, pathbuf + 1, &statbuf, false);
 		}
 
 		/* Send CopyDone message for the last tar file */
@@ -292,6 +424,19 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 }
 
 /*
+ * qsort comparison function, to compare log/seg portion of WAL segment
+ * filenames, ignoring the timeline portion.
+ */
+static int
+compareWalFileNames(const void *a, const void *b)
+{
+	char *fna = *((char **) a);
+	char *fnb = *((char **) b);
+
+	return strcmp(fna + 8, fnb + 8);
+}
+
+/*
  * Parse the base backup options passed down by the parser
  */
 static void
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 6c27449..5c93146 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -977,8 +977,6 @@ XLogRead(char *buf, XLogRecPtr startptr, Size count)
 	char	   *p;
 	XLogRecPtr	recptr;
 	Size		nbytes;
-	uint32		lastRemovedLog;
-	uint32		lastRemovedSeg;
 	uint32		log;
 	uint32		seg;
 
@@ -1073,19 +1071,8 @@ retry:
 	 * read() succeeds in that case, but the data we tried to read might
 	 * already have been overwritten with new WAL records.
 	 */
-	XLogGetLastRemoved(&lastRemovedLog, &lastRemovedSeg);
 	XLByteToSeg(startptr, log, seg);
-	if (log < lastRemovedLog ||
-		(log == lastRemovedLog && seg <= lastRemovedSeg))
-	{
-		char		filename[MAXFNAMELEN];
-
-		XLogFileName(filename, ThisTimeLineID, log, seg);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("requested WAL segment %s has already been removed",
-						filename)));
-	}
+	CheckXLogRemoved(log, seg, ThisTimeLineID);
 
 	/*
 	 * During recovery, the currently-open WAL file might be replaced with the
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index ecd3f0f..c21e43a 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -275,7 +275,7 @@ extern int XLogFileInit(uint32 log, uint32 seg,
 extern int	XLogFileOpen(uint32 log, uint32 seg);
 
 
-extern void XLogGetLastRemoved(uint32 *log, uint32 *seg);
+extern void CheckXLogRemoved(uint32 log, uint32 seg, TimeLineID tli);
 extern void XLogSetAsyncXactLSN(XLogRecPtr record);
 
 extern Buffer RestoreBackupBlock(XLogRecPtr lsn, XLogRecord *record,

#14

Greg Stark

stark@mit.edu

about 13 years ago

In reply to: Heikki Linnakangas (#13)

Re: pg_basebackup from cascading standby after timeline switch

On Wed, Jan 2, 2013 at 1:55 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

If you take a backup with "pg_basebackup -X fetch", and the timeline
switches while the backup is taken, you currently get an error like
"requested WAL segment 00000001000000000000000C has already been removed".
To fix, let's change the server-side support of "-X fetch" to include all
WAL files between the backup start and end pointers, regardless of
timelines. I'm thinking of doing this by scanning pg_xlog with readdir(),
and sending over any files in that range. Another option would be to read
and parse the timeline history file to figure out the exact filenames
expected, but the readdir() approach seems simpler.

I'm not clear what you mean by "any files in that range". There could
be other timelines in the archive that aren't relevant to the restore
at all. For example if the database you're requesting a backup from
has previously been restored from an old backup the archive could have
archives from the original timeline as well as the active timeline.

I'm trying to wrap my head around what other combinations are
possible. Is it possible there have been other false starts or
multiple timeline switches during the time the backup was being taken?
At first blush I think not, I think it's only possible for there to be
one timeline switch and it would be when a standby database was being
backed up and is activated while the backup was being taken.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers