Error restoring from a base backup taken from standby

Started by Heikki Linnakangasabout 13 years ago6 messages
#1Heikki Linnakangas
hlinnakangas@vmware.com
2 attachment(s)

(This is different from the other issue related to timeline switches I
just posted about. There's no timeline switch involved in this one.)

If you do "pg_basebackup -x" against a standby server, in some
circumstances the backup fails to restore with error like this:

C 2012-12-17 19:09:44.042 EET 7832 LOG: database system was not
properly shut down; automatic recovery in progress
C 2012-12-17 19:09:44.091 EET 7832 LOG: record with zero length at
0/1764F48
C 2012-12-17 19:09:44.091 EET 7832 LOG: redo is not required
C 2012-12-17 19:09:44.091 EET 7832 FATAL: WAL ends before end of online
backup
C 2012-12-17 19:09:44.091 EET 7832 HINT: All WAL generated while online
backup was taken must be available at recovery.
C 2012-12-17 19:09:44.092 EET 7831 LOG: startup process (PID 7832)
exited with exit code 1
C 2012-12-17 19:09:44.092 EET 7831 LOG: aborting startup due to startup
process failure

I spotted this bug while reading the code, and it took me quite a while
to actually construct a test case to reproduce the bug, so let me begin
by discussing the code where the bug is. You get the above error, "WAL
ends before end of online backup", when you reach the end of WAL before
reaching the backupEndPoint stored in the control file, which originally
comes from the backup_label file. backupEndPoint is only used in a base
backup taken from a standby, in a base backup taken from the master, the
end-of-backup WAL record is used instead to mark the end of backup. In
the xlog redo loop, after replaying each record, we check if we've just
reached backupEndPoint, and clear it from the control file if we have.
Now the problem is, if there are no WAL records after the checkpoint
redo point, we never even enter the redo loop, so backupEndPoint is not
cleared even though it's reached immediately after reading the initial
checkpoint record.

To deal with the similar situation wrt. reaching consistency for hot
standby purposes, we call CheckRecoveryConsistency() before the redo
loop. The straightforward fix is to copy-paste the check for
backupEndPoint to just before the redo loop, next to the
CheckRecoveryConsistency() call. Even better, I think we should move the
backupEndPoint check into CheckRecoveryConsistency(). It's already
responsible for keeping track of whether minRecoveryPoint has been
reached, so it seems like a good idea to do this check there as well.

Attached is a patch for that (for 9.2), as well as a script I used to
reproduce the bug. The script is a bit messy, and requires tweaking the
paths at the top. Anyone spot a problem with this?

- Heikki

Attachments:

fix-end-of-standby-backup-1.patchtext/x-diff; name=fix-end-of-standby-backup-1.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 702487b..1d1e47b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6776,27 +6776,6 @@ StartupXLOG(void)
 				/* Pop the error context stack */
 				error_context_stack = errcontext.previous;
 
-				if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) &&
-					XLByteLE(ControlFile->backupEndPoint, EndRecPtr))
-				{
-					/*
-					 * We have reached the end of base backup, the point where
-					 * the minimum recovery point in pg_control indicates. The
-					 * data on disk is now consistent. Reset backupStartPoint
-					 * and backupEndPoint.
-					 */
-					elog(DEBUG1, "end of backup reached");
-
-					LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-
-					MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr));
-					MemSet(&ControlFile->backupEndPoint, 0, sizeof(XLogRecPtr));
-					ControlFile->backupEndRequired = false;
-					UpdateControlFile();
-
-					LWLockRelease(ControlFileLock);
-				}
-
 				/*
 				 * Update lastReplayedEndRecPtr after this record has been
 				 * successfully replayed.
@@ -7177,6 +7156,34 @@ CheckRecoveryConsistency(void)
 		return;
 
 	/*
+	 * Have we reached the point where our base backup was completed?
+	 */
+	if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) &&
+		XLByteLE(ControlFile->backupEndPoint, EndRecPtr))
+	{
+		/*
+		 * We have reached the end of base backup, as indicated by pg_control.
+		 * The data on disk is now consistent. Reset backupStartPoint and
+		 * backupEndPoint, and update minRecoveryPoint to make sure we don't
+		 * allow starting up at an earlier point even if recovery is stopped
+		 * and restarted soon after this.
+		 */
+		elog(DEBUG1, "end of backup reached");
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+		if (XLByteLT(ControlFile->minRecoveryPoint, EndRecPtr))
+			ControlFile->minRecoveryPoint = EndRecPtr;
+
+		MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr));
+		MemSet(&ControlFile->backupEndPoint, 0, sizeof(XLogRecPtr));
+		ControlFile->backupEndRequired = false;
+		UpdateControlFile();
+
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
 	 * Have we passed our safe starting point? Note that minRecoveryPoint
 	 * is known to be incorrectly set if ControlFile->backupEndRequired,
 	 * until the XLOG_BACKUP_RECORD arrives to advise us of the correct
fix-end-of-standby-backup-1.shapplication/x-sh; name=fix-end-of-standby-backup-1.shDownload
#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#1)
Re: Error restoring from a base backup taken from standby

On 17 December 2012 17:39, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

(This is different from the other issue related to timeline switches I just
posted about. There's no timeline switch involved in this one.)

If you do "pg_basebackup -x" against a standby server, in some circumstances
the backup fails to restore with error like this:

C 2012-12-17 19:09:44.042 EET 7832 LOG: database system was not properly
shut down; automatic recovery in progress
C 2012-12-17 19:09:44.091 EET 7832 LOG: record with zero length at
0/1764F48
C 2012-12-17 19:09:44.091 EET 7832 LOG: redo is not required
C 2012-12-17 19:09:44.091 EET 7832 FATAL: WAL ends before end of online
backup
C 2012-12-17 19:09:44.091 EET 7832 HINT: All WAL generated while online
backup was taken must be available at recovery.
C 2012-12-17 19:09:44.092 EET 7831 LOG: startup process (PID 7832) exited
with exit code 1
C 2012-12-17 19:09:44.092 EET 7831 LOG: aborting startup due to startup
process failure

I spotted this bug while reading the code, and it took me quite a while to
actually construct a test case to reproduce the bug, so let me begin by
discussing the code where the bug is. You get the above error, "WAL ends
before end of online backup", when you reach the end of WAL before reaching
the backupEndPoint stored in the control file, which originally comes from
the backup_label file. backupEndPoint is only used in a base backup taken
from a standby, in a base backup taken from the master, the end-of-backup
WAL record is used instead to mark the end of backup. In the xlog redo loop,
after replaying each record, we check if we've just reached backupEndPoint,
and clear it from the control file if we have. Now the problem is, if there
are no WAL records after the checkpoint redo point, we never even enter the
redo loop, so backupEndPoint is not cleared even though it's reached
immediately after reading the initial checkpoint record.

To deal with the similar situation wrt. reaching consistency for hot standby
purposes, we call CheckRecoveryConsistency() before the redo loop. The
straightforward fix is to copy-paste the check for backupEndPoint to just
before the redo loop, next to the CheckRecoveryConsistency() call. Even
better, I think we should move the backupEndPoint check into
CheckRecoveryConsistency(). It's already responsible for keeping track of
whether minRecoveryPoint has been reached, so it seems like a good idea to
do this check there as well.

Attached is a patch for that (for 9.2), as well as a script I used to
reproduce the bug. The script is a bit messy, and requires tweaking the
paths at the top. Anyone spot a problem with this?

Yep. The problem is one of design, not merely a coding error.

The design assumes that the master is up, connected and doing work.
Which is perfectly reasonable, since this is the reason we are doing a
backup from the standby. But obviously not always true, so what we're
saying is that the selection of backupEndPoint is incorrect in the
first place. Putting code in to cope with that poor choice at recovery
seems wrong - we should record the correct location.

Comments in do_pg_start_backup() say this

* We return the current minimum recovery point as the backup end
* location. Note that it can be greater than the exact backup end
* location if the minimum recovery point is updated after the backup of
* pg_control. This is harmless for current uses.

Which would stay wrong if we apply the proposed patch.

I think we should record the most recently received LSN, which I
notice is not the misnamed receivedUpto in xlog.c
Misnamed because it refers to what has been received and written
rather than what has been received.

So I suggest using latestWalEnd if streaming or similar for file access.

In passing, I see another problem as well. We assume that the backup
is still valid if we promote the standby to a master during the
backup, but that isn't documented and so isn't full analysed to see
that is true. I haven't tried to construct a failure case out of that,
as yet.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Simon Riggs (#2)
Re: Error restoring from a base backup taken from standby

On 18.12.2012 00:35, Simon Riggs wrote:

On 17 December 2012 17:39, Heikki Linnakangas<hlinnakangas@vmware.com> wrote:

(This is different from the other issue related to timeline switches I just
posted about. There's no timeline switch involved in this one.)

If you do "pg_basebackup -x" against a standby server, in some circumstances
the backup fails to restore with error like this:

C 2012-12-17 19:09:44.042 EET 7832 LOG: database system was not properly
shut down; automatic recovery in progress
C 2012-12-17 19:09:44.091 EET 7832 LOG: record with zero length at
0/1764F48
C 2012-12-17 19:09:44.091 EET 7832 LOG: redo is not required
C 2012-12-17 19:09:44.091 EET 7832 FATAL: WAL ends before end of online
backup
C 2012-12-17 19:09:44.091 EET 7832 HINT: All WAL generated while online
backup was taken must be available at recovery.
C 2012-12-17 19:09:44.092 EET 7831 LOG: startup process (PID 7832) exited
with exit code 1
C 2012-12-17 19:09:44.092 EET 7831 LOG: aborting startup due to startup
process failure

I spotted this bug while reading the code, and it took me quite a while to
actually construct a test case to reproduce the bug, so let me begin by
discussing the code where the bug is. You get the above error, "WAL ends
before end of online backup", when you reach the end of WAL before reaching
the backupEndPoint stored in the control file, which originally comes from
the backup_label file. backupEndPoint is only used in a base backup taken
from a standby, in a base backup taken from the master, the end-of-backup
WAL record is used instead to mark the end of backup. In the xlog redo loop,
after replaying each record, we check if we've just reached backupEndPoint,
and clear it from the control file if we have. Now the problem is, if there
are no WAL records after the checkpoint redo point, we never even enter the
redo loop, so backupEndPoint is not cleared even though it's reached
immediately after reading the initial checkpoint record.

To deal with the similar situation wrt. reaching consistency for hot standby
purposes, we call CheckRecoveryConsistency() before the redo loop. The
straightforward fix is to copy-paste the check for backupEndPoint to just
before the redo loop, next to the CheckRecoveryConsistency() call. Even
better, I think we should move the backupEndPoint check into
CheckRecoveryConsistency(). It's already responsible for keeping track of
whether minRecoveryPoint has been reached, so it seems like a good idea to
do this check there as well.

Attached is a patch for that (for 9.2), as well as a script I used to
reproduce the bug. The script is a bit messy, and requires tweaking the
paths at the top. Anyone spot a problem with this?

Yep. The problem is one of design, not merely a coding error.

The design assumes that the master is up, connected and doing work.
Which is perfectly reasonable, since this is the reason we are doing a
backup from the standby. But obviously not always true, so what we're
saying is that the selection of backupEndPoint is incorrect in the
first place. Putting code in to cope with that poor choice at recovery
seems wrong - we should record the correct location.

Hmm, pg_controlinfo says:

pg_control version number: 922
Catalog version number: 201204301
Database system identifier: 5823207860608399803
Database cluster state: in crash recovery
pg_control last modified: ti 18. joulukuuta 2012 11.00.58
Latest checkpoint location: 0/1764EE8
Prior checkpoint location: 0/1764EE8
Latest checkpoint's REDO location: 0/1764EE8
Latest checkpoint's TimeLineID: 1
...
Minimum recovery ending location: 0/1764F48
Backup start location: 0/1764EE8
Backup end location: 0/1764F48
...

That seems correct to me. The backup is considered valid after reaching
0/1764F48, which is where the checkpoint record ends. minRecoveryPoint
is set to the same. What do you think it should be set to?

In passing, I see another problem as well. We assume that the backup
is still valid if we promote the standby to a master during the
backup, but that isn't documented and so isn't full analysed to see
that is true.

No, that is taken care of in do_pg_stop_backup:

/*
* Parse the BACKUP FROM line. If we are taking an online backup from the
* standby, we confirm that the standby has not been promoted during the
* backup.
*/
ptr = strstr(remaining, "BACKUP FROM:");
if (!ptr || sscanf(ptr, "BACKUP FROM: %19s\n", backupfrom) != 1)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#3)
Re: Error restoring from a base backup taken from standby

On 18 December 2012 09:18, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

That seems correct to me. The backup is considered valid after reaching
0/1764F48, which is where the checkpoint record ends. minRecoveryPoint is
set to the same. What do you think it should be set to?

I already said?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Simon Riggs (#4)
Re: Error restoring from a base backup taken from standby

On 18.12.2012 11:30, Simon Riggs wrote:

On 18 December 2012 09:18, Heikki Linnakangas<hlinnakangas@vmware.com> wrote:

That seems correct to me. The backup is considered valid after reaching
0/1764F48, which is where the checkpoint record ends. minRecoveryPoint is
set to the same. What do you think it should be set to?

I already said?

In this situation, there are no more WAL records in the standby the
backup was taken from, after the checkpoint record. Using receivedUpto
would yield the same value, and you'd still have the same problem.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: Error restoring from a base backup taken from standby

On Tue, Dec 18, 2012 at 2:39 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

(This is different from the other issue related to timeline switches I just
posted about. There's no timeline switch involved in this one.)

If you do "pg_basebackup -x" against a standby server, in some circumstances
the backup fails to restore with error like this:

C 2012-12-17 19:09:44.042 EET 7832 LOG: database system was not properly
shut down; automatic recovery in progress
C 2012-12-17 19:09:44.091 EET 7832 LOG: record with zero length at
0/1764F48
C 2012-12-17 19:09:44.091 EET 7832 LOG: redo is not required
C 2012-12-17 19:09:44.091 EET 7832 FATAL: WAL ends before end of online
backup
C 2012-12-17 19:09:44.091 EET 7832 HINT: All WAL generated while online
backup was taken must be available at recovery.
C 2012-12-17 19:09:44.092 EET 7831 LOG: startup process (PID 7832) exited
with exit code 1
C 2012-12-17 19:09:44.092 EET 7831 LOG: aborting startup due to startup
process failure

I spotted this bug while reading the code, and it took me quite a while to
actually construct a test case to reproduce the bug, so let me begin by
discussing the code where the bug is. You get the above error, "WAL ends
before end of online backup", when you reach the end of WAL before reaching
the backupEndPoint stored in the control file, which originally comes from
the backup_label file. backupEndPoint is only used in a base backup taken
from a standby, in a base backup taken from the master, the end-of-backup
WAL record is used instead to mark the end of backup. In the xlog redo loop,
after replaying each record, we check if we've just reached backupEndPoint,
and clear it from the control file if we have. Now the problem is, if there
are no WAL records after the checkpoint redo point, we never even enter the
redo loop, so backupEndPoint is not cleared even though it's reached
immediately after reading the initial checkpoint record.

Good catch!

To deal with the similar situation wrt. reaching consistency for hot standby
purposes, we call CheckRecoveryConsistency() before the redo loop. The
straightforward fix is to copy-paste the check for backupEndPoint to just
before the redo loop, next to the CheckRecoveryConsistency() call. Even
better, I think we should move the backupEndPoint check into
CheckRecoveryConsistency(). It's already responsible for keeping track of
whether minRecoveryPoint has been reached, so it seems like a good idea to
do this check there as well.

Attached is a patch for that (for 9.2), as well as a script I used to
reproduce the bug.

The patch looks good to me.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers