v12 and TimeLine switches and backups/restores

Started by Stephen Frostover 5 years ago7 messages
#1Stephen Frost
sfrost@snowman.net
1 attachment(s)

Greetings,

Among the changes made to PG's recovery in v12 was to set
recovery_target_timeline to be 'latest' by default. That's handy when
you're flipping back and forth between replicas and want to have
everyone follow that game, but it's made doing some basic things like
restoring from a backup problematic.

Specifically, if you take a backup off a primary and, while that backup
is going on, some replica is promoted and drops a .history file into the
WAL repo, that backup is no longer able to be restored with the new
recovery_target_timeline default. What happens is that the restore
process will happily follow the timeline change- even though it happened
before we reached consistency, and then it'll never find the needed
end-of-backup WAL point that would allow us to reach consistency.

Naturally, a primary isn't ever going to do a TL switch, and we already
throw an error during an online backup from a replica if that replica
did a TL switch during the backup, to indicate that the backup isn't
valid.

Attached is an initial draft of a patch to at least give a somewhat
clearer error message when we detect that the user has asked us to
follow a timeline switch to a new timeline before we've reached
consistency (though I had to hack in a check to see if pg_rewind is
being used, since apparently it actually depends on PG following a
timeline switch before reaching consistency...).

Thoughts?

Thanks,

Stephen

Attachments:

error-on-TL-switch-backup-end-of-backup_v1.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
new file mode 100644
index fd93bcf..5dd777f
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** static void SetCurrentChunkStartTime(Tim
*** 896,902 ****
  static void CheckRequiredParameterValues(void);
  static void XLogReportParameters(void);
  static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
! 								TimeLineID prevTLI);
  static void LocalSetXLogInsertAllowed(void);
  static void CreateEndOfRecoveryRecord(void);
  static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
--- 896,902 ----
  static void CheckRequiredParameterValues(void);
  static void XLogReportParameters(void);
  static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
! 								TimeLineID prevTLI, bool pgrewind_replay);
  static void LocalSetXLogInsertAllowed(void);
  static void CreateEndOfRecoveryRecord(void);
  static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
*************** static void xlog_outdesc(StringInfo buf,
*** 946,952 ****
  static void pg_start_backup_callback(int code, Datum arg);
  static void pg_stop_backup_callback(int code, Datum arg);
  static bool read_backup_label(XLogRecPtr *checkPointLoc,
! 							  bool *backupEndRequired, bool *backupFromStandby);
  static bool read_tablespace_map(List **tablespaces);
  
  static void rm_redo_error_callback(void *arg);
--- 946,952 ----
  static void pg_start_backup_callback(int code, Datum arg);
  static void pg_stop_backup_callback(int code, Datum arg);
  static bool read_backup_label(XLogRecPtr *checkPointLoc,
! 							  bool *backupEndRequired, bool *backupFromStandby, bool *pgrewind_replay);
  static bool read_tablespace_map(List **tablespaces);
  
  static void rm_redo_error_callback(void *arg);
*************** StartupXLOG(void)
*** 6319,6324 ****
--- 6319,6325 ----
  	TransactionId oldestActiveXID;
  	bool		backupEndRequired = false;
  	bool		backupFromStandby = false;
+ 	bool		pgrewind_replay = false;
  	DBState		dbstate_at_startup;
  	XLogReaderState *xlogreader;
  	XLogPageReadPrivate private;
*************** StartupXLOG(void)
*** 6506,6512 ****
  	master_image_masked = (char *) palloc(BLCKSZ);
  
  	if (read_backup_label(&checkPointLoc, &backupEndRequired,
! 						  &backupFromStandby))
  	{
  		List	   *tablespaces = NIL;
  
--- 6507,6513 ----
  	master_image_masked = (char *) palloc(BLCKSZ);
  
  	if (read_backup_label(&checkPointLoc, &backupEndRequired,
! 						  &backupFromStandby, &pgrewind_replay))
  	{
  		List	   *tablespaces = NIL;
  
*************** StartupXLOG(void)
*** 7297,7303 ****
  					if (newTLI != ThisTimeLineID)
  					{
  						/* Check that it's OK to switch to this TLI */
! 						checkTimeLineSwitch(EndRecPtr, newTLI, prevTLI);
  
  						/* Following WAL records should be run with new TLI */
  						ThisTimeLineID = newTLI;
--- 7298,7304 ----
  					if (newTLI != ThisTimeLineID)
  					{
  						/* Check that it's OK to switch to this TLI */
! 						checkTimeLineSwitch(EndRecPtr, newTLI, prevTLI, pgrewind_replay);
  
  						/* Following WAL records should be run with new TLI */
  						ThisTimeLineID = newTLI;
*************** UpdateFullPageWrites(void)
*** 9841,9847 ****
   * replay. (Currently, timeline can only change at a shutdown checkpoint).
   */
  static void
! checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI, TimeLineID prevTLI)
  {
  	/* Check that the record agrees on what the current (old) timeline is */
  	if (prevTLI != ThisTimeLineID)
--- 9842,9848 ----
   * replay. (Currently, timeline can only change at a shutdown checkpoint).
   */
  static void
! checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI, TimeLineID prevTLI, bool pgrewind_replay)
  {
  	/* Check that the record agrees on what the current (old) timeline is */
  	if (prevTLI != ThisTimeLineID)
*************** checkTimeLineSwitch(XLogRecPtr lsn, Time
*** 9877,9882 ****
--- 9878,9894 ----
  						(uint32) minRecoveryPoint,
  						minRecoveryPointTLI)));
  
+ 	/*
+ 	 * If we have not yet found the end of a backup that had been performed
+ 	 * then switching to a new timeline is similairly trouble, as there's
+ 	 * zero chance we'll find the end-of-backup WAL record on the new timeline.
+ 	 */
+ 	if (InArchiveRecovery && !reachedConsistency && !pgrewind_replay)
+ 		ereport(PANIC,
+ 				(errmsg("unexpected timeline ID %u in checkpoint record, before finding end-of-backup WAL record on timeline %u",
+ 						newTLI,
+ 						ThisTimeLineID)));
+ 
  	/* Looks good */
  }
  
*************** GetOldestRestartPoint(XLogRecPtr *oldrec
*** 11541,11547 ****
   */
  static bool
  read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
! 				  bool *backupFromStandby)
  {
  	char		startxlogfilename[MAXFNAMELEN];
  	TimeLineID	tli_from_walseg,
--- 11553,11559 ----
   */
  static bool
  read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
! 				  bool *backupFromStandby, bool *pgrewind_replay)
  {
  	char		startxlogfilename[MAXFNAMELEN];
  	TimeLineID	tli_from_walseg,
*************** read_backup_label(XLogRecPtr *checkPoint
*** 11599,11604 ****
--- 11611,11618 ----
  	{
  		if (strcmp(backuptype, "streamed") == 0)
  			*backupEndRequired = true;
+ 		if (strcmp(backuptype, "pg_rewind") == 0)
+ 			*pgrewind_replay = true;
  	}
  
  	if (fscanf(lfp, "BACKUP FROM: %19s\n", backupfrom) == 1)
#2Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#1)
Re: v12 and TimeLine switches and backups/restores

On Wed, Jul 1, 2020 at 12:12 AM Stephen Frost <sfrost@snowman.net> wrote:

Among the changes made to PG's recovery in v12 was to set
recovery_target_timeline to be 'latest' by default. That's handy when
you're flipping back and forth between replicas and want to have
everyone follow that game, but it's made doing some basic things like
restoring from a backup problematic.

Specifically, if you take a backup off a primary and, while that backup
is going on, some replica is promoted and drops a .history file into the
WAL repo, that backup is no longer able to be restored with the new
recovery_target_timeline default. What happens is that the restore
process will happily follow the timeline change- even though it happened
before we reached consistency, and then it'll never find the needed
end-of-backup WAL point that would allow us to reach consistency.

Ouch. Should we revert that change rather than doing this? Seems like
this might create a lot of problems for people, and they might be
problems that happen rarely enough that it looks like it's working
until it doesn't. What's the fix, if you hit the error? Add
recovery_target_timeline=<the correct timeline> to
postgresql.auto.conf?

Typo: similairly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#2)
Re: v12 and TimeLine switches and backups/restores

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Wed, Jul 1, 2020 at 12:12 AM Stephen Frost <sfrost@snowman.net> wrote:

Among the changes made to PG's recovery in v12 was to set
recovery_target_timeline to be 'latest' by default. That's handy when
you're flipping back and forth between replicas and want to have
everyone follow that game, but it's made doing some basic things like
restoring from a backup problematic.

Specifically, if you take a backup off a primary and, while that backup
is going on, some replica is promoted and drops a .history file into the
WAL repo, that backup is no longer able to be restored with the new
recovery_target_timeline default. What happens is that the restore
process will happily follow the timeline change- even though it happened
before we reached consistency, and then it'll never find the needed
end-of-backup WAL point that would allow us to reach consistency.

Ouch. Should we revert that change rather than doing this? Seems like
this might create a lot of problems for people, and they might be
problems that happen rarely enough that it looks like it's working
until it doesn't. What's the fix, if you hit the error? Add
recovery_target_timeline=<the correct timeline> to
postgresql.auto.conf?

I don't really think reverting the change to make following the latest
timeline would end up being terribly helpful- an awful lot of systems
are going to be running with that anyway for HA and such, so it seems
like something we just need to deal with. As such, it seems like this
is also something that would need to be back-patched, though I've not
looked at how much effort that'll be (yet), since it probably makes
sense to get agreement on if this approach is the best first.

There's two solutions, really- first would be, as you suggest, configure
PG to stay on the timeline that the backup was taken on, but I suspect
that's often *not* what the user actually wants- what they really want
is to restore an earlier backup (one taken before the TL switch) and
then have PG follow the timeline switch when it comes across it. We're
looking at having pgbackrest automatically pick the correct backup to be
able to make that happen when someone requests timeline-latest (pretty
handy having a repo full of backups that allow us to pick the right one
based on what the user's request is).

There's another option here, though I rejected it, which is that we
could possibly force the restore to ignore a TL switch before reaching
consistency, but if we do that then, sure, we'll finish the restore but
we won't be on the TL that the user asked us to be, and we wouldn't be
able to follow a primary that's on that TL, so ultimately the restore
wouldn't actually be what the user wanted. There's really not an option
to do what the user wanted except to find an earlier backup to restore,
so that's why I'm proposing that if we hit this situation we just PANIC.

Typo: similairly.

Fixed locally.

Thanks!

Stephen

#4Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#3)
Re: v12 and TimeLine switches and backups/restores

On Wed, Jul 1, 2020 at 4:02 PM Stephen Frost <sfrost@snowman.net> wrote:

There's two solutions, really- first would be, as you suggest, configure
PG to stay on the timeline that the backup was taken on, but I suspect
that's often *not* what the user actually wants- what they really want
is to restore an earlier backup (one taken before the TL switch) and
then have PG follow the timeline switch when it comes across it.

It seems, though, that if it IS what the user actually wants, they're
now going to get the wrong behavior by default, and that seems pretty
undesirable.

There's another option here, though I rejected it, which is that we
could possibly force the restore to ignore a TL switch before reaching
consistency, but if we do that then, sure, we'll finish the restore but
we won't be on the TL that the user asked us to be, and we wouldn't be
able to follow a primary that's on that TL, so ultimately the restore
wouldn't actually be what the user wanted. There's really not an option
to do what the user wanted except to find an earlier backup to restore,
so that's why I'm proposing that if we hit this situation we just PANIC.

I'm not sure I really believe this. If someone tries to configure a
backup without inserting a non-default setting of
recovery_target_timeline, is it more likely that they want backup
restoration to fail, or that they want to recover from the timeline
that will let backup restoration succeed? You're arguing for the
former, but my instinct was the latter. Perhaps we need to hear some
other opinions.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#4)
Re: v12 and TimeLine switches and backups/restores

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Wed, Jul 1, 2020 at 4:02 PM Stephen Frost <sfrost@snowman.net> wrote:

There's two solutions, really- first would be, as you suggest, configure
PG to stay on the timeline that the backup was taken on, but I suspect
that's often *not* what the user actually wants- what they really want
is to restore an earlier backup (one taken before the TL switch) and
then have PG follow the timeline switch when it comes across it.

It seems, though, that if it IS what the user actually wants, they're
now going to get the wrong behavior by default, and that seems pretty
undesirable.

Well, even if we revert the change to the default of target_timeline, it
seems like we should still add the check that I'm proposing, to address
the case where someone explicitly asks for the latest timeline.

There's another option here, though I rejected it, which is that we
could possibly force the restore to ignore a TL switch before reaching
consistency, but if we do that then, sure, we'll finish the restore but
we won't be on the TL that the user asked us to be, and we wouldn't be
able to follow a primary that's on that TL, so ultimately the restore
wouldn't actually be what the user wanted. There's really not an option
to do what the user wanted except to find an earlier backup to restore,
so that's why I'm proposing that if we hit this situation we just PANIC.

I'm not sure I really believe this. If someone tries to configure a
backup without inserting a non-default setting of
recovery_target_timeline, is it more likely that they want backup
restoration to fail, or that they want to recover from the timeline
that will let backup restoration succeed? You're arguing for the
former, but my instinct was the latter. Perhaps we need to hear some
other opinions.

Ultimately depends on if the user is knowledgable regarding what the
default is, or not. I'm going off the expectation that they know what
the default value is and the other argument is that they have no idea
what the default is and just expect the restore to work- which isn't a
wrong position to take, but the entire situation is only going to
happen if there's been a promotion involving a replica in the first
place, and that newly-promoted-replica pushed a .history file into the
same WAL repo that this server is following the WAL from, and if you're
running with replicas and you promote them, you probably do want to be
using a target timeline of 'latest' or your replicas won't follow those
timeline switches.

Changing the default now in a back-patch would actively break such
setups that are working now in a very non-obvious way too, only to be
discovered when a replica is promoted and another replica stops keeping
up because it keeps on its current timeline.

In the above situation, the restore will fail either way from what I've
seen- if we hit end-of-WAL before reaching consistency then we'll PANIC,
or if we come across a SHUTDOWN record, we'll also PANIC, so it's not
like the user is going to get a successful restore that's just
corrupted, thankfully. Catching this earlier with a clearer error
message, as I'm proposing here, seems like it would generally be helpful
though (perhaps with an added HINT: use an earlier backup to restore
from...).

Thanks,

Stephen

#6Michael Banck
michael.banck@credativ.de
In reply to: Stephen Frost (#1)
Re: v12 and TimeLine switches and backups/restores

Hi,

On Wed, Jul 01, 2020 at 12:12:14AM -0400, Stephen Frost wrote:

Specifically, if you take a backup off a primary and, while that backup
is going on, some replica is promoted and drops a .history file into the
WAL repo, that backup is no longer able to be restored with the new
recovery_target_timeline default.

Quick question to grasp the magnitude of this:

If a user takes a backup with pg_basebackup in streaming mode, would
that still be a problem? Or is this "only" a problem for base backups
which go through a wal archive common between primary and standby?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#7Stephen Frost
sfrost@snowman.net
In reply to: Michael Banck (#6)
Re: v12 and TimeLine switches and backups/restores

Greetings,

* Michael Banck (michael.banck@credativ.de) wrote:

On Wed, Jul 01, 2020 at 12:12:14AM -0400, Stephen Frost wrote:

Specifically, if you take a backup off a primary and, while that backup
is going on, some replica is promoted and drops a .history file into the
WAL repo, that backup is no longer able to be restored with the new
recovery_target_timeline default.

Quick question to grasp the magnitude of this:

If a user takes a backup with pg_basebackup in streaming mode, would
that still be a problem? Or is this "only" a problem for base backups
which go through a wal archive common between primary and standby?

That's a bit complicated to answer.

1) If the pg_basebackup is taken off of the primary, or some
non-promoted replica, and the user fetches WAL during the backup and
does *not* configure a restore_command, then the backup should restore
just fine using the WAL that was fetched/streamed from the primary,
along the original timeline. Of course, that system won't be then able
to follow the new primary that was promoted during the backup.

2) If the pg_basebackup is taken off of the primary, or some other
replica, and the user *does* configure a restore_command, and a
promotion happens during the backup and that former-replica then pushes
a .history file into the repo that the restore_command is configured to
use, then I'm pretty sure this issue would be hit during the restore
(though I haven't specifically tested that, but we do go out and look
for timelines pretty early on).

3) If the pg_basebackup is taken off of the replica that's promoted, the
pg_basebackup will actually fail and error and there won't be a valid
backup in the first place.

Thanks,

Stephen