xlog.c: removing ReadRecPtr and EndRecPtr

Started by Robert Haasabout 4 years ago13 messages

robertmhaas@gmail.com

about 4 years ago

1 attachment(s)

I spent a lot of time trying to figure out why xlog.c has global
variables ReadRecPtr and EndRecPtr instead of just relying on the
eponymous structure members inside the XLogReaderState. I concluded
that the values are the same at most points in the code, and thus that
we could just use xlogreaderstate->{Read,End}RecPtr instead. There are
two places where this wouldn't produce the same results we're getting
today. Both of those appear to be bugs.

The reason why it's generally the case that ReadRecPtr ==
xlogreaderstate->ReadRecPtr and likewise for EndRecPtr is that
ReadRecord() calls XLogReadRecord(), which sets the values inside the
XLogReaderState, and then immediately assigns the values stored there
to the global variables. There's no other code that changes the other
global variables, and the only other code that changes the structure
members is XLogBeginRead(). So the values can be unequal from the time
XLogBeginRead() is called up until the time that the XLogReadRecord()
call inside ReadRecord() returns. In practice, StartupXLOG() always
calls ReadRecord() right after it calls XLogBeginRead(), and
ReadRecord() does not reference either global variable before calling
XLogReadRecord(), so the problem surface is limited to code that runs
underneath XLogReadRecord(). XLogReadRecord() is part of xlogreader.c,
but it uses a callback interface: the callback is XLogPageRead(),
which itself references EndRecPtr, and also calls
WaitForWALToBecomeAvailable(), which in turn calls
rescanLatestTimeLine(), which also references EndRecPtr. So these are
the two problem cases: XLogPageRead(), and rescanLatestTimeLine().

In rescanLatestTimeLine(), the problem is IMHO probably serious enough
to justify a separate commit with back-patching. The problem is that
EndRecPtr is being used here to reject impermissible attempts to
switch to a bad timeline, but if pg_wal starts out empty, EndRecPtr
will be 0 here, which causes the code to fail to detect a case that
should be prohibited. Consider the following test case:

- create a primary
- create standby #1 from the primary
- start standby #1 and promote it
- take a backup from the primary using -Xnone to create standby #2
- clear primary_conninfo on standby #2 and then start it
- copy 00000002.history from standby #1 to standby #2

You get:

2021-11-17 15:34:26.213 EST [7474] LOG: selected new timeline ID: 2

But with the attached patch, you get:

2021-11-17 16:12:01.566 EST [20900] LOG: new timeline 2 forked off
current database system timeline 1 before current recovery point
0/A000060

Had the problem occurred at some later point in the WAL stream rather
than before fetching the very first record, I think everything is
fine; at that point, I think that the global variable EndRecPtr will
be initialized. I'm not entirely sure that it contains exactly the
right value, but it's someplace in the right ballpark, at least.

In XLogPageRead(), the problem is just cosmetic. We're only using
EndRecPtr as an argument to emode_for_corrupt_record(), which is all
about suppressing duplicate complaints about the same LSN. But if the
xlogreader has been repositioned using XLogBeginRead() since the last
call to ReadRecord(), or if there are no preceding calls to
ReadRecord(), then the value of EndRecPtr here is left over from the
previous read position and is not particularly related to the record
we're reading now. xlogreader->EndRecPtr, OTOH, is. This doesn't seem
worth a separate commit to me, or a back-patch, but it seems worth
fixing while I'm cleaning up these global variables.

Review appreciated.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-xlog.c-Remove-global-variables-ReadRecPtr-and-End.patchapplication/octet-stream; name=v1-0001-xlog.c-Remove-global-variables-ReadRecPtr-and-End.patchDownload

From cc87a8093c33f5955de049bbd95a309b43c581a0 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 17 Nov 2021 16:58:11 -0500
Subject: [PATCH v1] xlog.c: Remove global variables ReadRecPtr and EndRecPtr.

---
 src/backend/access/transam/xlog.c | 70 +++++++++++++++----------------
 1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1616448368..dd8fd57964 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -850,10 +850,6 @@ typedef struct XLogPageReadPrivate
 static TimestampTz XLogReceiptTime = 0;
 static XLogSource XLogReceiptSource = XLOG_FROM_ANY;
 
-/* State information for XLOG reading */
-static XLogRecPtr ReadRecPtr;	/* start of last record read */
-static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
-
 /*
  * Local copies of equivalent fields in the control file.  When running
  * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
@@ -924,7 +920,8 @@ static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 						 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 										bool fetching_ckpt, XLogRecPtr tliRecPtr,
-										TimeLineID replayTLI);
+										TimeLineID replayTLI,
+										XLogRecPtr EndRecPtr);
 static void XLogShutdownWalRcv(void);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
@@ -946,7 +943,8 @@ static bool PerformRecoveryXLogAction(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report,
 										TimeLineID replayTLI);
-static bool rescanLatestTimeLine(TimeLineID replayTLI);
+static bool rescanLatestTimeLine(TimeLineID replayTLI,
+								 XLogRecPtr EndRecPtr);
 static void InitControlFile(uint64 sysidentifier);
 static void WriteControlFile(void);
 static void ReadControlFile(void);
@@ -4473,8 +4471,6 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 		char	   *errormsg;
 
 		record = XLogReadRecord(xlogreader, &errormsg);
-		ReadRecPtr = xlogreader->ReadRecPtr;
-		EndRecPtr = xlogreader->EndRecPtr;
 		if (record == NULL)
 		{
 			/*
@@ -4503,7 +4499,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			 * shouldn't loop anymore in that case.
 			 */
 			if (errormsg)
-				ereport(emode_for_corrupt_record(emode, EndRecPtr),
+				ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
 						(errmsg_internal("%s", errormsg) /* already translated */ ));
 		}
 
@@ -4521,7 +4517,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 									   wal_segment_size);
 			XLogFileName(fname, xlogreader->seg.ws_tli, segno,
 						 wal_segment_size);
-			ereport(emode_for_corrupt_record(emode, EndRecPtr),
+			ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
 					(errmsg("unexpected timeline ID %u in log segment %s, offset %u",
 							xlogreader->latestPageTLI,
 							fname,
@@ -4563,9 +4559,9 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 				/* initialize minRecoveryPoint to this record */
 				LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 				ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
-				if (ControlFile->minRecoveryPoint < EndRecPtr)
+				if (ControlFile->minRecoveryPoint < xlogreader->EndRecPtr)
 				{
-					ControlFile->minRecoveryPoint = EndRecPtr;
+					ControlFile->minRecoveryPoint = xlogreader->EndRecPtr;
 					ControlFile->minRecoveryPointTLI = replayTLI;
 				}
 				/* update local copy */
@@ -4620,7 +4616,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
  * one and returns 'true'.
  */
 static bool
-rescanLatestTimeLine(TimeLineID replayTLI)
+rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr EndRecPtr)
 {
 	List	   *newExpectedTLEs;
 	bool		found;
@@ -7498,7 +7494,7 @@ StartupXLOG(void)
 		if (checkPoint.redo < RecPtr)
 			XLogCtl->replayEndRecPtr = checkPoint.redo;
 		else
-			XLogCtl->replayEndRecPtr = EndRecPtr;
+			XLogCtl->replayEndRecPtr = xlogreader->EndRecPtr;
 		XLogCtl->replayEndTLI = replayTLI;
 		XLogCtl->lastReplayedEndRecPtr = XLogCtl->replayEndRecPtr;
 		XLogCtl->lastReplayedTLI = XLogCtl->replayEndTLI;
@@ -7554,7 +7550,7 @@ StartupXLOG(void)
 
 			ereport(LOG,
 					(errmsg("redo starts at %X/%X",
-							LSN_FORMAT_ARGS(ReadRecPtr))));
+							LSN_FORMAT_ARGS(xlogreader->ReadRecPtr))));
 
 			/* Prepare to report progress of the redo phase. */
 			if (!StandbyMode)
@@ -7569,7 +7565,7 @@ StartupXLOG(void)
 
 				if (!StandbyMode)
 					ereport_startup_progress("redo in progress, elapsed time: %ld.%02d s, current LSN: %X/%X",
-											 LSN_FORMAT_ARGS(ReadRecPtr));
+											 LSN_FORMAT_ARGS(xlogreader->ReadRecPtr));
 
 #ifdef WAL_DEBUG
 				if (XLOG_DEBUG ||
@@ -7683,7 +7679,8 @@ StartupXLOG(void)
 					if (newReplayTLI != replayTLI)
 					{
 						/* Check that it's OK to switch to this TLI */
-						checkTimeLineSwitch(EndRecPtr, newReplayTLI,
+						checkTimeLineSwitch(xlogreader->EndRecPtr,
+											newReplayTLI,
 											prevReplayTLI, replayTLI);
 
 						/* Following WAL records should be run with new TLI */
@@ -7697,7 +7694,7 @@ StartupXLOG(void)
 				 * so that XLogFlush will update minRecoveryPoint correctly.
 				 */
 				SpinLockAcquire(&XLogCtl->info_lck);
-				XLogCtl->replayEndRecPtr = EndRecPtr;
+				XLogCtl->replayEndRecPtr = xlogreader->EndRecPtr;
 				XLogCtl->replayEndTLI = replayTLI;
 				SpinLockRelease(&XLogCtl->info_lck);
 
@@ -7729,7 +7726,7 @@ StartupXLOG(void)
 				 * successfully replayed.
 				 */
 				SpinLockAcquire(&XLogCtl->info_lck);
-				XLogCtl->lastReplayedEndRecPtr = EndRecPtr;
+				XLogCtl->lastReplayedEndRecPtr = xlogreader->EndRecPtr;
 				XLogCtl->lastReplayedTLI = replayTLI;
 				SpinLockRelease(&XLogCtl->info_lck);
 
@@ -7745,7 +7742,7 @@ StartupXLOG(void)
 				}
 
 				/* Remember this record as the last-applied one */
-				LastRec = ReadRecPtr;
+				LastRec = xlogreader->ReadRecPtr;
 
 				/* Allow read-only connections if we're consistent now */
 				CheckRecoveryConsistency();
@@ -7758,7 +7755,7 @@ StartupXLOG(void)
 					 * (possibly bogus) future WAL segments on the old
 					 * timeline.
 					 */
-					RemoveNonParentXlogFiles(EndRecPtr, replayTLI);
+					RemoveNonParentXlogFiles(xlogreader->EndRecPtr, replayTLI);
 
 					/*
 					 * Wake up any walsenders to notice that we are on a new
@@ -7826,7 +7823,7 @@ StartupXLOG(void)
 
 			ereport(LOG,
 					(errmsg("redo done at %X/%X system usage: %s",
-							LSN_FORMAT_ARGS(ReadRecPtr),
+							LSN_FORMAT_ARGS(xlogreader->ReadRecPtr),
 							pg_rusage_show(&ru0))));
 			xtime = GetLatestXTime();
 			if (xtime)
@@ -7901,7 +7898,7 @@ StartupXLOG(void)
 	 */
 	XLogBeginRead(xlogreader, LastRec);
 	record = ReadRecord(xlogreader, PANIC, false, replayTLI);
-	EndOfLog = EndRecPtr;
+	EndOfLog = xlogreader->EndRecPtr;
 
 	/*
 	 * EndOfLogTLI is the TLI in the filename of the XLOG segment containing
@@ -8010,7 +8007,7 @@ StartupXLOG(void)
 		 * between here and writing the end-of-recovery record.
 		 */
 		writeTimeLineHistory(newTLI, recoveryTargetTLI,
-							 EndRecPtr, reason);
+							 xlogreader->EndRecPtr, reason);
 
 		/*
 		 * Since there might be a partial WAL segment named RECOVERYXLOG, get
@@ -9644,7 +9641,7 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
  * startup process.)
  */
 static void
-RecoveryRestartPoint(const CheckPoint *checkPoint)
+RecoveryRestartPoint(const CheckPoint *checkPoint, XLogReaderState *record)
 {
 	/*
 	 * Also refrain from creating a restartpoint if we have seen any
@@ -9667,8 +9664,8 @@ RecoveryRestartPoint(const CheckPoint *checkPoint)
 	 * work out the next time it wants to perform a restartpoint.
 	 */
 	SpinLockAcquire(&XLogCtl->info_lck);
-	XLogCtl->lastCheckPointRecPtr = ReadRecPtr;
-	XLogCtl->lastCheckPointEndPtr = EndRecPtr;
+	XLogCtl->lastCheckPointRecPtr = record->ReadRecPtr;
+	XLogCtl->lastCheckPointEndPtr = record->EndRecPtr;
 	XLogCtl->lastCheckPoint = *checkPoint;
 	SpinLockRelease(&XLogCtl->info_lck);
 }
@@ -10464,7 +10461,7 @@ xlog_redo(XLogReaderState *record)
 					(errmsg("unexpected timeline ID %u (should be %u) in checkpoint record",
 							checkPoint.ThisTimeLineID, replayTLI)));
 
-		RecoveryRestartPoint(&checkPoint);
+		RecoveryRestartPoint(&checkPoint, record);
 	}
 	else if (info == XLOG_CHECKPOINT_ONLINE)
 	{
@@ -10520,7 +10517,7 @@ xlog_redo(XLogReaderState *record)
 					(errmsg("unexpected timeline ID %u (should be %u) in checkpoint record",
 							checkPoint.ThisTimeLineID, replayTLI)));
 
-		RecoveryRestartPoint(&checkPoint);
+		RecoveryRestartPoint(&checkPoint, record);
 	}
 	else if (info == XLOG_OVERWRITE_CONTRECORD)
 	{
@@ -10686,8 +10683,8 @@ xlog_redo(XLogReaderState *record)
 		if (!fpw)
 		{
 			SpinLockAcquire(&XLogCtl->info_lck);
-			if (XLogCtl->lastFpwDisableRecPtr < ReadRecPtr)
-				XLogCtl->lastFpwDisableRecPtr = ReadRecPtr;
+			if (XLogCtl->lastFpwDisableRecPtr < record->ReadRecPtr)
+				XLogCtl->lastFpwDisableRecPtr = record->ReadRecPtr;
 			SpinLockRelease(&XLogCtl->info_lck);
 		}
 
@@ -12471,7 +12468,8 @@ retry:
 										 private->randAccess,
 										 private->fetching_ckpt,
 										 targetRecPtr,
-										 private->replayTLI))
+										 private->replayTLI,
+										 xlogreader->EndRecPtr))
 		{
 			if (readFile >= 0)
 				close(readFile);
@@ -12581,7 +12579,7 @@ retry:
 		 * errmsg_internal() because the message was already translated.
 		 */
 		if (xlogreader->errormsg_buf[0])
-			ereport(emode_for_corrupt_record(emode, EndRecPtr),
+			ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
 					(errmsg_internal("%s", xlogreader->errormsg_buf)));
 
 		/* reset any error XLogReaderValidatePageHeader() might have set */
@@ -12636,7 +12634,7 @@ next_record_is_invalid:
 static bool
 WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 							bool fetching_ckpt, XLogRecPtr tliRecPtr,
-							TimeLineID replayTLI)
+							TimeLineID replayTLI, XLogRecPtr EndRecPtr)
 {
 	static TimestampTz last_fail_time = 0;
 	TimestampTz now;
@@ -12759,7 +12757,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					 */
 					if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_LATEST)
 					{
-						if (rescanLatestTimeLine(replayTLI))
+						if (rescanLatestTimeLine(replayTLI, EndRecPtr))
 						{
 							currentSource = XLOG_FROM_ARCHIVE;
 							break;
@@ -12886,7 +12884,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						 */
 						if (recoveryTargetTimeLineGoal ==
 							RECOVERY_TARGET_TIMELINE_LATEST)
-							rescanLatestTimeLine(replayTLI);
+							rescanLatestTimeLine(replayTLI, EndRecPtr);
 
 						startWalReceiver = true;
 					}
-- 
2.24.3 (Apple Git-128)

Amul Sul

sulamul@gmail.com

about 4 years ago

In reply to: Robert Haas (#1)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Thu, Nov 18, 2021 at 3:31 AM Robert Haas <robertmhaas@gmail.com> wrote:

I spent a lot of time trying to figure out why xlog.c has global
variables ReadRecPtr and EndRecPtr instead of just relying on the
eponymous structure members inside the XLogReaderState. I concluded
that the values are the same at most points in the code, and thus that
we could just use xlogreaderstate->{Read,End}RecPtr instead. There are
two places where this wouldn't produce the same results we're getting
today. Both of those appear to be bugs.

The reason why it's generally the case that ReadRecPtr ==
xlogreaderstate->ReadRecPtr and likewise for EndRecPtr is that
ReadRecord() calls XLogReadRecord(), which sets the values inside the
XLogReaderState, and then immediately assigns the values stored there
to the global variables. There's no other code that changes the other
global variables, and the only other code that changes the structure
members is XLogBeginRead(). So the values can be unequal from the time
XLogBeginRead() is called up until the time that the XLogReadRecord()
call inside ReadRecord() returns. In practice, StartupXLOG() always
calls ReadRecord() right after it calls XLogBeginRead(), and
ReadRecord() does not reference either global variable before calling
XLogReadRecord(), so the problem surface is limited to code that runs
underneath XLogReadRecord(). XLogReadRecord() is part of xlogreader.c,
but it uses a callback interface: the callback is XLogPageRead(),
which itself references EndRecPtr, and also calls
WaitForWALToBecomeAvailable(), which in turn calls
rescanLatestTimeLine(), which also references EndRecPtr. So these are
the two problem cases: XLogPageRead(), and rescanLatestTimeLine().

In rescanLatestTimeLine(), the problem is IMHO probably serious enough
to justify a separate commit with back-patching. The problem is that
EndRecPtr is being used here to reject impermissible attempts to
switch to a bad timeline, but if pg_wal starts out empty, EndRecPtr
will be 0 here, which causes the code to fail to detect a case that
should be prohibited. Consider the following test case:

- create a primary
- create standby #1 from the primary
- start standby #1 and promote it
- take a backup from the primary using -Xnone to create standby #2
- clear primary_conninfo on standby #2 and then start it
- copy 00000002.history from standby #1 to standby #2

You get:

2021-11-17 15:34:26.213 EST [7474] LOG: selected new timeline ID: 2

But with the attached patch, you get:

2021-11-17 16:12:01.566 EST [20900] LOG: new timeline 2 forked off
current database system timeline 1 before current recovery point
0/A000060

Somehow with and without patch I am getting the same log.

Had the problem occurred at some later point in the WAL stream rather
than before fetching the very first record, I think everything is
fine; at that point, I think that the global variable EndRecPtr will
be initialized. I'm not entirely sure that it contains exactly the
right value, but it's someplace in the right ballpark, at least.

Agree, change seems pretty much reasonable.

In XLogPageRead(), the problem is just cosmetic. We're only using
EndRecPtr as an argument to emode_for_corrupt_record(), which is all
about suppressing duplicate complaints about the same LSN. But if the
xlogreader has been repositioned using XLogBeginRead() since the last
call to ReadRecord(), or if there are no preceding calls to
ReadRecord(), then the value of EndRecPtr here is left over from the
previous read position and is not particularly related to the record
we're reading now. xlogreader->EndRecPtr, OTOH, is. This doesn't seem
worth a separate commit to me, or a back-patch, but it seems worth
fixing while I'm cleaning up these global variables.

LGTM.

Regards,
Amul

Robert Haas

robertmhaas@gmail.com

about 4 years ago

In reply to: Amul Sul (#2)

2 attachment(s)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Thu, Nov 18, 2021 at 7:30 AM Amul Sul <sulamul@gmail.com> wrote:

Somehow with and without patch I am getting the same log.

Try applying the attached 0001-dubious-test-cast.patch for you and see
if that fails. It does for me. If so, then try applying
0002-fix-the-bug.patch and see if that makes it pass.

Unfortunately, this test case isn't remotely committable as-is, and I
don't know how to make it so. The main problem is that, although you
can start up a server with nothing in pg_wal, no restore_command, and
no archive command, pg_ctl will not believe that it has started. I
worked around that problem by telling pg_ctl to ignore failures, but
it still waits for a long time before timing out, which sucks both
because (1) hackers are impatient and (2) some hackers run extremely
slow buildfarm machines where almost any timeout won't be long enough.
There's a second place where the patch needs to wait for something
also, and that one I've crudely kludged with sleep(10). If anybody
around here who is good at figuring out how to write clever TAP tests
can tell me how to fix this test to be non-stupid, I will happily do
so.

Otherwise, I think I will just need to commit and back-patch the
actual bug fix without a test, and then rebase the rest of the patch I
posted previously over top of those changes for master only.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

0001-dubious-test-case.patchapplication/octet-stream; name=0001-dubious-test-case.patchDownload

From 465ef4d1017171f80d320e56cced8539c3e896eb Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 18 Nov 2021 13:32:30 -0500
Subject: [PATCH 1/2] dubious test case

---
 src/test/recovery/t/027_tli_crosscheck.pl | 52 +++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 src/test/recovery/t/027_tli_crosscheck.pl

diff --git a/src/test/recovery/t/027_tli_crosscheck.pl b/src/test/recovery/t/027_tli_crosscheck.pl
new file mode 100644
index 0000000000..79ca1946fe
--- /dev/null
+++ b/src/test/recovery/t/027_tli_crosscheck.pl
@@ -0,0 +1,52 @@
+# Copyright (c) 2021, PostgreSQL Global Development Group
+
+# Test whether the appearance of a new timeline history file while a
+# standby is busy trying to read the very first checkpoint record is
+# properly handled. In this scenario, the timeline branches off before
+# the replay LSN, so the new timeline should be rejected.
+use strict;
+use warnings;
+use File::Copy;
+use FindBin;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 1;
+
+# initialize and start a primary node
+my $alpha = PostgreSQL::Test::Cluster->new('alpha');
+$alpha->init(allows_streaming => 1);
+$alpha->start;
+
+# create a standby, start it up, and promote it to make a second primary
+$alpha->backup('bravo_backup');
+my $bravo = PostgreSQL::Test::Cluster->new('bravo');
+$bravo->init_from_backup($alpha, 'bravo_backup', has_streaming => 1,
+						 has_archiving => 1);
+$bravo->start;
+$bravo->promote;
+
+# setup a second standby, this time with no WAL, no primary conninfo,
+# and no restore_command -- but make sure it starts as a standby all the same.
+$alpha->backup('charlie_backup', 'backup_options' => [ '-Xnone' ]);
+my $charlie = PostgreSQL::Test::Cluster->new('charlie');
+$charlie->init_from_backup($alpha, 'charlie_backup');
+$charlie->set_standby_mode();
+
+# XXX. the start() method won't think that the server has started even when it
+# has, so pretend like faillures are ok
+$charlie->start('fail_ok' => 1);
+
+# copy the timeline history file from the wrong primary to the new standby
+my $tlifilepath = '/pg_wal/00000002.history';
+copy($bravo->data_dir . $tlifilepath, $charlie->data_dir . $tlifilepath)
+	|| die "copy: $!";
+
+# XXX. wait for an amount of time that will be excessive on most buildfarm
+# machines and too short on a few, so that we get random test failures that
+# drive everybody crazy while simultaneously driving up the test runtime
+sleep(10);
+
+# check the logfile
+my $logfile = slurp_file($charlie->logfile());
+ok($logfile =~ qr/new timeline 2 forked off current database system timeline 1 before current recovery point/,
+   "timeline 2 rejected");
-- 
2.24.3 (Apple Git-128)

0002-fix-the-bug.patchapplication/octet-stream; name=0002-fix-the-bug.patchDownload

From 21e8623cc5d63e79feb74f85ed53c0028c506611 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 18 Nov 2021 14:00:43 -0500
Subject: [PATCH 2/2] fix the bug

---
 src/backend/access/transam/xlog.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1616448368..c4b822d98e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -924,7 +924,8 @@ static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 						 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 										bool fetching_ckpt, XLogRecPtr tliRecPtr,
-										TimeLineID replayTLI);
+										TimeLineID replayTLI,
+										XLogRecPtr replayLSN);
 static void XLogShutdownWalRcv(void);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
@@ -946,7 +947,8 @@ static bool PerformRecoveryXLogAction(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report,
 										TimeLineID replayTLI);
-static bool rescanLatestTimeLine(TimeLineID replayTLI);
+static bool rescanLatestTimeLine(TimeLineID replayTLI,
+								 XLogRecPtr replayLSN);
 static void InitControlFile(uint64 sysidentifier);
 static void WriteControlFile(void);
 static void ReadControlFile(void);
@@ -4620,7 +4622,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
  * one and returns 'true'.
  */
 static bool
-rescanLatestTimeLine(TimeLineID replayTLI)
+rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr replayLSN)
 {
 	List	   *newExpectedTLEs;
 	bool		found;
@@ -4671,13 +4673,13 @@ rescanLatestTimeLine(TimeLineID replayTLI)
 	 * next timeline was forked off from it *after* the current recovery
 	 * location.
 	 */
-	if (currentTle->end < EndRecPtr)
+	if (currentTle->end < replayLSN)
 	{
 		ereport(LOG,
 				(errmsg("new timeline %u forked off current database system timeline %u before current recovery point %X/%X",
 						newtarget,
 						replayTLI,
-						LSN_FORMAT_ARGS(EndRecPtr))));
+						LSN_FORMAT_ARGS(replayLSN))));
 		return false;
 	}
 
@@ -12471,7 +12473,8 @@ retry:
 										 private->randAccess,
 										 private->fetching_ckpt,
 										 targetRecPtr,
-										 private->replayTLI))
+										 private->replayTLI,
+										 xlogreader->EndRecPtr))
 		{
 			if (readFile >= 0)
 				close(readFile);
@@ -12624,6 +12627,10 @@ next_record_is_invalid:
  * 'tliRecPtr' is the position of the WAL record we're interested in. It is
  * used to decide which timeline to stream the requested WAL from.
  *
+ * 'replayLSN' is the current replay LSN, so that if we scan for new
+ * timelines, we can reject a switch to a timeline that branched off before
+ * this point.
+ *
  * If the record is not immediately available, the function returns false
  * if we're not in standby mode. In standby mode, waits for it to become
  * available.
@@ -12636,7 +12643,7 @@ next_record_is_invalid:
 static bool
 WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 							bool fetching_ckpt, XLogRecPtr tliRecPtr,
-							TimeLineID replayTLI)
+							TimeLineID replayTLI, XLogRecPtr replayLSN)
 {
 	static TimestampTz last_fail_time = 0;
 	TimestampTz now;
@@ -12759,7 +12766,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					 */
 					if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_LATEST)
 					{
-						if (rescanLatestTimeLine(replayTLI))
+						if (rescanLatestTimeLine(replayTLI, replayLSN))
 						{
 							currentSource = XLOG_FROM_ARCHIVE;
 							break;
@@ -12886,7 +12893,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						 */
 						if (recoveryTargetTimeLineGoal ==
 							RECOVERY_TARGET_TIMELINE_LATEST)
-							rescanLatestTimeLine(replayTLI);
+							rescanLatestTimeLine(replayTLI, replayLSN);
 
 						startWalReceiver = true;
 					}
-- 
2.24.3 (Apple Git-128)

Alvaro Herrera

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Robert Haas (#3)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On 2021-Nov-18, Robert Haas wrote:

Unfortunately, this test case isn't remotely committable as-is, and I
don't know how to make it so. The main problem is that, although you
can start up a server with nothing in pg_wal, no restore_command, and
no archive command, pg_ctl will not believe that it has started.

Would it work to start postmaster directly instad of using pg_ctl, and
then rely on (say) pg_isready?

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")

Robert Haas

robertmhaas@gmail.com

about 4 years ago

In reply to: Alvaro Herrera (#4)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Thu, Nov 18, 2021 at 2:21 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Would it work to start postmaster directly instad of using pg_ctl, and
then rely on (say) pg_isready?

I *think* that pg_isready would also fail, because the documentation
says "pg_isready returns 0 to the shell if the server is accepting
connections normally, 1 if the server is rejecting connections (for
example during startup) ..." and I believe the "during startup" case
would apply here.

Starting postmaster directly is a thought. Is there any existing
precedent for that approach?

--
Robert Haas
EDB: http://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Robert Haas (#3)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

Robert Haas <robertmhaas@gmail.com> writes:

There's a second place where the patch needs to wait for something
also, and that one I've crudely kludged with sleep(10). If anybody
around here who is good at figuring out how to write clever TAP tests
can tell me how to fix this test to be non-stupid, I will happily do
so.

As far as that goes, if you conceptualize it as "wait for this text
to appear in the log file", there's prior art in existing TAP tests.
Basically, sleep for some reasonable short period and check the
log file; if not there, repeat until timeout.

I'm a little dubious that this test case is valuable enough to
mess around with a nonstandard postmaster startup protocol, though.
The main reason I dislike that idea is that any fixes we apply to
the TAP tests' normal postmaster-startup code would almost inevitably
miss fixing this test. IIRC there have been security-related fixes in
that area (e.g. where do we put the postmaster's socket), so I find
that prospect pretty scary.

regards, tom lane

Robert Haas

robertmhaas@gmail.com

about 4 years ago

In reply to: Tom Lane (#6)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Thu, Nov 18, 2021 at 3:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

There's a second place where the patch needs to wait for something
also, and that one I've crudely kludged with sleep(10). If anybody
around here who is good at figuring out how to write clever TAP tests
can tell me how to fix this test to be non-stupid, I will happily do
so.

As far as that goes, if you conceptualize it as "wait for this text
to appear in the log file", there's prior art in existing TAP tests.
Basically, sleep for some reasonable short period and check the
log file; if not there, repeat until timeout.

Yeah, there's something like that in the form of find_in_log in
019_replslot_init.pl. I thought about copying that, but that didn't
seem great, and I also thought about trying to move into a common
module, which seems maybe better but also more work, and thus not
worth doing unless we have agreement that it's what we should do.

I'm a little dubious that this test case is valuable enough to
mess around with a nonstandard postmaster startup protocol, though.
The main reason I dislike that idea is that any fixes we apply to
the TAP tests' normal postmaster-startup code would almost inevitably
miss fixing this test. IIRC there have been security-related fixes in
that area (e.g. where do we put the postmaster's socket), so I find
that prospect pretty scary.

The problem that I have with the present situation is that the test
coverage of xlog.c is pretty abysmal. It actually doesn't look bad if
you just run a coverage report, but there are a shazillion flag
variables in that file and bugs like this make it quite clear that we
don't come close to testing all the possible combinations. It's really
borderline unmaintainable. I don't know whether there's a specific
individual who wrote most of this code and didn't get the memo that
global variables are best avoided, or whether this is sort of case
where we started over 1 or 2 and then it just gradually ballooned into
the giant mess that it now is, but the present situation is pretty
outrageous. It's taking me weeks of time to figure out how to make
changes that would normally take days, or maybe hours. We clearly need
to try both to get the number of cases under control by eliminating
stupid variables that are almost but not quite the same as something
else, and also get proper test coverage for the things that remain so
that it's possible to modify the code without excesive risk of
shooting ourselves in the foot.

That said, I'm not wedded to this particular test case, either. It's
an extremely specific bug that is unlikely to reappear once squashed,
and making the test case robust enough to avoid having the buildfarm
complain seems fraught.

--
Robert Haas
EDB: http://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Robert Haas (#7)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Nov 18, 2021 at 3:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm a little dubious that this test case is valuable enough to
mess around with a nonstandard postmaster startup protocol, though.

The problem that I have with the present situation is that the test
coverage of xlog.c is pretty abysmal.

Agreed, but this one test case isn't going to move the needle much.
To get to reasonable coverage we're going to need more tests, and
I definitely don't want multiple versions of ad-hoc postmaster startup
code. If we need that, it'd be smarter to extend Cluster.pm so that
the mainline code could do what's needful.

Having said that, it wasn't entirely clear to me why you needed a
weird startup method. Why couldn't you drop the bogus history file
into place *before* starting the charlie postmaster? If you have
to do it after, aren't there race/timing problems anyway?

regards, tom lane

Robert Haas

robertmhaas@gmail.com

about 4 years ago

In reply to: Tom Lane (#8)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Thu, Nov 18, 2021 at 4:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Nov 18, 2021 at 3:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm a little dubious that this test case is valuable enough to
mess around with a nonstandard postmaster startup protocol, though.

The problem that I have with the present situation is that the test
coverage of xlog.c is pretty abysmal.

Agreed, but this one test case isn't going to move the needle much.
To get to reasonable coverage we're going to need more tests, and
I definitely don't want multiple versions of ad-hoc postmaster startup
code. If we need that, it'd be smarter to extend Cluster.pm so that
the mainline code could do what's needful.

Perhaps so. I don't have a clear view on what a full set of good tests
would look like, so it's hard for me to guess which needs are general
and which are not.

Having said that, it wasn't entirely clear to me why you needed a
weird startup method. Why couldn't you drop the bogus history file
into place *before* starting the charlie postmaster? If you have
to do it after, aren't there race/timing problems anyway?

Well, I need rescanLatestTimeLine() to be called. I'm not sure that
will happen if the file is there originally -- that sounds like more
of a scan than a rescan, but I haven't poked at that angle. I also
think it doesn't matter whether the file is dropped in or whether it
is restored via restore_command, so having the server restore the file
rather than discover that it is appeared might be another and more
satisfying option, but I also have not tested whether that reproduces
the issue. This has been extremely time-consuming to hunt down.

--
Robert Haas
EDB: http://www.enterprisedb.com

#10

Alvaro Herrera

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Robert Haas (#5)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On 2021-Nov-18, Robert Haas wrote:

On Thu, Nov 18, 2021 at 2:21 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Would it work to start postmaster directly instad of using pg_ctl, and
then rely on (say) pg_isready?

I *think* that pg_isready would also fail, because the documentation
says "pg_isready returns 0 to the shell if the server is accepting
connections normally, 1 if the server is rejecting connections (for
example during startup) ..." and I believe the "during startup" case
would apply here.

Hmm, right ... I suppose there are other ways to check, but I'm not sure
that the value of adding this particular test is large enough to justify
such hacks.

I think one possibly useful technique might be Alexander Korotkov's stop
events[1]/messages/by-id/CAPpHfdtSEOHX8dSk9Qp+Z++i4BGQoffKip6JDWngEA+g7Z-XmQ@mail.gmail.com, except that it is designed around having working SQL access
to the server in order to control it. You'd need some frosting on top
in order to control the startup sequence without SQL access.

Starting postmaster directly is a thought. Is there any existing
precedent for that approach?

Not as far as I can see.

[1]: /messages/by-id/CAPpHfdtSEOHX8dSk9Qp+Z++i4BGQoffKip6JDWngEA+g7Z-XmQ@mail.gmail.com

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"Entristecido, Wutra (canción de Las Barreras)
echa a Freyr a rodar
y a nosotros al mar"

#11

Amul Sul

sulamul@gmail.com

about 4 years ago

In reply to: Robert Haas (#3)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Fri, Nov 19, 2021 at 12:43 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 18, 2021 at 7:30 AM Amul Sul <sulamul@gmail.com> wrote:

Somehow with and without patch I am getting the same log.

Try applying the attached 0001-dubious-test-cast.patch for you and see
if that fails. It does for me. If so, then try applying
0002-fix-the-bug.patch and see if that makes it pass.

Thanks, I can see the reported behavior -- 0001 alone fails & 0002
corrects that.

Unfortunately, this test case isn't remotely committable as-is, and I
don't know how to make it so. The main problem is that, although you
can start up a server with nothing in pg_wal, no restore_command, and
no archive command, pg_ctl will not believe that it has started. I
worked around that problem by telling pg_ctl to ignore failures, but
it still waits for a long time before timing out, which sucks both
because (1) hackers are impatient and (2) some hackers run extremely
slow buildfarm machines where almost any timeout won't be long enough.

Yeah :(

There's a second place where the patch needs to wait for something
also, and that one I've crudely kludged with sleep(10). If anybody
around here who is good at figuring out how to write clever TAP tests
can tell me how to fix this test to be non-stupid, I will happily do
so.

Otherwise, I think I will just need to commit and back-patch the
actual bug fix without a test, and then rebase the rest of the patch I
posted previously over top of those changes for master only.

+1.

Regards,
Amul

#12

Robert Haas

robertmhaas@gmail.com

about 4 years ago

In reply to: Robert Haas (#1)

3 attachment(s)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Wed, Nov 17, 2021 at 5:01 PM Robert Haas <robertmhaas@gmail.com> wrote:

In rescanLatestTimeLine(), the problem is IMHO probably serious enough
to justify a separate commit with back-patching.

On second thought, I think it's better not to back-patch this fix. It
turns out that, while it's easy enough to back-patch the proposed
patch, it doesn't make the test case pass in pre-v14 versions. The
test case itself requires some changes to work at all because of the
perl module renaming, but that's not a big deal. The issue is that, in
the back-branches, when starting up the server without any local WAL,
rescanLatestTimeLine() is checking with not only the wrong LSN but
also with the wrong TLI. That got fixed in master by commit
4a92a1c3d1c361ffb031ed05bf65b801241d7cdd even though, rather
unfortunately, the commit message does not say so. So to back-patch
this commit we would need to also back-patch much of that commit. But
that commit depends on the other commits that reduced use of
ThisTimeLineID. Untangling that seems like an undue amount of work and
risk for a corner-case bug fix that was discovered in the lab rather
than in the field and which won't matter anyway if you do things
correctly. So now I'm intending to commit to just to master only.

Attached please find the test case not for commit as
v2-0001-dubious-test-case.patch; the fix, for commit and now with a
proper commit message as
v2-0002-Fix-corner-case-failure-to-detect-improper-timeli.patch; and
the back-ported test case for v14 as v14-0001-dubious-test-case.patch
in case anyone wants to play around with that. (with apologies for
using the idea of a version number in two different and conflicting
senses)

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-dubious-test-case.patchapplication/octet-stream; name=v2-0001-dubious-test-case.patchDownload

From 465ef4d1017171f80d320e56cced8539c3e896eb Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 18 Nov 2021 13:32:30 -0500
Subject: [PATCH v2 1/2] dubious test case

---
 src/test/recovery/t/027_tli_crosscheck.pl | 52 +++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 src/test/recovery/t/027_tli_crosscheck.pl

diff --git a/src/test/recovery/t/027_tli_crosscheck.pl b/src/test/recovery/t/027_tli_crosscheck.pl
new file mode 100644
index 0000000000..79ca1946fe
--- /dev/null
+++ b/src/test/recovery/t/027_tli_crosscheck.pl
@@ -0,0 +1,52 @@
+# Copyright (c) 2021, PostgreSQL Global Development Group
+
+# Test whether the appearance of a new timeline history file while a
+# standby is busy trying to read the very first checkpoint record is
+# properly handled. In this scenario, the timeline branches off before
+# the replay LSN, so the new timeline should be rejected.
+use strict;
+use warnings;
+use File::Copy;
+use FindBin;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 1;
+
+# initialize and start a primary node
+my $alpha = PostgreSQL::Test::Cluster->new('alpha');
+$alpha->init(allows_streaming => 1);
+$alpha->start;
+
+# create a standby, start it up, and promote it to make a second primary
+$alpha->backup('bravo_backup');
+my $bravo = PostgreSQL::Test::Cluster->new('bravo');
+$bravo->init_from_backup($alpha, 'bravo_backup', has_streaming => 1,
+						 has_archiving => 1);
+$bravo->start;
+$bravo->promote;
+
+# setup a second standby, this time with no WAL, no primary conninfo,
+# and no restore_command -- but make sure it starts as a standby all the same.
+$alpha->backup('charlie_backup', 'backup_options' => [ '-Xnone' ]);
+my $charlie = PostgreSQL::Test::Cluster->new('charlie');
+$charlie->init_from_backup($alpha, 'charlie_backup');
+$charlie->set_standby_mode();
+
+# XXX. the start() method won't think that the server has started even when it
+# has, so pretend like faillures are ok
+$charlie->start('fail_ok' => 1);
+
+# copy the timeline history file from the wrong primary to the new standby
+my $tlifilepath = '/pg_wal/00000002.history';
+copy($bravo->data_dir . $tlifilepath, $charlie->data_dir . $tlifilepath)
+	|| die "copy: $!";
+
+# XXX. wait for an amount of time that will be excessive on most buildfarm
+# machines and too short on a few, so that we get random test failures that
+# drive everybody crazy while simultaneously driving up the test runtime
+sleep(10);
+
+# check the logfile
+my $logfile = slurp_file($charlie->logfile());
+ok($logfile =~ qr/new timeline 2 forked off current database system timeline 1 before current recovery point/,
+   "timeline 2 rejected");
-- 
2.24.3 (Apple Git-128)

v2-0002-Fix-corner-case-failure-to-detect-improper-timeli.patchapplication/octet-stream; name=v2-0002-Fix-corner-case-failure-to-detect-improper-timeli.patchDownload

From 6cca54db773c07098bce4259c0eac95c67b5ab9e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 23 Nov 2021 11:08:24 -0500
Subject: [PATCH v2 2/2] Fix corner-case failure to detect improper timeline
 switch.

rescanLatestTimeLine() contains a guard against switching to
a timeline that forked off from the current one prior to the
current recovery point, but that guard does not work if the
timeline switch occurs before the first WAL recod (which must
be the checkpoint record) is read. Without this patch, an
improper timeline switch is therefore possible in such cases.

This happens because rescanLatestTimeLine() relies on the global
variable EndRecPtr to understand the current position of WAL
replay. However, EndRecPtr at this point in the code contains
the endpoint of the last-replayed record, not the startpoint or
endpoint of the record being replayed now. Thus, before any
records have been replayed, it's zero, which causes the sanity
check to always pass.

To fix, pass down the correct timeline explicitly. The
EndRecPtr value we want is the one from the xlogreader, which
will be the starting position of the record we're about to
try to read, rather than the global variable, which is the
ending position of the last record we successfully read.
They're usually the same, but not in the corner case described
here.

Patch by me, reviewed by Amul Sul.

Discussion: http://postgr.es/m/CA+Tgmoao96EuNeSPd+hspRKcsCddu=b1h-QNRuKfY8VmfNQdfg@mail.gmail.com
---
 src/backend/access/transam/xlog.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1616448368..c4b822d98e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -924,7 +924,8 @@ static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 						 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 										bool fetching_ckpt, XLogRecPtr tliRecPtr,
-										TimeLineID replayTLI);
+										TimeLineID replayTLI,
+										XLogRecPtr replayLSN);
 static void XLogShutdownWalRcv(void);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
@@ -946,7 +947,8 @@ static bool PerformRecoveryXLogAction(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report,
 										TimeLineID replayTLI);
-static bool rescanLatestTimeLine(TimeLineID replayTLI);
+static bool rescanLatestTimeLine(TimeLineID replayTLI,
+								 XLogRecPtr replayLSN);
 static void InitControlFile(uint64 sysidentifier);
 static void WriteControlFile(void);
 static void ReadControlFile(void);
@@ -4620,7 +4622,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
  * one and returns 'true'.
  */
 static bool
-rescanLatestTimeLine(TimeLineID replayTLI)
+rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr replayLSN)
 {
 	List	   *newExpectedTLEs;
 	bool		found;
@@ -4671,13 +4673,13 @@ rescanLatestTimeLine(TimeLineID replayTLI)
 	 * next timeline was forked off from it *after* the current recovery
 	 * location.
 	 */
-	if (currentTle->end < EndRecPtr)
+	if (currentTle->end < replayLSN)
 	{
 		ereport(LOG,
 				(errmsg("new timeline %u forked off current database system timeline %u before current recovery point %X/%X",
 						newtarget,
 						replayTLI,
-						LSN_FORMAT_ARGS(EndRecPtr))));
+						LSN_FORMAT_ARGS(replayLSN))));
 		return false;
 	}
 
@@ -12471,7 +12473,8 @@ retry:
 										 private->randAccess,
 										 private->fetching_ckpt,
 										 targetRecPtr,
-										 private->replayTLI))
+										 private->replayTLI,
+										 xlogreader->EndRecPtr))
 		{
 			if (readFile >= 0)
 				close(readFile);
@@ -12624,6 +12627,10 @@ next_record_is_invalid:
  * 'tliRecPtr' is the position of the WAL record we're interested in. It is
  * used to decide which timeline to stream the requested WAL from.
  *
+ * 'replayLSN' is the current replay LSN, so that if we scan for new
+ * timelines, we can reject a switch to a timeline that branched off before
+ * this point.
+ *
  * If the record is not immediately available, the function returns false
  * if we're not in standby mode. In standby mode, waits for it to become
  * available.
@@ -12636,7 +12643,7 @@ next_record_is_invalid:
 static bool
 WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 							bool fetching_ckpt, XLogRecPtr tliRecPtr,
-							TimeLineID replayTLI)
+							TimeLineID replayTLI, XLogRecPtr replayLSN)
 {
 	static TimestampTz last_fail_time = 0;
 	TimestampTz now;
@@ -12759,7 +12766,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					 */
 					if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_LATEST)
 					{
-						if (rescanLatestTimeLine(replayTLI))
+						if (rescanLatestTimeLine(replayTLI, replayLSN))
 						{
 							currentSource = XLOG_FROM_ARCHIVE;
 							break;
@@ -12886,7 +12893,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						 */
 						if (recoveryTargetTimeLineGoal ==
 							RECOVERY_TARGET_TIMELINE_LATEST)
-							rescanLatestTimeLine(replayTLI);
+							rescanLatestTimeLine(replayTLI, replayLSN);
 
 						startWalReceiver = true;
 					}
-- 
2.24.3 (Apple Git-128)

v14-0001-dubious-test-case.patchapplication/octet-stream; name=v14-0001-dubious-test-case.patchDownload

From fff9a43a673ce85bffefdc7abdcdd15855b15801 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 18 Nov 2021 13:32:30 -0500
Subject: [PATCH v14 1/2] dubious test case

---
 src/test/recovery/t/027_tli_crosscheck.pl | 52 +++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 src/test/recovery/t/027_tli_crosscheck.pl

diff --git a/src/test/recovery/t/027_tli_crosscheck.pl b/src/test/recovery/t/027_tli_crosscheck.pl
new file mode 100644
index 0000000000..7bd747aa87
--- /dev/null
+++ b/src/test/recovery/t/027_tli_crosscheck.pl
@@ -0,0 +1,52 @@
+# Copyright (c) 2021, PostgreSQL Global Development Group
+
+# Test whether the appearance of a new timeline history file while a
+# standby is busy trying to read the very first checkpoint record is
+# properly handled. In this scenario, the timeline branches off before
+# the replay LSN, so the new timeline should be rejected.
+use strict;
+use warnings;
+use File::Copy;
+use FindBin;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# initialize and start a primary node
+my $alpha = PostgresNode->get_new_node('alpha');
+$alpha->init(allows_streaming => 1);
+$alpha->start;
+
+# create a standby, start it up, and promote it to make a second primary
+$alpha->backup('bravo_backup');
+my $bravo = PostgresNode->get_new_node('bravo');
+$bravo->init_from_backup($alpha, 'bravo_backup', has_streaming => 1,
+						 has_archiving => 1);
+$bravo->start;
+$bravo->promote;
+
+# setup a second standby, this time with no WAL, no primary conninfo,
+# and no restore_command -- but make sure it starts as a standby all the same.
+$alpha->backup('charlie_backup', 'backup_options' => [ '-Xnone' ]);
+my $charlie = PostgresNode->get_new_node('charlie');
+$charlie->init_from_backup($alpha, 'charlie_backup');
+$charlie->set_standby_mode();
+
+# XXX. the start() method won't think that the server has started even when it
+# has, so pretend like faillures are ok
+$charlie->start('fail_ok' => 1);
+
+# copy the timeline history file from the wrong primary to the new standby
+my $tlifilepath = '/pg_wal/00000002.history';
+copy($bravo->data_dir . $tlifilepath, $charlie->data_dir . $tlifilepath)
+	|| die "copy: $!";
+
+# XXX. wait for an amount of time that will be excessive on most buildfarm
+# machines and too short on a few, so that we get random test failures that
+# drive everybody crazy while simultaneously driving up the test runtime
+sleep(10);
+
+# check the logfile
+my $logfile = slurp_file($charlie->logfile());
+ok($logfile =~ qr/new timeline 2 forked off current database system timeline 1 before current recovery point/,
+   "timeline 2 rejected");
-- 
2.24.3 (Apple Git-128)

#13

Robert Haas

robertmhaas@gmail.com

about 4 years ago

In reply to: Robert Haas (#12)

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Tue, Nov 23, 2021 at 1:36 PM Robert Haas <robertmhaas@gmail.com> wrote:

Attached please find the test case not for commit as
v2-0001-dubious-test-case.patch; the fix, for commit and now with a
proper commit message as
v2-0002-Fix-corner-case-failure-to-detect-improper-timeli.patch; and
the back-ported test case for v14 as v14-0001-dubious-test-case.patch
in case anyone wants to play around with that. (with apologies for
using the idea of a version number in two different and conflicting
senses)

OK, I have committed this patch, rebased the original combined patch
over top of that, and committed that too. By my count that means I've
now removed a total of 3 global variables from this file and Amul got
rid of one as well so that makes 4 in total. If we continue at this
brisk pace the code may become understandable and maintainable
sometime prior to the heat death of the universe, which I think would
be rather nice.

--
Robert Haas
EDB: http://www.enterprisedb.com