PANIC during crash recovery of a recently promoted standby

Started by Pavan Deolaseeover 7 years ago21 messages
#1Pavan Deolasee
pavan.deolasee@gmail.com
2 attachment(s)

Hello,

I recently investigated a problem where a standby is promoted to be the new
master. The promoted standby crashes shortly thereafter for whatever
reason. Upon running the crash recovery, the promoted standby (now master)
PANICs with message such as:

PANIC,XX000,"WAL contains references to invalid
pages",,,,,,,,"XLogCheckInvalidPages,
xlogutils.c:242",""

After investigation, I could recreate a reproduction scenario for this
problem. The attached TAP test (thanks Alvaro from converting my bash
script to a TAP test) demonstrates the problem. The test is probably
sensitive to timing, but it reproduces the problem consistently at least at
my end. While the original report was for 9.6, I can reproduce it on the
master and thus it probably affects all supported releases.

Investigations point to a possible bug where we fail to update the
minRecoveryPoint after completing the ongoing restart point upon promotion.
IMV after promotion the new master must always recover to the end of the
WAL to ensure that all changes are applied correctly. But what we've
instead is that minRecoveryPoint remains set to a prior location because of
this:

/*
* Update pg_control, using current time. Check that it still shows
* IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
* this is a quick hack to make sure nothing really bad happens if
somehow
* we get here after the end-of-recovery checkpoint.
*/
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
{
ControlFile->checkPoint = lastCheckPointRecPtr;
ControlFile->checkPointCopy = lastCheckPoint;
ControlFile->time = (pg_time_t) time(NULL);

/*
* Ensure minRecoveryPoint is past the checkpoint record. Normally,
* this will have happened already while writing out dirty buffers,
* but not necessarily - e.g. because no buffers were dirtied. We
do
* this because a non-exclusive base backup uses minRecoveryPoint to
* determine which WAL files must be included in the backup, and the
* file (or files) containing the checkpoint record must be
included,
* at a minimum. Note that for an ordinary restart of recovery
there's
* no value in having the minimum recovery point any earlier than
this
* anyway, because redo will begin just after the checkpoint record.
*/
if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
{
ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
ControlFile->minRecoveryPointTLI =
lastCheckPoint.ThisTimeLineID;

/* update local copy */
minRecoveryPoint = ControlFile->minRecoveryPoint;
minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
}
if (flags & CHECKPOINT_IS_SHUTDOWN)
ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
UpdateControlFile();
}
LWLockRelease(ControlFileLock);

After promotion, the minRecoveryPoint is only updated (cleared) when the
first regular checkpoint completes. If a crash happens before that, we will
run the crash recovery with a stale minRecoveryPoint, which results into
the PANIC that we diagnosed. The test case was written to reproduce the
issue as reported to us. Thus the test case TRUNCATEs and extends the table
at hand after promotion. The crash shortly thereafter leaves the pages in
uninitialised state because the shared buffers are not yet flushed to the
disk.

During crash recovery, we see uninitialised pages for the WAL records
written before the promotion. These pages are remembered and we expect to
either see a DROP TABLE or TRUNCATE WAL record before the minRecoveryPoint
is reached. But since the minRecoveryPoint is still pointing to a WAL
location prior to the TRUNCATE operation, crash recovery hits the
minRecoveryPoint before seeing the TRUNCATE WAL record. That results in a
PANIC situation.

I propose that we should always clear the minRecoveryPoint after promotion
to ensure that crash recovery always run to the end if a just-promoted
standby crashes before completing its first regular checkpoint. A WIP patch
is attached.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-A-new-TAP-test-to-test-a-recovery-bug.patchapplication/octet-stream; name=0001-A-new-TAP-test-to-test-a-recovery-bug.patchDownload
From aa6e9cb102730566ddc1928b9f569d1b29ed7841 Mon Sep 17 00:00:00 2001
From: Pavan Deolasee <pavan.deolasee@gmail.com>
Date: Fri, 2 Mar 2018 17:52:11 +0530
Subject: [PATCH 1/2] A new TAP test to test a recovery bug.

When a standby is promoted to be a master, if the "Minimum recovery ending
location" is left unchanged, a crash in the just promoted standby before the
first checkpoint after promotion is complete, can lead to recovery ending
without references to all invalid pages are resolved. This results in a PANIC
situation.
---
 src/test/recovery/t/014_promotion_bug.pl | 72 ++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)
 create mode 100644 src/test/recovery/t/014_promotion_bug.pl

diff --git a/src/test/recovery/t/014_promotion_bug.pl b/src/test/recovery/t/014_promotion_bug.pl
new file mode 100644
index 0000000000..699de66001
--- /dev/null
+++ b/src/test/recovery/t/014_promotion_bug.pl
@@ -0,0 +1,72 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize master node
+my $alpha = get_new_node('alpha');
+$alpha->init;
+$alpha->append_conf("postgresql.conf", <<EOF);
+max_wal_senders=5
+wal_level=replica
+min_wal_size=1GB
+max_wal_size=16GB
+hot_standby=on
+EOF
+
+$alpha->append_conf("pg_hba.conf", <<EOF);
+local	replication	all	trust
+EOF
+
+# Start the master
+$alpha->start;
+
+# setup/start a standby
+$alpha->backup('bkp');
+my $bravo = get_new_node('bravo');
+$bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
+$bravo->append_conf('postgresql.conf', <<EOF);
+checkpoint_timeout=1h
+checkpoint_completion_target=0.9
+EOF
+$bravo->start;
+
+$alpha->safe_psql('postgres', 'create table test1 (a int)');
+$alpha->safe_psql('postgres', 'insert into test1 select generate_series(1, 10000)');
+
+# take a checkpoint
+$alpha->safe_psql('postgres', 'checkpoint');
+
+# the following vacuum will set visibility map bits and create problematic WAL
+# records
+$alpha->safe_psql('postgres', 'vacuum verbose test1');
+sleep 10;
+
+# now force a checkpoint on the standby. This seems unnecessary but for "some"
+# reason, the previous checkpoint on the master does not reflect on the standby
+# and without an explicit checkpoint, it may start redo recovery from a much
+# older point (which includes even create table and initial page additions
+$bravo->safe_psql('postgres', 'checkpoint');
+
+# here the original runs pg_controldata
+
+# now just use a dummy table and run some operations to move minRecoveryPoint
+# beyond the previous vacuum
+$alpha->safe_psql('postgres', 'create table test2 (a int, b text)');
+$alpha->safe_psql('postgres', 'insert into test2 select generate_series(1,10000), md5(random()::text)');
+$alpha->safe_psql('postgres', 'truncate test2');
+
+$bravo->promote;
+sleep 2;
+
+# fun time.. truncate the table on the promoted standby, vacuum and extend it
+# again
+$bravo->safe_psql('postgres', 'truncate test1');
+$bravo->safe_psql('postgres', 'vacuum verbose test1');
+$bravo->safe_psql('postgres', 'insert into test1 select generate_series(1,1000)');
+#
+# now crash-stop the promoted standby and restart. If timing is correct, you
+# should see a PANIC 
+$bravo->stop('immediate');
+$bravo->start;
-- 
2.14.3 (Apple Git-98)

0002-Ensure-recovery-is-run-to-the-end-upon-promotion-of-.patchapplication/octet-stream; name=0002-Ensure-recovery-is-run-to-the-end-upon-promotion-of-.patchDownload
From 9aaa247b9e9d2aa7c70e3dd95614eac2e2c7f53f Mon Sep 17 00:00:00 2001
From: Pavan Deolasee <pavan.deolasee@gmail.com>
Date: Wed, 9 May 2018 16:37:54 +0530
Subject: [PATCH 2/2] Ensure recovery is run to the end upon promotion of a
 standby.

When a standby is promoted to be a master, if the "Minimum recovery ending
location" is left unchanged, a crash in the just promoted standby before the
first checkpoint after promotion is complete, can lead to recovery ending
without references to all invalid pages are resolved. This results in a PANIC
situation.

We now set minRecoveryPoint to InvalidRecPtr upon promotion, thus forcing
recovery to the end in case of a crash.

Analysis and bug fix by Pavan Deolasee, with help from Alvaro Herrera.
---
 src/backend/access/transam/xlog.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..26b1a91fba 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9010,10 +9010,16 @@ CreateEndOfRecoveryRecord(void)
 	/*
 	 * Update the control file so that crash recovery can follow the timeline
 	 * changes to this point.
+	 *
+	 * Set minRecoveryPoint to InvalidXLogRecPtr so that a crash will force
+	 * redo recovery to the end of WAL. Otherwise a crash immediately after
+	 * promotion may lead to recovery to an inconsistent point and in the worst
+	 * case, recovery failing because of invalid page references (see
+	 * src/test/recovery/t/006_promotion_bug.pl).
 	 */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 	ControlFile->time = (pg_time_t) time(NULL);
-	ControlFile->minRecoveryPoint = recptr;
+	ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
 	ControlFile->minRecoveryPointTLI = ThisTimeLineID;
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
-- 
2.14.3 (Apple Git-98)

#2Michael Paquier
michael@paquier.xyz
In reply to: Pavan Deolasee (#1)
Re: PANIC during crash recovery of a recently promoted standby

On Thu, May 10, 2018 at 10:52:12AM +0530, Pavan Deolasee wrote:

I propose that we should always clear the minRecoveryPoint after promotion
to ensure that crash recovery always run to the end if a just-promoted
standby crashes before completing its first regular checkpoint. A WIP patch
is attached.

I have been playing with your patch and upgraded the test to check as
well for cascading standbys. We could use that in the final patch.
That's definitely something to add in the recovery test suite, and the
sleep phases should be replaced by waits on replay and/or flush.

Still, that approach looks sensitive to me. A restart point could be
running while the end-of-recovery record is inserted, so your patch
could update minRecoveryPoint to InvalidXLogRecPtr, and then a restart
point would happily update again the control file's minRecoveryPoint to
lastCheckPointEndPtr because it would see that the former is older than
lastCheckPointEndPtr (let's not forget that InvalidXLogRecPtr is 0), so
you could still crash on invalid pages?

I need to think a bit more about that stuff, but one idea would be to
use a special state in the control file to mark it as ending recovery,
this way we would control race conditions with restart points.
--
Michael

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#2)
Re: PANIC during crash recovery of a recently promoted standby

Michael Paquier wrote:

On Thu, May 10, 2018 at 10:52:12AM +0530, Pavan Deolasee wrote:

I propose that we should always clear the minRecoveryPoint after promotion
to ensure that crash recovery always run to the end if a just-promoted
standby crashes before completing its first regular checkpoint. A WIP patch
is attached.

I have been playing with your patch and upgraded the test to check as
well for cascading standbys. We could use that in the final patch.
That's definitely something to add in the recovery test suite, and the
sleep phases should be replaced by waits on replay and/or flush.

Still, that approach looks sensitive to me. A restart point could be
running while the end-of-recovery record is inserted, so your patch
could update minRecoveryPoint to InvalidXLogRecPtr, and then a restart
point would happily update again the control file's minRecoveryPoint to
lastCheckPointEndPtr because it would see that the former is older than
lastCheckPointEndPtr (let's not forget that InvalidXLogRecPtr is 0), so
you could still crash on invalid pages?

Yeah, I had this exact comment, but I was unable to come up with a test
case that would cause a problem.

I need to think a bit more about that stuff, but one idea would be to
use a special state in the control file to mark it as ending recovery,
this way we would control race conditions with restart points.

Hmm. Can we change the control file in released branches? (It should
be possible to make the new server understand both old and new formats,
but I think this is breaking new ground and it looks easy to introduce
more bugs there.)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#3)
Re: PANIC during crash recovery of a recently promoted standby

On Fri, May 11, 2018 at 12:09:58PM -0300, Alvaro Herrera wrote:

Yeah, I had this exact comment, but I was unable to come up with a test
case that would cause a problem.

pg_ctl promote would wait for the control file to be updated, so you
cannot use it in the TAP tests to trigger the promotion. Still I think
I found one after waking up? Please note I have not tested it:
- Use a custom trigger file and then trigger promotion with a signal.
- Use a sleep command in recovery_end_command to increase the window, as
what matters is sleeping after CreateEndOfRecoveryRecord updates the
control file.
- Issue a restart point on the standby, which will update the control
file.
- Stop the standby with immediate mode.
- Start the standby, it should see unreferenced pages.

Hmm. Can we change the control file in released branches? (It should
be possible to make the new server understand both old and new formats,
but I think this is breaking new ground and it looks easy to introduce
more bugs there.)

We definitely can't, even if the new value is added at the end of
DBState :(

A couple of wild ideas, not tested, again after waking up:
1) We could also abuse of existing values by using the existing
DB_IN_CRASH_RECOVERY or DB_STARTUP. Still that's not completely true as
the cluster may be open for business as a hot standby.
2) Invent a new special value for XLogRecPtr, normally impossible to
reach, which uses high bits.
--
Michael

#5Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#4)
1 attachment(s)
Re: PANIC during crash recovery of a recently promoted standby

On Sat, May 12, 2018 at 07:41:33AM +0900, Michael Paquier wrote:

pg_ctl promote would wait for the control file to be updated, so you
cannot use it in the TAP tests to trigger the promotion. Still I think
I found one after waking up? Please note I have not tested it:
- Use a custom trigger file and then trigger promotion with a signal.
- Use a sleep command in recovery_end_command to increase the window, as
what matters is sleeping after CreateEndOfRecoveryRecord updates the
control file.
- Issue a restart point on the standby, which will update the control
file.
- Stop the standby with immediate mode.
- Start the standby, it should see unreferenced pages.

I have been looking at that this morning, and actually I have been able
to create a test case where minRecoveryPoint goes backwards using
Pavan's patch. Using a sleep in recovery_end_command has proved to not
be enough so I had to patch the backend with a couple of sleeps and some
processing, mainly:
- One sleep in CreateRestartPoint to make a restart point wait before
updating the control file, which I set at 5s.
- One sleep just after calling CreateEndOfRecoveryRecord, which has been
set at 20s.
- Trigger an asynchronous checkpoint using IPC::Run::start.
- Trigger a promotion with a trigger file and SIGUSR2 to the
postmaster.

The rest of the test is crafted with some magic wait number and adds
some logs to ease the monitoring of the issue. In order to get a crash,
I think that you would need to crash the backend after creating the last
WAL records which generate the invalid page references, and also slow
down the last restart point which makes minRecoveryPoint go backwards,
which is err... Complicated to make deterministic. Still if you apply
the patch attached you would see log entries on the standby as follows:
2018-05-14 12:24:15.065 JST [17352] LOG: selected new timeline ID: 2
2018-05-14 12:24:15.074 JST [17352] LOG: archive recovery complete
2018-05-14 12:24:15.074 JST [17352] WARNING: CreateEndOfRecoveryRecord: minRecoveryPoint is 0/32C4258 before update
2018-05-14 12:24:15.074 JST [17352] WARNING: CreateEndOfRecoveryRecord: minRecoveryPoint is 0/0 after update
2018-05-14 12:24:17.067 JST [17353] WARNING: checkPointCopy.redo =0/30B3D70, lastCheckPoint.redo = 0/31BC208
2018-05-14 12:24:17.067 JST [17353] WARNING: CreateRestartPoint: minRecoveryPoint is 0/0 before update, lastCheckPointEndPtr is 0/31BC2B0
2018-05-14 12:24:17.067 JST [17353] WARNING: CreateRestartPoint: minRecoveryPoint is 0/31BC2B0 after update

So minRecoveryRecord can go definitely go backwards here, which is not
good. Attached is a patch which includes Pavan's fix on top of the test
case I crafted with what is in upthread. You just need to apply it on
HEAD.
--
Michael

Attachments:

min-recovery-point-backwards.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c633e11128..a4fe01112b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2763,7 +2763,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 			minRecoveryPoint = newMinRecoveryPoint;
 			minRecoveryPointTLI = newMinRecoveryPointTLI;
 
-			ereport(DEBUG2,
+			ereport(WARNING,
 					(errmsg("updated min recovery point to %X/%X on timeline %u",
 							(uint32) (minRecoveryPoint >> 32),
 							(uint32) minRecoveryPoint,
@@ -7631,6 +7631,9 @@ StartupXLOG(void)
 				}
 			}
 
+			/* the sleep should happen before requesting a checkpoint */
+			pg_usleep(1000000L * 20); /* 10s */
+
 			if (!fast_promoted)
 				RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
 								  CHECKPOINT_IMMEDIATE |
@@ -7643,9 +7646,12 @@ StartupXLOG(void)
 		 * And finally, execute the recovery_end_command, if any.
 		 */
 		if (recoveryEndCommand)
+		{
+			elog(WARNING, "executing recovery_end_command");
 			ExecuteRecoveryCommand(recoveryEndCommand,
 								   "recovery_end_command",
 								   true);
+		}
 	}
 
 	if (ArchiveRecoveryRequested)
@@ -9027,14 +9033,31 @@ CreateEndOfRecoveryRecord(void)
 
 	XLogFlush(recptr);
 
+	//Maybe need to add an extra sleep here, the WAL insertion seems
+	//to be stuck because of the WAL insert lock taken by the restart
+	//point...
+
 	/*
 	 * Update the control file so that crash recovery can follow the timeline
 	 * changes to this point.
+	 *
+	 * Set minRecoveryPoint to InvalidXLogRecPtr so that a crash will force
+	 * redo recovery to the end of WAL. Otherwise a crash immediately after
+	 * promotion may lead to recovery to an inconsistent point and in the worst
+	 * case, recovery failing because of invalid page references (see
+	 * src/test/recovery/t/006_promotion_bug.pl).
 	 */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	elog(WARNING, "CreateEndOfRecoveryRecord: minRecoveryPoint is %X/%X before update",
+		 (uint32) (ControlFile->minRecoveryPoint >> 32),
+		 (uint32) ControlFile->minRecoveryPoint);
+
 	ControlFile->time = (pg_time_t) time(NULL);
-	ControlFile->minRecoveryPoint = recptr;
+	ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
 	ControlFile->minRecoveryPointTLI = ThisTimeLineID;
+	elog(WARNING, "CreateEndOfRecoveryRecord: minRecoveryPoint is %X/%X after update",
+		 (uint32) (ControlFile->minRecoveryPoint >> 32),
+		 (uint32) ControlFile->minRecoveryPoint);
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
 
@@ -9147,6 +9170,7 @@ CreateRestartPoint(int flags)
 	 */
 	if (!RecoveryInProgress())
 	{
+		elog(WARNING, "restart point skipped");
 		ereport(DEBUG2,
 				(errmsg("skipping restartpoint, recovery has already ended")));
 		LWLockRelease(CheckpointLock);
@@ -9170,6 +9194,7 @@ CreateRestartPoint(int flags)
 	if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
 		lastCheckPoint.redo <= ControlFile->checkPointCopy.redo)
 	{
+		elog(WARNING, "restart point skipped as same point..");
 		ereport(DEBUG2,
 				(errmsg("skipping restartpoint, already performed at %X/%X",
 						(uint32) (lastCheckPoint.redo >> 32),
@@ -9227,6 +9252,12 @@ CreateRestartPoint(int flags)
 	 */
 	PriorRedoPtr = ControlFile->checkPointCopy.redo;
 
+	/* 5s, needs to be lower than restore_end_command! */
+	elog(WARNING, "waiting at sleep phase for CreateRestartPoint");
+	if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
+		elog(WARNING, "control file uses DB_IN_ARCHIVE_RECOVERY");
+	pg_usleep(1000000L * 5);
+
 	/*
 	 * Update pg_control, using current time.  Check that it still shows
 	 * IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
@@ -9234,6 +9265,11 @@ CreateRestartPoint(int flags)
 	 * we get here after the end-of-recovery checkpoint.
 	 */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	elog(WARNING, "checkPointCopy.redo = %X/%X, lastCheckPoint.redo = %X/%X",
+		 (uint32) (ControlFile->checkPointCopy.redo >> 32),
+		 (uint32) ControlFile->checkPointCopy.redo,
+		 (uint32) (lastCheckPoint.redo >> 32),
+		 (uint32) lastCheckPoint.redo);
 	if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
 		ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
 	{
@@ -9241,6 +9277,12 @@ CreateRestartPoint(int flags)
 		ControlFile->checkPointCopy = lastCheckPoint;
 		ControlFile->time = (pg_time_t) time(NULL);
 
+		elog(WARNING, "CreateRestartPoint: minRecoveryPoint is %X/%X before update, "
+			 "lastCheckPointEndPtr is %X/%X",
+			 (uint32) (ControlFile->minRecoveryPoint >> 32),
+			 (uint32) ControlFile->minRecoveryPoint,
+			 (uint32) (lastCheckPointEndPtr >> 32),
+			 (uint32) lastCheckPointEndPtr);
 		/*
 		 * Ensure minRecoveryPoint is past the checkpoint record.  Normally,
 		 * this will have happened already while writing out dirty buffers,
@@ -9261,6 +9303,9 @@ CreateRestartPoint(int flags)
 			minRecoveryPoint = ControlFile->minRecoveryPoint;
 			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		}
+		elog(WARNING, "CreateRestartPoint: minRecoveryPoint is %X/%X after update",
+			 (uint32) (ControlFile->minRecoveryPoint >> 32),
+			 (uint32) ControlFile->minRecoveryPoint);
 		if (flags & CHECKPOINT_IS_SHUTDOWN)
 			ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
 		UpdateControlFile();
diff --git a/src/test/recovery/t/014_promotion_bug.pl b/src/test/recovery/t/014_promotion_bug.pl
new file mode 100644
index 0000000000..85b275a40c
--- /dev/null
+++ b/src/test/recovery/t/014_promotion_bug.pl
@@ -0,0 +1,142 @@
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+# Initialize master node
+my $alpha = get_new_node('alpha');
+$alpha->init;
+$alpha->append_conf("postgresql.conf", <<EOF);
+max_wal_senders=5
+wal_level=replica
+min_wal_size=1GB
+max_wal_size=16GB
+hot_standby=on
+EOF
+
+$alpha->append_conf("pg_hba.conf", <<EOF);
+local	replication	all	trust
+EOF
+
+# Start the master
+$alpha->start;
+
+# setup/start a standby
+$alpha->backup('bkp');
+my $bravo = get_new_node('bravo');
+$bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
+$bravo->append_conf('postgresql.conf', <<EOF);
+checkpoint_timeout=1h
+checkpoint_completion_target=0.9
+EOF
+# Sleep at the end of recovery before the control file is updated
+# so as a checkpoint has the time to come in and update the
+# control file once again...
+my $bravo_datadir = $bravo->data_dir;
+my $trigger_file = "$bravo_datadir/bravo_trigger_file";
+$bravo->append_conf('recovery.conf', <<EOF);
+#recovery_end_command = 'sleep 10'
+trigger_file = '$trigger_file'
+EOF
+$bravo->start;
+
+$alpha->safe_psql('postgres', 'create table test1 (a int)');
+$alpha->safe_psql('postgres', 'insert into test1 select generate_series(1, 10000)');
+
+# take a checkpoint
+$alpha->safe_psql('postgres', 'checkpoint');
+
+# the following vacuum will set visibility map bits and create problematic WAL
+# records
+$alpha->safe_psql('postgres', 'vacuum verbose test1');
+sleep 10;
+
+# now force a checkpoint on the standby. This seems unnecessary but for "some"
+# reason, the previous checkpoint on the master does not reflect on the standby
+# and without an explicit checkpoint, it may start redo recovery from a much
+# older point (which includes even create table and initial page additions
+$bravo->safe_psql('postgres', 'checkpoint');
+
+# here the original runs pg_controldata
+
+# now just use a dummy table and run some operations to move minRecoveryPoint
+# beyond the previous vacuum
+$alpha->safe_psql('postgres', 'create table test2 (a int, b text)');
+$alpha->safe_psql('postgres', 'insert into test2 select generate_series(1,10000), md5(random()::text)');
+$alpha->safe_psql('postgres', 'truncate test2');
+# wait here so as last records can replay, allowing next restart point
+# to happen.
+# make next asynchronous restart point able to run and update the control
+# file.
+$alpha->safe_psql('postgres', 'checkpoint');
+# More data to move again minRecoveryPoint.
+$alpha->safe_psql('postgres', 'create table test3 (a int, b text)');
+$alpha->safe_psql('postgres', 'insert into test3 select generate_series(1,10000), md5(random()::text)');
+$alpha->safe_psql('postgres', 'truncate test3');
+sleep 2;
+
+# Generate a first restart point to make the second one going stale.
+# this needs to begin before promotion, so as this code canbe stuck
+# within CreateRestartPoint.
+$bravo->safe_psql('postgres', 'select 1');
+my $standby_checkpoint = IPC::Run::start(
+	[
+	 'psql', '-d', $bravo->connstr('postgres'),
+	 '-c', 'checkpoint;'
+	]);
+sleep 1;
+
+# Send SIGUSR1 to postmaster to trigger promotion. We want this part
+# to be asynchronous, but fetch first the postmaster PID and create the
+# trigger file.
+my $pid_file = "$bravo_datadir/postmaster.pid";
+open my $file, '<', $pid_file;
+my $bravo_pid = <$file>;
+chomp($bravo_pid);
+close $file;
+# trigger file
+open $file, '>', $trigger_file;
+close $file;
+my $ret = TestLib::system_log('pg_ctl', 'kill', 'USR1', $bravo_pid);
+is($ret, 0, "sent SIGUSR1 to bravo's postmaster's PID $bravo_pid");
+# Let time to reach wait after CreateEndOfRecoveryRecord before the
+# next restart point.
+sleep 2;
+
+# Let the time for the end-of-recovery process to finish.  The sleep
+# time should match the previous sleep time in restore_end_command but
+# that's not mandatory either.
+sleep 25;
+
+# Remove useless checkpoint. This is unfortunately too late as this has already
+# finished...
+$standby_checkpoint->kill_kill;
+
+# fun time.. truncate the table on the promoted standby, vacuum and extend it
+# again
+my $current_lsn =
+  $bravo->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
+$bravo->safe_psql('postgres', "SELECT '$current_lsn'::pg_lsn;");
+$bravo->safe_psql('postgres', 'truncate test1');
+$bravo->safe_psql('postgres', 'vacuum verbose test1');
+$bravo->safe_psql('postgres', 'insert into test1 select generate_series(1,1000)');
+$current_lsn =
+  $bravo->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
+$bravo->safe_psql('postgres', "SELECT '$current_lsn'::pg_lsn;");
+
+# Let the previous checkpoint enough time to finish...
+sleep 10;
+
+#
+# now crash-stop the promoted standby and restart. If timing is correct, you
+# should see a PANIC
+$bravo->stop('immediate');
+$bravo->start;
+
+my $psql_out;
+$bravo->psql(
+   'postgres',
+   "SELECT count(*) FROM test1",
+   stdout => \$psql_out);
+is($psql_out, '1000', "Check that table state is correct");
#6Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Alvaro Herrera (#3)
Re: PANIC during crash recovery of a recently promoted standby

On Fri, May 11, 2018 at 8:39 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Michael Paquier wrote:

On Thu, May 10, 2018 at 10:52:12AM +0530, Pavan Deolasee wrote:

I propose that we should always clear the minRecoveryPoint after

promotion

to ensure that crash recovery always run to the end if a just-promoted
standby crashes before completing its first regular checkpoint. A WIP

patch

is attached.

I have been playing with your patch and upgraded the test to check as
well for cascading standbys. We could use that in the final patch.
That's definitely something to add in the recovery test suite, and the
sleep phases should be replaced by waits on replay and/or flush.

Still, that approach looks sensitive to me. A restart point could be
running while the end-of-recovery record is inserted, so your patch
could update minRecoveryPoint to InvalidXLogRecPtr, and then a restart
point would happily update again the control file's minRecoveryPoint to
lastCheckPointEndPtr because it would see that the former is older than
lastCheckPointEndPtr (let's not forget that InvalidXLogRecPtr is 0), so
you could still crash on invalid pages?

Yeah, I had this exact comment, but I was unable to come up with a test
case that would cause a problem.

Looks like I didn't understand Alvaro's comment when he mentioned it to me
off-list. But I now see what Michael and Alvaro mean and that indeed seems
like a problem. I was thinking that the test for (ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY) will ensure that minRecoveryPoint can't be updated
after the standby is promoted. While that's true for a DB_IN_PRODUCTION, the
RestartPoint may finish after we have written end-of-recovery record, but
before we're in production and thus the minRecoveryPoint may again be set.

I need to think a bit more about that stuff, but one idea would be to
use a special state in the control file to mark it as ending recovery,
this way we would control race conditions with restart points.

Hmm. Can we change the control file in released branches? (It should
be possible to make the new server understand both old and new formats,
but I think this is breaking new ground and it looks easy to introduce
more bugs there.)

Can't we just remember that in shared memory state instead of writing to
the control file? So if we've already performed end-of-recovery, we don't
update the minRecoveryPoint when RestartPoint completes.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Michael Paquier
michael@paquier.xyz
In reply to: Pavan Deolasee (#6)
1 attachment(s)
Re: PANIC during crash recovery of a recently promoted standby

On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote:

Looks like I didn't understand Alvaro's comment when he mentioned it to me
off-list. But I now see what Michael and Alvaro mean and that indeed seems
like a problem. I was thinking that the test for (ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY) will ensure that minRecoveryPoint can't be updated
after the standby is promoted. While that's true for a DB_IN_PRODUCTION, the
RestartPoint may finish after we have written end-of-recovery record, but
before we're in production and thus the minRecoveryPoint may again be set.

Yeah, this has been something I considered as well first, but I was not
confident enough that setting up minRecoveryPoint to InvalidXLogRecPtr
was actually a safe thing for timeline switches.

So I have spent a good portion of today testing and playing with it to
be confident enough that this was right, and I have finished with the
attached. The patch adds a new flag to XLogCtl which marks if the
control file has been updated after the end-of-recovery record has been
written, so as minRecoveryPoint does not get updated because of a
restart point running in parallel.

I have also reworked the test case you sent, removing the manuals sleeps
and replacing them with correct wait points. There is also no point to
wait after promotion as pg_ctl promote implies a wait. Another
important thing is that you need to use wal_log_hints = off to see a
crash, which is something that allows_streaming actually enables.

Comments are welcome.
--
Michael

Attachments:

recovery-panic-michael.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2126..230409ced6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -701,6 +701,15 @@ typedef struct XLogCtlData
 	 */
 	XLogRecPtr	lastFpwDisableRecPtr;
 
+	/*
+	 * Track if end-of-recovery record has been written and that the
+	 * control file has been updated, so as the minimum recovery LSN
+	 * does not get updated anymore.  This way, WAL is replayed up to
+	 * the end if a crash happens before the first post-end-recovery
+	 * checkpoint.
+	 */
+	bool		endOfRecoveryDone;
+
 	slock_t		info_lck;		/* locks shared variables shown above */
 } XLogCtlData;
 
@@ -2707,12 +2716,25 @@ XLogGetReplicationSlotMinimumLSN(void)
 static void
 UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 {
+	bool	endOfRecoveryDone;
+
 	/* Quick check using our local copy of the variable */
 	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
 		return;
 
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
+	/*
+	 * If the end of recovery record has already been written and that
+	 * the control file has been accordingly updated, then minRecoveryPoint
+	 * should not be updated anymore.
+	 */
+	SpinLockAcquire(&XLogCtl->info_lck);
+	endOfRecoveryDone = XLogCtl->endOfRecoveryDone;
+	SpinLockRelease(&XLogCtl->info_lck);
+	if (endOfRecoveryDone)
+		return;
+
 	/* update local copy */
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
@@ -7032,6 +7054,7 @@ StartupXLOG(void)
 		XLogCtl->recoveryLastXTime = 0;
 		XLogCtl->currentChunkStartTime = 0;
 		XLogCtl->recoveryPause = false;
+		XLogCtl->endOfRecoveryDone = false;
 		SpinLockRelease(&XLogCtl->info_lck);
 
 		/* Also ensure XLogReceiptTime has a sane value */
@@ -9040,10 +9063,21 @@ CreateEndOfRecoveryRecord(void)
 	/*
 	 * Update the control file so that crash recovery can follow the timeline
 	 * changes to this point.
+	 *
+	 * Set minRecoveryPoint to InvalidXLogRecPtr so that a crash will force
+	 * redo recovery to the end of WAL. Otherwise a crash immediately after
+	 * promotion may lead to recovery to an inconsistent point and in the worst
+	 * case, recovery failing because of invalid page references.
+	 *
+	 * endOfRecoveryDone is updated to reflect the fact that minRecoveryPoint
+	 * cannot be updated anymore from this point.
 	 */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	SpinLockAcquire(&XLogCtl->info_lck);
+	XLogCtl->endOfRecoveryDone = true;
+	SpinLockRelease(&XLogCtl->info_lck);
 	ControlFile->time = (pg_time_t) time(NULL);
-	ControlFile->minRecoveryPoint = recptr;
+	ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
 	ControlFile->minRecoveryPointTLI = ThisTimeLineID;
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
@@ -9137,6 +9171,7 @@ CreateRestartPoint(int flags)
 	CheckPoint	lastCheckPoint;
 	XLogRecPtr	PriorRedoPtr;
 	TimestampTz xtime;
+	bool		endOfRecoveryDone;
 
 	/*
 	 * Acquire CheckpointLock to ensure only one restartpoint or checkpoint
@@ -9244,6 +9279,9 @@ CreateRestartPoint(int flags)
 	 * we get here after the end-of-recovery checkpoint.
 	 */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	SpinLockAcquire(&XLogCtl->info_lck);
+	endOfRecoveryDone = XLogCtl->endOfRecoveryDone;
+	SpinLockRelease(&XLogCtl->info_lck);
 	if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
 		ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
 	{
@@ -9261,8 +9299,12 @@ CreateRestartPoint(int flags)
 		 * at a minimum. Note that for an ordinary restart of recovery there's
 		 * no value in having the minimum recovery point any earlier than this
 		 * anyway, because redo will begin just after the checkpoint record.
+		 * It could be possible that minRecoveryPoint has already been updated
+		 * when generating the end-of-recovery record, in which case we want
+		 * WAL replay to happen up to the end.
 		 */
-		if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
+		if (!endOfRecoveryDone &&
+			ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
 		{
 			ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
 			ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
diff --git a/src/test/recovery/t/015_promotion.pl b/src/test/recovery/t/015_promotion.pl
new file mode 100644
index 0000000000..2f932d5acb
--- /dev/null
+++ b/src/test/recovery/t/015_promotion.pl
@@ -0,0 +1,83 @@
+# Test for promotion handling with WAL records generated post-promotion
+# before the first checkpoint is generated.  This test case checks for
+# invalid page references at replay based on the minimum consistent
+# recovery point defined.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize primary node
+my $alpha = get_new_node('alpha');
+$alpha->init(allows_streaming => 1);
+# Setting wal_log_hints to off is important to get invalid page
+# references.
+$alpha->append_conf("postgresql.conf", <<EOF);
+wal_log_hints = off
+EOF
+
+# Start the primary
+$alpha->start;
+
+# setup/start a standby
+$alpha->backup('bkp');
+my $bravo = get_new_node('bravo');
+$bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
+$bravo->append_conf('postgresql.conf', <<EOF);
+checkpoint_timeout=1h
+checkpoint_completion_target=0.9
+EOF
+$bravo->start;
+
+# Dummy table for the upcoming tests.
+$alpha->safe_psql('postgres', 'create table test1 (a int)');
+$alpha->safe_psql('postgres', 'insert into test1 select generate_series(1, 10000)');
+
+# take a checkpoint
+$alpha->safe_psql('postgres', 'checkpoint');
+
+# The following vacuum will set visibility map bits and create
+# problematic WAL records.
+$alpha->safe_psql('postgres', 'vacuum verbose test1');
+# Wait for last record to have been replayed on the standby.
+$alpha->wait_for_catchup($bravo, 'replay',
+						 $alpha->lsn('insert'));
+
+# Now force a checkpoint on the standby. This seems unnecessary but for "some"
+# reason, the previous checkpoint on the primary does not reflect on the standby
+# and without an explicit checkpoint, it may start redo recovery from a much
+# older point, which includes even create table and initial page additions.
+$bravo->safe_psql('postgres', 'checkpoint');
+
+# Now just use a dummy table and run some operations to move minRecoveryPoint
+# beyond the previous vacuum.
+$alpha->safe_psql('postgres', 'create table test2 (a int, b text)');
+$alpha->safe_psql('postgres', 'insert into test2 select generate_series(1,10000), md5(random()::text)');
+$alpha->safe_psql('postgres', 'truncate test2');
+
+# Do the promotion, which reinitializes minRecoveryPoint in the control
+# file so as WAL is replayed up to the end.
+$bravo->promote;
+
+# Truncate the table on the promoted standby, vacuum and extend it
+# again to create new page references.  The first post-recovery checkpoint
+# has not happened yet.
+$bravo->safe_psql('postgres', 'truncate test1');
+$bravo->safe_psql('postgres', 'vacuum verbose test1');
+$bravo->safe_psql('postgres', 'insert into test1 select generate_series(1,1000)');
+
+# Now crash-stop the promoted standby and restart.  This makes sure that
+# replay does not see invalid page references because of an invalid
+# minimum consistent recovery point.
+$bravo->stop('immediate');
+$bravo->start;
+
+# Check state of the table after full crash recovery.  All its data should
+# be here.
+my $psql_out;
+$bravo->psql(
+	'postgres',
+	"SELECT count(*) FROM test1",
+	stdout => \$psql_out);
+is($psql_out, '1000', "Check that table state is correct");
#8Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Michael Paquier (#7)
1 attachment(s)
Re: PANIC during crash recovery of a recently promoted standby

Hello.

At Thu, 24 May 2018 16:57:07 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180524075707.GE15445@paquier.xyz>

On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote:

Looks like I didn't understand Alvaro's comment when he mentioned it to me
off-list. But I now see what Michael and Alvaro mean and that indeed seems
like a problem. I was thinking that the test for (ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY) will ensure that minRecoveryPoint can't be updated
after the standby is promoted. While that's true for a DB_IN_PRODUCTION, the
RestartPoint may finish after we have written end-of-recovery record, but
before we're in production and thus the minRecoveryPoint may again be set.

Yeah, this has been something I considered as well first, but I was not
confident enough that setting up minRecoveryPoint to InvalidXLogRecPtr
was actually a safe thing for timeline switches.

So I have spent a good portion of today testing and playing with it to
be confident enough that this was right, and I have finished with the
attached. The patch adds a new flag to XLogCtl which marks if the
control file has been updated after the end-of-recovery record has been
written, so as minRecoveryPoint does not get updated because of a
restart point running in parallel.

I have also reworked the test case you sent, removing the manuals sleeps
and replacing them with correct wait points. There is also no point to
wait after promotion as pg_ctl promote implies a wait. Another
important thing is that you need to use wal_log_hints = off to see a
crash, which is something that allows_streaming actually enables.

Comments are welcome.

As the result of some poking around, my dignosis is different
from yours.

(I believe that) By definition recovery doesn't end until the
end-of-recovery check point ends so from the viewpoint I think it
is wrong to clear ControlFile->minRecoveryPoint before the end.

Invalid-page checking during crash recovery is hamful rather than
useless. It is done by CheckRecoveryConsistency even in crash
recovery against its expectation because there's a case where
minRecoveryPoint is valid but InArchiveRecovery is false. The two
variable there seems to be in contradiction with each other.

The immediate cause of the contradition is that StartXLOG wrongly
initializes local minRecoveryPoint from control file's value even
under crash recovery. updateMinRecoveryPoint also should be
turned off during crash recovery. The case of crash after last
checkpoint end has been treated in UpdateMinRecoveryPoint but
forgot to consider this case, where crash recovery has been
started while control file is still holding valid
minRecoveryPoint.

Finally, the attached patch seems fixing the issue. It passes
015_promotion.pl and the problem doesn't happen with
014_promotion_bug.pl. Also this passes the existing tests
002-014.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Fix-wrong-behavior-during-crash-recovery.patchtext/x-patch; charset=us-asciiDownload
From 192c36bc96135d9108c499d01016f7eb6fc28fd1 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 7 Jun 2018 16:02:41 +0900
Subject: [PATCH] Fix wrong behavior during crash recovery

After being killed before end-of-recovery checkpoint ends, server
restarts in crash recovery mode. However it can wrongly performs
invalid-page checks of WAL records, which is not expected during crash
recovery, then results in PANIC.

This patch prevents the wrong checking by suppressing needless updates
of minRecoveryPoint during crash recovery.
---
 src/backend/access/transam/xlog.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2126..283c26cb6c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -821,8 +821,11 @@ static XLogSource XLogReceiptSource = 0;	/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint; /* local copy of
-									 * ControlFile->minRecoveryPoint */
+/*
+ * local copy of ControlFile->minRecoveryPoint. InvalidXLogRecPtr means we're
+ * in crash recovery
+ */
+static XLogRecPtr minRecoveryPoint = InvalidXLogRecPtr;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -2717,14 +2720,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		XLogRecPtr	newMinRecoveryPoint;
 		TimeLineID	newMinRecoveryPointTLI;
@@ -6841,6 +6837,7 @@ StartupXLOG(void)
 								ControlFile->checkPointCopy.ThisTimeLineID,
 								recoveryTargetTLI)));
 			ControlFile->state = DB_IN_CRASH_RECOVERY;
+			updateMinRecoveryPoint = false;
 		}
 		ControlFile->checkPoint = checkPointLoc;
 		ControlFile->checkPointCopy = checkPoint;
@@ -6852,6 +6849,10 @@ StartupXLOG(void)
 				ControlFile->minRecoveryPoint = checkPoint.redo;
 				ControlFile->minRecoveryPointTLI = checkPoint.ThisTimeLineID;
 			}
+
+			/* also initialize our local copies */
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		}
 
 		/*
@@ -6889,10 +6890,6 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
 		 */
@@ -7858,6 +7855,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
-- 
2.16.3

#9Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro HORIGUCHI (#8)
Re: PANIC during crash recovery of a recently promoted standby

On Thu, Jun 07, 2018 at 07:58:29PM +0900, Kyotaro HORIGUCHI wrote:

Invalid-page checking during crash recovery is hamful rather than
useless. It is done by CheckRecoveryConsistency even in crash
recovery against its expectation because there's a case where
minRecoveryPoint is valid but InArchiveRecovery is false. The two
variable there seems to be in contradiction with each other.

The immediate cause of the contradition is that StartXLOG wrongly
initializes local minRecoveryPoint from control file's value even
under crash recovery. updateMinRecoveryPoint also should be
turned off during crash recovery. The case of crash after last
checkpoint end has been treated in UpdateMinRecoveryPoint but
forgot to consider this case, where crash recovery has been
started while control file is still holding valid
minRecoveryPoint.

Hmm. This patch looks interesting and those issues need a very careful
lookup. This is one of those things which should be fixed as part of
the extra CF, so I am planning to look at it in details very soon,
perhaps by the end of this week...

I have by the way noticed that nothing was registered in the CF app:
https://commitfest.postgresql.org/18/1680/
I have added as well a bullet point in the open item page of v11 for
older bugs.
--
Michael

#10Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro HORIGUCHI (#8)
1 attachment(s)
Re: PANIC during crash recovery of a recently promoted standby

On Thu, Jun 07, 2018 at 07:58:29PM +0900, Kyotaro HORIGUCHI wrote:

(I believe that) By definition recovery doesn't end until the
end-of-recovery check point ends so from the viewpoint I think it
is wrong to clear ControlFile->minRecoveryPoint before the end.

Invalid-page checking during crash recovery is hamful rather than
useless. It is done by CheckRecoveryConsistency even in crash
recovery against its expectation because there's a case where
minRecoveryPoint is valid but InArchiveRecovery is false. The two
variable there seems to be in contradiction with each other.

The immediate cause of the contradition is that StartXLOG wrongly
initializes local minRecoveryPoint from control file's value even
under crash recovery. updateMinRecoveryPoint also should be
turned off during crash recovery. The case of crash after last
checkpoint end has been treated in UpdateMinRecoveryPoint but
forgot to consider this case, where crash recovery has been
started while control file is still holding valid
minRecoveryPoint.

I have been digesting your proposal and I reviewed it, and I think that
what you are proposing here is on the right track. However, the updates
around minRecoveryPoint and minRecoveryPointTLI ought to be more
consistent because that could cause future issues. I have spotted two
bug where I think the problem is not fixed: when replaying a WAL record
XLOG_PARAMETER_CHANGE, minRecoveryPoint and minRecoveryPointTLI would
still get updated from the control file values which can still lead to
failures as CheckRecoveryConsistency could still happily trigger a
PANIC, so I think that we had better maintain those values consistent as
long as crash recovery runs. And XLogNeedsFlush() also has a similar
problem.

Note that there is as well the case where the startup process switches
from crash recovery to archive recovery, in which case the update of the
local copies have to happen once the switch is done. Your patch covers
that with just updating updateMinRecoveryPoint each time crash recovery
happens but that's not completely consistent, but it seems that it also
missed the fact that updateMinRecoveryPoint needs to be switched back to
true as the startup process can update the control file. Actually,
updateMinRecoveryPoint needs to be switched back to true in that case or
the startup process would not be able to update minRecoveryPoint when it
calls XLogFlush for example.

There is the point of trying to get rid of updateMinRecoveryPoint which
has crossed my mind, but that's not wise as it's default value allows
the checkpointer minRecoveryPoint when started, which also has to happen
once the startup process gets out of recovery and tells the postmaster
to start the checkpointer. For backends as well that's a sane default.

There is also no point in taking ControlFileLock when checking if the
local copy of minRecoveryPoint is valid or not, so this can be
bypassed.

The assertion in CheckRecoveryConsistency is definitely a good idea as
this should only be called by the startup process, so we can keep it.

In the attached, I have fixed the issues I found and added the test case
which should be included in the final commit. Its concept is pretty
simple, the idea is to keep the local copy of minRecoveryPoint to
InvalidXLogRecPtr as long as crash recovery runs, and is switched back
to normal if there is a switch to archive recovery after a crash
recovery.

This is not really a complicated patch, and it took a lot of energy from
me the last couple of days per the nature of the many scenarios to think
about... So an extra pair of eyes from another committer would be
welcome. I am letting that cool down for a couple of days now.
--
Michael

Attachments:

recovery-panic-michael-v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2126..ee837843f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -821,6 +821,13 @@ static XLogSource XLogReceiptSource = 0;	/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
 static XLogRecPtr minRecoveryPoint; /* local copy of
 									 * ControlFile->minRecoveryPoint */
 static TimeLineID minRecoveryPointTLI;
@@ -2711,20 +2718,26 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
 		return;
 
+	/*
+	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
+	 * i.e., we're doing crash recovery.  We never modify the control file's
+	 * value in that case, so we can short-circuit future checks here too.
+	 * The local values of minRecoveryPoint and minRecoveryPointTLI should
+	 * not be updated until crash recovery finishes.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint))
+	{
+		updateMinRecoveryPoint = false;
+		return;
+	}
+
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
 	/* update local copy */
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		XLogRecPtr	newMinRecoveryPoint;
 		TimeLineID	newMinRecoveryPointTLI;
@@ -3110,7 +3123,16 @@ XLogNeedsFlush(XLogRecPtr record)
 	 */
 	if (RecoveryInProgress())
 	{
-		/* Quick exit if already known updated */
+		/*
+		 * An invalid minRecoveryPoint means that we need to recover all the
+		 * WAL, i.e., we're doing crash recovery.  We never modify the control
+		 * file's value in that case, so we can short-circuit future checks
+		 * here too.
+		 */
+		if (XLogRecPtrIsInvalid(minRecoveryPoint))
+			updateMinRecoveryPoint = false;
+
+		/* Quick exit if already known to be updated or cannot be updated */
 		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
 			return false;
 
@@ -3124,20 +3146,8 @@ XLogNeedsFlush(XLogRecPtr record)
 		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		LWLockRelease(ControlFileLock);
 
-		/*
-		 * An invalid minRecoveryPoint means that we need to recover all the
-		 * WAL, i.e., we're doing crash recovery.  We never modify the control
-		 * file's value in that case, so we can short-circuit future checks
-		 * here too.
-		 */
-		if (minRecoveryPoint == 0)
-			updateMinRecoveryPoint = false;
-
 		/* check again */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-			return false;
-		else
-			return true;
+		return record > minRecoveryPoint;
 	}
 
 	/* Quick exit if already known flushed */
@@ -4266,6 +4276,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				minRecoveryPoint = ControlFile->minRecoveryPoint;
 				minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
+				/*
+				 * The startup process can update its local copy of
+				 * minRecoveryPoint from that point.
+				 */
+				updateMinRecoveryPoint = true;
+
 				UpdateControlFile();
 				LWLockRelease(ControlFileLock);
 
@@ -6889,9 +6905,26 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		/*
+		 * Initialize our local copy of minRecoveryPoint.  When doing crash
+		 * recovery we want to replay up to the end of WAL.  Particularly, in
+		 * the case of a promoted standby minRecoveryPoint value in the control
+		 * file is only updated after the first checkpoint.  However, if the
+		 * instance crashes before this first completed is generated then
+		 * recovery will use a stale location causing the startup process
+		 * to think that there are still invalid page references when checking
+		 * the recovery consistency.
+		 */
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		else
+		{
+			minRecoveryPoint = InvalidXLogRecPtr;
+			minRecoveryPointTLI = 0;
+		}
 
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
@@ -7858,6 +7891,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
@@ -9946,10 +9981,15 @@ xlog_redo(XLogReaderState *record)
 		 * Update minRecoveryPoint to ensure that if recovery is aborted, we
 		 * recover back up to this point before allowing hot standby again.
 		 * This is important if the max_* settings are decreased, to ensure
-		 * you don't run queries against the WAL preceding the change.
+		 * you don't run queries against the WAL preceding the change. The
+		 * local copies cannot be updated as long as crash recovery is
+		 * happening and we expect all the WAL to be replayed.
 		 */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
 		if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
 		{
 			ControlFile->minRecoveryPoint = lsn;
diff --git a/src/test/recovery/t/015_promotion_pages.pl b/src/test/recovery/t/015_promotion_pages.pl
new file mode 100644
index 0000000000..2f932d5acb
--- /dev/null
+++ b/src/test/recovery/t/015_promotion_pages.pl
@@ -0,0 +1,83 @@
+# Test for promotion handling with WAL records generated post-promotion
+# before the first checkpoint is generated.  This test case checks for
+# invalid page references at replay based on the minimum consistent
+# recovery point defined.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize primary node
+my $alpha = get_new_node('alpha');
+$alpha->init(allows_streaming => 1);
+# Setting wal_log_hints to off is important to get invalid page
+# references.
+$alpha->append_conf("postgresql.conf", <<EOF);
+wal_log_hints = off
+EOF
+
+# Start the primary
+$alpha->start;
+
+# setup/start a standby
+$alpha->backup('bkp');
+my $bravo = get_new_node('bravo');
+$bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
+$bravo->append_conf('postgresql.conf', <<EOF);
+checkpoint_timeout=1h
+checkpoint_completion_target=0.9
+EOF
+$bravo->start;
+
+# Dummy table for the upcoming tests.
+$alpha->safe_psql('postgres', 'create table test1 (a int)');
+$alpha->safe_psql('postgres', 'insert into test1 select generate_series(1, 10000)');
+
+# take a checkpoint
+$alpha->safe_psql('postgres', 'checkpoint');
+
+# The following vacuum will set visibility map bits and create
+# problematic WAL records.
+$alpha->safe_psql('postgres', 'vacuum verbose test1');
+# Wait for last record to have been replayed on the standby.
+$alpha->wait_for_catchup($bravo, 'replay',
+						 $alpha->lsn('insert'));
+
+# Now force a checkpoint on the standby. This seems unnecessary but for "some"
+# reason, the previous checkpoint on the primary does not reflect on the standby
+# and without an explicit checkpoint, it may start redo recovery from a much
+# older point, which includes even create table and initial page additions.
+$bravo->safe_psql('postgres', 'checkpoint');
+
+# Now just use a dummy table and run some operations to move minRecoveryPoint
+# beyond the previous vacuum.
+$alpha->safe_psql('postgres', 'create table test2 (a int, b text)');
+$alpha->safe_psql('postgres', 'insert into test2 select generate_series(1,10000), md5(random()::text)');
+$alpha->safe_psql('postgres', 'truncate test2');
+
+# Do the promotion, which reinitializes minRecoveryPoint in the control
+# file so as WAL is replayed up to the end.
+$bravo->promote;
+
+# Truncate the table on the promoted standby, vacuum and extend it
+# again to create new page references.  The first post-recovery checkpoint
+# has not happened yet.
+$bravo->safe_psql('postgres', 'truncate test1');
+$bravo->safe_psql('postgres', 'vacuum verbose test1');
+$bravo->safe_psql('postgres', 'insert into test1 select generate_series(1,1000)');
+
+# Now crash-stop the promoted standby and restart.  This makes sure that
+# replay does not see invalid page references because of an invalid
+# minimum consistent recovery point.
+$bravo->stop('immediate');
+$bravo->start;
+
+# Check state of the table after full crash recovery.  All its data should
+# be here.
+my $psql_out;
+$bravo->psql(
+	'postgres',
+	"SELECT count(*) FROM test1",
+	stdout => \$psql_out);
+is($psql_out, '1000', "Check that table state is correct");
#11Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Michael Paquier (#10)
Re: PANIC during crash recovery of a recently promoted standby

On Fri, Jun 22, 2018 at 9:28 AM, Michael Paquier <michael@paquier.xyz>
wrote:

This is not really a complicated patch, and it took a lot of energy from
me the last couple of days per the nature of the many scenarios to think
about...

Thanks for the efforts. It wasn't an easy bug to chase to begin with. So I
am not surprised there were additional problems that I missed.

So an extra pair of eyes from another committer would be
welcome. I am letting that cool down for a couple of days now.

I am not a committer, so don't know if my pair of eyes count, but FWIW the
patch looks good to me except couple of minor points.

+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
 static XLogRecPtr minRecoveryPoint; /* local copy of
  * ControlFile->minRecoveryPoint */

The inline comment looks unnecessary now that we have comment at the top.

@@ -4266,6 +4276,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr
RecPtr, int emode,
minRecoveryPoint = ControlFile->minRecoveryPoint;
minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;

+ /*
+ * The startup process can update its local copy of
+ * minRecoveryPoint from that point.
+ */

s/that/this

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#12Michael Paquier
michael@paquier.xyz
In reply to: Pavan Deolasee (#11)
Re: PANIC during crash recovery of a recently promoted standby

On Fri, Jun 22, 2018 at 10:08:24AM +0530, Pavan Deolasee wrote:

On Fri, Jun 22, 2018 at 9:28 AM, Michael Paquier <michael@paquier.xyz>
wrote:

So an extra pair of eyes from another committer would be
welcome. I am letting that cool down for a couple of days now.

I am not a committer, so don't know if my pair of eyes count, but FWIW the
patch looks good to me except couple of minor points.

Thanks for grabbing some time, Pavan. Any help is welcome!

+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
static XLogRecPtr minRecoveryPoint; /* local copy of
* ControlFile->minRecoveryPoint */

The inline comment looks unnecessary now that we have comment at the
top.

I'll fix that.

@@ -4266,6 +4276,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr
RecPtr, int emode,
minRecoveryPoint = ControlFile->minRecoveryPoint;
minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;

+ /*
+ * The startup process can update its local copy of
+ * minRecoveryPoint from that point.
+ */

s/that/this

This one too.
--
Michael

#13Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Michael Paquier (#12)
Re: PANIC during crash recovery of a recently promoted standby

Hello, sorry for the absense and I looked the second patch.

At Fri, 22 Jun 2018 13:45:21 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180622044521.GC5215@paquier.xyz>

On Fri, Jun 22, 2018 at 10:08:24AM +0530, Pavan Deolasee wrote:

On Fri, Jun 22, 2018 at 9:28 AM, Michael Paquier <michael@paquier.xyz>
wrote:

So an extra pair of eyes from another committer would be
welcome. I am letting that cool down for a couple of days now.

I am not a committer, so don't know if my pair of eyes count, but FWIW the
patch looks good to me except couple of minor points.

Thanks for grabbing some time, Pavan. Any help is welcome!

in previous mail:

I have spotted two
bug where I think the problem is not fixed: when replaying a WAL record
XLOG_PARAMETER_CHANGE, minRecoveryPoint and minRecoveryPointTLI would
still get updated from the control file values which can still lead to
failures as CheckRecoveryConsistency could still happily trigger a
PANIC, so I think that we had better maintain those values consistent as

The fix of StartupXLOG, CheckRecoveryConsistency, ReadRecrod and
xlog_redo looks (functionally, mendtioned below) fine.

long as crash recovery runs. And XLogNeedsFlush() also has a similar
problem.

Here, on the other hand, this patch turns off
updateMinRecoverypoint if minRecoverPoint is invalid when
RecoveryInProgress() == true. Howerver RecovInProg() == true
means archive recovery is already started and and
minRecoveryPoint *should* be updated t for the
condition. Actually minRecoverypoint is updated just below. If
this is really right thing, I think that some explanation for the
reason is required here.

In xlog_redo there still be "minRecoverypoint != 0", which ought
to be described as "!XLogRecPtrIsInvalid(minRecoveryPoint)". (It
seems the only one. Double negation is a bit uneasy but there are
many instance of this kind of coding.)

# I'll go all-out next week.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#14Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro HORIGUCHI (#13)
Re: PANIC during crash recovery of a recently promoted standby

On Fri, Jun 22, 2018 at 02:34:02PM +0900, Kyotaro HORIGUCHI wrote:

Hello, sorry for the absense and I looked the second patch.

Thanks for the review!

At Fri, 22 Jun 2018 13:45:21 +0900, Michael Paquier
<michael@paquier.xyz> wrote in <20180622044521.GC5215@paquier.xyz>

long as crash recovery runs. And XLogNeedsFlush() also has a similar
problem.

Here, on the other hand, this patch turns off
updateMinRecoverypoint if minRecoverPoint is invalid when
RecoveryInProgress() == true. Howerver RecovInProg() == true
means archive recovery is already started and and
minRecoveryPoint *should* be updated t for the
condition. Actually minRecoverypoint is updated just below. If
this is really right thing, I think that some explanation for the
reason is required here.

LocalRecoveryInProgress is just a local copy of SharedRecoveryInProgress
so RecoveryInProgress also returns true if crash recovery is running.
But perhaps I am missing what you mean? The point here is that redo can
call XLogNeedsFlush, no?

In xlog_redo there still be "minRecoverypoint != 0", which ought
to be described as "!XLogRecPtrIsInvalid(minRecoveryPoint)". (It
seems the only one. Double negation is a bit uneasy but there are
many instance of this kind of coding.)

It is possible to use directly a comparison with InvalidXLogRecPtr
instead of a double negation.

# I'll go all-out next week.

Enjoy your vacations!
--
Michael

#15Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#14)
6 attachment(s)
Re: PANIC during crash recovery of a recently promoted standby

On Fri, Jun 22, 2018 at 03:25:48PM +0900, Michael Paquier wrote:

On Fri, Jun 22, 2018 at 02:34:02PM +0900, Kyotaro HORIGUCHI wrote:

Hello, sorry for the absense and I looked the second patch.

Thanks for the review!

I have been spending some time testing and torturing the patch for all
stable branches, and I have finished with the set of patches attached.

My testing has involved using the TAP suite, where I have actually, and
roughly backported the infrastructure in v10 down to older versions,
which has required to tweak Makefile.global.in and finding out again
that pg_ctl start has switched to the wait mode by default in 10.

I have spent a bit of time testing this on HEAD, 10 and 9.6. For 9.5,
9.4 and 9.3 I have reproduced the failure and tested the patch, but I
lacked time to perform more tests. The patch set for 9.3~9.5 applies
without conflict across the 3 branches. 9.6 has a conflict in a
comment, and v10 had an extra comment conflict.

Feel free to have a look, I am not completely done with this stuff and
I'll work more tomorrow on checking 9.3~9.5.
--
Michael

Attachments:

promote-panic-96.patchtext/x-diff; charset=iso-8859-1Download
From 1ac8e1b4c9d3002594accc447ca6dae9ef32bfe5 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 27 Jun 2018 14:56:14 +0900
Subject: [PATCH] Prevent references to invalid relation pages at
 post-promotion
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position past the first end-of-recovery checkpoint while all
the WAL available should be replayed.  This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed. When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed.  Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.

The base fix idea comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me.  The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to be faster and more reliable.

Backpatch down to all supported versions, aka 9.3.

Reported-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
---
 src/backend/access/transam/xlog.c | 101 +++++++++++++++++++++---------
 1 file changed, 70 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 248ea9a976..2f670d0d96 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -794,8 +794,14 @@ static XLogSource XLogReceiptSource = 0;		/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint;		/* local copy of
-										 * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -2532,20 +2538,26 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
 		return;
 
+	/*
+	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
+	 * i.e., we're doing crash recovery.  We never modify the control file's
+	 * value in that case, so we can short-circuit future checks here too. The
+	 * local values of minRecoveryPoint and minRecoveryPointTLI should not be
+	 * updated until crash recovery finishes.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint))
+	{
+		updateMinRecoveryPoint = false;
+		return;
+	}
+
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
 	/* update local copy */
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		XLogRecPtr	newMinRecoveryPoint;
 		TimeLineID	newMinRecoveryPointTLI;
@@ -2930,7 +2942,16 @@ XLogNeedsFlush(XLogRecPtr record)
 	 */
 	if (RecoveryInProgress())
 	{
-		/* Quick exit if already known updated */
+		/*
+		 * An invalid minRecoveryPoint means that we need to recover all the
+		 * WAL, i.e., we're doing crash recovery.  We never modify the control
+		 * file's value in that case, so we can short-circuit future checks
+		 * here too.
+		 */
+		if (XLogRecPtrIsInvalid(minRecoveryPoint))
+			updateMinRecoveryPoint = false;
+
+		/* Quick exit if already known to be updated or cannot be updated */
 		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
 			return false;
 
@@ -2944,20 +2965,8 @@ XLogNeedsFlush(XLogRecPtr record)
 		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		LWLockRelease(ControlFileLock);
 
-		/*
-		 * An invalid minRecoveryPoint means that we need to recover all the
-		 * WAL, i.e., we're doing crash recovery.  We never modify the control
-		 * file's value in that case, so we can short-circuit future checks
-		 * here too.
-		 */
-		if (minRecoveryPoint == 0)
-			updateMinRecoveryPoint = false;
-
 		/* check again */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-			return false;
-		else
-			return true;
+		return record > minRecoveryPoint;
 	}
 
 	/* Quick exit if already known flushed */
@@ -4099,6 +4108,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				minRecoveryPoint = ControlFile->minRecoveryPoint;
 				minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
+				/*
+				 * The startup process can update its local copy of
+				 * minRecoveryPoint from this point.
+				 */
+				updateMinRecoveryPoint = true;
+
 				UpdateControlFile();
 				LWLockRelease(ControlFileLock);
 
@@ -6578,9 +6593,26 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		/*
+		 * Initialize our local copy of minRecoveryPoint.  When doing crash
+		 * recovery we want to replay up to the end of WAL.  Particularly, in
+		 * the case of a promoted standby minRecoveryPoint value in the
+		 * control file is only updated after the first checkpoint.  However,
+		 * if the instance crashes before the first post-recovery checkpoint
+		 * is completed then recovery will use a stale location causing the
+		 * startup process to think that there are still invalid page
+		 * references when checking for data consistency.
+		 */
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		else
+		{
+			minRecoveryPoint = InvalidXLogRecPtr;
+			minRecoveryPointTLI = 0;
+		}
 
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
@@ -7520,6 +7552,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
@@ -9582,11 +9616,16 @@ xlog_redo(XLogReaderState *record)
 		 * Update minRecoveryPoint to ensure that if recovery is aborted, we
 		 * recover back up to this point before allowing hot standby again.
 		 * This is important if the max_* settings are decreased, to ensure
-		 * you don't run queries against the WAL preceding the change.
+		 * you don't run queries against the WAL preceding the change. The
+		 * local copies cannot be updated as long as crash recovery is
+		 * happening and we expect all the WAL to be replayed.
 		 */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-		if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		if (minRecoveryPoint != InvalidXLogRecPtr && minRecoveryPoint < lsn)
 		{
 			ControlFile->minRecoveryPoint = lsn;
 			ControlFile->minRecoveryPointTLI = ThisTimeLineID;
-- 
2.18.0

promote-panic-10.patchtext/x-diff; charset=iso-8859-1Download
From dbeb80fcfdfac7fe250f464c079cfc5cc9bbce3f Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 27 Jun 2018 14:56:14 +0900
Subject: [PATCH] Prevent references to invalid relation pages at
 post-promotion
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position past the first end-of-recovery checkpoint while all
the WAL available should be replayed.  This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed. When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed.  Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.

The base fix idea comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me.  The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to be faster and more reliable.

Backpatch down to all supported versions, aka 9.3.

Reported-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
---
 src/backend/access/transam/xlog.c          | 101 ++++++++++++++-------
 src/test/recovery/t/013_promotion_pages.pl |  87 ++++++++++++++++++
 2 files changed, 157 insertions(+), 31 deletions(-)
 create mode 100644 src/test/recovery/t/013_promotion_pages.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d3bfe41485..935bdb0a0e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -817,8 +817,14 @@ static XLogSource XLogReceiptSource = 0;	/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint; /* local copy of
-									 * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -2685,20 +2691,26 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
 		return;
 
+	/*
+	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
+	 * i.e., we're doing crash recovery.  We never modify the control file's
+	 * value in that case, so we can short-circuit future checks here too. The
+	 * local values of minRecoveryPoint and minRecoveryPointTLI should not be
+	 * updated until crash recovery finishes.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint))
+	{
+		updateMinRecoveryPoint = false;
+		return;
+	}
+
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
 	/* update local copy */
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		XLogRecPtr	newMinRecoveryPoint;
 		TimeLineID	newMinRecoveryPointTLI;
@@ -3083,7 +3095,16 @@ XLogNeedsFlush(XLogRecPtr record)
 	 */
 	if (RecoveryInProgress())
 	{
-		/* Quick exit if already known updated */
+		/*
+		 * An invalid minRecoveryPoint means that we need to recover all the
+		 * WAL, i.e., we're doing crash recovery.  We never modify the control
+		 * file's value in that case, so we can short-circuit future checks
+		 * here too.
+		 */
+		if (XLogRecPtrIsInvalid(minRecoveryPoint))
+			updateMinRecoveryPoint = false;
+
+		/* Quick exit if already known to be updated or cannot be updated */
 		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
 			return false;
 
@@ -3097,20 +3118,8 @@ XLogNeedsFlush(XLogRecPtr record)
 		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		LWLockRelease(ControlFileLock);
 
-		/*
-		 * An invalid minRecoveryPoint means that we need to recover all the
-		 * WAL, i.e., we're doing crash recovery.  We never modify the control
-		 * file's value in that case, so we can short-circuit future checks
-		 * here too.
-		 */
-		if (minRecoveryPoint == 0)
-			updateMinRecoveryPoint = false;
-
 		/* check again */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-			return false;
-		else
-			return true;
+		return record > minRecoveryPoint;
 	}
 
 	/* Quick exit if already known flushed */
@@ -4258,6 +4267,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				minRecoveryPoint = ControlFile->minRecoveryPoint;
 				minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
+				/*
+				 * The startup process can update its local copy of
+				 * minRecoveryPoint from this point.
+				 */
+				updateMinRecoveryPoint = true;
+
 				UpdateControlFile();
 				LWLockRelease(ControlFileLock);
 
@@ -6837,9 +6852,26 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		/*
+		 * Initialize our local copy of minRecoveryPoint.  When doing crash
+		 * recovery we want to replay up to the end of WAL.  Particularly, in
+		 * the case of a promoted standby minRecoveryPoint value in the
+		 * control file is only updated after the first checkpoint.  However,
+		 * if the instance crashes before the first post-recovery checkpoint
+		 * is completed then recovery will use a stale location causing the
+		 * startup process to think that there are still invalid page
+		 * references when checking for data consistency.
+		 */
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		else
+		{
+			minRecoveryPoint = InvalidXLogRecPtr;
+			minRecoveryPointTLI = 0;
+		}
 
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
@@ -7806,6 +7838,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
@@ -9913,11 +9947,16 @@ xlog_redo(XLogReaderState *record)
 		 * Update minRecoveryPoint to ensure that if recovery is aborted, we
 		 * recover back up to this point before allowing hot standby again.
 		 * This is important if the max_* settings are decreased, to ensure
-		 * you don't run queries against the WAL preceding the change.
+		 * you don't run queries against the WAL preceding the change. The
+		 * local copies cannot be updated as long as crash recovery is
+		 * happening and we expect all the WAL to be replayed.
 		 */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-		if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		if (minRecoveryPoint != InvalidXLogRecPtr && minRecoveryPoint < lsn)
 		{
 			ControlFile->minRecoveryPoint = lsn;
 			ControlFile->minRecoveryPointTLI = ThisTimeLineID;
diff --git a/src/test/recovery/t/013_promotion_pages.pl b/src/test/recovery/t/013_promotion_pages.pl
new file mode 100644
index 0000000000..48f941b963
--- /dev/null
+++ b/src/test/recovery/t/013_promotion_pages.pl
@@ -0,0 +1,87 @@
+# Test for promotion handling with WAL records generated post-promotion
+# before the first checkpoint is generated.  This test case checks for
+# invalid page references at replay based on the minimum consistent
+# recovery point defined.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize primary node
+my $alpha = get_new_node('alpha');
+$alpha->init(allows_streaming => 1);
+# Setting wal_log_hints to off is important to get invalid page
+# references.
+$alpha->append_conf("postgresql.conf", <<EOF);
+wal_log_hints = off
+EOF
+
+# Start the primary
+$alpha->start;
+
+# setup/start a standby
+$alpha->backup('bkp');
+my $bravo = get_new_node('bravo');
+$bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
+$bravo->append_conf('postgresql.conf', <<EOF);
+checkpoint_timeout=1h
+checkpoint_completion_target=0.9
+EOF
+$bravo->start;
+
+# Dummy table for the upcoming tests.
+$alpha->safe_psql('postgres', 'create table test1 (a int)');
+$alpha->safe_psql('postgres', 'insert into test1 select generate_series(1, 10000)');
+
+# take a checkpoint
+$alpha->safe_psql('postgres', 'checkpoint');
+
+# The following vacuum will set visibility map bits and create
+# problematic WAL records.
+$alpha->safe_psql('postgres', 'vacuum verbose test1');
+# Wait for last record to have been replayed on the standby.
+$alpha->wait_for_catchup($bravo, 'replay',
+						 $alpha->lsn('insert'));
+
+# Now force a checkpoint on the standby. This seems unnecessary but for "some"
+# reason, the previous checkpoint on the primary does not reflect on the standby
+# and without an explicit checkpoint, it may start redo recovery from a much
+# older point, which includes even create table and initial page additions.
+$bravo->safe_psql('postgres', 'checkpoint');
+
+# Now just use a dummy table and run some operations to move minRecoveryPoint
+# beyond the previous vacuum.
+$alpha->safe_psql('postgres', 'create table test2 (a int, b text)');
+$alpha->safe_psql('postgres', 'insert into test2 select generate_series(1,10000), md5(random()::text)');
+$alpha->safe_psql('postgres', 'truncate test2');
+
+# Wait again for all records to be replayed.
+$alpha->wait_for_catchup($bravo, 'replay',
+						 $alpha->lsn('insert'));
+
+# Do the promotion, which reinitializes minRecoveryPoint in the control
+# file so as WAL is replayed up to the end.
+$bravo->promote;
+
+# Truncate the table on the promoted standby, vacuum and extend it
+# again to create new page references.  The first post-recovery checkpoint
+# has not happened yet.
+$bravo->safe_psql('postgres', 'truncate test1');
+$bravo->safe_psql('postgres', 'vacuum verbose test1');
+$bravo->safe_psql('postgres', 'insert into test1 select generate_series(1,1000)');
+
+# Now crash-stop the promoted standby and restart.  This makes sure that
+# replay does not see invalid page references because of an invalid
+# minimum consistent recovery point.
+$bravo->stop('immediate');
+$bravo->start;
+
+# Check state of the table after full crash recovery.  All its data should
+# be here.
+my $psql_out;
+$bravo->psql(
+	'postgres',
+	"SELECT count(*) FROM test1",
+	stdout => \$psql_out);
+is($psql_out, '1000', "Check that table state is correct");
-- 
2.18.0

promote-panic-master.patchtext/x-diff; charset=iso-8859-1Download
From 6e8062e5622b2357eb3c969f5f52e5b037b1ec3a Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 27 Jun 2018 14:32:13 +0900
Subject: [PATCH] Prevent references to invalid relation pages at
 post-promotion
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position past the first end-of-recovery checkpoint while all
the WAL available should be replayed.  This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed. When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed.  Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.

The base fix idea comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me.  The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to be faster and more reliable.

Backpatch down to all supported versions, aka 9.3.

Reported-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
---
 src/backend/access/transam/xlog.c          | 101 ++++++++++++++-------
 src/test/recovery/t/015_promotion_pages.pl |  87 ++++++++++++++++++
 2 files changed, 157 insertions(+), 31 deletions(-)
 create mode 100644 src/test/recovery/t/015_promotion_pages.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1a419aa49b..8752a6e1ae 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -821,8 +821,14 @@ static XLogSource XLogReceiptSource = 0;	/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint; /* local copy of
-									 * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -2711,20 +2717,26 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
 		return;
 
+	/*
+	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
+	 * i.e., we're doing crash recovery.  We never modify the control file's
+	 * value in that case, so we can short-circuit future checks here too. The
+	 * local values of minRecoveryPoint and minRecoveryPointTLI should not be
+	 * updated until crash recovery finishes.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint))
+	{
+		updateMinRecoveryPoint = false;
+		return;
+	}
+
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
 	/* update local copy */
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		XLogRecPtr	newMinRecoveryPoint;
 		TimeLineID	newMinRecoveryPointTLI;
@@ -3110,7 +3122,16 @@ XLogNeedsFlush(XLogRecPtr record)
 	 */
 	if (RecoveryInProgress())
 	{
-		/* Quick exit if already known updated */
+		/*
+		 * An invalid minRecoveryPoint means that we need to recover all the
+		 * WAL, i.e., we're doing crash recovery.  We never modify the control
+		 * file's value in that case, so we can short-circuit future checks
+		 * here too.
+		 */
+		if (XLogRecPtrIsInvalid(minRecoveryPoint))
+			updateMinRecoveryPoint = false;
+
+		/* Quick exit if already known to be updated or cannot be updated */
 		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
 			return false;
 
@@ -3124,20 +3145,8 @@ XLogNeedsFlush(XLogRecPtr record)
 		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		LWLockRelease(ControlFileLock);
 
-		/*
-		 * An invalid minRecoveryPoint means that we need to recover all the
-		 * WAL, i.e., we're doing crash recovery.  We never modify the control
-		 * file's value in that case, so we can short-circuit future checks
-		 * here too.
-		 */
-		if (minRecoveryPoint == 0)
-			updateMinRecoveryPoint = false;
-
 		/* check again */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-			return false;
-		else
-			return true;
+		return record > minRecoveryPoint;
 	}
 
 	/* Quick exit if already known flushed */
@@ -4269,6 +4278,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				minRecoveryPoint = ControlFile->minRecoveryPoint;
 				minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
+				/*
+				 * The startup process can update its local copy of
+				 * minRecoveryPoint from this point.
+				 */
+				updateMinRecoveryPoint = true;
+
 				UpdateControlFile();
 				LWLockRelease(ControlFileLock);
 
@@ -6892,9 +6907,26 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		/*
+		 * Initialize our local copy of minRecoveryPoint.  When doing crash
+		 * recovery we want to replay up to the end of WAL.  Particularly, in
+		 * the case of a promoted standby minRecoveryPoint value in the
+		 * control file is only updated after the first checkpoint.  However,
+		 * if the instance crashes before the first post-recovery checkpoint
+		 * is completed then recovery will use a stale location causing the
+		 * startup process to think that there are still invalid page
+		 * references when checking for data consistency.
+		 */
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		else
+		{
+			minRecoveryPoint = InvalidXLogRecPtr;
+			minRecoveryPointTLI = 0;
+		}
 
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
@@ -7861,6 +7893,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
@@ -9949,11 +9983,16 @@ xlog_redo(XLogReaderState *record)
 		 * Update minRecoveryPoint to ensure that if recovery is aborted, we
 		 * recover back up to this point before allowing hot standby again.
 		 * This is important if the max_* settings are decreased, to ensure
-		 * you don't run queries against the WAL preceding the change.
+		 * you don't run queries against the WAL preceding the change. The
+		 * local copies cannot be updated as long as crash recovery is
+		 * happening and we expect all the WAL to be replayed.
 		 */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-		if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		if (minRecoveryPoint != InvalidXLogRecPtr && minRecoveryPoint < lsn)
 		{
 			ControlFile->minRecoveryPoint = lsn;
 			ControlFile->minRecoveryPointTLI = ThisTimeLineID;
diff --git a/src/test/recovery/t/015_promotion_pages.pl b/src/test/recovery/t/015_promotion_pages.pl
new file mode 100644
index 0000000000..48f941b963
--- /dev/null
+++ b/src/test/recovery/t/015_promotion_pages.pl
@@ -0,0 +1,87 @@
+# Test for promotion handling with WAL records generated post-promotion
+# before the first checkpoint is generated.  This test case checks for
+# invalid page references at replay based on the minimum consistent
+# recovery point defined.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize primary node
+my $alpha = get_new_node('alpha');
+$alpha->init(allows_streaming => 1);
+# Setting wal_log_hints to off is important to get invalid page
+# references.
+$alpha->append_conf("postgresql.conf", <<EOF);
+wal_log_hints = off
+EOF
+
+# Start the primary
+$alpha->start;
+
+# setup/start a standby
+$alpha->backup('bkp');
+my $bravo = get_new_node('bravo');
+$bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
+$bravo->append_conf('postgresql.conf', <<EOF);
+checkpoint_timeout=1h
+checkpoint_completion_target=0.9
+EOF
+$bravo->start;
+
+# Dummy table for the upcoming tests.
+$alpha->safe_psql('postgres', 'create table test1 (a int)');
+$alpha->safe_psql('postgres', 'insert into test1 select generate_series(1, 10000)');
+
+# take a checkpoint
+$alpha->safe_psql('postgres', 'checkpoint');
+
+# The following vacuum will set visibility map bits and create
+# problematic WAL records.
+$alpha->safe_psql('postgres', 'vacuum verbose test1');
+# Wait for last record to have been replayed on the standby.
+$alpha->wait_for_catchup($bravo, 'replay',
+						 $alpha->lsn('insert'));
+
+# Now force a checkpoint on the standby. This seems unnecessary but for "some"
+# reason, the previous checkpoint on the primary does not reflect on the standby
+# and without an explicit checkpoint, it may start redo recovery from a much
+# older point, which includes even create table and initial page additions.
+$bravo->safe_psql('postgres', 'checkpoint');
+
+# Now just use a dummy table and run some operations to move minRecoveryPoint
+# beyond the previous vacuum.
+$alpha->safe_psql('postgres', 'create table test2 (a int, b text)');
+$alpha->safe_psql('postgres', 'insert into test2 select generate_series(1,10000), md5(random()::text)');
+$alpha->safe_psql('postgres', 'truncate test2');
+
+# Wait again for all records to be replayed.
+$alpha->wait_for_catchup($bravo, 'replay',
+						 $alpha->lsn('insert'));
+
+# Do the promotion, which reinitializes minRecoveryPoint in the control
+# file so as WAL is replayed up to the end.
+$bravo->promote;
+
+# Truncate the table on the promoted standby, vacuum and extend it
+# again to create new page references.  The first post-recovery checkpoint
+# has not happened yet.
+$bravo->safe_psql('postgres', 'truncate test1');
+$bravo->safe_psql('postgres', 'vacuum verbose test1');
+$bravo->safe_psql('postgres', 'insert into test1 select generate_series(1,1000)');
+
+# Now crash-stop the promoted standby and restart.  This makes sure that
+# replay does not see invalid page references because of an invalid
+# minimum consistent recovery point.
+$bravo->stop('immediate');
+$bravo->start;
+
+# Check state of the table after full crash recovery.  All its data should
+# be here.
+my $psql_out;
+$bravo->psql(
+	'postgres',
+	"SELECT count(*) FROM test1",
+	stdout => \$psql_out);
+is($psql_out, '1000', "Check that table state is correct");
-- 
2.18.0

promote-panic-95.patchtext/x-diff; charset=iso-8859-1Download
From 371094469cb22865f91e326254e1056f3557855b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 27 Jun 2018 14:56:14 +0900
Subject: [PATCH] Prevent references to invalid relation pages at
 post-promotion
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position past the first end-of-recovery checkpoint while all
the WAL available should be replayed.  This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed. When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed.  Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.

The base fix idea comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me.  The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to be faster and more reliable.

Backpatch down to all supported versions, aka 9.3.

Reported-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
---
 src/backend/access/transam/xlog.c | 101 +++++++++++++++++++++---------
 1 file changed, 70 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 76dd767ece..7061269871 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -788,8 +788,14 @@ static XLogSource XLogReceiptSource = 0;		/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint;		/* local copy of
-										 * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -2526,20 +2532,26 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
 		return;
 
+	/*
+	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
+	 * i.e., we're doing crash recovery.  We never modify the control file's
+	 * value in that case, so we can short-circuit future checks here too. The
+	 * local values of minRecoveryPoint and minRecoveryPointTLI should not be
+	 * updated until crash recovery finishes.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint))
+	{
+		updateMinRecoveryPoint = false;
+		return;
+	}
+
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
 	/* update local copy */
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		XLogRecPtr	newMinRecoveryPoint;
 		TimeLineID	newMinRecoveryPointTLI;
@@ -2877,7 +2889,16 @@ XLogNeedsFlush(XLogRecPtr record)
 	 */
 	if (RecoveryInProgress())
 	{
-		/* Quick exit if already known updated */
+		/*
+		 * An invalid minRecoveryPoint means that we need to recover all the
+		 * WAL, i.e., we're doing crash recovery.  We never modify the control
+		 * file's value in that case, so we can short-circuit future checks
+		 * here too.
+		 */
+		if (XLogRecPtrIsInvalid(minRecoveryPoint))
+			updateMinRecoveryPoint = false;
+
+		/* Quick exit if already known to be updated or cannot be updated */
 		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
 			return false;
 
@@ -2891,20 +2912,8 @@ XLogNeedsFlush(XLogRecPtr record)
 		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		LWLockRelease(ControlFileLock);
 
-		/*
-		 * An invalid minRecoveryPoint means that we need to recover all the
-		 * WAL, i.e., we're doing crash recovery.  We never modify the control
-		 * file's value in that case, so we can short-circuit future checks
-		 * here too.
-		 */
-		if (minRecoveryPoint == 0)
-			updateMinRecoveryPoint = false;
-
 		/* check again */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-			return false;
-		else
-			return true;
+		return record > minRecoveryPoint;
 	}
 
 	/* Quick exit if already known flushed */
@@ -4046,6 +4055,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				minRecoveryPoint = ControlFile->minRecoveryPoint;
 				minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
+				/*
+				 * The startup process can update its local copy of
+				 * minRecoveryPoint from this point.
+				 */
+				updateMinRecoveryPoint = true;
+
 				UpdateControlFile();
 				LWLockRelease(ControlFileLock);
 
@@ -6530,9 +6545,26 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		/*
+		 * Initialize our local copy of minRecoveryPoint.  When doing crash
+		 * recovery we want to replay up to the end of WAL.  Particularly, in
+		 * the case of a promoted standby minRecoveryPoint value in the
+		 * control file is only updated after the first checkpoint.  However,
+		 * if the instance crashes before the first post-recovery checkpoint
+		 * is completed then recovery will use a stale location causing the
+		 * startup process to think that there are still invalid page
+		 * references when checking for data consistency.
+		 */
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		else
+		{
+			minRecoveryPoint = InvalidXLogRecPtr;
+			minRecoveryPointTLI = 0;
+		}
 
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
@@ -7461,6 +7493,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
@@ -9531,11 +9565,16 @@ xlog_redo(XLogReaderState *record)
 		 * This is particularly important if wal_level was set to 'archive'
 		 * before, and is now 'hot_standby', to ensure you don't run queries
 		 * against the WAL preceding the wal_level change. Same applies to
-		 * decreasing max_* settings.
+		 * decreasing max_* settings.  The local copies cannot be updated as
+		 * long as crash recovery is happening and we expect all the WAL to
+		 * be replayed.
 		 */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-		if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		if (minRecoveryPoint != InvalidXLogRecPtr && minRecoveryPoint < lsn)
 		{
 			ControlFile->minRecoveryPoint = lsn;
 			ControlFile->minRecoveryPointTLI = ThisTimeLineID;
-- 
2.18.0

promote-panic-94.patchtext/x-diff; charset=iso-8859-1Download
From 78823745b0c8a7b6e0e083f05624ea13aa30ac5b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 27 Jun 2018 14:56:14 +0900
Subject: [PATCH] Prevent references to invalid relation pages at
 post-promotion
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position past the first end-of-recovery checkpoint while all
the WAL available should be replayed.  This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed. When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed.  Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.

The base fix idea comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me.  The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to be faster and more reliable.

Backpatch down to all supported versions, aka 9.3.

Reported-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
---
 src/backend/access/transam/xlog.c | 101 +++++++++++++++++++++---------
 1 file changed, 70 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c50c608a1f..6499f6bebe 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -749,8 +749,14 @@ static XLogSource XLogReceiptSource = 0;		/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint;		/* local copy of
-										 * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -2706,20 +2712,26 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
 		return;
 
+	/*
+	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
+	 * i.e., we're doing crash recovery.  We never modify the control file's
+	 * value in that case, so we can short-circuit future checks here too. The
+	 * local values of minRecoveryPoint and minRecoveryPointTLI should not be
+	 * updated until crash recovery finishes.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint))
+	{
+		updateMinRecoveryPoint = false;
+		return;
+	}
+
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
 	/* update local copy */
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		/* use volatile pointer to prevent code rearrangement */
 		volatile XLogCtlData *xlogctl = XLogCtl;
@@ -3069,7 +3081,16 @@ XLogNeedsFlush(XLogRecPtr record)
 	 */
 	if (RecoveryInProgress())
 	{
-		/* Quick exit if already known updated */
+		/*
+		 * An invalid minRecoveryPoint means that we need to recover all the
+		 * WAL, i.e., we're doing crash recovery.  We never modify the control
+		 * file's value in that case, so we can short-circuit future checks
+		 * here too.
+		 */
+		if (XLogRecPtrIsInvalid(minRecoveryPoint))
+			updateMinRecoveryPoint = false;
+
+		/* Quick exit if already known to be updated or cannot be updated */
 		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
 			return false;
 
@@ -3083,20 +3104,8 @@ XLogNeedsFlush(XLogRecPtr record)
 		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		LWLockRelease(ControlFileLock);
 
-		/*
-		 * An invalid minRecoveryPoint means that we need to recover all the
-		 * WAL, i.e., we're doing crash recovery.  We never modify the control
-		 * file's value in that case, so we can short-circuit future checks
-		 * here too.
-		 */
-		if (minRecoveryPoint == 0)
-			updateMinRecoveryPoint = false;
-
 		/* check again */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-			return false;
-		else
-			return true;
+		return record > minRecoveryPoint;
 	}
 
 	/* Quick exit if already known flushed */
@@ -4259,6 +4268,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				minRecoveryPoint = ControlFile->minRecoveryPoint;
 				minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
+				/*
+				 * The startup process can update its local copy of
+				 * minRecoveryPoint from this point.
+				 */
+				updateMinRecoveryPoint = true;
+
 				UpdateControlFile();
 				LWLockRelease(ControlFileLock);
 
@@ -6639,9 +6654,26 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		/*
+		 * Initialize our local copy of minRecoveryPoint.  When doing crash
+		 * recovery we want to replay up to the end of WAL.  Particularly, in
+		 * the case of a promoted standby minRecoveryPoint value in the
+		 * control file is only updated after the first checkpoint.  However,
+		 * if the instance crashes before the first post-recovery checkpoint
+		 * is completed then recovery will use a stale location causing the
+		 * startup process to think that there are still invalid page
+		 * references when checking for data consistency.
+		 */
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		else
+		{
+			minRecoveryPoint = InvalidXLogRecPtr;
+			minRecoveryPointTLI = 0;
+		}
 
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
@@ -7463,6 +7495,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
@@ -9640,11 +9674,16 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
 		 * This is particularly important if wal_level was set to 'archive'
 		 * before, and is now 'hot_standby', to ensure you don't run queries
 		 * against the WAL preceding the wal_level change. Same applies to
-		 * decreasing max_* settings.
+		 * decreasing max_* settings.  The local copies cannot be updated as
+		 * long as crash recovery is happening and we expect all the WAL to
+		 * be replayed.
 		 */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-		if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		if (minRecoveryPoint != InvalidXLogRecPtr && minRecoveryPoint < lsn)
 		{
 			ControlFile->minRecoveryPoint = lsn;
 			ControlFile->minRecoveryPointTLI = ThisTimeLineID;
-- 
2.18.0

promote-panic-93.patchtext/x-diff; charset=iso-8859-1Download
From f2e4362d3a7da28008e1cb66144856376ca31388 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 27 Jun 2018 14:56:14 +0900
Subject: [PATCH] Prevent references to invalid relation pages at
 post-promotion
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position past the first end-of-recovery checkpoint while all
the WAL available should be replayed.  This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed. When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed.  Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.

The base fix idea comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me.  The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to be faster and more reliable.

Backpatch down to all supported versions, aka 9.3.

Reported-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
---
 src/backend/access/transam/xlog.c | 101 +++++++++++++++++++++---------
 1 file changed, 70 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f788d70e5a..03de8451e2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -642,8 +642,14 @@ static XLogSource XLogReceiptSource = 0;		/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint;		/* local copy of
-										 * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -1849,20 +1855,26 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
 		return;
 
+	/*
+	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
+	 * i.e., we're doing crash recovery.  We never modify the control file's
+	 * value in that case, so we can short-circuit future checks here too. The
+	 * local values of minRecoveryPoint and minRecoveryPointTLI should not be
+	 * updated until crash recovery finishes.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint))
+	{
+		updateMinRecoveryPoint = false;
+		return;
+	}
+
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
 	/* update local copy */
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		/* use volatile pointer to prevent code rearrangement */
 		volatile XLogCtlData *xlogctl = XLogCtl;
@@ -2202,7 +2214,16 @@ XLogNeedsFlush(XLogRecPtr record)
 	 */
 	if (RecoveryInProgress())
 	{
-		/* Quick exit if already known updated */
+		/*
+		 * An invalid minRecoveryPoint means that we need to recover all the
+		 * WAL, i.e., we're doing crash recovery.  We never modify the control
+		 * file's value in that case, so we can short-circuit future checks
+		 * here too.
+		 */
+		if (XLogRecPtrIsInvalid(minRecoveryPoint))
+			updateMinRecoveryPoint = false;
+
+		/* Quick exit if already known to be updated or cannot be updated */
 		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
 			return false;
 
@@ -2216,20 +2237,8 @@ XLogNeedsFlush(XLogRecPtr record)
 		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		LWLockRelease(ControlFileLock);
 
-		/*
-		 * An invalid minRecoveryPoint means that we need to recover all the
-		 * WAL, i.e., we're doing crash recovery.  We never modify the control
-		 * file's value in that case, so we can short-circuit future checks
-		 * here too.
-		 */
-		if (minRecoveryPoint == 0)
-			updateMinRecoveryPoint = false;
-
 		/* check again */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-			return false;
-		else
-			return true;
+		return record > minRecoveryPoint;
 	}
 
 	/* Quick exit if already known flushed */
@@ -3371,6 +3380,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				minRecoveryPoint = ControlFile->minRecoveryPoint;
 				minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
+				/*
+				 * The startup process can update its local copy of
+				 * minRecoveryPoint from this point.
+				 */
+				updateMinRecoveryPoint = true;
+
 				UpdateControlFile();
 				LWLockRelease(ControlFileLock);
 
@@ -5492,9 +5507,26 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		/*
+		 * Initialize our local copy of minRecoveryPoint.  When doing crash
+		 * recovery we want to replay up to the end of WAL.  Particularly, in
+		 * the case of a promoted standby minRecoveryPoint value in the
+		 * control file is only updated after the first checkpoint.  However,
+		 * if the instance crashes before the first post-recovery checkpoint
+		 * is completed then recovery will use a stale location causing the
+		 * startup process to think that there are still invalid page
+		 * references when checking for data consistency.
+		 */
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		else
+		{
+			minRecoveryPoint = InvalidXLogRecPtr;
+			minRecoveryPointTLI = 0;
+		}
 
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
@@ -6315,6 +6347,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
@@ -8448,11 +8482,16 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
 		 * This is particularly important if wal_level was set to 'archive'
 		 * before, and is now 'hot_standby', to ensure you don't run queries
 		 * against the WAL preceding the wal_level change. Same applies to
-		 * decreasing max_* settings.
+		 * decreasing max_* settings.  The local copies cannot be updated as
+		 * long as crash recovery is happening and we expect all the WAL to
+		 * be replayed.
 		 */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-		if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		if (minRecoveryPoint != InvalidXLogRecPtr && minRecoveryPoint < lsn)
 		{
 			ControlFile->minRecoveryPoint = lsn;
 			ControlFile->minRecoveryPointTLI = ThisTimeLineID;
-- 
2.18.0

#16Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#15)
5 attachment(s)
Re: PANIC during crash recovery of a recently promoted standby

Adding Heikki and Andres in CC here for awareness..

On Wed, Jun 27, 2018 at 05:29:38PM +0900, Michael Paquier wrote:

I have spent a bit of time testing this on HEAD, 10 and 9.6. For 9.5,
9.4 and 9.3 I have reproduced the failure and tested the patch, but I
lacked time to perform more tests. The patch set for 9.3~9.5 applies
without conflict across the 3 branches. 9.6 has a conflict in a
comment, and v10 had an extra comment conflict.

Feel free to have a look, I am not completely done with this stuff and
I'll work more tomorrow on checking 9.3~9.5.

And I have been able to spend the time I wanted to spend on this patch
series with testing for 9.3 to 9.5. Attached are a couple of patches
you can use to reproduce the failures for all the branches:
- For master and 10, the tests are included in the patch and are
proposed for commit.
- On 9.6, I had to tweak the TAP scripts as pg_ctl start has switched to
use the wait mode by default.
- On 9.5, there is a tweak to src/Makefile.global.in which cleans up
tmp_check, and a couple of GUCs not compatible.
- On 9.4, I had to tweak src/Makefile.global.in so as the temporary
installation path is correct. Again some GUCs had to be tweaked.
- On 9.3, there is no TAP infrastructure, so I tweaked
src/test/recovery/Makefile to be able to run the tests.

I have also created a bash script which emulates what the TAP test does,
which is attached. Because of visibly some timing reasons, I have not
been able to reproduce the problem with it. Anyway, running (and
actually sort of back-porting) the TAP suite so as the problematic test
case can be run is possible with the sets attached and shows the failure
so we can use that.

Thoughts? I would love more input about the patch concept.
--
Michael

Attachments:

promote_panic.bashtext/plain; charset=us-asciiDownload
promote-panic-test-93.tar.gzapplication/gzipDownload
promote-panic-test-94.tar.gzapplication/gzipDownload
�:;4[�[{s�F���������"%K�hK�,)�jmI'����,vI�A��u���_w���W��=%e��������{�Q.�Dt#xN72�<�E<q��}��>��=����}�w��������n��3����n����>�����8�������}��~���q_����z�)�vg^������w���z����p������������������^o��w����������Z��v�����W�������6��c�$`�i�ySq������v�#�W�:�B�j��`�����������p>����u�)k4O�_��_���EdS���jm�o��bD��h��;4�W���Z�9.p{w��������FcxhLaW��_���D
���������������h���l��0>	#{�z��
B[��������go��F�
��dP�[���B��hf�?-`����g-���M�?�=~3je]��d��^���������k��r\0������	����.��1[��~�]$���������S�e�iz���h:�
w];	mp��N�����Mb�S�}��Vg#Y�d[_�w�����K.�V�D��^I;���|����doO�z?���`�=x���W(\��Nt���}��^�����?����1+�����*�/$;	������tZlpp�����:0B& ���eo���Sq+�0Z� ao�0�j�/��q��&B�;�Z�`�d��%s��������c�	�=�g�q��M�t@V�Y�)h���OCf�"v���F��b4>~��>=�
���Bo�I/Z�k�v�j4��'I`h:��6�y��c!C�l���^'��
��cHW'l&��Y^���������Av  t������Te��Y�������
����Y�����J-w�l~
�cQ�����z�����=�Q^g��[A��+�Lu
4�~���`@�}t8�Z�0
]�=����wg���������,4�1{��t
R�`���G�F�_\^��G��X*3	�P��{�p��Ro�h&$ccg�Z��o��g=e'���`��<��X8I��;/�3^�Np ��aC��<�� !���cf���9f��	I�.����x���NL�V��t�����0k&6��������6�����e�4�5oR��.j�{�P�`L8��Y���g'c6��_4�uo�KX������������/��0��0M��l�|*l�)����K�]�1�Y�-��0�GI��4q��)�X����	�*�����O`Qj-�B���B@cs�W��"N��SX��%�8�5���*Z1�@����#hJO�\��X���b�@�]�y�b:7�P��������{������=��bj��
���Zd������52���0��|!q�V�+A���BG ��;��T���&���'p������XV!�j9���zr(��[x>d�������Qp+���N�X���K!s+�)��n�C�?_���gh�
���R.D������%���KkIL[��5�����m"d����7��A�n�h��K��
^�e��@%Y/��3x����O�����4&���xK�SI�TW�B���6w;���:,q@@��;�54��������7����#��
F/��������J�hDm��IB�d>#8t��A�A��.E�{o��gX�t{�=�G��m��0�*U0�4�\����Ji4�)����t�!N������K��4�U�@qp��z��
��]��F8"�km�T��:�#6;tX�������=���M�&7���2�����6D i�<�38sC��E�<`�����/������L/=$�|�����tJ�<G����P�.�(	��A����;��i8*��h��\`�Rp�����!�%C��Q�N
&��e��������c��n37)��:�%�
�6��U��$���p�c�G����8�}'��Y�����}� [�d���,���$>��k��V���kp#_�R�$�8>^$��LE�)5�$���C�b�f���H,�f�6
��"@qhF@�����p��~H)Hf�#�CT���^�)��,+Lc���7,!��C/���j�b`���k�!P���a4��h�?��O p0�7��a����������O�����;��P�������F��2��0�Vo��������(�l(
�J<�&n�����?c:fL����-�Ew.���/ �M>
���.��D�@������U�,����0N�`\3�d
�C��J������" B��#5?�.����`���`{��������������`8�S��C�~bURC���F�m9�?i�]���������C0�}}<:{@������8kF�	[F`���$�5��;��E��i
����a����?�x�&�	?������[���v����$X����\�RBtRh�?<�'������0��D��_�.G3d�B��PW:��8q�*)BCqB�������S��
@��PL�K�d�PK��E�a;�T�
�($���=��$�����	8�8���((B�����D��\1d��3e0
Rn�	�+��L����q�]h��o�J�������?���������J�Oa�����t�([�C���y;qB��h3��[(J�����1�������L��R�����B+9��4��\���������H�����,5w�>g�?��!�_����������C�����\%��P_"��>d�����U���0�`*�rW,,Y�G�13��7B�������Dzl���"'@�$ #��	�����(��E���E�l��w�bi�} ��Pb�~Gz
����/`?��mf�����x���B�lI�W���d
�6�������Q��Vs3��&�t i�9R,(��;������7�i
��n����7��7oV+��/B�B=UT�X-Ia%��^B.�%��9�cg�� ��p�\������Lf#�C�+�9�����(	��Z��0�������vH��+
�*E5����3��U�N�{4�z�fs���g�7�n������,�M|O����9m!���(���pk����Ty���)���j5�\U��0t�b/�1r��G#�>��t_���M�����*��	E4-�:R:����CM�����Pg��]�be��N-u7��DL1uKID�0��i�F�7i�`��
HB��#.J����'�R�;~O�KqNQ����?�3{��ca�E=����S�-M��o���t��T�D�*�zFi:��L���z�&�uJ����0&m����f6a5�%���|�����X�jZi�����zl5-���>wyzV��Q6�Y���z)eD�0����k(�����IA��=�p�1%�A$�B��)e��u��$�>�p)u���m���>��:H��K���@v���[^f�j�
Z�����
��-MTn������d�&Q�";y4����R.\���_F���G����%p�3��ko��<]��>�%�p=>@�<���x��f��b;���.��a��q��q������<�w�udu�=B�,�Q�y������^�-9������%�*M��c������zjq��'W�Qi������T#pi��������4#����"<���3��Z��h�����\�e3�*���bXs�Y�x�
�y�rI�B$��"dV��S�Q���h���Mw\�]`q�)sC'�")q��1s���}L�����X�����O����B	��T�-���rn4�Uc��Q��`����d���C���W-L(����6LFLy�����f
�������5g[�$M5�6�����"��4k������5{K+WT�rV,�����E0S�T��v����~��SW�<��8���S)��C���������^���z*�w
��Kb�Fy�+K��S�z�d|�Z$�G�
R�!��8ic��r��'�������+��e��>u0�F���p�YZ�r�����N|}j%�
PpbD-�idd�E<A�d
X'��]������L�m�����M���
[���:T����ABP�5�P�zP���(�A��
�-�H��8��.h�M��}���'M ������/��>��_Y���h&
��A3�
�2:�����R}jf�*��;'�1�����J�5�h1ukH5]�=d&x�f���TI�����A�5h:	��c���	��4��Le�%�����N�Qg+�@^���a��]~�o�����o�O�~]��f��o�R�����>��*�:�`k�0(���A���R*���Y~, �Re��jb�_�������u�*t�+����������B(�^�S�'�>�����@��u����Q��2m�X�U�?8h9���Bd�*
��R�J�)c#5^��!��tU�~ad�)���������e|�`c�0`�g����O���g��I�?�m0LZ5,+k�o����u������I�����5����m��z������K����c������d�$��bLEw��K�L�#�0Y?�.�������
�W�s��]{�Uq0x�z���������
��e�yi�)98*oQ�3������N��]1�,���cx�Z�����A
o�/�i-�3����/B7+���"���=��!���7��R��nT�^/���C�Z%X�s����A��_�)	�s�B�c���s=u
�d=ft������KUc���Ta��SY����xY��*���)��:�"��z5xk��	���ew�W`f��L��.�oQ�_sebHb��\�����q��H\Q#���N�Q�DZ�j�T��I��t����B7[U���|�
g���2�#�^M�2�(^�F����������3�#���i��,^�4hA��_�|�\fcGG��e������S��V�x������K�L��L��F�������U��:�v�D9��x���L��5g�
(`&��w~��I,
�a �y����{�l+e?��O^V����*`8{	�������B��sWz�H/�S-��^S��Nd?d���x�D������-"X��^��
�dO����*���v��x�UeS���1�rI�_'����6�&�W�}�"�lI�p.�-)��������2X��e����������8��m��J���')SEk��	1�O`�`*��Y��Z���t�zv>�i���A���K�j�S������������aN��'���`��\}��Yq���s\p�x'�qt�G���$~TzG�����	y�%���C~����89,������.�����N�X,�������5���\i��qs��WW�wP4�\�����������w�j�����z��W���KIF���p��Ro"�
p"�J@<��V���U*I�1?�soJ��o�����f�O��Xp*�F�k�=�V����_����/;XK��J���=H/��e���$���Lq�����m�;�V
�\�1�U��!����b����,E�U[�7�x�m���p��Zu������Uxc���e|l�/T]w���fA�s|w7��9��{|��{�
0��e��C���V�P������A���hWb�H-�� ����(���HG�*C�U+�����qe��_�����$$���c�0��I�����T@�%�����������<�JBN����5�C����g�}���Ec�c����K���
R`���	��7|�c��"gV���x?_U���q�t�K��9�E�CK���bt��	�t$n!�����t����mon�V7��kY?S�l��� �~��Lu�	������%	�+�*��l��*
���(�^d���D�NT����E���6Jx��j�s����}�6u��}�CBm�^��+7����%��uKw��JzG5��+0'{[��:�w�b!�w��K	�;�R$��s��+K�?&�=�/e�:(�j��w�8@^T8�CA���L��q���w�����G��h="]F�j.p����7$G��<���p������_k0�9������]-�7KV5G�y�"����E���F�����Q`l������H�����3����n2u������h��x��#I��<���m�p����Av���Thj�����I��*j��5�^����
��(R3��0B%i+�D����)+r��y<`b�k����/w��o�i�Y��]t�!���Po��:�M��_�f!��hI�<���3�Q���Z����F�q�|�%����xi�a>�	Z�hWTBd�&��;X\[����G��H�o���k^������uTqS�{v��{0����H�y8�7�qP�P��yjF����u�n��T��!=�$a\�����]\F������q0n�dr������K7W$���Dqs��S�T�JA�f9#��*�D�)�h�2r-�8��+�����qcE�Lc\J����q&��>��������!U������E��O\��R6u�n�������c^u�xv��G��$f<���yQ���K�K3�1�W�]y�_���Z�P7�����(�~��T��[^Vz�N���]�`������^�+���7�R��2��"����C��1g��)��yx��@	��d(=t^���p0�k4�Hj�
�9M_�����]
I�~):Bqy���b����'��V(�q�UZru�D=��5����i�I�q�#�L�l��t2���I�&U�Og�=��|��6��;�`$5<� �%V9V>��6��*BF��O�A��_��C�:�Q��IDg%K\	��T��j�Z���lg{�F�u]$��"O�;�/x�
���gP��&io��r_�c���%���#�oE(������w���u{�u�Q���XU�|�z���^�h+�����U�W8 y�J����}�������y5z����+yC��
K;�����E�f<���2��"k�t�9_7��MX��fg�;1�	�{��[%�������>�������
j�}���d7��u�gL�N`-�I���w�[��k�����@��#}�2b�
����1E�F�����5��M��X�b�M��<+*s���Z+VUC03��Bl��Lk��Y�Nl5<���j#�M�<��(�>U%��a@u#yK�4��+�]�b����h��A���?�h��mg����j�5��Y��GvBf@�%��M�����H�E��}c�D��@k�hPP���A-j�k��������%(=��pz!����j�=NMnv.@-������Y���@�����a�fd����c�����K�Vy��.b�����hD�j�^�/�*:3�q"38�]�,�1
Z��$c�[L��893����1P���|U�F�Kh�uw�����5�<$��������)��kD��p$)<���������j��e���SA��j}�#��5l����X�+r`���gf����g������N��Z�M|�������5������t��&��X�C'8�!P0����[��BZm1�XF�����v���{��q�eZ
V���o�.�$}8
��4��0Q�^���V����8mV7-)�3Z1;;{m� ��P���4��������t0��%QX������;�!]t'y�#�v���+�N�kPk�_r���(j�HB�����cw��45����}��fC
Fl��h���!P�L�f�H��Li�'����*��4����da.x�����i:����t���e �_;�)�����e'���f����7��R��������������G'�w����G��s���p���TD�^q�����B\�*��}0(o��2}�x�9=���p���So��3�E�j��v���u�q���������G���%�&�
�.��y*)��m�����y���8�#"]����~>����	w�Z��4������U>�4<PY������6q��u�QK�}���R���i��YvU�	G���v�8���)�����[�����Dd����;�LF)�.�$Vd������a|&=-l�&�a��D0�'�p����	*�J�=Q��/�8��E���W	���`�2u�������<:�t ��;m�o�4K0��7"�a=��-/��b&5�p�a{��5�i����S

��"dg���`��r�*�j2�.�N��?�\��5����6�x�o�7��<"?�;A��w�:����e����Z�G�.��� < P��*���
ya�t���!�?:������4�si!o�k�N��NO��wL�w�_�h|\�j�E��2���'c����k��p=q�m��VVh���vDtVO^u�������������m��P��>���}H{�c�K�E��2m� 3F���x����}c;Cd�'n����w�A��|J0i�\�H�C3p���pwC�0��	�������11B��i���05c��]7O���vt��+#)�����A��5�?�-"��iC^���z�
����6�A�\T�O�C���(�� �/��_)~���$�1t�������)��L�����������b���|��NAv�=p~v-$|O������I~S�z�hY�z������Q�!�a�E���yi<qEgA[8�8o_)��X��|4�����L������C������l���|9���	�_�xJ]�K]<L��2�N�7�o�3���'����R)��@���\$Fr��y�s�\��y7ZcUq��KU��7a�N��H9O���}��b�oa��+�G3�1���|��cCMx��v�z��a����_������%;!O&�p�����;ID��c\��YS��{�)%L	{�\�M=����;�I:cBN�Gn��6��=�l2!�����7��K
R})�X��eF]����A�gRm`zY�?&� m���0��'wWA[~���
&�=#2"���!C��$3I��8�q�><B*d���r^��|B)%*��>���4!A�n��2�xq�:r��xI0�#���b���@�*���(<�y1��x���DkMROM���X��'�(�M�X�ff+��2�>#��u!�t1������{DD�_�"D�cc�:C[������8b��=����M��W,+�|�C�o9A]%�Uf�n�&�Vu_�`E�UH6�u�������r��o��Fvbhm��q>�5���L��3$'���NQ�����=����
��O��C�;���{�u�xP�:Z��9�4�{gZ��W���4������.y���1�T0'�����
J.N����t=�6����(:3'��gA��9�h7q`�����aK��Vq��%��*��@)�	��'
j���k��f��8�F�Biu{�R�Ti������f��XqNnD��'������4���*�^YC�K�cSz��A�� ���;?���+���O�����0U5���
�,���(����$�V�f��n��B�.S�ZL��kLfVLu3g��x}[6���U�`�LN��cr���YM���#��E����K�d`�R�3�Qj��%�Z)2�y��V>#����J�2��X3��0�q��k>���b����1��WLTEq?��E[�DN��������D��R�x�I���}��zW�BQ������Z��W���5[`�6�TU�>'AX��L�5*J�����J���MX���))]�j�r�ipganv�`��M�c9�x*��H5�=^'D�
Ni����CNl46��#n�|�t>l:_u(x�_E	��MH�+W"t���E90l��FN����x����+aU���!=���vjP5?y���i����S
���`�����:���]���Yv�h�(''JT��bgJ���������M|5]Y1Q?Y�G���yn������wn"�Fa�mh�5���*6-�
�r@�{���{������������x��)�1"Ow���G�h��<K�O����_�����d���;�pMq��-c�������	./�v�q�@O+�����2g@���P�A�2`��L/��o��$A�sX����~IV�:����j:�lme1���qw�L'�[���������dt�h5��o
�s<��$W�\���l8y��s��1$>�	�'��d6��cBj�����g�
� ��+�G�m�� ^�q�����V4��|��C���o���n����^x��<�1w���Q������F:-}�|T��a+fY��0��Y����5�v�s+���:f�$";�2��!`���
G�A��#:)��r�]Y��7oX��w���w+��w�[��+��2�m��*����6X�V_��*�$�����7���U*i���iC���'��2/`s���'�N��h,���#��������|5v�P�Y��(�����<o(�2����>�
�?O}y[�iI#����F���9V�?�~�k�_�H��Cw���k�n�?����K>����c����g�f�3���|���s��~,��l�$�������wY������8�.�D5z����Q�vD�q%A�g)�+�8��)������<�P3>mD�A�iW}�s���^��	��K��c��=^���]��:����*���H����m���TX9�_��������AWq�4v�i89�z���D���"������n�	/��ZX�!�|C>	��r��_�`������S2���=D�a~pS/� f���`OtR���*����@�v��lG6��gK�:� E�u# E�*�LN����������Uu&���~�RMR�eei�_	/f�
L����%����Zp��R����j7������~��7Oq/^��1�8����(���������ko���s�6VS�DA4#|[3����m�����W��}V�]��r7'�`|6=�����w�h��M ��I�
\�0�%pwX�V��G�g���mM�-�L��
����p��Z����Z-�x��ln�g�kx����r���d�r�
:���q^�3�*�N�9��4X���ql�}`S�3��M��~|>����#=�/���0p�A'}o�(�q�eWHs��	d2�K����@G,����Z�9��� E�n�����3�Af�.�@��.r������G����U/V������=?�#����+�I��;��5IR*����R
�h���?`����������?a����,!wfs��?dW�
�U�'�����d�9�;�h���+�����Em�$�����B�o�C<I.��9��!���H%dQc�_���4�e����r}e��1�P5���)�� b��UhA����l�����`��3�G��I�N�}i�d62��AM��a�\%
���p����y��rK�����]��=�-5@�0t������G��$�8�h���`�v���!Z�'��V��wH�d_e8����,�y!���3K>)X�W�:~��-<���/��u��o�A6r�@����hE;P������rFjLF���-�$hn4u����l�7���y�\����O&q/��>��?���DK���e
�n}HJ��R�����@�vH����|V����0c�@��]K%v�*�K5*-N���4�sJ��d��Z��QH^�
c�\���x!e����F�hb�����LC�P����!��+�US[wq<1�J`��aJU/�J�Q
��: (3�ek�[�>�?���@��B�VE�
j�l�_�z"����F���B`����y�HW�LS�q�j6�i�?�_�����,>��^���
�9����*���Xd�	�#�i�J�:�*�g���`%p�b�B ��������'�0�f��"�IKn����P��KXe�Q�\��k����jF���}J�F���4|f4|&T�����!��L'��5QF��h��F��7�Z]�88a�����O�PH=��q�@��E����0T�.���dBs%��H8�������i�������x1���<������I�tt3�0�uf�w�(_����;����:eLu�2�~���$I6��)��^����EVnA-[����O����;�����	*O'������ �*;IG���/����2T�)e���%�G�<[���`�(z+����W ���<R][��3k2tx�Q��+r_\	LL�"P�H���n��W[q�������O��V��l�D��]Mo�]�Z���W��w��+�x���_�:��B����]�p�M����rYh�"������)����-K��)bS������������OD��"���w
�o����8<=,'���GW}JjK+��4��K���tdwX/�|�Z�^~�
���g"Yx�"_�;W������6�"�sr|�q���3L[�������d���"�n\��C1�M6��i��������7s�����,���������:_���v��<yI�;���C��Ny&�i���B$��>�s�� ���>��5�U���~�f��Vrp9������r�o����<`��O�t��l���#�����.Y}W�|M�f��U���
���<���1?���g�a�A��(K��b�D��n0�P������;q����6�z���fC��nJ��L��i[���#��N��	���������~|s������+�p��4��[����}Q��U�`������.7���:��������n1���A�&�A�5
�Q`����m�S�lz�LKC�Y!H'"����
b~N��pr�jLbJ-� ;��yDr��22�S��`n2g��g��-�u��t�V�f*��U��,o�{�5$ �i���=���x�&(��Ff�0�A?�O5��)�faB���7R�Q��/r���Es�e��4*�Q��L���F������n;�mt����w��8��b����LR��P�w^$�`b���wn�iUU"���
�������MI���������?���_U���ru�.�W�����E�Z�q���j�hS�dT6����#R�_��U��IN���8�0���a[����G<~r�����"��2����e5	#�4.��b>6���"���RS%��;������dv����I�_b�����t�n�N�F^��9�3S����d��o[��j��
#����F�Q�Xjv�=��;imA��9v�H����hri��D& �PH��:g��bF�V0��0��c���4���Y��M���'�W�i'-��;�ez�U��#��tn<d?^NO�W�
=i~�J.��"�l����#�(�ftFi+�Q#a6F'�d=h>n>�����
�����fZ�|�=hn��e-� '�1����;���yA��7'�oV7���b���,U�[x_���d����4�MN �=�}��
��,��D�+�/�7����C�&�#���ODmW��gP����� ���s}~�d�
'b����-T�� ���l<Mk<���<�?���z��dq����.�kd��� ���w��gol�t\�$��lLq{���z
.���{%�\�tYS7�mZ
��8���,����%]9��~7�z�!���(mRIGqBTY�I�%���4/��g�����+q}6.W�
u[��r�Y%�$�X:����}��G��)�4x=���+�����|�8?���F4��J	�L��5�RwM)J��#��v��t�~�BV�`�E;p�+R���4�
�t�����1:���8���0���r;��8s�M����:��	d<������H�]���������xV
g��h��m�>�|K�����Z��_(!�sTX��au��)������>*Y>_O��zJ}=�Hw�
��y	1��k��'��kv������H+�$x�������H���w�U��lPU��O��������o��N��W�'+��o�k�%���L��A)� �K��$"E�������+X�u���uv`���WmdtU��s=�9uj]����Y��K:����+���*��?����rC���j�0Y>�J8_�����M/��`h��x=��OWq�0�eb$��l��9S������������Z$�2R��[������M7j��f�����w� ~r��hD[���n3������~�<j���h�]�$z���D�m������';h�Q�CgyP��~����r1�2���
��1]o�������j�Hg��L����i��	kp#i� s����diG`6�&XO4���?���G�@D �u���Q*j8�W������0�	�zB���@�����j��>��
�s��D�DNE��tq��>�|5`"�G��|Y ����^C=�>���^��5�f�J����W�5��}77 F��w���b�3M���H�r��w�]o���+�[a�u��2�2�'�>lX(_�6T��Q�0$NV7��B���������.��ne��j�	<}sl*5O�;��=�`f0My��1���9��z�S��h\�A�����@)1�[X�!�!
n
FP]DhH��.|?_�v�D|�,e1�@>I%�������;����l@�/�Z�>N����{F�����>�w��EP$�Z�h��i:��
R���	gs/���D ���w%�C%��!��_6`����h����o���&=t.�I�M-�����A�J���S��^��Zn(�:��3����������G4B�}�4��W�2U�#tsPp�����Krx���En��o��k`���P���a��<�s]a��Qq�6���9;a���;;?"�����v��!���~��7{��Ci����e���$�c]����D|�"�(�H��5!�����L�3'�v i�e����]b*o<N#��p�6ujoU���<0~|�9T<�`�W���@oa���.����n$�0����kZ���d}p����;���)�c�����[�H��`�8\����2.� ��Q^��<�~����`
����1&��[��"��|��]�F�!���M��I���.bl�y��-T��N�8|�e������_y%T�~�xBm[���Ym�����-~�A�u�L�������R c������8�&�����kg�g��[�����At��,f6)ze�bzt	�G��s4�N�=N)��P�������H����#���T^���������}7���mTl�U��Wq��8@ko�Z��m���t?�	�49%�w k���kJw�D[QC��g�3P�

E�������Wl.kf{*yc]�l�k�;���BK
����0='h�l�i��y����K�9�*�6#��6������F��@/�e�3�2X����M�C~G�����oy���3��L�q�b$�L�����*�0�����4�!�B�x���n�F��f���?OO_�J�M�F,�i�Y��/F��������n���en_�F�{X����y��+C�������=z��J�x}�������^5g��������@AuO
���!,���1#��\�r� ��f�~�����0�2�&�����t������;�;�pG�yq�tm��y}yxvpzt�����}���Z���YB��8�_�X���I�Zg��5�Z���S1� I�����Y�����8~�v���8�<p�y��B��N%�4kE'�'#��fus8)�P}]����A	�"��p����k�H�u�.��5�~f���kd���c9'q����&�6�{f!���f�9>|����!�I��-{�pIY���9����
Y�h���D��"d|�U�Y�6��q��f�9�����~z��-�d!&�����3�M{���>f#Z3+Zt��8Pz�SVI�.�@���l�u!�3�z���)�^7��J�wO�2��caq[�� ����e����S�!
_iNC��/���dD9�b��7��S�	��H�hY�5��w����<y}H,0����[ha�C:�"l'8|
,p�����-5�H����9�'�r�D�>�Cu��B����9�VZ������p)����w��3OI��&6��%n���e���;��hPs�,xk��g�b���_����C�[Yw�P,^�}�}P8���'�Tg������`����Y��MG������36����������
$���}�����z����?���I���#��������bnvz�����#e��+��L��Y��d���&�u����=��|�Y`w�p*�����5K��J�V��*'?%�aJSq���*t��&tf��	�f������t���;��k�:�Xzv3$��Fd�K9��v����I�PK��z)������P.0sM��l���U������LuB�������V�C�2��'�1�3oCy��b����<#|NHs���k�� 
��x2
U�x���w��/��������6�]���k_�������+GIv�-�r�]��*`�&B;'S?;IU'�T������������e*52���P*��������O,�������\�~�Ku&�p�dgE��&J�a�}����M�FIZ*���pn�L�-�q��&bL(5���Yd2��M��������0��Mm�k��&U�hB�7��E�N;����s��a��]~�!���L5)���I8��fUF���~��-_�0c|p�;����[�08������(�t��xz��%
���bb��M�|@buQ����v��o�LY��7;�e�$��8��%���"=
��
k:����0]�A��!U�P]lf���q\���_�&�A��x���x���|������{����BbQ�:����V��(�4�l������nK>]
T��jB�������|}	9x� *��h���F��yG��<�����V�:9=2�Ai����������BZL��Q������"��,)G�*�9������T{�����������M���G�~���q. �"��{%��8>�~��x��a�����'Oz�v�d�xU����H$����{L��55q��rc`��M4������R�j�Y��Y����d����V�G��������$A&�p2[���c�5�H*\�Q���������)�K'T����A����*�T��P���#�����G�Ct^�u�s��k���-E��*����PEkvp��r���yU@��;��z�� �����Pw���7�<83���n����R^��>S���� ���%��8q�HD6��I��o/K�eWHu�5�m������Z��X]%x���'��]�~��?�v�k����@�t`�����U�w��������7�)u���:x������!c�4�t�^�t$��0�[B��8��
#�w��}B&l����(?7���P�4��&3�����C�{M��K���7�����6���������,,���#�	�>����v��W^`p��������?$md��������p��/���D��
��)owY�l�4���Q��f��M`�`�j^�u��u�=�
s�[�9o[5��#z�����a�)�0
���x�s2�����%���"�T�Zt��ick�Z�Y�l1�8A�]�/���*1���+~\���D��b>�M-�]\ v���l�0���	t-8#�<:��s�`�%h���:I7a_m�������N��~�'7������pj�-�l���2�����������{��^����C�H��a�>~%��aUzp:�Qd&��������bD+�C�/sXi������S�Cs��m��O�^u�0|����G)|oj���W���T�����w����i*����<Cr�p���Vi(c�R���#����#��`������h�U���;������b��^�PJ�"��:��b�����P�z��x%&��Y2������{�T����%{<)���,�_I�P�������_erA���D��Y�BZ/�B��pD���4m�h���-��K�LR(�4��g<�6O�L()�xFQ�|�B����
I��w�p��+�F��o�z�w��I����C2=Eg����U
H
��PY]�������1E;	��l��HUpj	J5=dKs���o^������i��Ag��k<�?<<;3��?X=`�q�
���v��xh����������O_���p�������/_�������S����wg���qtPR�)�����?�Oni@����-����WGoV�
����a|����m����k�j�,����U���e�\��n��o�._BV���,]���v�W���_��+����#��L�T�3���'�''p��-����?j<�Y�n���K	J����9yjF�b{p������#���r�>�O�N������'|������G�! ���W(�5s5,�GQ�������8i6G$��B�N���%�+�"��;?�����'j;z����������!��r�z���b�itL�y_5����,�
|�Nl//��Y����8ut��Uox�������!���B��I�<c$@[�7'ph'I��W��T���iL������\Y�`I�rs�$El	S�(]�ngR��2���?~W����!wr�Hs��cS�����5l<-+HaK^AS�+������:AA�s\��_@H���%�T����T6�:9��w���pe�P����9�e��:�D�w�wW)�l�J�B|	����K���|�
��+eRk�����S9='S�����'@���;�K�����\����
a]�bdg��O ���!�D��,�H�Q�:8�ZPx���H�0�����
@J	�x�����VB���l6v'�G�oR�V7���#>Z3��~�cC�Vg��F�\%��2�������������q|`��b�K*�����6j0dT�����j�p�R��U��]C�mF������CJ���������o~:QSq���
e �N����R���Tv��fs�9%��;H�nKp���������������K������u�y�j]�3�LO-�O����|?�oZ<{������2n�*����x�4J>��$�t�#��Cr+O{�����]h��d��g�W*u��5���k�/���;m�o�����������?��O�U
�6�u5�L���j�����}����b���mt#�n��zp��Z����&�!�=����i=����yaFvAE�a���b,P��c3�H%2��G��M�;���N9�$�`�zj�>�i �����O*x�_��?��
x��Brw����=�����������?!�u~i�����b���~leo\r�	�~��z��'�D^�/��	8�)�Mt�,���{h��d[����aB)6\ED���2iH�87�0,z������^\�[��p��
��|T�x����L�Ka�����2���JE����{�����������B�4hew6���.?(���_��������z�p�Y�-����}~I�����Q���b���0�,��$�H����?����*�y�����O
�	���rd�N�3������4DNk� �Y��Q�oiOQY�9������v�W"�l�����<��\���������/Qq��M)!��8kr(��>i����M�2V�_'���BI�>6w�i�[KZ[���,'j���cw���;����@��^�>5R�P~g��e\��+�a��;���#���s�&��<�k��~V�SgRk��v��E���+�������9��O)��*�R����a���n(M��Yoc
G�L:�SZ��9Br�^�[r����LG�5B=���*�!^\��I���G��XQ�{Q��4���`�5DS���C���r�~c��
�>�z�W�O��h������Z���
�k�5��4y+/��^w���V	�e���r��^8���P!+�
k�RF���&�S�|?�2�� 9��GP��A��,�'��-����a�*��R����m�<>@�����xZ�)�Qg������<���������Qd�w3��9���`KfbC/}�~��B
T��B�y3�jP���*j6�wJ��cU�|#��(4���$�i:7Ut$j^@�qQ�|��n$Qz&�&�8��4,�rXp��U��(,6����Di�A�*���/���VmU���;���g��#Y�(.��(
 ������Y���K$�1�u�gH��u����$���8�jo7��/P���[Q.�r�f��8�C���*�^��o���S�z�]��(������&Co���u^�}��������������3A�������;��[�E�(4<�v�U{��poO|����9��z3�Z�i�=/T4C����/�|�8!�SN�R,��FD
H?�2�h<�'��G����*��T�EO�ct�eA�V������B��bwtQd��D�'��D2"�g��>���I~2E:�(-/���A��9h��p\#&��[�E
a�	��CO��f�C�c6�pE#�bJ�)��S[w��;;
)�����I��h8T������Ky[�Gk�q�?�n�]������qi�oI�
�M.�L��g^3������}�ui�,�F��k���iJ
-V�d����g�#����{���M�Q9FH��5���*~\21^��k��WaZ����1��{b�)�m�O�Vp�����g2Sl��r�$Q+�i2�9���G�h�tw�?�(���'df����c'-����f���(tE,R�S�����5�8�������$��D��I>�9�x�i���W������OtD��0kHa%�����t=V�?�Z�<`�?����������H+Te��������pag�R���8���=�9��I���%�4�#��(���ye�v F�Vka�hTu�t��p
�B�h�����]�����0����R����k�|lT5���	q�$���!+�+�����M[��)��BgV�E;,Z��[��Wlc")9�t7�@��h��2;[�m���yo����u�jL�o� �� �R�{�
�TM�F�?
�"�K:
rI{_�6h����M�I����{������������{<�N��GO���,�]M5s%
������/����Dkr���3ZYG[2J��ED+�
���T���E��,���I�#�9���Z�O�B>�_oo��N�i�RV�������?���8,A���Z��d�Z8�����j��=e�7\����F���79���l�0�2�D�K�0k4O��My��#^��e��j%J��8
V�)��h-Wo����5�Oj��#N�Q?�.*m���VWO6E)Q������c�Z���z�m�o=��>_�m�`�������)K}�������{s2�+>���`�Ap�I�\}���:C�X�z�����[Y��������)��p��]:�����P��Iq���h�Zn?���n�����E�S*\fE���\���\j]5\�3�V?����o>{��b�h����J��W���z_d���e�lp+��~�����VmYbX�B(L�_y����j�5����Br�Xr�Kn������$q��"��� �����
���a����9M�J����(�-Kw�1%d K�x�7����f/�'����F�fZ@0K�e3�(����{�Y����A����7���	�2�w	a�R�[�'��u��<9��R&O�4!����U������BR�����5*��Y�e��_=7�`�`�"�R��c����%��0lh�4{[?��G��c�L'q����1#S���$��G��|@'
�D�i����1����XRFo���z�^z��q��b�v����Q|]�K��������w��E��^��V�v���6��F���6��]��v���.���4�(����D��-�����fw�VnFy|�/;�4JM*<�|�J|Q�&�!8��.
�j;x���Xw��H�/0�PfMb3���$v������8#�wU���a���SJ��B�x�5�I�N���5�I����q�k��^c����X=�������d���Zb�������d	����Y��O{M��L'���Cw���k�n�?������������;.���,��%M��Ka�7����f�q;����E���L�JT��L���J@;�m"�
�K����#���
w�jc��U��C�s��
��1H�
'R�2/�x�n�-��7l�Sh�I���x��JA��6�{�[��j�b�=_�zYD���������Z��������J=������q�n�������������e��d�I�Q�I�Iw������xx�/�#�Wv�����:I'��]�@���F�~��ac��~X���^Q���%P��1�����������������4JH_0%��sy4]!�i��Q�/8Q�����d����CqX\�����!o
Yt���_d�<��
me���������8��nh�x���j2�3s�/����Z������O�}s�j,E���p�GD�s�
~�F�����sd/�#7��S�%>�����X��6�^�5��,��M'c�6Q�����;?������&���KO�S1�2��t��c���$��9E>���|����R}~��hbV]��[C���(2^��	g)a{��2��v�
��J��q�A�.j�n���;���)e�>�]0��|��.K��_����v�|Ij���G���D���G�$e
2�!�Hp�\�e�����8�)��a7ab#zJ�\�&5�?�S������g�����a�c��;�d1"��������	w��iG�G�����PZ\�����^��
�bG��1���xQ#^'(�c��c+f�Pd��O����&����7�10������t����{v ���I�CZ�bz�SL��w��Yj�J� ����U7��u�%��D�������V��M�7���t�}<� RNg��j>�������I*Ob�h��a(�mnG����k���]� �2����Z�������ul�VkC`��L������uwWmQW	TW���)����fC��"��C�%��w�<�y"�e�f��A�`��~����
&�MvGW�%I��C�Ne$l����:��I�HU�&�����6C$���<��3����f�$AO�Ijn2�����|�zL��K����E�#�3� ��NNB�,�1����Q�)��Z�zw�&(
{��'�6FE%���-
�mvqt�4)�k��?���0U��U�n�3�9�

p}"�C����TC3��l�S�	��9so�Q?\&���}����I��w������7a=��&f�������%[��rjF�^�������mY�d�^G3r�_���&���g]w3���Y'4�����B���>��{���>��-\S:�M��U-��`��G�f"9c�fA�#�+�|����B��d���� ����	��G������������?_��|�����������?���|=V@
promote-panic-test-95.tar.gzapplication/gzipDownload
promote-panic-test-96.tar.gzapplication/gzipDownload
#17Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Michael Paquier (#14)
3 attachment(s)
Re: PANIC during crash recovery of a recently promoted standby

Hello.

At Fri, 22 Jun 2018 15:25:48 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180622062548.GE5215@paquier.xyz>

On Fri, Jun 22, 2018 at 02:34:02PM +0900, Kyotaro HORIGUCHI wrote:

Hello, sorry for the absense and I looked the second patch.

Thanks for the review!

At Fri, 22 Jun 2018 13:45:21 +0900, Michael Paquier
<michael@paquier.xyz> wrote in <20180622044521.GC5215@paquier.xyz>

long as crash recovery runs. And XLogNeedsFlush() also has a similar
problem.

Here, on the other hand, this patch turns off
updateMinRecoverypoint if minRecoverPoint is invalid when
RecoveryInProgress() == true. Howerver RecovInProg() == true
means archive recovery is already started and and
minRecoveryPoint *should* be updated t for the
condition. Actually minRecoverypoint is updated just below. If
this is really right thing, I think that some explanation for the
reason is required here.

LocalRecoveryInProgress is just a local copy of SharedRecoveryInProgress
so RecoveryInProgress also returns true if crash recovery is running.
But perhaps I am missing what you mean? The point here is that redo can
call XLogNeedsFlush, no?

My concern at the time was the necessity to turn off
updateMinRecoveryPoint on the fly. (The previous comment seems a
bit confused, sorry.)

When minRecoveryPoint is invalid, there're only two possible
cases. It may be at very beginning of archive reovery or may be
running a crash recovery. In the latter case, we have detected
crash recovery before redo starts. So we can turn off
updateMinRecoveryPoint immediately and no further check is
needed and it is (I think) easier to understand.

In xlog_redo there still be "minRecoverypoint != 0", which ought
to be described as "!XLogRecPtrIsInvalid(minRecoveryPoint)". (It
seems the only one. Double negation is a bit uneasy but there are
many instance of this kind of coding.)

It is possible to use directly a comparison with InvalidXLogRecPtr
instead of a double negation.

I'm not sure whether it is abstraction of invalid value, or just
a short cut of the value. That's right if it's the
latter. (There's several places where invalid LSN is assumed to
be smaller than any valid values in the patch).

the second diff is the difference of the first patch from
promote_panic_master.diff

On further thought, as we confirmed upthread (and existing
comments are saying) that (minRecoveryPoint == 0)
!InArchiveRecovery are always equivalent, and
updateMinRecoveryPoint becomes equivalent to them by the v3
patch. That is, we can just remove the variable and the attached
third patch is that. It also passes all recovery tests including
the new 015.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

promote-panic_v3.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dcfef36591..d86137ae8b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -821,8 +821,14 @@ static XLogSource XLogReceiptSource = 0;	/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint; /* local copy of
-									 * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available, and updateMinRecoveryPoint is
+ * switched to false to prevent any updates while replaying records.
+ * Those values are kept consistent as long as crash recovery runs.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -2707,7 +2713,7 @@ XLogGetReplicationSlotMinimumLSN(void)
 static void
 UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 {
-	/* Quick check using our local copy of the variable */
+	/* Check using our local copy of minRecoveryPoint */
 	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
 		return;
 
@@ -2717,14 +2723,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		XLogRecPtr	newMinRecoveryPoint;
 		TimeLineID	newMinRecoveryPointTLI;
@@ -3110,7 +3109,7 @@ XLogNeedsFlush(XLogRecPtr record)
 	 */
 	if (RecoveryInProgress())
 	{
-		/* Quick exit if already known updated */
+		/* Quick exit if already known to be updated or cannot be updated */
 		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
 			return false;
 
@@ -3124,20 +3123,8 @@ XLogNeedsFlush(XLogRecPtr record)
 		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		LWLockRelease(ControlFileLock);
 
-		/*
-		 * An invalid minRecoveryPoint means that we need to recover all the
-		 * WAL, i.e., we're doing crash recovery.  We never modify the control
-		 * file's value in that case, so we can short-circuit future checks
-		 * here too.
-		 */
-		if (minRecoveryPoint == 0)
-			updateMinRecoveryPoint = false;
-
 		/* check again */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-			return false;
-		else
-			return true;
+		return record > minRecoveryPoint;
 	}
 
 	/* Quick exit if already known flushed */
@@ -4269,6 +4256,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
 				minRecoveryPoint = ControlFile->minRecoveryPoint;
 				minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
+				/*
+				 * The startup process can update its local copy of
+				 * minRecoveryPoint from this point.
+				 */
+				updateMinRecoveryPoint = true;
+
 				UpdateControlFile();
 				LWLockRelease(ControlFileLock);
 
@@ -6892,9 +6885,31 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		/*
+		 * Initialize our local copy of minRecoveryPoint.  When doing crash
+		 * recovery we want to replay up to the end of WAL.  Particularly, in
+		 * the case of a promoted standby minRecoveryPoint value in the
+		 * control file is only updated after the first checkpoint.  However,
+		 * if the instance crashes before the first post-recovery checkpoint
+		 * is completed then recovery will use a stale location causing the
+		 * startup process to think that there are still invalid page
+		 * references when checking for data consistency.
+		 */
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		else
+		{
+			/*
+			 * We are to run crash recovery. We shouldn't update
+			 * minRecoveryPoint until crash recvoery ends.
+			 */
+			updateMinRecoveryPoint = false;
+			minRecoveryPoint = InvalidXLogRecPtr;
+			minRecoveryPointTLI = 0;
+		}
 
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
@@ -7861,6 +7876,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
@@ -9949,11 +9966,16 @@ xlog_redo(XLogReaderState *record)
 		 * Update minRecoveryPoint to ensure that if recovery is aborted, we
 		 * recover back up to this point before allowing hot standby again.
 		 * This is important if the max_* settings are decreased, to ensure
-		 * you don't run queries against the WAL preceding the change.
+		 * you don't run queries against the WAL preceding the change. The
+		 * local copies cannot be updated as long as crash recovery is
+		 * happening and we expect all the WAL to be replayed.
 		 */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-		if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		if (minRecoveryPoint != InvalidXLogRecPtr && minRecoveryPoint < lsn)
 		{
 			ControlFile->minRecoveryPoint = lsn;
 			ControlFile->minRecoveryPointTLI = ThisTimeLineID;
diff_from_promote_panic_master.difftext/x-patch; charset=us-asciiDownload
2716c2716
< 	/* Quick check using our local copy of the variable */
---
> 	/* Check using our local copy of minRecoveryPoint */
2720,2732d2719
< 	/*
< 	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
< 	 * i.e., we're doing crash recovery.  We never modify the control file's
< 	 * value in that case, so we can short-circuit future checks here too. The
< 	 * local values of minRecoveryPoint and minRecoveryPointTLI should not be
< 	 * updated until crash recovery finishes.
< 	 */
< 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
< 	{
< 		updateMinRecoveryPoint = false;
< 		return;
< 	}
< 
3125,3133d3111
< 		/*
< 		 * An invalid minRecoveryPoint means that we need to recover all the
< 		 * WAL, i.e., we're doing crash recovery.  We never modify the control
< 		 * file's value in that case, so we can short-circuit future checks
< 		 * here too.
< 		 */
< 		if (XLogRecPtrIsInvalid(minRecoveryPoint))
< 			updateMinRecoveryPoint = false;
< 
6926a6905,6909
> 			/*
> 			 * We are to run crash recovery. We shouldn't update
> 			 * minRecoveryPoint until crash recvoery ends.
> 			 */
> 			updateMinRecoveryPoint = false;
promote-panic_horiguti_v4.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dcfef36591..8a37784653 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -821,10 +821,13 @@ static XLogSource XLogReceiptSource = 0;	/* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint; /* local copy of
-									 * ControlFile->minRecoveryPoint */
+/*
+ * Local copies of equivalent fields in the control file.  When running
+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we
+ * expect to replay all the WAL available.
+ */
+static XLogRecPtr minRecoveryPoint;
 static TimeLineID minRecoveryPointTLI;
-static bool updateMinRecoveryPoint = true;
 
 /*
  * Have we reached a consistent database state? In crash recovery, we have
@@ -2707,8 +2710,13 @@ XLogGetReplicationSlotMinimumLSN(void)
 static void
 UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 {
-	/* Quick check using our local copy of the variable */
-	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
+	/*
+	 * Quich check using our local copy of minRecoveryPoint. Invalid
+	 * minRecoveryPoint value means we are running crash recovery and don't
+	 * need to update it.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint) ||
+		(!force && lsn <= minRecoveryPoint))
 		return;
 
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
@@ -2717,14 +2725,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	minRecoveryPoint = ControlFile->minRecoveryPoint;
 	minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-	/*
-	 * An invalid minRecoveryPoint means that we need to recover all the WAL,
-	 * i.e., we're doing crash recovery.  We never modify the control file's
-	 * value in that case, so we can short-circuit future checks here too.
-	 */
-	if (minRecoveryPoint == 0)
-		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	if (force || minRecoveryPoint < lsn)
 	{
 		XLogRecPtr	newMinRecoveryPoint;
 		TimeLineID	newMinRecoveryPointTLI;
@@ -3110,8 +3111,8 @@ XLogNeedsFlush(XLogRecPtr record)
 	 */
 	if (RecoveryInProgress())
 	{
-		/* Quick exit if already known updated */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
+		/* Quick exit if already known to be updated or cannot be updated */
+		if (record <= minRecoveryPoint || XLogRecPtrIsInvalid(minRecoveryPoint))
 			return false;
 
 		/*
@@ -3124,20 +3125,8 @@ XLogNeedsFlush(XLogRecPtr record)
 		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 		LWLockRelease(ControlFileLock);
 
-		/*
-		 * An invalid minRecoveryPoint means that we need to recover all the
-		 * WAL, i.e., we're doing crash recovery.  We never modify the control
-		 * file's value in that case, so we can short-circuit future checks
-		 * here too.
-		 */
-		if (minRecoveryPoint == 0)
-			updateMinRecoveryPoint = false;
-
 		/* check again */
-		if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
-			return false;
-		else
-			return true;
+		return record > minRecoveryPoint;
 	}
 
 	/* Quick exit if already known flushed */
@@ -6892,9 +6881,31 @@ StartupXLOG(void)
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
-		/* initialize our local copy of minRecoveryPoint */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		/*
+		 * Initialize our local copy of minRecoveryPoint.  When doing crash
+		 * recovery we want to replay up to the end of WAL.  Particularly, in
+		 * the case of a promoted standby minRecoveryPoint value in the
+		 * control file is only updated after the first checkpoint.  However,
+		 * if the instance crashes before the first post-recovery checkpoint
+		 * is completed then recovery will use a stale location causing the
+		 * startup process to think that there are still invalid page
+		 * references when checking for data consistency.
+		 */
+		if (InArchiveRecovery)
+		{
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+		}
+		else
+		{
+			/*
+			 * We are to run crash recovery. Set invalid value to
+			 * minRecoveryPoint ignoring the shared variable, which inhibis
+			 * updating the variable during crash recovery.
+			 */
+			minRecoveryPoint = InvalidXLogRecPtr;
+			minRecoveryPointTLI = 0;
+		}
 
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
@@ -7861,6 +7872,8 @@ CheckRecoveryConsistency(void)
 	if (XLogRecPtrIsInvalid(minRecoveryPoint))
 		return;
 
+	Assert(InArchiveRecovery);
+
 	/*
 	 * assume that we are called in the startup process, and hence don't need
 	 * a lock to read lastReplayedEndRecPtr
@@ -9265,6 +9278,7 @@ CreateRestartPoint(int flags)
 		 * no value in having the minimum recovery point any earlier than this
 		 * anyway, because redo will begin just after the checkpoint record.
 		 */
+		Assert(ControlFile->minRecoveryPoint != InvalidXLogRecPtr);
 		if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
 		{
 			ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
@@ -9949,14 +9963,21 @@ xlog_redo(XLogReaderState *record)
 		 * Update minRecoveryPoint to ensure that if recovery is aborted, we
 		 * recover back up to this point before allowing hot standby again.
 		 * This is important if the max_* settings are decreased, to ensure
-		 * you don't run queries against the WAL preceding the change.
+		 * you don't run queries against the WAL preceding the change. Both
+		 * local and shared variables cannot be updated and we expect all the
+		 * WAL to be replayed while crash recovery is happning.
 		 */
-		minRecoveryPoint = ControlFile->minRecoveryPoint;
-		minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-		if (minRecoveryPoint != 0 && minRecoveryPoint < lsn)
+		if (InArchiveRecovery)
 		{
-			ControlFile->minRecoveryPoint = lsn;
-			ControlFile->minRecoveryPointTLI = ThisTimeLineID;
+			Assert(minRecoveryPoint != InvalidXLogRecPtr);
+			minRecoveryPoint = ControlFile->minRecoveryPoint;
+			minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+
+			if (minRecoveryPoint < lsn)
+			{
+				ControlFile->minRecoveryPoint = lsn;
+				ControlFile->minRecoveryPointTLI = ThisTimeLineID;
+			}
 		}
 
 		CommitTsParameterChange(xlrec.track_commit_timestamp,
#18Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro HORIGUCHI (#17)
Re: PANIC during crash recovery of a recently promoted standby

On Mon, Jul 02, 2018 at 04:25:13PM +0900, Kyotaro HORIGUCHI wrote:

When minRecoveryPoint is invalid, there're only two possible
cases. It may be at very beginning of archive reovery or may be
running a crash recovery. In the latter case, we have detected
crash recovery before redo starts. So we can turn off
updateMinRecoveryPoint immediately and no further check is
needed and it is (I think) easier to understand.

Er, you are missing the point that updateMinRecoveryPoint is also used
by processes, like the checkpointer, other than the startup process,
which actually needs to update minRecoveryPoint and rely on the default
value of updateMinRecoveryPoint which is true...

I am planning to finish wrapping this patch luckily on Wednesday JST
time, or in the worst case on Thursday. I got this problem on my mind
for a couple of days now and I could not find a case where the approach
taken could cause a problem. Opinions are welcome.
--
Michael

#19Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#18)
Re: PANIC during crash recovery of a recently promoted standby

On Mon, Jul 02, 2018 at 10:41:05PM +0900, Michael Paquier wrote:

I am planning to finish wrapping this patch luckily on Wednesday JST
time, or in the worst case on Thursday. I got this problem on my mind
for a couple of days now and I could not find a case where the approach
taken could cause a problem. Opinions are welcome.

Okay, pushed and back-patched. Thanks to all who participated in the
thread!
--
Michael

#20Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Michael Paquier (#19)
Re: PANIC during crash recovery of a recently promoted standby

On Thu, Jul 5, 2018 at 7:20 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jul 02, 2018 at 10:41:05PM +0900, Michael Paquier wrote:

I am planning to finish wrapping this patch luckily on Wednesday JST
time, or in the worst case on Thursday. I got this problem on my mind
for a couple of days now and I could not find a case where the approach
taken could cause a problem. Opinions are welcome.

Okay, pushed and back-patched. Thanks to all who participated in the
thread!

Many thanks Michael for doing the gruelling of coming up with a more
complete fix, verifying all the cases, in various back branches.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#21Michael Paquier
michael@paquier.xyz
In reply to: Pavan Deolasee (#20)
Re: PANIC during crash recovery of a recently promoted standby

On Thu, Jul 05, 2018 at 01:03:14PM +0530, Pavan Deolasee wrote:

Many thanks Michael for doing the gruelling of coming up with a more
complete fix, verifying all the cases, in various back branches.

No problem. I hope I got the credits right. If there is anything wrong
please feel free to let me know.
--
Michael