Standby got invalid primary checkpoint after crashed right after promoted.

Started by hao harryalmost 4 years ago4 messages
#1hao harry
harry-hao@outlook.com
2 attachment(s)

Hi, pgsql-hackers,

I think I found a case that database is not recoverable, would you please give a look?

Here is how it happens:

- setup primary/standby
- do a lots INSERT at primary
- create a checkpoint at primary
- wait until standby start doing restart point, it take about 3mins syncing buffers to complete
- before the restart point update ControlFile, promote the standby, that changed ControlFile
->state to DB_IN_PRODUCTION, this will skip update to ControlFile, leaving the ControlFile
->checkPoint pointing to a removed file
- before the promoted standby request the post-recovery checkpoint (fast promoted),
one backend crashed, it will kill other server process, so the post-recovery checkpoint skipped
- the database restart startup process, which report: "could not locate a valid checkpoint record"

I attached a test to reproduce it, it does not fail every time, it fails every 10 times to me.
To increase the chance CreateRestartPoint skip update ControlFile and to simulate a crash,
the patch 0001 is needed.

Best Regard.

Harry Hao

Attachments:

0001-Patched-CreateRestartPoint-to-reproduce-invalid-chec.patchapplication/octet-stream; name=0001-Patched-CreateRestartPoint-to-reproduce-invalid-chec.patchDownload
From 668e3e4a11129675c19b20ed825978c1e0ed32a4 Mon Sep 17 00:00:00 2001
From: Harry Hao <harry-hao@outlook.com>
Date: Wed, 16 Mar 2022 13:20:37 +0800
Subject: [PATCH] Patched CreateRestartPoint to reproduce invalid checkpoint at
 startup.

Sleep before checking ControlFile->state at CreateRestartPoint, to
increase the chance it changed to DB_IN_PRODUCTION by promotion.
If that happens, the ControlFile->checkPoint is stale and invalid.
If that was a fast promotion, the post-recovery checkpoint can be
skipped if crash happens, then the database will not recover.

Also added chaos_sigsegv to simulate a crash, and some logs to
show the timing of restartpoint, promotion and the crash.
---
 src/backend/access/transam/xlog.c | 10 +++++++++-
 src/test/regress/regress.c        |  9 +++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0d2bd7a357..379425b23d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5723,12 +5723,14 @@ PerformRecoveryXLogAction(void)
 		 * checkpoint later anyway, just for safety.
 		 */
 		CreateEndOfRecoveryRecord();
+        ereport(WARNING, errmsg("TEST: fast promoted"));
 	}
 	else
 	{
 		RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
 						  CHECKPOINT_IMMEDIATE |
 						  CHECKPOINT_WAIT);
+        ereport(WARNING, errmsg("TEST: normal promoted"));
 	}
 
 	return promoted;
@@ -6991,6 +6993,12 @@ CreateRestartPoint(int flags)
 	 */
 	PriorRedoPtr = ControlFile->checkPointCopy.redo;
 
+    /*
+     * Sleep to increase chance state become DB_IN_PRODUCTION when promoted.
+     */
+    ereport(WARNING, errmsg("TEST: restartpoint sleep before check state"));
+    pg_usleep(5000000L);
+    ereport(WARNING, errmsg("TEST: restartpoint state=%d", ControlFile->state));
 	/*
 	 * Update pg_control, using current time.  Check that it still shows
 	 * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
@@ -7076,7 +7084,7 @@ CreateRestartPoint(int flags)
 	 */
 	if (!RecoveryInProgress())
 		replayTLI = XLogCtl->InsertTimeLineID;
-
+ 
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr, replayTLI);
 
 	/*
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 0802fb9136..ad9cbea762 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -1216,3 +1216,12 @@ binary_coercible(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(IsBinaryCoercible(srctype, targettype));
 }
+
+PG_FUNCTION_INFO_V1(chaos_sigsegv);
+Datum
+chaos_sigsegv(PG_FUNCTION_ARGS)
+{
+    char        *p = NULL;
+    *p = 0;
+    PG_RETURN_VOID();
+}
-- 
2.30.1

reprod_crash_right_after_promoted.pltext/x-perl-script; name=reprod_crash_right_after_promoted.plDownload
#2hao harry
harry-hao@outlook.com
In reply to: hao harry (#1)
Re: Standby got invalid primary checkpoint after crashed right after promoted.

Found this issue is duplicated to [1]/messages/by-id/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com&lt;https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com&gt;, after applied that patch, I cannot reproduce it anymore.

[1]: /messages/by-id/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com&lt;https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com&gt;

2022年3月16日 下午3:16,hao harry <harry-hao@outlook.com<mailto:harry-hao@outlook.com>> 写道:

Hi, pgsql-hackers,

I think I found a case that database is not recoverable, would you please give a look?

Here is how it happens:

- setup primary/standby
- do a lots INSERT at primary
- create a checkpoint at primary
- wait until standby start doing restart point, it take about 3mins syncing buffers to complete
- before the restart point update ControlFile, promote the standby, that changed ControlFile
->state to DB_IN_PRODUCTION, this will skip update to ControlFile, leaving the ControlFile
->checkPoint pointing to a removed file
- before the promoted standby request the post-recovery checkpoint (fast promoted),
one backend crashed, it will kill other server process, so the post-recovery checkpoint skipped
- the database restart startup process, which report: "could not locate a valid checkpoint record"

I attached a test to reproduce it, it does not fail every time, it fails every 10 times to me.
To increase the chance CreateRestartPoint skip update ControlFile and to simulate a crash,
the patch 0001 is needed.

Best Regard.

Harry Hao

<0001-Patched-CreateRestartPoint-to-reproduce-invalid-chec.patch><reprod_crash_right_after_promoted.pl>

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: hao harry (#1)
Re: Standby got invalid primary checkpoint after crashed right after promoted.

At Wed, 16 Mar 2022 07:16:16 +0000, hao harry <harry-hao@outlook.com> wrote in

Hi, pgsql-hackers,

I think I found a case that database is not recoverable, would you please give a look?

Here is how it happens:

- setup primary/standby
- do a lots INSERT at primary
- create a checkpoint at primary
- wait until standby start doing restart point, it take about 3mins syncing buffers to complete
- before the restart point update ControlFile, promote the standby, that changed ControlFile
->state to DB_IN_PRODUCTION, this will skip update to ControlFile, leaving the ControlFile
->checkPoint pointing to a removed file

Yeah, it seems like exactly the same issue pointed in [1]/messages/by-id/7bfad665-db9c-0c2a-2604-9f54763c5f9e@oss.nttdata.com. A fix is
proposed in [1]/messages/by-id/7bfad665-db9c-0c2a-2604-9f54763c5f9e@oss.nttdata.com. Maybe I can remove "possible" from the mail subject:p

[1]: /messages/by-id/7bfad665-db9c-0c2a-2604-9f54763c5f9e@oss.nttdata.com
[2]: /messages/by-id/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com

- before the promoted standby request the post-recovery checkpoint (fast promoted),
one backend crashed, it will kill other server process, so the post-recovery checkpoint skipped
- the database restart startup process, which report: "could not locate a valid checkpoint record"

I attached a test to reproduce it, it does not fail every time, it fails every 10 times to me.
To increase the chance CreateRestartPoint skip update ControlFile and to simulate a crash,
the patch 0001 is needed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: hao harry (#2)
Re: Standby got invalid primary checkpoint after crashed right after promoted.

(My previous mail hass crossed with this one)

At Wed, 16 Mar 2022 08:21:46 +0000, hao harry <harry-hao@outlook.com> wrote in

Found this issue is duplicated to [1], after applied that patch, I cannot reproduce it anymore.

[1] /messages/by-id/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com&lt;https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com&gt;

Glad to hear that. Thanks for checking it!

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center