Why don't update minimum recovery point in xact_redo_abort

Started by 蔡梦娟(玊于)over 4 years ago9 messages
#1蔡梦娟(玊于)
mengjuan.cmj@alibaba-inc.com

Hi, all

Recently, I got a PANIC while restarts standby, which can be reproduced by the following steps, based on pg 11:
1. begin a transaction in primary node;
2. create a table in the transaction;
3. insert lots of data into the table;
4. do a checkpoint, and restart standby after checkpoint is done in primary node;
5. insert/update lots of data into the table again;
6. abort the transaction.

after step 6, fast shutdown standby node, and then restart standby, you will get a PANIC log, and the backtrace is:
#0 0x00007fc663e5a277 in raise () from /lib64/libc.so.6
#1 0x00007fc663e5b968 in abort () from /lib64/libc.so.6
#2 0x0000000000c89f01 in errfinish (dummy=0) at elog.c:707
#3 0x0000000000c8cba3 in elog_finish (elevel=22, fmt=0xdccc18 "WAL contains references to invalid pages") at elog.c:1658
#4 0x00000000005e476a in XLogCheckInvalidPages () at xlogutils.c:253
#5 0x00000000005cbc1a in CheckRecoveryConsistency () at xlog.c:9477
#6 0x00000000005ca5c5 in StartupXLOG () at xlog.c:8609
#7 0x0000000000a025a5 in StartupProcessMain () at startup.c:274
#8 0x0000000000643a5c in AuxiliaryProcessMain (argc=2, argv=0x7ffe4e4849a0) at bootstrap.c:485
#9 0x0000000000a00620 in StartChildProcess (type=StartupProcess) at postmaster.c:6215
#10 0x00000000009f92c6 in PostmasterMain (argc=3, argv=0x4126500) at postmaster.c:1506
#11 0x00000000008eab64 in main (argc=3, argv=0x4126500) at main.c:232

I think the reason for the above error is as follows:
1. the transaction in primary node was aborted finally, the standby node also deleted the table files after replayed the xlog record, however, without updating minimum recovery point;
2. primary node did a checkpoint before abort, and then standby node is restarted, so standby node will recovery from a point where the table has already been created and data has been inserted into the table;
3. when standby node restarts after step 6, it will find the page needed during recovery doesn't exist, which has already been deleted by xact_redo_abort before, so standby node will treat this page as an invalid page;
4. xact_redo_abort drop relation files without updating minumum recovery point, before standby node replay the abort xlog record and forget invalid pages again, it will reach consistency because the abort xlogrecord lsn is greater than minrecoverypoint;
5. during checkRecoveryConsistency, it will check invalid pages, and find that there is invalid page, and the PANIC log will be generated.

So why don't update minimum recovery point in xact_redo_abort, just like XLogFlush in xact_redo_commit, in which way standby could reach consistency and check invalid pages after replayed the abort xlogrecord.
Hope to get your reply

Thanks & Best Regard

#2Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: 蔡梦娟(玊于) (#1)
Re: Why don't update minimum recovery point in xact_redo_abort

On 2021/07/27 2:38, 蔡梦娟(玊于) wrote:

Hi, all

Recently, I got a PANIC while restarts standby, which can be reproduced by the following steps, based on pg 11:
1. begin a transaction in primary node;
2. create a table in the transaction;
3. insert lots of data into the table;
4. do a checkpoint, and restart standby after checkpoint is done in primary node;
5. insert/update lots of data into the table again;
6. abort the transaction.

I could reproduce the issue by using the similar steps and
disabling full_page_writes, in the master branch.

after step 6, fast shutdown standby node, and then restart standby, you will get a PANIC log, and the backtrace is:
#0  0x00007fc663e5a277 in raise () from /lib64/libc.so.6
#1  0x00007fc663e5b968 in abort () from /lib64/libc.so.6
#2  0x0000000000c89f01 in errfinish (dummy=0) at elog.c:707
#3  0x0000000000c8cba3 in elog_finish (elevel=22, fmt=0xdccc18 "WAL contains references to invalid pages") at elog.c:1658
#4  0x00000000005e476a in XLogCheckInvalidPages () at xlogutils.c:253
#5  0x00000000005cbc1a in CheckRecoveryConsistency () at xlog.c:9477
#6  0x00000000005ca5c5 in StartupXLOG () at xlog.c:8609
#7  0x0000000000a025a5 in StartupProcessMain () at startup.c:274
#8  0x0000000000643a5c in AuxiliaryProcessMain (argc=2, argv=0x7ffe4e4849a0) at bootstrap.c:485
#9  0x0000000000a00620 in StartChildProcess (type=StartupProcess) at postmaster.c:6215
#10 0x00000000009f92c6 in PostmasterMain (argc=3, argv=0x4126500) at postmaster.c:1506
#11 0x00000000008eab64 in main (argc=3, argv=0x4126500) at main.c:232

I think the reason for the above error is as follows:
1. the transaction in primary node was aborted finally, the standby node also deleted the table files after replayed the xlog record, however, without updating minimum recovery point;
2. primary node did a checkpoint before abort, and then standby node is restarted, so standby node will recovery from a point where the table has already been created and data has been inserted into the table;
3. when standby node restarts after step 6, it will find the page needed during recovery doesn't exist, which has already been deleted by xact_redo_abort before, so standby node will treat this page as an invalid page;
4. xact_redo_abort drop relation files without updating minumum recovery point, before standby node replay the abort xlog record and forget invalid pages again, it will reach consistency because the abort xlogrecord lsn is greater than minrecoverypoint;
5. during checkRecoveryConsistency, it will check invalid pages, and find that there is invalid page, and the PANIC log will be generated.

So why don't update minimum recovery point in xact_redo_abort, just like XLogFlush in xact_redo_commit, in which way standby could reach consistency and check invalid pages after replayed the abort xlogrecord.

ISTM that you're right. xact_redo_abort() should call XLogFlush() to
update the minimum recovery point on truncation. This seems
the oversight in commit 7bffc9b7bf.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#3Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#2)
Re: Why don't update minimum recovery point in xact_redo_abort

On Tue, Jul 27, 2021 at 05:26:05PM +0900, Fujii Masao wrote:

ISTM that you're right. xact_redo_abort() should call XLogFlush() to
update the minimum recovery point on truncation. This seems
the oversight in commit 7bffc9b7bf.

Indeed. It would be nice to see some refactoring of this code as
well? Both share a lot of steps, so adding something to one path can
easily lead to the other path being forgotten.
--
Michael

#4Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Michael Paquier (#3)
1 attachment(s)
Re: Why don't update minimum recovery point in xact_redo_abort

On 2021/07/27 19:51, Michael Paquier wrote:

On Tue, Jul 27, 2021 at 05:26:05PM +0900, Fujii Masao wrote:

ISTM that you're right. xact_redo_abort() should call XLogFlush() to
update the minimum recovery point on truncation. This seems
the oversight in commit 7bffc9b7bf.

Indeed. It would be nice to see some refactoring of this code as
well? Both share a lot of steps, so adding something to one path can
easily lead to the other path being forgotten.

That's idea, but as far as I read both functions, they seem not
so similar. So I'm not sure how much such refactoring would help.

Anyway I attached the patch that changes only xact_redo_abort()
so that it calls XLogFlush() to update min recovery point.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

xact_redo_abort.patchtext/plain; charset=UTF-8; name=xact_redo_abort.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 441445927e..387f80419a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5983,7 +5983,16 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid,
 	}
 
 	/* Make sure files supposed to be dropped are dropped */
-	DropRelationFiles(parsed->xnodes, parsed->nrels, true);
+	if (parsed->nrels > 0)
+	{
+		/*
+		 * See comments about update of minimum recovery point on truncation,
+		 * in xact_redo_commit().
+		 */
+		XLogFlush(lsn);
+
+		DropRelationFiles(parsed->xnodes, parsed->nrels, true);
+	}
 }
 
 void
#5蔡梦娟(玊于)
mengjuan.cmj@alibaba-inc.com
In reply to: Fujii Masao (#2)
1 attachment(s)
回复:Why don't update minimum recovery point in xact_redo_abort

Hi, Fujii

Thanks for your reply.
And I want to share a patch about the bug with you, I add XLogFlush() in xact_redo_abort() to update the minimum recovery point.

Best Regards,
Suyu

------------------------------------------------------------------
发件人:Fujii Masao <masao.fujii@oss.nttdata.com>
发送时间:2021年7月27日(星期二) 16:26
收件人:蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>
主 题:Re: Why don't update minimum recovery point in xact_redo_abort

On 2021/07/27 2:38, 蔡梦娟(玊于) wrote:

Hi, all

Recently, I got a PANIC while restarts standby, which can be reproduced by the following steps, based on pg 11:
1. begin a transaction in primary node;
2. create a table in the transaction;
3. insert lots of data into the table;
4. do a checkpoint, and restart standby after checkpoint is done in primary node;
5. insert/update lots of data into the table again;
6. abort the transaction.

I could reproduce the issue by using the similar steps and
disabling full_page_writes, in the master branch.

after step 6, fast shutdown standby node, and then restart standby, you will get a PANIC log, and the backtrace is:
#0 0x00007fc663e5a277 in raise () from /lib64/libc.so.6
#1 0x00007fc663e5b968 in abort () from /lib64/libc.so.6
#2 0x0000000000c89f01 in errfinish (dummy=0) at elog.c:707
#3 0x0000000000c8cba3 in elog_finish (elevel=22, fmt=0xdccc18 "WAL contains references to invalid pages") at elog.c:1658
#4 0x00000000005e476a in XLogCheckInvalidPages () at xlogutils.c:253
#5 0x00000000005cbc1a in CheckRecoveryConsistency () at xlog.c:9477
#6 0x00000000005ca5c5 in StartupXLOG () at xlog.c:8609
#7 0x0000000000a025a5 in StartupProcessMain () at startup.c:274
#8 0x0000000000643a5c in AuxiliaryProcessMain (argc=2, argv=0x7ffe4e4849a0) at bootstrap.c:485
#9 0x0000000000a00620 in StartChildProcess (type=StartupProcess) at postmaster.c:6215
#10 0x00000000009f92c6 in PostmasterMain (argc=3, argv=0x4126500) at postmaster.c:1506
#11 0x00000000008eab64 in main (argc=3, argv=0x4126500) at main.c:232

I think the reason for the above error is as follows:
1. the transaction in primary node was aborted finally, the standby node also deleted the table files after replayed the xlog record, however, without updating minimum recovery point;
2. primary node did a checkpoint before abort, and then standby node is restarted, so standby node will recovery from a point where the table has already been created and data has been inserted into the table;
3. when standby node restarts after step 6, it will find the page needed during recovery doesn't exist, which has already been deleted by xact_redo_abort before, so standby node will treat this page as an invalid page;
4. xact_redo_abort drop relation files without updating minumum recovery point, before standby node replay the abort xlog record and forget invalid pages again, it will reach consistency because the abort xlogrecord lsn is greater than minrecoverypoint;
5. during checkRecoveryConsistency, it will check invalid pages, and find that there is invalid page, and the PANIC log will be generated.

So why don't update minimum recovery point in xact_redo_abort, just like XLogFlush in xact_redo_commit, in which way standby could reach consistency and check invalid pages after replayed the abort xlogrecord.

ISTM that you're right. xact_redo_abort() should call XLogFlush() to
update the minimum recovery point on truncation. This seems
the oversight in commit 7bffc9b7bf.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

0001-Update-minimum-recovery-point-on-file-deletion-durin.patchapplication/octet-streamDownload
From a7e1fcec841223121c04883b9ce1e32b70574c62 Mon Sep 17 00:00:00 2001
From: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Date: Tue, 27 Jul 2021 16:23:05 +0000
Subject: [PATCH] Update minimum recovery point on file deletion during redo
 abort xlog record.

    If a file is deleted during redo abort xlog record, we must update minRecoveryPoint.
    Once a file is deleted, there's no going back; it would not be safe to stop recovery
    at a point earlier than that anymore.
---
 src/backend/access/transam/xact.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 4414459..78394d6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5982,8 +5982,29 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid,
 						   false /* backward */ , false /* WAL */ );
 	}
 
-	/* Make sure files supposed to be dropped are dropped */
-	DropRelationFiles(parsed->xnodes, parsed->nrels, true);
+	if (parsed->nrels > 0)
+	{
+		/*
+		 * First update minimum recovery point to cover this WAL record. Once
+		 * a relation is deleted, there's no going back. The buffer manager
+		 * enforces the WAL-first rule for normal updates to relation files,
+		 * so that the minimum recovery point is always updated before the
+		 * corresponding change in the data file is flushed to disk. We have
+		 * to do the same here. 
+		 *
+		 * Doing this before deleting the files means that if a deletion fails
+		 * for some reason, you cannot start up the system even after restart,
+		 * until you fix the underlying situation so that the deletion will
+		 * succeed. Alternatively, we could update the minimum recovery point
+		 * after deletion, but that would leave a small window where the
+		 * WAL-first rule would be violated. 
+		 */
+		XLogFlush(lsn);
+
+		/* Make sure files supposed to be dropped are dropped */
+		DropRelationFiles(parsed->xnodes, parsed->nrels, true);
+	}
+
 }
 
 void
-- 
1.8.3.1

#6Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Fujii Masao (#4)
1 attachment(s)
Re: Why don't update minimum recovery point in xact_redo_abort

On 27/07/2021 19:49, Fujii Masao wrote:

Anyway I attached the patch that changes only xact_redo_abort()
so that it calls XLogFlush() to update min recovery point.

Looks good to me, thanks! FWIW, I used the attached script to reproduce
this.

- Heikki

Attachments:

xact_redo_abort.pytext/x-python; charset=UTF-8; name=xact_redo_abort.pyDownload
#7Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: 蔡梦娟(玊于) (#5)
Re: 回复:Why don't update minimum recovery point in xact_redo_abort

On 2021/07/28 1:55, 蔡梦娟(玊于) wrote:

Hi, Fujii

Thanks for your reply.
And I want to share a patch about the bug with you, I add XLogFlush() in xact_redo_abort() to update the minimum recovery point.

Thanks for the patch! It looks almost the same as the patch I posted upthread.
One diff between them is that you copy-and-pasted the comments for update of
minRecoveryPoint, but instead I just added the comment "See comments ... in
xact_redo_commit()". IMO it's better to avoid putting the same (a bit long)
comments in multiple places so that we can more easily maintain the comments.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#8Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Heikki Linnakangas (#6)
Re: Why don't update minimum recovery point in xact_redo_abort

On 2021/07/28 6:25, Heikki Linnakangas wrote:

On 27/07/2021 19:49, Fujii Masao wrote:

Anyway I attached the patch that changes only xact_redo_abort()
so that it calls XLogFlush() to update min recovery point.

Looks good to me, thanks! FWIW, I used the attached script to reproduce this.

Thanks for the review!

Barring any objection, I will commit the patch and
back-patch it to all supported versions.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#9Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Fujii Masao (#8)
Re: Why don't update minimum recovery point in xact_redo_abort

On 2021/07/28 12:44, Fujii Masao wrote:

On 2021/07/28 6:25, Heikki Linnakangas wrote:

On 27/07/2021 19:49, Fujii Masao wrote:

Anyway I attached the patch that changes only xact_redo_abort()
so that it calls XLogFlush() to update min recovery point.

Looks good to me, thanks! FWIW, I used the attached script to reproduce this.

Thanks for the review!

Barring any objection, I will commit the patch and
back-patch it to all supported versions.

Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION