[BUG] FailedAssertion in SnapBuildPurgeOlderTxn
Hi!
PREAMBLE
For a last couple of months, I stumbled into a problem while running tests
on ARM (Debain, aarch64) and some more wired platforms
for my 64–bit XIDs patch set. Test contrib/test_decoding
(catalog_change_snapshot) rarely failed with the next message:
TRAP: FailedAssertion("TransactionIdIsNormal(InitialRunningXacts[0]/messages/by-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2@amazon.com) &&
TransactionIdIsNormal(builder->xmin)", File: "snapbuild.c"
I have plenty of failing on ARM, couple on x86 and none (if memory serves)
on x86–64.
At first, my thought was to blame my 64–bit XID patch for what, but this is
not the case. This error persist from PG15 to PG10
without any patch applied. Though hard to reproduce.
PROBLEM
After some investigation, I think, the problem is in the snapbuild.c
(commit 272248a0c1b1, see [0]/messages/by-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2@amazon.com). We do allocate InitialRunningXacts
array in the context of builder->context, but for the time when we call
SnapBuildPurgeOlderTxn this context may be already free'd. Thus,
InitialRunningXacts array become array of 2139062143 (i.e. 0x7F7F7F7F)
values. This is not caused buildfarm to fail due to that code:
if (!NormalTransactionIdPrecedes(InitialRunningXacts[0]/messages/by-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2@amazon.com,
builder->xmin))
return;
Since the cluster is initialised with XID way less than 0x7F7F7F7F, we get
to return here, but the problem is still existing.
I've attached the patch based on branch REL_15_STABLE to reproduce the
problem on x86-64.
On my patch set of 64–bit XID's this problem manifested since I do init
cluster with XID far beyond 32–bit bound.
Alternatively, I did try to use my patch [1] to init cluster with first
transaction 2139062143 (i.e. 0x7F7F7F7F).
Then put pg_sleep call just like in the attached patch:
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -968,6 +968,8 @@ SnapBuildPurgeOlderTxn(SnapBuild *builder)
if (NInitialRunningXacts == 0)
return;
+ pg_usleep(1000000L * 2L);
+
/* bound check if there is at least one transaction to remove */
if (!NormalTransactionIdPrecedes(InitialRunningXacts[0],
builder->xmin))
Run installcheck-force for many times for a test_decoding/
catalog_change_snapshot's and got a segfault.
CONCLUSION
In snapbuild.c, context allocated array InitialRunningXacts may be free'd,
this caused assertion failed (at best) or
may lead to the more serious problems.
P.S.
Simple fix like:
@@ -1377,7 +1379,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr
lsn, xl_running_xacts *runn
* changes. See SnapBuildXidSetCatalogChanges.
*/
NInitialRunningXacts = nxacts;
- InitialRunningXacts = MemoryContextAlloc(builder->context,
sz);
+ InitialRunningXacts = MemoryContextAlloc(TopMemoryContext,
sz);
memcpy(InitialRunningXacts, running->xids, sz);
qsort(InitialRunningXacts, nxacts, sizeof(TransactionId),
xidComparator);
seems to solve the described problem, but I'm not in the context of [0]/messages/by-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2@amazon.com and
why array is allocated in builder->context.
[0]: /messages/by-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2@amazon.com
[1]: /messages/by-id/CACG=ezaa4vqYjJ16yoxgrpa-=gXnf0Vv3Ey9bjGrRRFN2YyWFQ@mail.gmail.com
/messages/by-id/CACG=ezaa4vqYjJ16yoxgrpa-=gXnf0Vv3Ey9bjGrRRFN2YyWFQ@mail.gmail.com
--
Best regards,
Maxim Orlov.
Attachments:
0001-catalog_change_snapshot-fail.patch.txttext/plain; charset=US-ASCII; name=0001-catalog_change_snapshot-fail.patch.txtDownload
From d09a031f1f807cdfe1e02000b2bf4fd3eaaedd8f Mon Sep 17 00:00:00 2001
From: Maxim Orlov <orlovmg@gmail.com>
Date: Mon, 21 Nov 2022 14:50:02 +0300
Subject: [PATCH] catalog_change_snapshot-fail
---
contrib/test_decoding/Makefile | 1007 ++++++++++++++++++-
src/backend/replication/logical/snapbuild.c | 12 +
2 files changed, 1012 insertions(+), 7 deletions(-)
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c7ce603706..aaf7a63411 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -3,12 +3,1006 @@
MODULES = test_decoding
PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
-REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
- decoding_into_rel binary prepared replorigin time messages \
- spill slot truncate stream stats twophase twophase_stream
-ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
- oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
- twophase_snapshot slot_creation_error catalog_change_snapshot
+ISOLATION = catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot \
+ catalog_change_snapshot
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
@@ -17,7 +1011,6 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
# typical installcheck users do not have (e.g. buildfarm clients).
NO_INSTALLCHECK = 1
-TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index cdf4aa01e9..54cd6f0e67 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -277,6 +277,7 @@ static bool ExportInProgress = false;
* transaction has catalog change. But that won't be a problem since we
* use snapshot built during decoding only for reading system catalogs.
*/
+static TransactionId InitialXact = InvalidTransactionId;
static TransactionId *InitialRunningXacts = NULL;
static int NInitialRunningXacts = 0;
@@ -968,6 +969,12 @@ SnapBuildPurgeOlderTxn(SnapBuild *builder)
if (NInitialRunningXacts == 0)
return;
+ pg_usleep(1000000L * 2L);
+
+ if (InitialRunningXacts[0] != InitialXact)
+ elog(PANIC, "XIDS doesn't match (%u vs %u)", InitialRunningXacts[0],
+ InitialXact);
+
/* bound check if there is at least one transaction to remove */
if (!NormalTransactionIdPrecedes(InitialRunningXacts[0],
builder->xmin))
@@ -991,12 +998,16 @@ SnapBuildPurgeOlderTxn(SnapBuild *builder)
}
if (surviving_xids > 0)
+ {
memcpy(InitialRunningXacts, workspace,
sizeof(TransactionId) * surviving_xids);
+ InitialXact = InitialRunningXacts[0];
+ }
else
{
pfree(InitialRunningXacts);
InitialRunningXacts = NULL;
+ InitialXact = InvalidTransactionId;
}
elog(DEBUG3, "purged initial running transactions from %u to %u, oldest running xid %u",
@@ -1380,6 +1391,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
InitialRunningXacts = MemoryContextAlloc(builder->context, sz);
memcpy(InitialRunningXacts, running->xids, sz);
qsort(InitialRunningXacts, nxacts, sizeof(TransactionId), xidComparator);
+ InitialXact = InitialRunningXacts[0];
/* there won't be any state to cleanup */
return false;
--
2.38.1
Hi,
On 2022-11-21 15:47:12 +0300, Maxim Orlov wrote:
After some investigation, I think, the problem is in the snapbuild.c
(commit 272248a0c1b1, see [0]). We do allocate InitialRunningXacts
array in the context of builder->context, but for the time when we call
SnapBuildPurgeOlderTxn this context may be already free'd. Thus,
InitialRunningXacts array become array of 2139062143 (i.e. 0x7F7F7F7F)
values. This is not caused buildfarm to fail due to that code:
Amit, that does indeed seem to be a problem...
Greetings,
Andres Freund
On Tue, Nov 22, 2022 at 2:22 AM Andres Freund <andres@anarazel.de> wrote:
On 2022-11-21 15:47:12 +0300, Maxim Orlov wrote:
After some investigation, I think, the problem is in the snapbuild.c
(commit 272248a0c1b1, see [0]). We do allocate InitialRunningXacts
array in the context of builder->context, but for the time when we call
SnapBuildPurgeOlderTxn this context may be already free'd. Thus,
InitialRunningXacts array become array of 2139062143 (i.e. 0x7F7F7F7F)
values. This is not caused buildfarm to fail due to that code:Amit, that does indeed seem to be a problem...
I'll look into it.
--
With Regards,
Amit Kapila.
On Mon, Nov 21, 2022 at 6:17 PM Maxim Orlov <orlovmg@gmail.com> wrote:
PROBLEM
After some investigation, I think, the problem is in the snapbuild.c (commit 272248a0c1b1, see [0]). We do allocate InitialRunningXacts
array in the context of builder->context, but for the time when we call SnapBuildPurgeOlderTxn this context may be already free'd.
I think you are seeing it freed in SnapBuildPurgeOlderTxn when we
finish and restart decoding in the same session. After finishing the
first decoding, it frees the decoding context but we forgot to reset
NInitialRunningXacts and InitialRunningXacts array. So, next time when
we start decoding in the same session where we don't restore any
serialized snapshot, it can lead to the problem you are seeing because
NInitialRunningXacts (and InitialRunningXacts array) won't have sane
values.
This can happen in the catalog_change_snapshot test as we have
multiple permutations and those use the same session across a restart
of decoding.
Simple fix like: @@ -1377,7 +1379,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn * changes. See SnapBuildXidSetCatalogChanges. */ NInitialRunningXacts = nxacts; - InitialRunningXacts = MemoryContextAlloc(builder->context, sz); + InitialRunningXacts = MemoryContextAlloc(TopMemoryContext, sz); memcpy(InitialRunningXacts, running->xids, sz); qsort(InitialRunningXacts, nxacts, sizeof(TransactionId), xidComparator);seems to solve the described problem, but I'm not in the context of [0] and why array is allocated in builder->context.
It will leak the memory for InitialRunningXacts. We need to reset
NInitialRunningXacts (and InitialRunningXacts) as mentioned above.
Thank you for the report and initial analysis. I have added Sawada-San
to know his views as he was the primary author of this work.
--
With Regards,
Amit Kapila.
Thank you for the report and initial analysis. I have added Sawada-San
to know his views as he was the primary author of this work.--
With Regards,
Amit Kapila.
OK, thanks a lot. I hope, we'll fix this soon.
--
Best regards,
Maxim Orlov.
Hi,
On Tue, Nov 22, 2022 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Nov 21, 2022 at 6:17 PM Maxim Orlov <orlovmg@gmail.com> wrote:
PROBLEM
After some investigation, I think, the problem is in the snapbuild.c (commit 272248a0c1b1, see [0]). We do allocate InitialRunningXacts
array in the context of builder->context, but for the time when we call SnapBuildPurgeOlderTxn this context may be already free'd.I think you are seeing it freed in SnapBuildPurgeOlderTxn when we
finish and restart decoding in the same session. After finishing the
first decoding, it frees the decoding context but we forgot to reset
NInitialRunningXacts and InitialRunningXacts array. So, next time when
we start decoding in the same session where we don't restore any
serialized snapshot, it can lead to the problem you are seeing because
NInitialRunningXacts (and InitialRunningXacts array) won't have sane
values.This can happen in the catalog_change_snapshot test as we have
multiple permutations and those use the same session across a restart
of decoding.
I have the same analysis. In order to restart the decoding from the
LSN where we don't restore any serialized snapshot, we somehow need to
advance the slot's restart_lsn. In this case, I think it happened
since the same session drops at the end of the first scenario and
creates the replication slot with the same name at the beginning of
the second scenario in catalog_change_snapshot.spec.
Simple fix like: @@ -1377,7 +1379,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn * changes. See SnapBuildXidSetCatalogChanges. */ NInitialRunningXacts = nxacts; - InitialRunningXacts = MemoryContextAlloc(builder->context, sz); + InitialRunningXacts = MemoryContextAlloc(TopMemoryContext, sz); memcpy(InitialRunningXacts, running->xids, sz); qsort(InitialRunningXacts, nxacts, sizeof(TransactionId), xidComparator);seems to solve the described problem, but I'm not in the context of [0] and why array is allocated in builder->context.
It will leak the memory for InitialRunningXacts. We need to reset
NInitialRunningXacts (and InitialRunningXacts) as mentioned above.Thank you for the report and initial analysis. I have added Sawada-San
to know his views as he was the primary author of this work.
Thanks!
I've attached a draft patch. To fix it, I think we can reset
InitialRunningXacts and NInitialRunningXacts at FreeSnapshotBuilder()
and add an assertion in AllocateSnapshotBuilder() to make sure both
are reset. Regarding the tests, the patch includes a new scenario to
reproduce this issue. However, since the issue can be reproduced also
by the existing scenario (with low probability, though), I'm not sure
it's worth adding the new scenario.
I've not checked if the patch works for version 14 or older yet.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
reset_initial_running_xacts.patchapplication/octet-stream; name=reset_initial_running_xacts.patchDownload
diff --git a/contrib/test_decoding/expected/catalog_change_snapshot.out b/contrib/test_decoding/expected/catalog_change_snapshot.out
index b33e49c0b1..3310f6bd2c 100644
--- a/contrib/test_decoding/expected/catalog_change_snapshot.out
+++ b/contrib/test_decoding/expected/catalog_change_snapshot.out
@@ -88,6 +88,70 @@ stop
(1 row)
+starting permutation: s0_init s0_begin s0_savepoint s0_truncate s1_checkpoint s1_get_changes s0_commit s0_begin s0_xid s1_checkpoint s1_get_changes s0_commit s1_get_changes s1_checkpoint s0_drop s0_init s1_checkpoint s1_get_changes
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_truncate: TRUNCATE tbl1;
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_xid: SELECT 'assigned' FROM pg_current_xact_id();
+?column?
+--------
+assigned
+(1 row)
+
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+---------------------------------------
+BEGIN
+table public.tbl1: TRUNCATE: (no-flags)
+COMMIT
+(3 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s1_checkpoint: CHECKPOINT;
+step s0_drop: SELECT 'drop' FROM pg_drop_replication_slot('isolation_slot');
+?column?
+--------
+drop
+(1 row)
+
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init
+(1 row)
+
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+?column?
+--------
+stop
+(1 row)
+
+
starting permutation: s0_init s0_begin s0_savepoint s0_insert s1_checkpoint s1_get_changes s0_truncate s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes
step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
?column?
diff --git a/contrib/test_decoding/specs/catalog_change_snapshot.spec b/contrib/test_decoding/specs/catalog_change_snapshot.spec
index 770dbd642d..cbf99f1320 100644
--- a/contrib/test_decoding/specs/catalog_change_snapshot.spec
+++ b/contrib/test_decoding/specs/catalog_change_snapshot.spec
@@ -17,11 +17,13 @@ teardown
session "s0"
setup { SET synchronous_commit=on; }
step "s0_init" { SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); }
+step "s0_drop" { SELECT 'drop' FROM pg_drop_replication_slot('isolation_slot'); }
step "s0_begin" { BEGIN; }
step "s0_savepoint" { SAVEPOINT sp1; }
step "s0_truncate" { TRUNCATE tbl1; }
step "s0_insert" { INSERT INTO tbl1 VALUES (1); }
step "s0_insert2" { INSERT INTO user_cat VALUES (1); }
+step "s0_xid" { SELECT 'assigned' FROM pg_current_xact_id(); }
step "s0_commit" { COMMIT; }
session "s1"
@@ -54,9 +56,25 @@ permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s
# containing catalog changes.
permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_insert2" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+# XXX: This scenario simulates the execution of both the first permutation and the
+# part of the second permutation in the one scenario.
+#
+# The second from the last decoding restarts from the RUNNING_XACTS record that has
+# one running transaction and is emitted during the second checkpoint. Then it setups
+# the space and the count for the initial running transactions.
+#
+# The problem was that when it finishes the allocated, it forgot to reset them. The
+# allocated space is freed along with its memory context but the counter is not
+# reset. After that, since we recreate the replication with the same name by another
+# session, the slot's restart_lsn is re-initialized to a new LSN and restarting the
+# decoding from there (the last decoding) doesn't need to restore the snapshot. The
+# session ends up using unreset data.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_xid" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes" "s1_checkpoint" "s0_drop" "s0_init" "s1_checkpoint" "s1_get_changes"
+
# The last decoding restarts from the first checkpoint and adds invalidation
# messages generated by "s0_truncate" to the subtransaction. While
# processing the commit record for the top-level transaction, we decide
# to skip this xact but ensure that corresponding invalidation messages
# get processed.
permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index cdf4aa01e9..d2fe2d222e 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -343,6 +343,9 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
MemoryContextSwitchTo(oldcontext);
+ /* Make sure the initial running transactions array is empty */
+ Assert(InitialRunningXacts == NULL && NInitialRunningXacts == 0);
+
return builder;
}
@@ -363,6 +366,10 @@ FreeSnapshotBuilder(SnapBuild *builder)
/* other resources are deallocated via memory context reset */
MemoryContextDelete(context);
+
+ /* InitialRunningXacts is freed along with the context */
+ InitialRunningXacts = NULL;
+ NInitialRunningXacts = 0;
}
/*
I've attached a draft patch. To fix it, I think we can reset
InitialRunningXacts and NInitialRunningXacts at FreeSnapshotBuilder()
and add an assertion in AllocateSnapshotBuilder() to make sure both
are reset.
Thanks for the patch. It works fine. I've tested this patch for 15 and 11
versions on x86_64 and ARM
and see no fails. But the function pg_current_xact_id added by 4c04be9b05ad
doesn't exist in PG11.
Regarding the tests, the patch includes a new scenario to
reproduce this issue. However, since the issue can be reproduced also
by the existing scenario (with low probability, though), I'm not sure
it's worth adding the new scenario.
AFAICS, the test added doesn't 100% reproduce this issue, so, maybe, it
does not worth it. But, I do not have a strong opinion here.
Let's add tests in a separate commit and let the actual committer to decide
what to do, should we?
--
Best regards,
Maxim Orlov.
On Tue, Nov 22, 2022 at 10:33 PM Maxim Orlov <orlovmg@gmail.com> wrote:
Regarding the tests, the patch includes a new scenario to
reproduce this issue. However, since the issue can be reproduced also
by the existing scenario (with low probability, though), I'm not sure
it's worth adding the new scenario.AFAICS, the test added doesn't 100% reproduce this issue, so, maybe, it does not worth it. But, I do not have a strong opinion here.
Let's add tests in a separate commit and let the actual committer to decide what to do, should we?
+1 to not have a test for this as the scenario can already be tested
by the existing set of tests.
--
With Regards,
Amit Kapila.
On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 22, 2022 at 10:33 PM Maxim Orlov <orlovmg@gmail.com> wrote:
Regarding the tests, the patch includes a new scenario to
reproduce this issue. However, since the issue can be reproduced also
by the existing scenario (with low probability, though), I'm not sure
it's worth adding the new scenario.AFAICS, the test added doesn't 100% reproduce this issue, so, maybe, it does not worth it. But, I do not have a strong opinion here.
Let's add tests in a separate commit and let the actual committer to decide what to do, should we?+1 to not have a test for this as the scenario can already be tested
by the existing set of tests.
Agreed not to have a test case for this.
I've attached an updated patch. I've confirmed this patch works for
all supported branches.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
0001-Reset-InitialRunningXacts-array-when-freeing-SnapBui.patchapplication/octet-stream; name=0001-Reset-InitialRunningXacts-array-when-freeing-SnapBui.patchDownload
From 5bec3fb3533cd25e5d89f3355b8f1598e6967e9b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Nov 2022 16:58:44 +0900
Subject: [PATCH] Reset InitialRunningXacts array when freeing SnapBuild.
After finishing the decoding, it frees the decoding context but we
forgot to reset NInitialRunningXacts and InitialRunningXacts
array. So, the next time when starting decoding in the same session
where we don't restore any serialized snapshot, we ended up using the
unreset array.
An oversight in 272248a0c1b18a82c4266e0dc3b526d4d2637de3, so backpatch
to 11.
Reported-by: Maxim Orlov
Author: Masahiko Sawada
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CACG=ezZoz_KG+Ryh9MrU_g5e0HiVoHocEvqFF=NRrhrwKmEQJQ@mail.gmail.com
Backpatch-through: v11
---
src/backend/replication/logical/snapbuild.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index cdf4aa01e9..d2fe2d222e 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -343,6 +343,9 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
MemoryContextSwitchTo(oldcontext);
+ /* Make sure the initial running transactions array is empty */
+ Assert(InitialRunningXacts == NULL && NInitialRunningXacts == 0);
+
return builder;
}
@@ -363,6 +366,10 @@ FreeSnapshotBuilder(SnapBuild *builder)
/* other resources are deallocated via memory context reset */
MemoryContextDelete(context);
+
+ /* InitialRunningXacts is freed along with the context */
+ InitialRunningXacts = NULL;
+ NInitialRunningXacts = 0;
}
/*
--
2.31.1
Agreed not to have a test case for this.
I've attached an updated patch. I've confirmed this patch works for
all supported branches.Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
It works for me as well. Thanks!
I've created a commitfest entry for this patch, see
https://commitfest.postgresql.org/41/4024/
Hope, it will be committed soon.
--
Best regards,
Maxim Orlov.
On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Agreed not to have a test case for this.
I've attached an updated patch. I've confirmed this patch works for
all supported branches.
I have slightly changed the checks used in the patch, otherwise looks
good to me. I am planning to push (v11-v15) the attached tomorrow
unless there are more comments.
--
With Regards,
Amit Kapila.
Attachments:
v2-0001-Fix-uninitialized-access-to-InitialRunningXacts-d.patchapplication/octet-stream; name=v2-0001-Fix-uninitialized-access-to-InitialRunningXacts-d.patchDownload
From 29bfecd52f2356f0101f0db47f753e00e197cb48 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 24 Nov 2022 16:28:23 +0530
Subject: [PATCH v2] Fix uninitialized access to InitialRunningXacts during
decoding.
In commit 272248a0c, we introduced an InitialRunningXacts array to
remember transactions and subtransactions that were running when the
xl_running_xacts record that we decoded was written. This array was
allocated in the snapshot builder memory context after we restore
serialized snapshot but we forgot to reset the array while freeing the
builder memory context. So, the next when starting decoding in the same
session where we don't restore any serialized snapshot, we ended up using
the uninitialized array and that can lead to unpredictable behavior.
Reported-by: Maxim Orlov
Author: Masahiko Sawada
Reviewed-by: Amit Kapila, Maxim Orlov
Backpatch-through: 11
Discussion: https://postgr.es/m/CACG=ezZoz_KG+Ryh9MrU_g5e0HiVoHocEvqFF=NRrhrwKmEQJQ@mail.gmail.com
---
src/backend/replication/logical/snapbuild.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index aa2e9f303e..02d73e5c93 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -377,6 +377,9 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
MemoryContextSwitchTo(oldcontext);
+ /* The initial running transactions array must be empty. */
+ Assert(NInitialRunningXacts == 0 && InitialRunningXacts == NULL);
+
return builder;
}
@@ -397,6 +400,10 @@ FreeSnapshotBuilder(SnapBuild *builder)
/* other resources are deallocated via memory context reset */
MemoryContextDelete(context);
+
+ /* InitialRunningXacts is freed along with the context */
+ NInitialRunningXacts = 0;
+ InitialRunningXacts = NULL;
}
/*
--
2.28.0.windows.1
On Thu, Nov 24, 2022 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Agreed not to have a test case for this.
I've attached an updated patch. I've confirmed this patch works for
all supported branches.I have slightly changed the checks used in the patch, otherwise looks
good to me. I am planning to push (v11-v15) the attached tomorrow
unless there are more comments.
Pushed.
--
With Regards,
Amit Kapila.
On Fri, 25 Nov 2022 at 09:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Nov 24, 2022 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Agreed not to have a test case for this.
I've attached an updated patch. I've confirmed this patch works for
all supported branches.I have slightly changed the checks used in the patch, otherwise looks
good to me. I am planning to push (v11-v15) the attached tomorrow
unless there are more comments.Pushed.
A big thanks to you! Could you also, close the commitfest entry
https://commitfest.postgresql.org/41/4024/, please?
--
Best regards,
Maxim Orlov.
On Fri, Nov 25, 2022 at 5:58 PM Maxim Orlov <orlovmg@gmail.com> wrote:
On Fri, 25 Nov 2022 at 09:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Nov 24, 2022 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Agreed not to have a test case for this.
I've attached an updated patch. I've confirmed this patch works for
all supported branches.I have slightly changed the checks used in the patch, otherwise looks
good to me. I am planning to push (v11-v15) the attached tomorrow
unless there are more comments.Pushed.
A big thanks to you! Could you also, close the commitfest entry https://commitfest.postgresql.org/41/4024/, please?
Closed.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com