test_decoding assertion failure for the loss of top-sub transaction relationship
Hi, hackers
I've met an assertion failure of logical decoding with below scenario on HEAD.
---
<preparation>
create table tab1 (val integer);
select 'init' from pg_create_logical_replication_slot('regression_slot', 'test_decoding');
<session1>
begin;
savepoint sp1;
insert into tab1 values (1);
<session2>
checkpoint; -- for RUNNING_XACT
select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
<session1>
truncate tab1; -- for NEW_CID
commit;
begin;
insert into tab1 values (3);
<session2>
checkpoint;
select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
<session1>
commit;
<session2>
select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
---
Here, it's not a must but is advisable to make LOG_SNAPSHOT_INTERVAL_MS bigger so that
we can issue RUNNING_XACT according to our checkpoint commands explicitly.
In the above scenario, the first checkpoint generates RUNNING_XACT after the wal record
(for ReorderBufferAssignChild) that associates sub transaction with its top transaction.
This means that once we restart from RUNNING_XACT, we lose the association between top
transaction and sub transaction and then we can't mark the top transaction as catalog
modifying transaction by decoding NEW_CID (written after RUNNING_XACT), if the
sub transaction changes the catalog.
Therefore, this leads to the failure for the assert that can check
the consistency that when one sub transaction modifies the catalog,
its top transaction should be marked so as well.
I feel we need to remember the relationship between top transaction and sub transaction
in the serialized snapshot even before changing catalog at decoding RUNNING_XACT,
so that we can keep track of the association after the restart. What do you think ?
The stack call of this failure and related information is below.
(gdb) bt
#0 0x00007f2632588387 in raise () from /lib64/libc.so.6
#1 0x00007f2632589a78 in abort () from /lib64/libc.so.6
#2 0x0000000000b3eba1 in ExceptionalCondition (conditionName=0xd137e0 "!needs_snapshot || needs_timetravel",
errorType=0xd130c5 "FailedAssertion", fileName=0xd130b9 "snapbuild.c", lineNumber=1116) at assert.c:69
#3 0x0000000000911257 in SnapBuildCommitTxn (builder=0x23f0638, lsn=22386632, xid=728, nsubxacts=1,
subxacts=0x2bfcc88, xinfo=79) at snapbuild.c:1116
#4 0x00000000008fa420 in DecodeCommit (ctx=0x23e0108, buf=0x7fff4a1f9220, parsed=0x7fff4a1f9020, xid=728,
two_phase=false) at decode.c:630
#5 0x00000000008f9953 in xact_decode (ctx=0x23e0108, buf=0x7fff4a1f9220) at decode.c:216
#6 0x00000000008f967d in LogicalDecodingProcessRecord (ctx=0x23e0108, record=0x23e04a0) at decode.c:119
#7 0x0000000000900b63 in pg_logical_slot_get_changes_guts (fcinfo=0x23d80a8, confirm=true, binary=false)
at logicalfuncs.c:271
#8 0x0000000000900ca0 in pg_logical_slot_get_changes (fcinfo=0x23d80a8) at logicalfuncs.c:338
...
(gdb) frame 3
#3 0x0000000000911257 in SnapBuildCommitTxn (builder=0x23f0638, lsn=22386632, xid=728, nsubxacts=1,
subxacts=0x2bfcc88, xinfo=79) at snapbuild.c:1116
1116 Assert(!needs_snapshot || needs_timetravel);
(gdb) list
1111 {
1112 /* record that we cannot export a general snapshot anymore */
1113 builder->committed.includes_all_transactions = false;
1114 }
1115
1116 Assert(!needs_snapshot || needs_timetravel);
1117
1118 /*
1119 * Adjust xmax of the snapshot builder, we only do that for committed,
1120 * catalog modifying, transactions, everything else isn't interesting for
Best Regards,
Takamichi Osumi
Hi Hackers,
Therefore, this leads to the failure for the assert that can check
the consistency that when one sub transaction modifies the catalog,
its top transaction should be marked so as well.I feel we need to remember the relationship between top transaction and sub
transaction
in the serialized snapshot even before changing catalog at decoding
RUNNING_XACT,
so that we can keep track of the association after the restart. What do you think ?
PSA patch that fixes the failure.
This adds pairs of sub-top transactions to the SnapBuild, and it will be serialized and restored.
The pair will be checked when we mark the ReorderBufferTXN as RBTXN_HAS_CATALOG_CHANGES.
Thanks to off-list discussion with Osumi-san.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
0001-mark-RBTXN_HAS_CATALOG_CHANGES-to-the-top-transactio.patchapplication/octet-stream; name=0001-mark-RBTXN_HAS_CATALOG_CHANGES-to-the-top-transactio.patchDownload
From fafdad65f8116f2646cfe51a9cf05408b38a2653 Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Thu, 1 Sep 2022 08:20:31 +0000
Subject: [PATCH] mark RBTXN_HAS_CATALOG_CHANGES to the top transaction
correctly
When a subtransaction is recored to WAL, only the first record has the xid
of the top-level transaction. So when a previously serialized snapshot was
restored, the first WAL record of a sub transaction might be not read in which
case the top-sub relationship of transactions will be missed.
Normally it is not harmful, but if the subtransaction modifies the catalog,
the mark RBTXN_HAS_CATALOG_CHANGES will be not propagated to the top transaction.
This contradiction may cause the core dump.
To fix this problem, this change adds the pair lists of (sub)transaction IDs to
SnapBuild. Lists are filled when the snapshot is serialized, and it is used
to mark top-transaction as containing catalog changes.
---
.../expected/catalog_change_snapshot.out | 45 +++++
.../specs/catalog_change_snapshot.spec | 11 ++
src/backend/replication/logical/decode.c | 6 +-
.../replication/logical/reorderbuffer.c | 46 +++++
src/backend/replication/logical/snapbuild.c | 167 +++++++++++++++++-
src/include/replication/reorderbuffer.h | 9 +
src/include/replication/snapbuild.h | 2 +
7 files changed, 280 insertions(+), 6 deletions(-)
diff --git a/contrib/test_decoding/expected/catalog_change_snapshot.out b/contrib/test_decoding/expected/catalog_change_snapshot.out
index d2a4bdfcc1..e36f417e4a 100644
--- a/contrib/test_decoding/expected/catalog_change_snapshot.out
+++ b/contrib/test_decoding/expected/catalog_change_snapshot.out
@@ -96,3 +96,48 @@ COMMIT
stop
(1 row)
+
+starting permutation: s0_init s0_begin s0_savepoint s0_insert s1_checkpoint s1_get_changes s0_truncate s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s0_truncate: TRUNCATE tbl1;
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+table public.tbl1: TRUNCATE: (no-flags)
+COMMIT
+(4 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+COMMIT
+(3 rows)
+
+?column?
+--------
+stop
+(1 row)
+
diff --git a/contrib/test_decoding/specs/catalog_change_snapshot.spec b/contrib/test_decoding/specs/catalog_change_snapshot.spec
index ff8f68489b..e3f72cc3fe 100644
--- a/contrib/test_decoding/specs/catalog_change_snapshot.spec
+++ b/contrib/test_decoding/specs/catalog_change_snapshot.spec
@@ -57,3 +57,14 @@ permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s
# checkpoint record it prunes one of the xacts in that list and when decoding the
# next checkpoint, it will completely prune that list.
permutation "s0_init" "s0_begin" "s0_truncate" "s2_begin" "s2_truncate" "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s2_commit" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+
+# Test that we can restore the association between xid of the catalog modifying top
+# transaction and that of the subtransaction from the serialized snapshot. This is
+# necessary when we restart decoding from RUNNING_XACT without the wal to associate
+# subtransaction to its top transaction. The last decoding starts from the first
+# checkpoint and NEW_CID of "s0_truncate" doesn't mark the top transaction as catalog
+# modifying transaction. In this scenario, serialized snapshot restores the
+# relationship between transactions and thus the decoding works without a contradition
+# that one subtransaction changed catalog transaction while its top transaction
+# didn't.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 1667d720b1..4e5602eaae 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -272,8 +272,8 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
buf->origptr,
invals->nmsgs,
invals->msgs);
- ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
- buf->origptr);
+ SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid,
+ buf->origptr);
}
else if ((!ctx->fast_forward))
ReorderBufferImmediateInvalidation(ctx->reorder,
@@ -499,7 +499,7 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
break;
(void) SnapBuildProcessChange(builder, xid, buf->origptr);
- ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+ SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid, buf->origptr);
break;
case XLOG_HEAP_CONFIRM:
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 89cf9f9389..48b00beefa 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -293,6 +293,8 @@ static void ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
ReorderBufferChange *change,
bool addition, Size sz);
+static int cmp_subxids_cell(const ListCell *a, const ListCell *b);
+
/*
* Allocate a new ReorderBuffer and clean out any old serialized state from
* prior ReorderBuffer instances for the same slot.
@@ -5224,3 +5226,47 @@ restart:
*cmax = ent->cmax;
return true;
}
+
+static int
+cmp_subxids_cell(const ListCell *a, const ListCell *b)
+{
+ subxids_cell *suba = (subxids_cell *) lfirst(a);
+ subxids_cell *subb = (subxids_cell *) lfirst(b);
+
+ return xidComparator(&suba->subtxn, &subb->subtxn);
+}
+
+/*
+ * Lookup all ReorderBufferTXNs and list them that have toplevel_xid.
+ * subxids_cell is used as the cell of the list.
+ */
+List *
+ReorderBufferGetSubtxns(ReorderBuffer *rb)
+{
+ HASH_SEQ_STATUS hash_seq;
+ ReorderBufferTXNByIdEnt *entry;
+ List *result = NIL;
+
+ hash_seq_init(&hash_seq, rb->by_txn);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ subxids_cell *cell;
+
+ /*
+ * Skip if picked transaction seems top.
+ */
+ if (!TransactionIdIsValid(entry->txn->toplevel_xid))
+ continue;
+
+ cell = (subxids_cell *) palloc0(sizeof(subxids_cell));
+
+ cell->subtxn = entry->txn->xid;
+ cell->txn = entry->txn->toplevel_xid;
+
+ result = lappend(result, cell);
+ }
+
+ list_sort(result, cmp_subxids_cell);
+
+ return result;
+}
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index bf72ad45ec..7573f9c9d9 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -268,6 +268,30 @@ struct SnapBuild
/* This array must be sorted in xidComparator order */
TransactionId *xip;
} catchange;
+
+ /*
+ * Two arrays that have xids of (sub)transactions that were running
+ * when the snapshot was serialized.
+ *
+ * When a subtransaction is recored to WAL, only the first record has the xid
+ * of the top-level transaction. So if a previously serialized snapshot is
+ * restored, the first WAL record of a sub transaction may be not read in which
+ * case the top-sub relationship of transactions will be missed.
+ *
+ * Now to avoid the above problem, we serialize the pair of sub-top transactions.
+ *
+ * XXX: ReorderBufferGetSubtxns() returns data as List, but we record here as arrays.
+ * Is there a better way to record them?
+ */
+ struct
+ {
+ /* number of transactions */
+ size_t xcnt;
+
+ /* subxip must be sorted in xidComparator order */
+ TransactionId *subxip;
+ TransactionId *xip;
+ } subxids;
};
/*
@@ -812,7 +836,7 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
* we only log new_cid's if a catalog tuple was modified, so mark the
* transaction as containing catalog modifications
*/
- ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);
+ SnapBuildXidSetCatalogChanges(builder, xid, lsn);
ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
xlrec->target_locator, xlrec->target_tid,
@@ -1003,6 +1027,47 @@ SnapBuildPurgeOlderTxn(SnapBuild *builder)
builder->xmin, builder->xmax);
builder->catchange.xcnt = surviving_xids;
}
+
+ /*
+ * Purge xids in ->subxids as well. The purged array must also be sorted
+ * in xidComparator order.
+ */
+ if (builder->subxids.xcnt > 0)
+ {
+ /*
+ * Since subxids.xip is sorted, we find the lower bound of xids that
+ * are still interesting.
+ */
+ for (off = 0; off < builder->subxids.xcnt; off++)
+ {
+ if (TransactionIdFollowsOrEquals(builder->subxids.subxip[off],
+ builder->xmin))
+ break;
+ }
+
+ surviving_xids = builder->subxids.xcnt - off;
+
+ if (surviving_xids > 0)
+ {
+ memmove(builder->subxids.subxip, &(builder->subxids.subxip[off]),
+ surviving_xids * sizeof(TransactionId));
+ memmove(builder->subxids.xip, &(builder->subxids.xip[off]),
+ surviving_xids * sizeof(TransactionId));
+ }
+ else
+ {
+ pfree(builder->subxids.subxip);
+ builder->subxids.subxip = NULL;
+ pfree(builder->subxids.xip);
+ builder->subxids.xip = NULL;
+ }
+
+ elog(DEBUG3, "purged subtransactions from %u to %u, xmin: %u, xmax: %u",
+ (uint32) builder->catchange.xcnt, (uint32) surviving_xids,
+ builder->xmin, builder->xmax);
+
+ builder->subxids.xcnt = surviving_xids;
+ }
}
/*
@@ -1595,6 +1660,8 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
TransactionId *catchange_xip = NULL;
MemoryContext old_ctx;
size_t catchange_xcnt;
+ List *list = NIL;
+ size_t sub_xcnt;
char *ondisk_c;
int fd;
char tmppath[MAXPGPATH];
@@ -1686,8 +1753,12 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
catchange_xip = ReorderBufferGetCatalogChangesXacts(builder->reorder);
catchange_xcnt = builder->reorder->catchange_ntxns;
+ /* Get the list of subtransactions */
+ list = ReorderBufferGetSubtxns(builder->reorder);
+ sub_xcnt = list_length(list);
+
needed_length = sizeof(SnapBuildOnDisk) +
- sizeof(TransactionId) * (builder->committed.xcnt + catchange_xcnt);
+ sizeof(TransactionId) * (builder->committed.xcnt + catchange_xcnt + sub_xcnt * 2);
ondisk_c = palloc0(needed_length);
ondisk = (SnapBuildOnDisk *) ondisk_c;
@@ -1707,8 +1778,9 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
ondisk->builder.reorder = NULL;
ondisk->builder.committed.xip = NULL;
ondisk->builder.catchange.xip = NULL;
- /* update catchange only on disk data */
+ /* update catchange and subxids on disk data */
ondisk->builder.catchange.xcnt = catchange_xcnt;
+ ondisk->builder.subxids.xcnt = sub_xcnt;
COMP_CRC32C(ondisk->checksum,
&ondisk->builder,
@@ -1732,6 +1804,37 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
ondisk_c += sz;
}
+ /* copy sub-top transaction arrays */
+ if (sub_xcnt > 0)
+ {
+ TransactionId *subxips = (TransactionId *) palloc(sizeof(TransactionId) * sub_xcnt);
+ TransactionId *topxips = (TransactionId *) palloc(sizeof(TransactionId) * sub_xcnt);
+ ListCell *lc;
+ int i = 0;
+
+ sz = sizeof(TransactionId) * sub_xcnt;
+
+ foreach(lc, list)
+ {
+ subxids_cell *cell = (subxids_cell *) lfirst(lc);
+
+ subxips[i] = cell->subtxn;
+ topxips[i] = cell->txn;
+ i++;
+ }
+
+ memcpy(ondisk_c, subxips, sz);
+ COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
+ ondisk_c += sz;
+
+ memcpy(ondisk_c, topxips, sz);
+ COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
+ ondisk_c += sz;
+
+ pfree(subxips);
+ pfree(topxips);
+ }
+
FIN_CRC32C(ondisk->checksum);
/* we have valid data now, open tempfile and write it there */
@@ -1821,6 +1924,8 @@ out:
pfree(ondisk);
if (catchange_xip)
pfree(catchange_xip);
+ if (list)
+ list_free_deep(list);
}
/*
@@ -1906,6 +2011,19 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
COMP_CRC32C(checksum, ondisk.builder.catchange.xip, sz);
}
+ if (ondisk.builder.subxids.xcnt > 0)
+ {
+ sz = sizeof(TransactionId) * ondisk.builder.subxids.xcnt;
+
+ ondisk.builder.subxids.subxip = MemoryContextAllocZero(builder->context, sz);
+ SnapBuildRestoreContents(fd, (char *) ondisk.builder.subxids.subxip, sz, path);
+ COMP_CRC32C(checksum, ondisk.builder.subxids.subxip, sz);
+
+ ondisk.builder.subxids.xip = MemoryContextAllocZero(builder->context, sz);
+ SnapBuildRestoreContents(fd, (char *) ondisk.builder.subxids.xip, sz, path);
+ COMP_CRC32C(checksum, ondisk.builder.subxids.xip, sz);
+ }
+
if (CloseTransientFile(fd) != 0)
ereport(ERROR,
(errcode_for_file_access(),
@@ -1966,6 +2084,18 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
builder->catchange.xip = ondisk.builder.catchange.xip;
ondisk.builder.catchange.xip = NULL;
+ if (builder->subxids.subxip)
+ pfree(builder->subxids.subxip);
+ if(builder->subxids.xip)
+ pfree(builder->subxids.xip);
+
+ builder->subxids.xcnt = ondisk.builder.subxids.xcnt;
+ builder->subxids.subxip = ondisk.builder.subxids.subxip;
+ builder->subxids.xip = ondisk.builder.subxids.xip;
+
+ ondisk.builder.subxids.subxip = NULL;
+ ondisk.builder.subxids.xip = NULL;
+
/* our snapshot is not interesting anymore, build a new one */
if (builder->snapshot != NULL)
{
@@ -1989,6 +2119,10 @@ snapshot_not_interesting:
pfree(ondisk.builder.committed.xip);
if (ondisk.builder.catchange.xip != NULL)
pfree(ondisk.builder.catchange.xip);
+ if (ondisk.builder.subxids.subxip != NULL)
+ pfree(ondisk.builder.subxids.subxip);
+ if (ondisk.builder.subxids.xip != NULL)
+ pfree(ondisk.builder.subxids.xip);
return false;
}
@@ -2115,3 +2249,30 @@ CheckPointSnapBuild(void)
}
FreeDir(snap_dir);
}
+
+/*
+ * Check the reorder buffer and the snapshot to mark if the given transaction
+ * has a top transaction.
+ */
+void
+SnapBuildXidSetCatalogChanges(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
+{
+ ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);
+
+ /* Check the subxids array */
+ if (builder->subxids.xcnt > 0)
+ {
+ TransactionId *found = bsearch(&xid, builder->subxids.subxip,
+ builder->subxids.xcnt,
+ sizeof(TransactionId), xidComparator);
+
+ /* Mark to the top-level transaction, if found */
+ if (found)
+ {
+ int index = (found - builder->subxids.subxip) / sizeof(TransactionId);
+
+ ReorderBufferXidSetCatalogChanges(builder->reorder,
+ builder->subxids.xip[index], lsn);
+ }
+ }
+}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 8695901ba7..c3f4042fd8 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -628,6 +628,13 @@ struct ReorderBuffer
int64 totalBytes; /* total amount of data decoded */
};
+/*
+ * The cell structure used by ReorderBufferGetSubtxns()
+ */
+typedef struct subxids_cell {
+ TransactionId subtxn;
+ TransactionId txn;
+} subxids_cell;
extern ReorderBuffer *ReorderBufferAllocate(void);
extern void ReorderBufferFree(ReorderBuffer *);
@@ -694,4 +701,6 @@ extern void ReorderBufferSetRestartPoint(ReorderBuffer *, XLogRecPtr ptr);
extern void StartupReorderBuffer(void);
+extern List *ReorderBufferGetSubtxns(ReorderBuffer *rb);
+
#endif
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index e6adea24f2..3c1b89f9b4 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -91,4 +91,6 @@ extern void SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn,
struct xl_running_xacts *running);
extern void SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn);
+extern void SnapBuildXidSetCatalogChanges(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn);
+
#endif /* SNAPBUILD_H */
--
2.27.0
Good catch, and thanks for the patch!
At Fri, 2 Sep 2022 01:08:04 +0000, "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote in
PSA patch that fixes the failure.
This adds pairs of sub-top transactions to the SnapBuild, and it will be serialized and restored.
The pair will be checked when we mark the ReorderBufferTXN as RBTXN_HAS_CATALOG_CHANGES.
A commit record has all subtransaction ids and SnapBuildCommitTxn()
already checks if every one has catalog changes before checking the
top transaction's catalog changes. So, no need to record top-sub
transaction relationship to serialized snapshots. If any of the
subtransactions has catalog changes, the commit contains catalog
changes.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Fri, Sep 2, 2022 at 6:38 AM kuroda.hayato@fujitsu.com
<kuroda.hayato@fujitsu.com> wrote:
Hi Hackers,
Therefore, this leads to the failure for the assert that can check
the consistency that when one sub transaction modifies the catalog,
its top transaction should be marked so as well.I feel we need to remember the relationship between top transaction and sub
transaction
in the serialized snapshot even before changing catalog at decoding
RUNNING_XACT,
so that we can keep track of the association after the restart. What do you think ?PSA patch that fixes the failure.
This adds pairs of sub-top transactions to the SnapBuild, and it will be serialized and restored.
The pair will be checked when we mark the ReorderBufferTXN as RBTXN_HAS_CATALOG_CHANGES.
It seems that SnapBuildCommitTxn() is already taking care of adding
the top transaction to the committed transaction if any subtransaction
has the catalog changes, it has just missed setting the flag so I
think just setting the flag like this should be sufficient no?
diff --git a/src/backend/replication/logical/snapbuild.c
b/src/backend/replication/logical/snapbuild.c
index 1ff2c12..ee3f695 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1086,6 +1086,7 @@ SnapBuildCommitTxn(SnapBuild *builder,
XLogRecPtr lsn, TransactionId xid,
else if (sub_needs_timetravel)
{
/* track toplevel txn as well, subxact alone isn't meaningful */
+ needs_timetravel = true;
SnapBuildAddCommittedTxn(builder, xid);
}
else if (needs_timetravel)
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
At Fri, 2 Sep 2022 10:59:56 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Fri, Sep 2, 2022 at 6:38 AM kuroda.hayato@fujitsu.com
<kuroda.hayato@fujitsu.com> wrote:Hi Hackers,
Therefore, this leads to the failure for the assert that can check
the consistency that when one sub transaction modifies the catalog,
its top transaction should be marked so as well.I feel we need to remember the relationship between top transaction and sub
transaction
in the serialized snapshot even before changing catalog at decoding
RUNNING_XACT,
so that we can keep track of the association after the restart. What do you think ?PSA patch that fixes the failure.
This adds pairs of sub-top transactions to the SnapBuild, and it will be serialized and restored.
The pair will be checked when we mark the ReorderBufferTXN as RBTXN_HAS_CATALOG_CHANGES.It seems that SnapBuildCommitTxn() is already taking care of adding
the top transaction to the committed transaction if any subtransaction
has the catalog changes, it has just missed setting the flag so I
think just setting the flag like this should be sufficient no?
Oops! That's right.
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 1ff2c12..ee3f695 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1086,6 +1086,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid, else if (sub_needs_timetravel) { /* track toplevel txn as well, subxact alone isn't meaningful */ + needs_timetravel = true; SnapBuildAddCommittedTxn(builder, xid); } else if (needs_timetravel)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Dear Horiguchi-san, Dilip,
Thank you for replying!
It seems that SnapBuildCommitTxn() is already taking care of adding
the top transaction to the committed transaction if any subtransaction
has the catalog changes, it has just missed setting the flag so I
think just setting the flag like this should be sufficient no?Oops! That's right.
Basically I agreed, but I was not sure the message "found top level transaction..."
should be output or not. It may be useful even if one of sub transactions contains the change.
How about following?
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index bf72ad45ec..a630522907 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1086,8 +1086,17 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
}
}
- /* if top-level modified catalog, it'll need a snapshot */
- if (SnapBuildXidHasCatalogChanges(builder, xid, xinfo))
+ /*
+ * if top-level or one of sub modified catalog, it'll need a snapshot.
+ *
+ * Normally the second check is not needed because the relation between
+ * top-sub transactions is tracked on the ReorderBuffer layer, and the top
+ * transaction is marked as containing catalog changes if its children are.
+ * But in some cases the relation may be missed, in which case only the sub
+ * transaction may be marked as containing catalog changes.
+ */
+ if (SnapBuildXidHasCatalogChanges(builder, xid, xinfo)
+ || sub_needs_timetravel)
{
elog(DEBUG2, "found top level transaction %u, with catalog changes",
xid);
@@ -1095,11 +1104,6 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
needs_timetravel = true;
SnapBuildAddCommittedTxn(builder, xid);
}
- else if (sub_needs_timetravel)
- {
- /* track toplevel txn as well, subxact alone isn't meaningful */
- SnapBuildAddCommittedTxn(builder, xid);
- }
else if (needs_timetravel)
{
elog(DEBUG2, "forced transaction %u to do timetravel", xid);
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
On Fri, Sep 2, 2022 at 11:25 AM kuroda.hayato@fujitsu.com
<kuroda.hayato@fujitsu.com> wrote:
Dear Horiguchi-san, Dilip,
Thank you for replying!
It seems that SnapBuildCommitTxn() is already taking care of adding
the top transaction to the committed transaction if any subtransaction
has the catalog changes, it has just missed setting the flag so I
think just setting the flag like this should be sufficient no?Oops! That's right.
Basically I agreed, but I was not sure the message "found top level transaction..."
should be output or not. It may be useful even if one of sub transactions contains the change.How about following?
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index bf72ad45ec..a630522907 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1086,8 +1086,17 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid, } }- /* if top-level modified catalog, it'll need a snapshot */ - if (SnapBuildXidHasCatalogChanges(builder, xid, xinfo)) + /* + * if top-level or one of sub modified catalog, it'll need a snapshot. + * + * Normally the second check is not needed because the relation between + * top-sub transactions is tracked on the ReorderBuffer layer, and the top + * transaction is marked as containing catalog changes if its children are. + * But in some cases the relation may be missed, in which case only the sub + * transaction may be marked as containing catalog changes. + */ + if (SnapBuildXidHasCatalogChanges(builder, xid, xinfo) + || sub_needs_timetravel) { elog(DEBUG2, "found top level transaction %u, with catalog changes", xid); @@ -1095,11 +1104,6 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid, needs_timetravel = true; SnapBuildAddCommittedTxn(builder, xid); } - else if (sub_needs_timetravel) - { - /* track toplevel txn as well, subxact alone isn't meaningful */ - SnapBuildAddCommittedTxn(builder, xid); - } else if (needs_timetravel) { elog(DEBUG2, "forced transaction %u to do timetravel", xid);
Yeah, I am fine with this as well.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
At Fri, 2 Sep 2022 11:27:23 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Fri, Sep 2, 2022 at 11:25 AM kuroda.hayato@fujitsu.com
<kuroda.hayato@fujitsu.com> wrote:How about following?
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index bf72ad45ec..a630522907 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1086,8 +1086,17 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid, } }- /* if top-level modified catalog, it'll need a snapshot */ - if (SnapBuildXidHasCatalogChanges(builder, xid, xinfo)) + /* + * if top-level or one of sub modified catalog, it'll need a snapshot. + * + * Normally the second check is not needed because the relation between + * top-sub transactions is tracked on the ReorderBuffer layer, and the top + * transaction is marked as containing catalog changes if its children are. + * But in some cases the relation may be missed, in which case only the sub + * transaction may be marked as containing catalog changes. + */ + if (SnapBuildXidHasCatalogChanges(builder, xid, xinfo) + || sub_needs_timetravel) { elog(DEBUG2, "found top level transaction %u, with catalog changes", xid); @@ -1095,11 +1104,6 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid, needs_timetravel = true; SnapBuildAddCommittedTxn(builder, xid); } - else if (sub_needs_timetravel) - { - /* track toplevel txn as well, subxact alone isn't meaningful */ - SnapBuildAddCommittedTxn(builder, xid); - } else if (needs_timetravel) { elog(DEBUG2, "forced transaction %u to do timetravel", xid);Yeah, I am fine with this as well.
I'm basically fine, too. But this is a bug that needs back-patching
back to 10. This change changes the condition for the DEBUG2 message.
So we need to add an awkward if() condition for the DEBUG2 message.
Looking that the messages have different debug-level, I doubt there
have been a chance they are useful. If we remove the two DEBUGx
messages, I'm fine with the change.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Fri, Sep 2, 2022 at 12:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Fri, 2 Sep 2022 11:27:23 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
On Fri, Sep 2, 2022 at 11:25 AM kuroda.hayato@fujitsu.com
<kuroda.hayato@fujitsu.com> wrote:How about following?
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index bf72ad45ec..a630522907 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1086,8 +1086,17 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid, } }- /* if top-level modified catalog, it'll need a snapshot */ - if (SnapBuildXidHasCatalogChanges(builder, xid, xinfo)) + /* + * if top-level or one of sub modified catalog, it'll need a snapshot. + * + * Normally the second check is not needed because the relation between + * top-sub transactions is tracked on the ReorderBuffer layer, and the top + * transaction is marked as containing catalog changes if its children are. + * But in some cases the relation may be missed, in which case only the sub + * transaction may be marked as containing catalog changes. + */ + if (SnapBuildXidHasCatalogChanges(builder, xid, xinfo) + || sub_needs_timetravel) { elog(DEBUG2, "found top level transaction %u, with catalog changes", xid); @@ -1095,11 +1104,6 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid, needs_timetravel = true; SnapBuildAddCommittedTxn(builder, xid); } - else if (sub_needs_timetravel) - { - /* track toplevel txn as well, subxact alone isn't meaningful */ - SnapBuildAddCommittedTxn(builder, xid); - } else if (needs_timetravel) { elog(DEBUG2, "forced transaction %u to do timetravel", xid);Yeah, I am fine with this as well.
I'm basically fine, too. But this is a bug that needs back-patching
back to 10.
I have not verified but I think we need to backpatch this till 14
because prior to that in DecodeCommit, we use to set the top-level txn
as having catalog changes based on the if there are invalidation
messages in the commit record. So, in the current scenario shared by
Osumi-San, before SnapBuildCommitTxn(), the top-level txn will be
marked as having catalog changes.
This change changes the condition for the DEBUG2 message.
So we need to add an awkward if() condition for the DEBUG2 message.
Looking that the messages have different debug-level, I doubt there
have been a chance they are useful. If we remove the two DEBUGx
messages, I'm fine with the change.
I think these DEBUG2 messages could be useful, so instead of removing
these, I suggest we should follow Dilip's proposed fix and maybe add a
new DEBUG2 message on the lines of (("forced transaction %u to do
timetravel due to one of its subtransaction", xid) in the else if
(sub_needs_timetravel) condition if we think that will be useful too
but I am fine leaving the addition of new DEBUG2 message.
--
With Regards,
Amit Kapila.
I'm basically fine, too. But this is a bug that needs back-patching
back to 10.I have not verified but I think we need to backpatch this till 14
because prior to that in DecodeCommit, we use to set the top-level txn
as having catalog changes based on the if there are invalidation
messages in the commit record. So, in the current scenario shared by
Osumi-San, before SnapBuildCommitTxn(), the top-level txn will be
marked as having catalog changes.
I and Osumi-san are now investigating that, so please wait further reports and patches.
This change changes the condition for the DEBUG2 message.
So we need to add an awkward if() condition for the DEBUG2 message.
Looking that the messages have different debug-level, I doubt there
have been a chance they are useful. If we remove the two DEBUGx
messages, I'm fine with the change.I think these DEBUG2 messages could be useful, so instead of removing
these, I suggest we should follow Dilip's proposed fix and maybe add a
new DEBUG2 message on the lines of (("forced transaction %u to do
timetravel due to one of its subtransaction", xid) in the else if
(sub_needs_timetravel) condition if we think that will be useful too
but I am fine leaving the addition of new DEBUG2 message.
I agreed both that DEBUG2 messages are still useful but we should not
change the condition for output. So I prefer the idea suggested by Amit.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
On Fri, Sep 2, 2022 at 6:26 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
I've met an assertion failure of logical decoding with below scenario on HEAD.
---
<preparation>
create table tab1 (val integer);
select 'init' from pg_create_logical_replication_slot('regression_slot', 'test_decoding');<session1>
begin;
savepoint sp1;
insert into tab1 values (1);<session2>
checkpoint; -- for RUNNING_XACT
select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');<session1>
truncate tab1; -- for NEW_CID
commit;
begin;
insert into tab1 values (3);
BTW, if I just change the truncate statement to "Analyze tab1" in your
entire test then I am getting a different assertion failure:
postgres.exe!ExceptionalCondition(const char * conditionName, const
char * errorType, const char * fileName, int lineNumber) Line 70 C
postgres.exe!AssertTXNLsnOrder(ReorderBuffer * rb) Line 902 C
postgres.exe!ReorderBufferTXNByXid(ReorderBuffer * rb, unsigned int
xid, bool create, bool * is_new, unsigned __int64 lsn, bool
create_as_top) Line 681 C
postgres.exe!ReorderBufferAddNewTupleCids(ReorderBuffer * rb, unsigned
int xid, unsigned __int64 lsn, RelFileLocator locator, ItemPointerData
tid, unsigned int cmin, unsigned int cmax, unsigned int combocid) Line
3188 C
postgres.exe!SnapBuildProcessNewCid(SnapBuild * builder, unsigned int
xid, unsigned __int64 lsn, xl_heap_new_cid * xlrec) Line 823 C
postgres.exe!heap2_decode(LogicalDecodingContext * ctx,
XLogRecordBuffer * buf) Line 408 C
postgres.exe!LogicalDecodingProcessRecord(LogicalDecodingContext *
ctx, XLogReaderState * record) Line 119 C
postgres.exe!pg_logical_slot_get_changes_guts(FunctionCallInfoBaseData
* fcinfo, bool confirm, bool binary) Line 274 C
postgres.exe!pg_logical_slot_get_changes(FunctionCallInfoBaseData *
fcinfo) Line 339 C
This is matching with call stack we see intermittently in the BF
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-08-20%2002%3A45%3A34
statement generates an additional WAL XLOG_STANDBY_LOCK prior to
XLOG_HEAP2_NEW_CID.
I think we can fix this in the below ways:
a. Assert(prev_first_lsn <= cur_txn->first_lsn); -- Explain in
comments that it is possible when subtransaction and transaction are
not previously logged as it happened in this scenario
b. track txn of prev_first_lsn (say as prev_txn) and check if
prev_txn's toptxn is the same as cur_txn or cur_txn's toptxn is the
same as the prev_txn then perform assert mentioned in (a) else, keep
the current Assert.
It seems (b) will be more robust.
Thoughts?
Note: I added Sawada-San as sometime back we had an offlist discussion
on this intermittent BF failure but we were not able to reach the
exact test which can show this failure.
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-08-20%2002%3A45%3A34
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
--
With Regards,
Amit Kapila.
Dear hackers,
I agreed both that DEBUG2 messages are still useful but we should not
change the condition for output. So I prefer the idea suggested by Amit.
PSA newer patch, which contains the fix and test.
I have not verified but I think we need to backpatch this till 14
because prior to that in DecodeCommit, we use to set the top-level txn
as having catalog changes based on the if there are invalidation
messages in the commit record. So, in the current scenario shared by
Osumi-San, before SnapBuildCommitTxn(), the top-level txn will be
marked as having catalog changes.I and Osumi-san are now investigating that, so please wait further reports and
patches.
We investigated it about older versions, and in some versions *another stack-trace* has been found.
About PG10-13, indeed, the failure was not occurred.
In these versions transactions are regarded as
that have catalog changes when the commit record has XACT_XINFO_HAS_INVALS flag.
This flag will be set if the transaction has invalidation messages.
When sub transaction changes system catalogs and user commits,
all invalidation messages allocated in sub transaction will be transferred to top transaction.
Therefore both transactions will be marked as containing catalog changes.
About PG14 and 15, however, another stack-trace has been found.
While executing the same workload, we got followings at the same SQL statement;
```
(gdb) backtrace
#0 0x00007fa78c6dc387 in raise () from /lib64/libc.so.6
#1 0x00007fa78c6dda78 in abort () from /lib64/libc.so.6
#2 0x0000000000b16680 in ExceptionalCondition (conditionName=0xcd3ab0 "txn->ninvalidations == 0", errorType=0xcd3284 "FailedAssertion",
fileName=0xcd32d0 "reorderbuffer.c", lineNumber=2936) at assert.c:69
#3 0x00000000008e9e70 in ReorderBufferForget (rb=0x12b5b10, xid=735, lsn=24125384) at reorderbuffer.c:2936
#4 0x00000000008d9493 in DecodeCommit (ctx=0x12a2d20, buf=0x7ffe08b236b0, parsed=0x7ffe08b23510, xid=734, two_phase=false) at decode.c:733
#5 0x00000000008d8962 in DecodeXactOp (ctx=0x12a2d20, buf=0x7ffe08b236b0) at decode.c:279
#6 0x00000000008d85e2 in LogicalDecodingProcessRecord (ctx=0x12a2d20, record=0x12a30e0) at decode.c:142
#7 0x00000000008dfef2 in pg_logical_slot_get_changes_guts (fcinfo=0x129acb0, confirm=true, binary=false) at logicalfuncs.c:296
#8 0x00000000008e002f in pg_logical_slot_get_changes (fcinfo=0x129acb0) at logicalfuncs.c:365
...
(gdb) frame 4
#4 0x00000000008d9493 in DecodeCommit (ctx=0x14cfd20, buf=0x7ffc638b0ca0, parsed=0x7ffc638b0b00, xid=734, two_phase=false) at decode.c:733
733 ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
(gdb) list
728 */
729 if (DecodeTXNNeedSkip(ctx, buf, parsed->dbId, origin_id))
730 {
731 for (i = 0; i < parsed->nsubxacts; i++)
732 {
733 ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
734 }
735 ReorderBufferForget(ctx->reorder, xid, buf->origptr);
736
737 return;
(gdb) frame 3
#3 0x00000000008e9e70 in ReorderBufferForget (rb=0x14e2b10, xid=735, lsn=24125152) at reorderbuffer.c:2936
2936 Assert(txn->ninvalidations == 0);
(gdb) list
2931 */
2932 if (txn->base_snapshot != NULL && txn->ninvalidations > 0)
2933 ReorderBufferImmediateInvalidation(rb, txn->ninvalidations,
2934 txn->invalidations);
2935 else
2936 Assert(txn->ninvalidations == 0);
2937
2938 /* remove potential on-disk data, and deallocate */
2939 ReorderBufferCleanupTXN(rb, txn);
2940 }
(gdb) print *txn
$1 = {txn_flags = 3, xid = 735, toplevel_xid = 734, gid = 0x0, first_lsn = 24113488, final_lsn = 24125152, end_lsn = 0, toptxn = 0x14ecb98,
restart_decoding_lsn = 24113304, origin_id = 0, origin_lsn = 0, commit_time = 0, base_snapshot = 0x0, base_snapshot_lsn = 0,
base_snapshot_node = {prev = 0x14ecc00, next = 0x14e2b28}, snapshot_now = 0x0, command_id = 4294967295, nentries = 5, nentries_mem = 5,
changes = {head = {prev = 0x14eecf8, next = 0x14eeb18}}, tuplecids = {head = {prev = 0x14ecb10, next = 0x14ecb10}}, ntuplecids = 0,
tuplecid_hash = 0x0, toast_hash = 0x0, subtxns = {head = {prev = 0x14ecb38, next = 0x14ecb38}}, nsubtxns = 0, ninvalidations = 3,
invalidations = 0x14e2d28, node = {prev = 0x14ecc68, next = 0x14ecc68}, size = 452, total_size = 452, concurrent_abort = false,
output_plugin_private = 0x0}
```
In these versions DecodeCommit() said OK. However, we have met another failure
because the ReorderBufferTXN of the sub transaction had invalidation messages but it did not have base_snapshot.
I thought that this failure was occurred the only the base_snapshot of the sub transaction is transferred via ReorderBufferTransferSnapToParent()
when a transaction is assigned as child, but its invalidation messages are not.
I was not sure what's the proper way to fix it.
The solution I've thought at first was transporting all invalidations from sub to top like ReorderBufferTransferSnapToParent(),
but I do not know its side effect. Moreover, how do we deal with ReorderBufferChange?
Should we transfer them too? If so, how about the ordering of changes?
Alternative solustion was just remove the assertion, but was it OK?
How do you think? I want to hear your comments and suggestions...
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
HEAD-0001-Fix-assertion-failure-during-logical-decoding.patchapplication/octet-stream; name=HEAD-0001-Fix-assertion-failure-during-logical-decoding.patchDownload
From 8792a530e48196247c52acd4e9c0dc74471ceae5 Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Tue, 6 Sep 2022 05:08:33 +0000
Subject: [PATCH] Fix assertion failure during logical decoding.
In SnapBuildCommitTxn(), we assumed that the top transaction needs timetravel
too if one of sub transactions need that.
Normally this assumption was correct, but in some cases the necessity of
timetravel was inconsistent between top and sub transactions. This was caused
because the relation between transactions might not be recorded to the
reorder buffer when CHECKPOINT was occurred.
To fix this problem, this forces the timetravel for the top transaction if one
of sub transactions needs that.
Thanks to off-list discussion with Takamichi Osumi.
---
.../expected/catalog_change_snapshot.out | 45 +++++++++++++++++++
.../specs/catalog_change_snapshot.spec | 10 +++++
src/backend/replication/logical/snapbuild.c | 4 +-
3 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/contrib/test_decoding/expected/catalog_change_snapshot.out b/contrib/test_decoding/expected/catalog_change_snapshot.out
index d2a4bdfcc1..e36f417e4a 100644
--- a/contrib/test_decoding/expected/catalog_change_snapshot.out
+++ b/contrib/test_decoding/expected/catalog_change_snapshot.out
@@ -96,3 +96,48 @@ COMMIT
stop
(1 row)
+
+starting permutation: s0_init s0_begin s0_savepoint s0_insert s1_checkpoint s1_get_changes s0_truncate s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s0_truncate: TRUNCATE tbl1;
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+table public.tbl1: TRUNCATE: (no-flags)
+COMMIT
+(4 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+COMMIT
+(3 rows)
+
+?column?
+--------
+stop
+(1 row)
+
diff --git a/contrib/test_decoding/specs/catalog_change_snapshot.spec b/contrib/test_decoding/specs/catalog_change_snapshot.spec
index ff8f68489b..c0eb50698f 100644
--- a/contrib/test_decoding/specs/catalog_change_snapshot.spec
+++ b/contrib/test_decoding/specs/catalog_change_snapshot.spec
@@ -57,3 +57,13 @@ permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s
# checkpoint record it prunes one of the xacts in that list and when decoding the
# next checkpoint, it will completely prune that list.
permutation "s0_init" "s0_begin" "s0_truncate" "s2_begin" "s2_truncate" "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s2_commit" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+
+# Test that we can force the top transaction to do timetravel when one of sub
+# transactions needs that. This is necessary when we restart decoding from RUNNING_XACT
+# without the wal to associate subtransaction to its top transaction. The last decoding
+# starts from the first checkpoint and NEW_CID of "s0_truncate" doesn't mark the top
+# transaction as catalog modifying transaction. In this scenario, the enforcement sets
+# needs_timetravel to true even if the top transaction is regarded as that it does not
+# have catalog changes and thus the decoding works without a contradition that one
+# subtransaction needed timetravel while its top transaction didn't.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 1d8ebb4c0d..9c18d4c4b4 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1098,7 +1098,9 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
}
else if (sub_needs_timetravel)
{
- /* track toplevel txn as well, subxact alone isn't meaningful */
+ elog(DEBUG2, "forced transaction %u to do timetravel due to one of its subtransaction",
+ xid);
+ needs_timetravel = true;
SnapBuildAddCommittedTxn(builder, xid);
}
else if (needs_timetravel)
--
2.27.0
I was not sure what's the proper way to fix it.
The solution I've thought at first was transporting all invalidations from sub to top
like ReorderBufferTransferSnapToParent(),
but I do not know its side effect. Moreover, how do we deal with
ReorderBufferChange?
Should we transfer them too? If so, how about the ordering of changes?
Alternative solustion was just remove the assertion, but was it OK?
PSA the PoC patch for discussion. In this patch only invalidation messages are transported,
changes hold by subtxn are ignored. This can be passed the reported workload.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
REL14-0001-PoC-Fix-assertion-failure-during-logical-decoding.patchapplication/octet-stream; name=REL14-0001-PoC-Fix-assertion-failure-during-logical-decoding.patchDownload
From 24be9c26f50a02338ea08afb9978364919a6336d Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Tue, 6 Sep 2022 09:11:25 +0000
Subject: [PATCH] (PoC) Fix assertion failure during logical decoding.
---
.../expected/catalog_change_snapshot.out | 45 +++++++++++++++++++
.../specs/catalog_change_snapshot.spec | 2 +
.../replication/logical/reorderbuffer.c | 41 +++++++++++++++++
3 files changed, 88 insertions(+)
diff --git a/contrib/test_decoding/expected/catalog_change_snapshot.out b/contrib/test_decoding/expected/catalog_change_snapshot.out
index dc4f9b7018..246cda8099 100644
--- a/contrib/test_decoding/expected/catalog_change_snapshot.out
+++ b/contrib/test_decoding/expected/catalog_change_snapshot.out
@@ -42,3 +42,48 @@ COMMIT
stop
(1 row)
+
+starting permutation: s0_init s0_begin s0_savepoint s0_insert s1_checkpoint s1_get_changes s0_truncate s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s0_truncate: TRUNCATE tbl1;
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+table public.tbl1: TRUNCATE: (no-flags)
+COMMIT
+(4 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+COMMIT
+(3 rows)
+
+?column?
+--------
+stop
+(1 row)
+
diff --git a/contrib/test_decoding/specs/catalog_change_snapshot.spec b/contrib/test_decoding/specs/catalog_change_snapshot.spec
index 2971ddc69c..fed823c836 100644
--- a/contrib/test_decoding/specs/catalog_change_snapshot.spec
+++ b/contrib/test_decoding/specs/catalog_change_snapshot.spec
@@ -37,3 +37,5 @@ step "s1_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_
# record written by bgwriter. One might think we can either stop the bgwriter or
# increase LOG_SNAPSHOT_INTERVAL_MS but it's not practical via tests.
permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 3194e41851..cbe9cc74bb 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -220,6 +220,8 @@ static ReorderBufferTXN *ReorderBufferTXNByXid(ReorderBuffer *rb,
XLogRecPtr lsn, bool create_as_top);
static void ReorderBufferTransferSnapToParent(ReorderBufferTXN *txn,
ReorderBufferTXN *subtxn);
+static void ReorderBufferTransferInvalToParent(ReorderBufferTXN *txn,
+ ReorderBufferTXN *subtxn);
static void AssertTXNLsnOrder(ReorderBuffer *rb);
@@ -1114,6 +1116,9 @@ ReorderBufferTransferSnapToParent(ReorderBufferTXN *txn,
subtxn->base_snapshot = NULL;
subtxn->base_snapshot_lsn = InvalidXLogRecPtr;
dlist_delete(&subtxn->base_snapshot_node);
+
+ /* Invalidaitons must be also transported */
+ ReorderBufferTransferInvalToParent(txn, subtxn);
}
else
{
@@ -1126,6 +1131,42 @@ ReorderBufferTransferSnapToParent(ReorderBufferTXN *txn,
}
}
+/*
+ * ReorderBufferTransferInvalToParent
+ * Transfer invalidation messages from subtxn to top-level txn
+ *
+ * XXX: Do we have to transport ReorderBufferChange at the same time?
+ */
+static void
+ReorderBufferTransferInvalToParent(ReorderBufferTXN *txn,
+ ReorderBufferTXN *subtxn)
+{
+ Assert(subtxn->toplevel_xid == txn->xid);
+
+ if (subtxn->ninvalidations > 0)
+ {
+ if (txn->ninvalidations == 0)
+ {
+ txn->invalidations = (SharedInvalidationMessage *)
+ palloc(sizeof(SharedInvalidationMessage) * subtxn->ninvalidations);
+ memcpy(txn->invalidations, subtxn->invalidations,
+ sizeof(SharedInvalidationMessage) * subtxn->ninvalidations);
+ txn->ninvalidations = subtxn->ninvalidations;
+ }
+ else
+ {
+ txn->invalidations = (SharedInvalidationMessage *)
+ repalloc(txn->invalidations, sizeof(SharedInvalidationMessage) *
+ (txn->ninvalidations + subtxn->ninvalidations));
+ memcpy(txn->invalidations + txn->ninvalidations, subtxn->invalidations,
+ subtxn->ninvalidations * sizeof(SharedInvalidationMessage));
+ txn->ninvalidations += subtxn->ninvalidations;
+ }
+ subtxn->invalidations = NULL;
+ subtxn->ninvalidations = 0;
+ }
+}
+
/*
* Associate a subtransaction with its toplevel transaction at commit
* time. There may be no further changes added after this.
--
2.27.0
On Tue, Sep 6, 2022 at 1:17 PM kuroda.hayato@fujitsu.com
<kuroda.hayato@fujitsu.com> wrote:
Dear hackers,
I agreed both that DEBUG2 messages are still useful but we should not
change the condition for output. So I prefer the idea suggested by Amit.PSA newer patch, which contains the fix and test.
I have not verified but I think we need to backpatch this till 14
because prior to that in DecodeCommit, we use to set the top-level txn
as having catalog changes based on the if there are invalidation
messages in the commit record. So, in the current scenario shared by
Osumi-San, before SnapBuildCommitTxn(), the top-level txn will be
marked as having catalog changes.I and Osumi-san are now investigating that, so please wait further reports and
patches.We investigated it about older versions, and in some versions *another stack-trace* has been found.
About PG10-13, indeed, the failure was not occurred.
In these versions transactions are regarded as
that have catalog changes when the commit record has XACT_XINFO_HAS_INVALS flag.
This flag will be set if the transaction has invalidation messages.When sub transaction changes system catalogs and user commits,
all invalidation messages allocated in sub transaction will be transferred to top transaction.
Therefore both transactions will be marked as containing catalog changes.About PG14 and 15, however, another stack-trace has been found.
While executing the same workload, we got followings at the same SQL statement;
Did you get this new assertion failure after you applied the patch for
the first failure? Because otherwise, how can you reach it with the
same test case?
About patch:
else if (sub_needs_timetravel)
{
- /* track toplevel txn as well, subxact alone isn't meaningful */
+ elog(DEBUG2, "forced transaction %u to do timetravel due to one of
its subtransaction",
+ xid);
+ needs_timetravel = true;
SnapBuildAddCommittedTxn(builder, xid);
Why did you remove the above comment? I think it still makes sense to retain it.
--
With Regards,
Amit Kapila.
Dear Amit,
Thanks for giving comments!
Did you get this new assertion failure after you applied the patch for
the first failure? Because otherwise, how can you reach it with the
same test case?
The first failure is occurred only in the HEAD, so I did not applied the first patch
to REL14 and REL15.
This difference is caused because the commit [Fix catalog lookup...] in REL15(272248a) and older is different
from the HEAD one.
In order versions SnapBuildXidSetCatalogChanges() has been added. In the function
a transaction will be marked as containing catalog changes if the transaction is in InitialRunningXacts,
and after that the relation between sub-top transactions is assigned based on the parsed->subxact.
The marking avoids the first failure, but the assignment triggers new failure.
About patch: else if (sub_needs_timetravel) { - /* track toplevel txn as well, subxact alone isn't meaningful */ + elog(DEBUG2, "forced transaction %u to do timetravel due to one of its subtransaction", + xid); + needs_timetravel = true; SnapBuildAddCommittedTxn(builder, xid);Why did you remove the above comment? I think it still makes sense to retain it.
Fixed.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
HEAD-v2-0001-Fix-assertion-failure-during-logical-decoding.patchapplication/octet-stream; name=HEAD-v2-0001-Fix-assertion-failure-during-logical-decoding.patchDownload
From cd1edd5d65c35d65f038a814c99d74514373a578 Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Tue, 6 Sep 2022 05:08:33 +0000
Subject: [PATCH v2] Fix assertion failure during logical decoding.
In SnapBuildCommitTxn(), we assumed that the top transaction needs timetravel
too if one of sub transactions need that.
Normally this assumption was correct, but in some cases the necessity of
timetravel was inconsistent between top and sub transactions. This was caused
because the relation between transactions might not be recorded to the
reorder buffer when CHECKPOINT was occurred.
To fix this problem, this forces the timetravel for the top transaction if one
of sub transactions needs that.
Thanks to off-list discussion with Takamichi Osumi.
---
.../expected/catalog_change_snapshot.out | 45 +++++++++++++++++++
.../specs/catalog_change_snapshot.spec | 10 +++++
src/backend/replication/logical/snapbuild.c | 3 ++
3 files changed, 58 insertions(+)
diff --git a/contrib/test_decoding/expected/catalog_change_snapshot.out b/contrib/test_decoding/expected/catalog_change_snapshot.out
index d2a4bdfcc1..e36f417e4a 100644
--- a/contrib/test_decoding/expected/catalog_change_snapshot.out
+++ b/contrib/test_decoding/expected/catalog_change_snapshot.out
@@ -96,3 +96,48 @@ COMMIT
stop
(1 row)
+
+starting permutation: s0_init s0_begin s0_savepoint s0_insert s1_checkpoint s1_get_changes s0_truncate s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s0_truncate: TRUNCATE tbl1;
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+table public.tbl1: TRUNCATE: (no-flags)
+COMMIT
+(4 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+COMMIT
+(3 rows)
+
+?column?
+--------
+stop
+(1 row)
+
diff --git a/contrib/test_decoding/specs/catalog_change_snapshot.spec b/contrib/test_decoding/specs/catalog_change_snapshot.spec
index ff8f68489b..c0eb50698f 100644
--- a/contrib/test_decoding/specs/catalog_change_snapshot.spec
+++ b/contrib/test_decoding/specs/catalog_change_snapshot.spec
@@ -57,3 +57,13 @@ permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s
# checkpoint record it prunes one of the xacts in that list and when decoding the
# next checkpoint, it will completely prune that list.
permutation "s0_init" "s0_begin" "s0_truncate" "s2_begin" "s2_truncate" "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s2_commit" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+
+# Test that we can force the top transaction to do timetravel when one of sub
+# transactions needs that. This is necessary when we restart decoding from RUNNING_XACT
+# without the wal to associate subtransaction to its top transaction. The last decoding
+# starts from the first checkpoint and NEW_CID of "s0_truncate" doesn't mark the top
+# transaction as catalog modifying transaction. In this scenario, the enforcement sets
+# needs_timetravel to true even if the top transaction is regarded as that it does not
+# have catalog changes and thus the decoding works without a contradition that one
+# subtransaction needed timetravel while its top transaction didn't.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 1d8ebb4c0d..8247baee8b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1099,6 +1099,9 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
else if (sub_needs_timetravel)
{
/* track toplevel txn as well, subxact alone isn't meaningful */
+ elog(DEBUG2, "forced transaction %u to do timetravel due to one of its subtransaction",
+ xid);
+ needs_timetravel = true;
SnapBuildAddCommittedTxn(builder, xid);
}
else if (needs_timetravel)
--
2.27.0
On Wed, Sep 7, 2022 at 11:06 AM kuroda.hayato@fujitsu.com
<kuroda.hayato@fujitsu.com> wrote:
Dear Amit,
Thanks for giving comments!
Did you get this new assertion failure after you applied the patch for
the first failure? Because otherwise, how can you reach it with the
same test case?The first failure is occurred only in the HEAD, so I did not applied the first patch
to REL14 and REL15.
This difference is caused because the commit [Fix catalog lookup...] in REL15(272248a) and older is different
from the HEAD one.
In order versions SnapBuildXidSetCatalogChanges() has been added. In the function
a transaction will be marked as containing catalog changes if the transaction is in InitialRunningXacts,
and after that the relation between sub-top transactions is assigned based on the parsed->subxact.
The marking avoids the first failure, but the assignment triggers new failure.About patch: else if (sub_needs_timetravel) { - /* track toplevel txn as well, subxact alone isn't meaningful */ + elog(DEBUG2, "forced transaction %u to do timetravel due to one of its subtransaction", + xid); + needs_timetravel = true; SnapBuildAddCommittedTxn(builder, xid);Why did you remove the above comment? I think it still makes sense to retain it.
Fixed.
Here are some review comments for v2 patch:
+# Test that we can force the top transaction to do timetravel when one of sub
+# transactions needs that. This is necessary when we restart decoding
from RUNNING_XACT
+# without the wal to associate subtransaction to its top transaction.
I don't think the second sentence is necessary.
---
The last decoding
+# starts from the first checkpoint and NEW_CID of "s0_truncate"
doesn't mark the top
+# transaction as catalog modifying transaction. In this scenario, the
enforcement sets
+# needs_timetravel to true even if the top transaction is regarded as
that it does not
+# have catalog changes and thus the decoding works without a
contradition that one
+# subtransaction needed timetravel while its top transaction didn't.
I don't understand the last sentence, probably it's a long sentence.
How about the following description?
# Test that we can handle the case where only subtransaction is marked
as containing
# catalog changes. The last decoding starts from NEW_CID generated by
"s0_truncate" and
# marks only the subtransaction as containing catalog changes but we
don't create the
# association between top-level transaction and subtransaction yet.
When decoding the
# commit record of the top-level transaction, we must force the
top-level transaction
# to do timetravel since one of its subtransactions is marked as
containing catalog changes.
---
+ elog(DEBUG2, "forced transaction %u to do timetravel due to one of
its subtransaction",
+ xid);
+ needs_timetravel = true;
I think "one of its subtransaction" should be "one of its subtransactions".
Regards,
--
Masahiko Sawada
Dear Sawada-san,
Thank you for reviewing HEAD patch! PSA v3 patch.
+# Test that we can force the top transaction to do timetravel when one of sub +# transactions needs that. This is necessary when we restart decoding from RUNNING_XACT +# without the wal to associate subtransaction to its top transaction.I don't think the second sentence is necessary.
--- The last decoding +# starts from the first checkpoint and NEW_CID of "s0_truncate" doesn't mark the top +# transaction as catalog modifying transaction. In this scenario, the enforcement sets +# needs_timetravel to true even if the top transaction is regarded as that it does not +# have catalog changes and thus the decoding works without a contradition that one +# subtransaction needed timetravel while its top transaction didn't.I don't understand the last sentence, probably it's a long sentence.
How about the following description?
# Test that we can handle the case where only subtransaction is marked
as containing
# catalog changes. The last decoding starts from NEW_CID generated by
"s0_truncate" and
# marks only the subtransaction as containing catalog changes but we
don't create the
# association between top-level transaction and subtransaction yet.
When decoding the
# commit record of the top-level transaction, we must force the
top-level transaction
# to do timetravel since one of its subtransactions is marked as
containing catalog changes.
Seems good, I replaced all of comments to yours.
+ elog(DEBUG2, "forced transaction %u to do timetravel due to one of its subtransaction", + xid); + needs_timetravel = true;I think "one of its subtransaction" should be "one of its subtransactions".
Fixed.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
HEAD-v3-0001-Fix-assertion-failure-during-logical-decoding.patchapplication/octet-stream; name=HEAD-v3-0001-Fix-assertion-failure-during-logical-decoding.patchDownload
From 6d3c0bc289a6f0fc6dd8fb9a980ef4cfd42e346b Mon Sep 17 00:00:00 2001
From: "kuroda.hayato%40jp.fujitsu.com" <kuroda.hayato@jp.fujitsu.com>
Date: Tue, 6 Sep 2022 05:08:33 +0000
Subject: [PATCH v3] Fix assertion failure during logical decoding.
In SnapBuildCommitTxn(), we assumed that the top transaction needs timetravel
too if one of sub transactions need that.
Normally this assumption was correct, but in some cases the necessity of
timetravel was inconsistent between top and sub transactions. This was caused
because the relation between transactions might not be recorded to the
reorder buffer when CHECKPOINT was occurred.
To fix this problem, this forces the timetravel for the top transaction if one
of sub transactions needs that.
Thanks to off-list discussion with Takamichi Osumi.
---
.../expected/catalog_change_snapshot.out | 45 +++++++++++++++++++
.../specs/catalog_change_snapshot.spec | 9 ++++
src/backend/replication/logical/snapbuild.c | 3 ++
3 files changed, 57 insertions(+)
diff --git a/contrib/test_decoding/expected/catalog_change_snapshot.out b/contrib/test_decoding/expected/catalog_change_snapshot.out
index d2a4bdfcc1..e36f417e4a 100644
--- a/contrib/test_decoding/expected/catalog_change_snapshot.out
+++ b/contrib/test_decoding/expected/catalog_change_snapshot.out
@@ -96,3 +96,48 @@ COMMIT
stop
(1 row)
+
+starting permutation: s0_init s0_begin s0_savepoint s0_insert s1_checkpoint s1_get_changes s0_truncate s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s0_truncate: TRUNCATE tbl1;
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+table public.tbl1: TRUNCATE: (no-flags)
+COMMIT
+(4 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+-------------------------------------------------------------
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+COMMIT
+(3 rows)
+
+?column?
+--------
+stop
+(1 row)
+
diff --git a/contrib/test_decoding/specs/catalog_change_snapshot.spec b/contrib/test_decoding/specs/catalog_change_snapshot.spec
index ff8f68489b..7ff28ed7e8 100644
--- a/contrib/test_decoding/specs/catalog_change_snapshot.spec
+++ b/contrib/test_decoding/specs/catalog_change_snapshot.spec
@@ -57,3 +57,12 @@ permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s
# checkpoint record it prunes one of the xacts in that list and when decoding the
# next checkpoint, it will completely prune that list.
permutation "s0_init" "s0_begin" "s0_truncate" "s2_begin" "s2_truncate" "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s2_commit" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+
+# Test that we can handle the case where only subtransaction is marked as
+# containing catalog changes. The last decoding starts from NEW_CID generated
+# by "s0_truncate" and marks only the subtransaction as containing catalog
+# changes but we don't create the association between top-level transaction and
+# subtransaction yet. When decoding the commit record of the top-level
+# transaction, we must force the top-level transaction to do timetravel since
+# one of its subtransactions is marked as containing catalog changes.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 54499c06fe..f892d19bfb 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1099,6 +1099,9 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
else if (sub_needs_timetravel)
{
/* track toplevel txn as well, subxact alone isn't meaningful */
+ elog(DEBUG2, "forced transaction %u to do timetravel due to one of its subtransactions",
+ xid);
+ needs_timetravel = true;
SnapBuildAddCommittedTxn(builder, xid);
}
else if (needs_timetravel)
--
2.27.0
On Wed, Sep 7, 2022 at 11:06 AM kuroda.hayato@fujitsu.com
<kuroda.hayato@fujitsu.com> wrote:
Dear Amit,
Thanks for giving comments!
Did you get this new assertion failure after you applied the patch for
the first failure? Because otherwise, how can you reach it with the
same test case?The first failure is occurred only in the HEAD, so I did not applied the first patch
to REL14 and REL15.
This difference is caused because the commit [Fix catalog lookup...] in REL15(272248a) and older is different
from the HEAD one.
In order versions SnapBuildXidSetCatalogChanges() has been added. In the function
a transaction will be marked as containing catalog changes if the transaction is in InitialRunningXacts,
and after that the relation between sub-top transactions is assigned based on the parsed->subxact.
The marking avoids the first failure, but the assignment triggers new failure.
FYI, as I just replied to the related thread[1]/messages/by-id/CAD21AoA1gV9pfu8hoXpTQBWH8uEMRg_F_MKM+U3Sr0HnyH4AUQ@mail.gmail.com, the assertion failure
in REL14 and REL15 can be fixed by the patch proposed there. So I'd
like to see how the discussion goes. Regardless of this proposed fix,
the patch proposed by Kuroda-san is required for HEAD, REL14, and
REL15, in order to fix the assertion failure in SnapBuildCommitTxn().
Regards,
[1]: /messages/by-id/CAD21AoA1gV9pfu8hoXpTQBWH8uEMRg_F_MKM+U3Sr0HnyH4AUQ@mail.gmail.com
--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Dear Sawada-san,
FYI, as I just replied to the related thread[1], the assertion failure
in REL14 and REL15 can be fixed by the patch proposed there. So I'd
like to see how the discussion goes. Regardless of this proposed fix,
the patch proposed by Kuroda-san is required for HEAD, REL14, and
REL15, in order to fix the assertion failure in SnapBuildCommitTxn().
I understood that my patches for REL14 and REL15 might be not needed.
I will check the thread later. Thanks!
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
On Wed, Oct 12, 2022 at 3:35 PM kuroda.hayato@fujitsu.com
<kuroda.hayato@fujitsu.com> wrote:
Dear Sawada-san,
FYI, as I just replied to the related thread[1], the assertion failure
in REL14 and REL15 can be fixed by the patch proposed there. So I'd
like to see how the discussion goes. Regardless of this proposed fix,
the patch proposed by Kuroda-san is required for HEAD, REL14, and
REL15, in order to fix the assertion failure in SnapBuildCommitTxn().I understood that my patches for REL14 and REL15 might be not needed.
No, sorry for confusing you. I meant that even if we agreed with the
patch I proposed there, your patch is still required to fix the issue.
Regards,
--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com