[BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding

Started by cca5507about 2 months ago7 messages
#1cca5507
cca5507@qq.com
1 attachment(s)

Hi,

When working on another historic snapshot's bug in [1]/messages/by-id/tencent_21E152AD504A814C071EDF41A4DD7BA84D06@qq.com, I find the $subject.

Here is a test case, but we need to add some log in SnapBuildSerialize() first:

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6e18baa33cb..6d13b2d811b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1523,6 +1523,19 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
        /* consistent snapshots have no next phase */
        Assert(builder->next_phase_at == InvalidTransactionId);
 
+       StringInfoData logbuf;
+       initStringInfo(&logbuf);
+       appendStringInfo(&logbuf, "SnapBuildSerialize: lsn: %X/%08X xmin: %u, xmax: %u, committed: ",
+                                        LSN_FORMAT_ARGS(lsn), builder->xmin, builder->xmax);
+       for (size_t i = 0; i < builder->committed.xcnt; i++)
+       {
+               if (i > 0)
+                       appendStringInfoString(&logbuf, ", ");
+               appendStringInfo(&logbuf, "%u", builder->committed.xip[i]);
+       }
+       elog(LOG, "%s", logbuf.data);
+       pfree(logbuf.data);
+
        /*
         * We identify snapshots by the LSN they are valid for. We don't need to
         * include timelines in the name as each LSN maps to exactly one timeline

1) create table t (id int) with (user_catalog_table = true);
2) select pg_create_logical_replication_slot('s1', 'test_decoding');
3) select pg_create_logical_replication_slot('s2', 'test_decoding');
4) insert into t values (1);
5) select pg_replication_slot_advance('s1', pg_current_wal_lsn());
6) select pg_logical_slot_get_changes('s2', pg_current_wal_lsn(), null);

Then we will find some log like this:

LOG: SnapBuildSerialize: lsn: 0/017D1318 xmin: 768, xmax: 768, committed:
STATEMENT: select pg_replication_slot_advance('s1', pg_current_wal_lsn());

LOG: SnapBuildSerialize: lsn: 0/017D1318 xmin: 768, xmax: 769, committed: 768
STATEMENT: select pg_logical_slot_get_changes('s2', pg_current_wal_lsn(), null);

At the same lsn, we get two different historic snapshots, and the first one (which is incorrect) is serialized to disk.

The main reason is that we don't handle XLOG_HEAP2_NEW_CID during fast-forwarding, so we don't consider the insert as having a catalog change.

Attach a patch to fix it.

Looking forward to your reply.

[1]: /messages/by-id/tencent_21E152AD504A814C071EDF41A4DD7BA84D06@qq.com
/messages/by-id/tencent_21E152AD504A814C071EDF41A4DD7BA84D06@qq.com

--
Regards,
ChangAo Chen

Attachments:

v1-0001-Handle-XLOG_HEAP2_NEW_CID-in-heap2_decode-even-if.patchapplication/octet-stream; charset=utf-8; name=v1-0001-Handle-XLOG_HEAP2_NEW_CID-in-heap2_decode-even-if.patchDownload
From 377692b2eba99408a2adf6aeb7b7fa42affe73ad Mon Sep 17 00:00:00 2001
From: ChangAo Chen <cca5507@qq.com>
Date: Sat, 22 Nov 2025 16:51:13 +0800
Subject: [PATCH v1] Handle XLOG_HEAP2_NEW_CID in heap2_decode() even if
 fast-forwarding.

---
 src/backend/replication/logical/decode.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..ffd4372917b 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -435,9 +435,11 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 				xlrec = (xl_heap_new_cid *) XLogRecGetData(buf->record);
 				SnapBuildProcessNewCid(builder, xid, buf->origptr, xlrec);
-
-				break;
 			}
+			else
+				ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
+												  buf->origptr);
+			break;
 		case XLOG_HEAP2_REWRITE:
 
 			/*
-- 
2.51.2

#2ocean_li_996
ocean_li_996@163.com
In reply to: cca5507 (#1)
1 attachment(s)
Re:[BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding

Hi ChangAo,

Thanks for your analyze and report.

At 2025-11-23 00:07:59, "cca5507" <cca5507@qq.com> wrote:

The main reason is that we don't handle XLOG_HEAP2_NEW_CID during fast-forwarding, so we don't consider the insert as having a catalog change.

Yeah. Refers to code below:
```
case XLOG_HEAP2_NEW_CID:
if (!ctx->fast_forward)
{
xl_heap_new_cid *xlrec;

xlrec = (xl_heap_new_cid *) XLogRecGetData(buf->record);
SnapBuildProcessNewCid(builder, xid, buf->origptr, xlrec);

break;
}
```
It is clear that transaction is skipped set has catalog change (in SnapBuildProcessNewCid) duraing fast forward.

However, in practice this does not cause any actual issues, because for the "real" system catalogs (those not

specified via user_catalog_table and thus critical for decoding), any HEAP2_NEW_CID changes are always accompanied

by some invalid messages. As a result, they are ultimately marked as catalog changes.

Still, from a code perspective, I would prefer fixing this behavior for logical consistency across the snapshot

builder, even if it doesn’t manifest as a runtime problem today. But we can not provide a test case.

Attach a patch to fix it.

The patch in attachment is better for me. What do you think?

Bset regards,
Haiyang Li

Attachments:

v2-0001-Handle-XLOG_HEAP2_NEW_CID-in-fast-forward.patchapplication/octet-stream; name=v2-0001-Handle-XLOG_HEAP2_NEW_CID-in-fast-forward.patch; x-cm-securityLevel=0Download
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..5d0c5088201 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -429,6 +429,15 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				DecodeMultiInsert(ctx, buf);
 			break;
 		case XLOG_HEAP2_NEW_CID:
+			/*
+			 * we only log new_cid's if a catalog tuple was modified, so mark the
+			 * transaction as containing catalog modifications.
+			 * 
+			 * Note: we do this even in fast-forward mode, as we need to maintain
+			 * the snapshot correctly.
+			 */
+			ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+
 			if (!ctx->fast_forward)
 			{
 				xl_heap_new_cid *xlrec;
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6e18baa33cb..9f78a881444 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -691,12 +691,6 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 {
 	CommandId	cid;
 
-	/*
-	 * we only log new_cid's if a catalog tuple was modified, so mark the
-	 * transaction as containing catalog modifications
-	 */
-	ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);
-
 	ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
 								 xlrec->target_locator, xlrec->target_tid,
 								 xlrec->cmin, xlrec->cmax,
#3cca5507
cca5507@qq.com
In reply to: ocean_li_996 (#2)
Re: [BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding

Hi Haiyang,

The patch in attachment is better for me. What do you think?

The v2-0001 LGTM.

A small suggestion:

We should move the 'break' out of the 'if', because we don't want it fall through to XLOG_HEAP2_REWRITE if we are fast-forwarding.

--
Regards,
ChangAo Chen

#4ocean_li_996
ocean_li_996@163.com
In reply to: cca5507 (#3)
1 attachment(s)
Re:Re: [BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding

Hi ChangAo,
At 2025-11-23 13:31:49, "cca5507" <cca5507@qq.com> wrote:

The patch in attachment is better for me. What do you think?

The v2-0001 LGTM.

A small suggestion:

We should move the 'break' out of the 'if', because we don't want it fall through to XLOG_HEAP2_REWRITE if we are fast-forwarding.

Fair. The patch updated is provided in attachment.

Best regards,
Haiyang Li

Attachments:

v3-0001-Handle-XLOG_HEAP2_NEW_CID-in-fast-forward.patchapplication/octet-stream; name=v3-0001-Handle-XLOG_HEAP2_NEW_CID-in-fast-forward.patch; x-cm-securityLevel=0Download
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..4f5704354a6 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -429,15 +429,23 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				DecodeMultiInsert(ctx, buf);
 			break;
 		case XLOG_HEAP2_NEW_CID:
+			/*
+			 * we only log new_cid's if a catalog tuple was modified, so mark the
+			 * transaction as containing catalog modifications.
+			 * 
+			 * Note: we do this even in fast-forward mode, as we need to maintain
+			 * the snapshot correctly.
+			 */
+			ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+
 			if (!ctx->fast_forward)
 			{
 				xl_heap_new_cid *xlrec;
 
 				xlrec = (xl_heap_new_cid *) XLogRecGetData(buf->record);
 				SnapBuildProcessNewCid(builder, xid, buf->origptr, xlrec);
-
-				break;
 			}
+			break;
 		case XLOG_HEAP2_REWRITE:
 
 			/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6e18baa33cb..9f78a881444 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -691,12 +691,6 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 {
 	CommandId	cid;
 
-	/*
-	 * we only log new_cid's if a catalog tuple was modified, so mark the
-	 * transaction as containing catalog modifications
-	 */
-	ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);
-
 	ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
 								 xlrec->target_locator, xlrec->target_tid,
 								 xlrec->cmin, xlrec->cmax,
#5cca5507
cca5507@qq.com
In reply to: ocean_li_996 (#4)
1 attachment(s)
Re: [BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding

Hi,

I add some commit message to the patch and create a CF entry:

https://commitfest.postgresql.org/patch/6304/

--
Regards,
ChangAo Chen

Attachments:

v4-0001-Handle-XLOG_HEAP2_NEW_CID-in-heap2_decode-even-if.patchapplication/octet-stream; charset=utf-8; name=v4-0001-Handle-XLOG_HEAP2_NEW_CID-in-heap2_decode-even-if.patchDownload
From 6f0a67ca2474498e22bfeefba68e1fd73edcc4bd Mon Sep 17 00:00:00 2001
From: ChangAo Chen <cca5507@qq.com>
Date: Wed, 10 Dec 2025 11:34:02 +0800
Subject: [PATCH v4] Handle XLOG_HEAP2_NEW_CID in heap2_decode() even if
 fast-forwarding.

We need to mark the transaction as containing catalog modifications
during decoding XLOG_HEAP2_NEW_CID even if fast-forwarding to maintain
the snapshot correctly. Otherwise, an incorrect historic snapshot may
be serialized to disk during fast-forwarding because the snapshot only
track catalog modified transaction.
---
 src/backend/replication/logical/decode.c    | 12 ++++++++++--
 src/backend/replication/logical/snapbuild.c |  9 ++-------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..31d22c51290 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -429,15 +429,23 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				DecodeMultiInsert(ctx, buf);
 			break;
 		case XLOG_HEAP2_NEW_CID:
+			/*
+			 * We only log new_cid's if a catalog tuple was modified, so mark the
+			 * transaction as containing catalog modifications.
+			 *
+			 * Note: we do this even in fast-forward mode, as we need to maintain
+			 * the snapshot correctly.
+			 */
+			ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+
 			if (!ctx->fast_forward)
 			{
 				xl_heap_new_cid *xlrec;
 
 				xlrec = (xl_heap_new_cid *) XLogRecGetData(buf->record);
 				SnapBuildProcessNewCid(builder, xid, buf->origptr, xlrec);
-
-				break;
 			}
+			break;
 		case XLOG_HEAP2_REWRITE:
 
 			/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index d6ab1e017eb..ab5ca82ff9f 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -682,7 +682,8 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 /*
  * Do CommandId/combo CID handling after reading an xl_heap_new_cid record.
  * This implies that a transaction has done some form of write to system
- * catalogs.
+ * catalogs. Caller must mark the transaction as containing catalog
+ * modifications in the reorder buffer.
  */
 void
 SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
@@ -690,12 +691,6 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 {
 	CommandId	cid;
 
-	/*
-	 * we only log new_cid's if a catalog tuple was modified, so mark the
-	 * transaction as containing catalog modifications
-	 */
-	ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);
-
 	ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
 								 xlrec->target_locator, xlrec->target_tid,
 								 xlrec->cmin, xlrec->cmax,
-- 
2.34.1

#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: ocean_li_996 (#4)
Re: Re: [BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding

On Sat, Nov 22, 2025 at 10:33 PM ocean_li_996 <ocean_li_996@163.com> wrote:

Hi ChangAo,

At 2025-11-23 13:31:49, "cca5507" <cca5507@qq.com> wrote:

The patch in attachment is better for me. What do you think?

The v2-0001 LGTM.

A small suggestion:

We should move the 'break' out of the 'if', because we don't want it fall through to XLOG_HEAP2_REWRITE if we are fast-forwarding.

Fair. The patch updated is provided in attachment.

While I agree with your analysis, I'm not sure what actual problems it
could lead to in practice. Have you had a chance to reproduce this
behavior by using DDLs instead of a user-catalog table? IIUC the
problem can occur if a transaction makes catalog changes and writes
only NEW_CID WAL records without INVALIDATION WAL records. However,
I'm not sure there are such transactions in practice. IIUC it would
not be a problem in terms of logical decoding even if we don't include
their XIDs to the snapshot if they change only user-catalog tables. It
might be more future proof to mark transactions as catalog-changed
even when fast-forwarding a NEW_CID record, as you proposed, but I'd
like to confirm the actual problems first.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#7cca5507
cca5507@qq.com
In reply to: Masahiko Sawada (#6)
Re: [BUG] Incorrect historic snapshot may be serialized to diskduring fast-forwarding

Hi,

While I agree with your analysis, I'm not sure what actual problems it
could lead to in practice. Have you had a chance to reproduce this
behavior by using DDLs instead of a user-catalog table?

I'm not sure if this can be reproduced by using DDLs, but I will try.

IIUC the problem can occur if a transaction makes catalog changes and writes
only NEW_CID WAL records without INVALIDATION WAL records.

+1

However, I'm not sure there are such transactions in practice.

Maybe currently not, but what about future, I'm not sure yet.

IIUC it would not be a problem in terms of logical decoding even if we
don't include their XIDs to the snapshot if they change only user-catalog
tables. It might be more future proof to mark transactions as catalog-changed
even when fast-forwarding a NEW_CID record, as you proposed, but I'd
like to confirm the actual problems first.

Our current design of historic snapshot is to track all catalogs even if only a part
of them are useful for logical decoding. So I think this bug breaks our design even
if it's maybe ok in practice.

--
Regards,
ChangAo Chen