Fix premature xmin advancement during fast forward decoding

Started by Zhijie Hou (Fujitsu)9 months ago15 messages

houzj.fnst@fujitsu.com

9 months ago

2 attachment(s)

Hi,

When analyzing some issues in another thread[1]/messages/by-id/OS3PR01MB5718ED3F0123737C7380BBAD94BB2@OS3PR01MB5718.jpnprd01.prod.outlook.com, I found a bug that fast forward
decoding could lead to premature advancement of catalog_xmin, resulting in
required catalog data being removed by vacuum.

The issue arises because we do not build a base snapshot when decoding changes
during fast forward mode, preventing reference to the minimum transaction ID
that remains visible in the snapshot when determining the candidate for
catalog_xmin. As a result, catalog_xmin was directly advanced to the oldest
running transaction ID found in the latest running_xacts record.

In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could not
be reached during fast forward decoding, resulting in
rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the
system attempts to refer to rb->txns_by_base_snapshot_lsn via
ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is
NULL, it defaults to directly using running->oldestRunningXid as the candidate
for catalog_xmin. For reference, see the details in
SnapBuildProcessRunningXacts.

See the attachment for a test(0002) to prove that the catalog data that are
still required would be removed after premature catalog_xmin advancement during
fast forward decoding.

To fix this, I think we can allow the base snapshot to be built during fast
forward decoding, as implemented in the patch 0001 (We already built base
snapshot in fast-forward mode for logical message in logicalmsg_decode()).

I also explored the possibility of further optimizing the fix to reduce the
overhead associated with building a snapshot in fast-forward mode. E.g., to
maintain a single base_snapshot_xmin rather than generating a full snapshot for
each transaction, and use this base_snapshot_xmin when determining the
candidate catalog_xmin. However, I am not feeling confident about maintaining
some fast-forward dedicated fields in common structures and
perhaps employing different handling for catalog_xmin.

Moreover, I conducted a basic test[2]Create a table test(a int); Create a slot A. pgbench postgres -T 60 -j 100 -c 100 -f simple-insert to test the patch's impact, noting that
advancing the slot incurs roughly a 4% increase in processing time after
applying the patch, which appears to be acceptable. Additionally, the cost
associated with building the snapshot via SnapBuildBuildSnapshot() did not show
up in the profile. Therefore, I think it's reasonable to refrain from
further optimization at this stage.

[1]: /messages/by-id/OS3PR01MB5718ED3F0123737C7380BBAD94BB2@OS3PR01MB5718.jpnprd01.prod.outlook.com

[2]: Create a table test(a int); Create a slot A. pgbench postgres -T 60 -j 100 -c 100 -f simple-insert
Create a table test(a int);
Create a slot A.
pgbench postgres -T 60 -j 100 -c 100 -f simple-insert

simple-insert:
BEGIN;
INSERT INTO test VALUES(1);
COMMIT;

use pg_replication_slot_advance to advance the slot A to the latest position
and count the time.

HEAD: Time: 4189.257 ms
PATCH: Time: 4371.358 ms

Best Regards,
Hou zj

Attachments:

v1-0002-Add-a-test.patchapplication/octet-stream; name=v1-0002-Add-a-test.patchDownload

From f55acc6c9838dbb04f50a7677517214ee43d5be8 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Tue, 22 Apr 2025 11:03:40 +0800
Subject: [PATCH v1 2/2] Add a test

---
 .../test_decoding/expected/oldest_xmin.out    | 41 +++++++++++++++++++
 contrib/test_decoding/specs/oldest_xmin.spec  |  5 +++
 2 files changed, 46 insertions(+)

diff --git a/contrib/test_decoding/expected/oldest_xmin.out b/contrib/test_decoding/expected/oldest_xmin.out
index dd6053f9c1f..57268b38d33 100644
--- a/contrib/test_decoding/expected/oldest_xmin.out
+++ b/contrib/test_decoding/expected/oldest_xmin.out
@@ -38,3 +38,44 @@ COMMIT
 stop    
 (1 row)
 
+
+starting permutation: s0_begin s0_getxid s1_begin s1_insert s0_alter s0_commit s0_checkpoint s0_advance_slot s0_advance_slot s1_commit s0_vacuum s0_get_changes
+step s0_begin: BEGIN;
+step s0_getxid: SELECT pg_current_xact_id() IS NULL;
+?column?
+--------
+f       
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_insert: INSERT INTO harvest VALUES ((1, 2, 3));
+step s0_alter: ALTER TYPE basket DROP ATTRIBUTE mangos;
+step s0_commit: COMMIT;
+step s0_checkpoint: CHECKPOINT;
+step s0_advance_slot: SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn());
+slot_name     
+--------------
+isolation_slot
+(1 row)
+
+step s0_advance_slot: SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn());
+slot_name     
+--------------
+isolation_slot
+(1 row)
+
+step s1_commit: COMMIT;
+step s0_vacuum: VACUUM pg_attribute;
+step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data                                                  
+------------------------------------------------------
+BEGIN                                                 
+table public.harvest: INSERT: fruits[basket]:'(1,2,3)'
+COMMIT                                                
+(3 rows)
+
+?column?
+--------
+stop    
+(1 row)
+
diff --git a/contrib/test_decoding/specs/oldest_xmin.spec b/contrib/test_decoding/specs/oldest_xmin.spec
index 88bd30f5ff7..7f2fe3d7ed7 100644
--- a/contrib/test_decoding/specs/oldest_xmin.spec
+++ b/contrib/test_decoding/specs/oldest_xmin.spec
@@ -25,6 +25,7 @@ step "s0_commit" { COMMIT; }
 step "s0_checkpoint" { CHECKPOINT; }
 step "s0_vacuum" { VACUUM pg_attribute; }
 step "s0_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+step "s0_advance_slot" { SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn()); }
 
 session "s1"
 setup { SET synchronous_commit=on; }
@@ -40,3 +41,7 @@ step "s1_commit" { COMMIT; }
 # will be removed (xmax set) before T1 commits. That is, interlocking doesn't
 # forbid modifying catalog after someone read it (and didn't commit yet).
 permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_get_changes" "s0_get_changes" "s1_commit" "s0_vacuum" "s0_get_changes"
+
+# Perform the same testing process as described above, but use advance_slot to
+# forces xmin advancement during fast forward decoding.
+permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_advance_slot" "s0_advance_slot" "s1_commit" "s0_vacuum" "s0_get_changes"
-- 
2.30.0.windows.2

v1-0001-Fix-premature-xmin-advancement-during-fast-forwar.patchapplication/octet-stream; name=v1-0001-Fix-premature-xmin-advancement-during-fast-forwar.patchDownload

From 5327186830024769c2d893fba5859ebbdc04af29 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Tue, 22 Apr 2025 10:58:35 +0800
Subject: [PATCH v1 1/2] Fix premature xmin advancement during fast forward
 decoding

Fast forward decoding could lead to premature advancement of catalog_xmin,
resulting in required catalog data being removed by vacuum. The issue arise
because we do not build a base snapshot when decoding changes during fast
forward mode, preventing reference to the minimum transaction ID that remains
visible in the snapshot when determining the candidate for catalog_xmin. As a
result, catalog_xmin was directly advanced to the oldest running transaction ID
found in the latest running_xacts record.

To resolve this, the commit allows the base snapshot to be built during fast
forward decoding.
---
 src/backend/replication/logical/decode.c | 37 ++++++++++++++++--------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 78f9a0a11c4..8dc467a8bb3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -412,19 +412,24 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding changes.
+	 * point in decoding data changes. However, it's crucial to build the base
+	 * snapshot during fast-forward mode (as done in SnapBuildProcessChange())
+	 * because the snapshot's minimum visible transaction ID (xmin) must be
+	 * referenced when determining the candidate catalog_xmin for the
+	 * replication slot.
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	switch (info)
 	{
 		case XLOG_HEAP2_MULTI_INSERT:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeMultiInsert(ctx, buf);
 			break;
 		case XLOG_HEAP2_NEW_CID:
+			if (!ctx->fast_forward)
 			{
 				xl_heap_new_cid *xlrec;
 
@@ -471,16 +476,20 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding data changes.
+	 * point in decoding data changes. However, it's crucial to build the base
+	 * snapshot during fast-forward mode (as done in SnapBuildProcessChange())
+	 * because the snapshot's minimum visible transaction ID (xmin) must be
+	 * referenced when determining the candidate catalog_xmin for the
+	 * replication slot.
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	switch (info)
 	{
 		case XLOG_HEAP_INSERT:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeInsert(ctx, buf);
 			break;
 
@@ -491,17 +500,20 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			 */
 		case XLOG_HEAP_HOT_UPDATE:
 		case XLOG_HEAP_UPDATE:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeUpdate(ctx, buf);
 			break;
 
 		case XLOG_HEAP_DELETE:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeDelete(ctx, buf);
 			break;
 
 		case XLOG_HEAP_TRUNCATE:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeTruncate(ctx, buf);
 			break;
 
@@ -525,7 +537,8 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			break;
 
 		case XLOG_HEAP_CONFIRM:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeSpecConfirm(ctx, buf);
 			break;
 
-- 
2.30.0.windows.2

shveta malik

shveta.malik@gmail.com

9 months ago

In reply to: Zhijie Hou (Fujitsu) (#1)

Re: Fix premature xmin advancement during fast forward decoding

On Tue, Apr 22, 2025 at 12:36 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

To fix this, I think we can allow the base snapshot to be built during fast
forward decoding, as implemented in the patch 0001 (We already built base
snapshot in fast-forward mode for logical message in logicalmsg_decode()).

The idea and code looks okay to me and the performance impact is also
not that huge.

IIUC, fast_forward decoding mode is used only in two cases. 1)
pg_replication_slot_advance and 2) in upgrade flow to check if there
are any pending WAL changes which are not yet replicated. See
'binary_upgrade_logical_slot_has_caught_up'-->'LogicalReplicationSlotHasPendingWal'.
It seems like this change will not have any negative impact in the
upgrade flow as well (in terms of performance and anything else).
Thoughts?

Moreover, I conducted a basic test[2] to test the patch's impact, noting that
advancing the slot incurs roughly a 4% increase in processing time after
applying the patch, which appears to be acceptable. Additionally, the cost
associated with building the snapshot via SnapBuildBuildSnapshot() did not show
up in the profile. Therefore, I think it's reasonable to refrain from
further optimization at this stage.

I agree on this.

thanks
Shveta

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

9 months ago

In reply to: shveta malik (#2)

RE: Fix premature xmin advancement during fast forward decoding

On Wed, Apr 23, 2025 at 2:31 PM shveta malik wrote:

On Tue, Apr 22, 2025 at 12:36 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

To fix this, I think we can allow the base snapshot to be built during fast
forward decoding, as implemented in the patch 0001 (We already built base
snapshot in fast-forward mode for logical message in logicalmsg_decode()).

The idea and code looks okay to me and the performance impact is also
not that huge.

Thanks for reviewing!

IIUC, fast_forward decoding mode is used only in two cases. 1)
pg_replication_slot_advance and 2) in upgrade flow to check if there
are any pending WAL changes which are not yet replicated. See
'binary_upgrade_logical_slot_has_caught_up'-->'LogicalReplicationSlotHasP
endingWal'.
It seems like this change will not have any negative impact in the
upgrade flow as well (in terms of performance and anything else).
Thoughts?

I also think so. The upgrade uses fast forward decoding to check if there are
any pending WALs not sent yet and report an ERROR if so. This indicates that
we do not expect to decode real changes for the upgrade slots, so the
performance impact would be minimal in normal cases.

Best Regards,
Hou zj

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Zhijie Hou (Fujitsu) (#1)

1 attachment(s)

Re: Fix premature xmin advancement during fast forward decoding

On Tue, Apr 22, 2025 at 12:36 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

When analyzing some issues in another thread[1], I found a bug that fast forward
decoding could lead to premature advancement of catalog_xmin, resulting in
required catalog data being removed by vacuum.

The issue arises because we do not build a base snapshot when decoding changes
during fast forward mode, preventing reference to the minimum transaction ID
that remains visible in the snapshot when determining the candidate for
catalog_xmin. As a result, catalog_xmin was directly advanced to the oldest
running transaction ID found in the latest running_xacts record.

In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could not
be reached during fast forward decoding, resulting in
rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the
system attempts to refer to rb->txns_by_base_snapshot_lsn via
ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is
NULL, it defaults to directly using running->oldestRunningXid as the candidate
for catalog_xmin. For reference, see the details in
SnapBuildProcessRunningXacts.

See the attachment for a test(0002) to prove that the catalog data that are
still required would be removed after premature catalog_xmin advancement during
fast forward decoding.

To fix this, I think we can allow the base snapshot to be built during fast
forward decoding, as implemented in the patch 0001 (We already built base
snapshot in fast-forward mode for logical message in logicalmsg_decode()).

The same bug was fixed for non-fast_forward mode in commit f49a80c4.
See the following code in that commit:

-       LogicalIncreaseXminForSlot(lsn, running->oldestRunningXid);
+       xmin = ReorderBufferGetOldestXmin(builder->reorder);
+       if (xmin == InvalidTransactionId)
+               xmin = running->oldestRunningXid;
....
+       LogicalIncreaseXminForSlot(lsn, xmin);

Is my understanding correct? If so, then I think it is a miss in the
commit f49a80c4 to consider fast_forward mode. Please refer commit in
the commit message.

The code changes look correct to me. I have some suggestions related
to comments in the code. See attached. Please prepare patches for back
branches as well.

--
With Regards,
Amit Kapila.

Attachments:

v1_comments_amit.1.patch.txttext/plain; charset=US-ASCII; name=v1_comments_amit.1.patch.txtDownload

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 8dc467a8bb3..cc03f0706e9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -413,10 +413,10 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
 	 * point in decoding data changes. However, it's crucial to build the base
-	 * snapshot during fast-forward mode (as done in SnapBuildProcessChange())
-	 * because the snapshot's minimum visible transaction ID (xmin) must be
-	 * referenced when determining the candidate catalog_xmin for the
-	 * replication slot.
+	 * snapshot during fast-forward mode (as is done in
+	 * SnapBuildProcessChange()) because we require the snapshot's xmin when
+	 * determining the candidate catalog_xmin for the replication slot. See
+	 * SnapBuildProcessRunningXacts().
 	 */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
@@ -477,10 +477,10 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
 	 * point in decoding data changes. However, it's crucial to build the base
-	 * snapshot during fast-forward mode (as done in SnapBuildProcessChange())
-	 * because the snapshot's minimum visible transaction ID (xmin) must be
-	 * referenced when determining the candidate catalog_xmin for the
-	 * replication slot.
+	 * snapshot during fast-forward mode (as is done in
+	 * SnapBuildProcessChange()) because we require the snapshot's xmin when
+	 * determining the candidate catalog_xmin for the replication slot. See
+	 * SnapBuildProcessRunningXacts().
 	 */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;

Masahiko Sawada

sawada.mshk@gmail.com

9 months ago

In reply to: Zhijie Hou (Fujitsu) (#1)

Re: Fix premature xmin advancement during fast forward decoding

Hi,

On Tue, Apr 22, 2025 at 12:06 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When analyzing some issues in another thread[1], I found a bug that fast forward
decoding could lead to premature advancement of catalog_xmin, resulting in
required catalog data being removed by vacuum.

The issue arises because we do not build a base snapshot when decoding changes
during fast forward mode, preventing reference to the minimum transaction ID
that remains visible in the snapshot when determining the candidate for
catalog_xmin. As a result, catalog_xmin was directly advanced to the oldest
running transaction ID found in the latest running_xacts record.

In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could not
be reached during fast forward decoding, resulting in
rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the
system attempts to refer to rb->txns_by_base_snapshot_lsn via
ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is
NULL, it defaults to directly using running->oldestRunningXid as the candidate
for catalog_xmin. For reference, see the details in
SnapBuildProcessRunningXacts.

I agree with your analysis.

To fix this, I think we can allow the base snapshot to be built during fast
forward decoding, as implemented in the patch 0001 (We already built base
snapshot in fast-forward mode for logical message in logicalmsg_decode()).

I also explored the possibility of further optimizing the fix to reduce the
overhead associated with building a snapshot in fast-forward mode. E.g., to
maintain a single base_snapshot_xmin rather than generating a full snapshot for
each transaction, and use this base_snapshot_xmin when determining the
candidate catalog_xmin. However, I am not feeling confident about maintaining
some fast-forward dedicated fields in common structures and
perhaps employing different handling for catalog_xmin.

Moreover, I conducted a basic test[2] to test the patch's impact, noting that
advancing the slot incurs roughly a 4% increase in processing time after
applying the patch, which appears to be acceptable. Additionally, the cost
associated with building the snapshot via SnapBuildBuildSnapshot() did not show
up in the profile.

Where did the regression come from? I think that even with your patch
SnapBuildBuildSnapshot() would be called only a few times in the
workload. With the patch, I think that we need to create
ReorderBufferTXN entries for each transaction even during fast_forward
mode, which could lead to overheads. I think that 4% degradation is
something that we want to avoid.

Regards

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

9 months ago

In reply to: Masahiko Sawada (#5)

1 attachment(s)

RE: Fix premature xmin advancement during fast forward decoding

On Fri, Apr 25, 2025 at 5:44 AM Masahiko Sawada wrote:

(Resending this email after compressing the attachment)

On Tue, Apr 22, 2025 at 12:06 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When analyzing some issues in another thread[1], I found a bug that
fast forward decoding could lead to premature advancement of
catalog_xmin, resulting in required catalog data being removed by vacuum.

The issue arises because we do not build a base snapshot when decoding
changes during fast forward mode, preventing reference to the minimum
transaction ID that remains visible in the snapshot when determining
the candidate for catalog_xmin. As a result, catalog_xmin was directly
advanced to the oldest running transaction ID found in the latest

running_xacts record.

In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot
could not be reached during fast forward decoding, resulting in
rb->txns_by_base_snapshot_lsn being NULL. When advancing

catalog_xmin,

rb->the
system attempts to refer to rb->txns_by_base_snapshot_lsn via
ReorderBufferGetOldestXmin(). However, since
rb->txns_by_base_snapshot_lsn is NULL, it defaults to directly using
running->oldestRunningXid as the candidate for catalog_xmin. For
reference, see the details in SnapBuildProcessRunningXacts.

I agree with your analysis.

To fix this, I think we can allow the base snapshot to be built during
fast forward decoding, as implemented in the patch 0001 (We already
built base snapshot in fast-forward mode for logical message in

logicalmsg_decode()).

I also explored the possibility of further optimizing the fix to
reduce the overhead associated with building a snapshot in
fast-forward mode. E.g., to maintain a single base_snapshot_xmin
rather than generating a full snapshot for each transaction, and use
this base_snapshot_xmin when determining the candidate catalog_xmin.
However, I am not feeling confident about maintaining some
fast-forward dedicated fields in common structures and perhaps employing

different handling for catalog_xmin.

Moreover, I conducted a basic test[2] to test the patch's impact,
noting that advancing the slot incurs roughly a 4% increase in
processing time after applying the patch, which appears to be
acceptable. Additionally, the cost associated with building the
snapshot via SnapBuildBuildSnapshot() did not show up in the profile.

Where did the regression come from? I think that even with your patch
SnapBuildBuildSnapshot() would be called only a few times in the workload.

AFAICS, the primary cost arises from the additional function
invocations/executions. Functions such as SnapBuildProcessChange,
ReorderBufferSetBaseSnapshot, and ReorderBufferXidHasBaseSnapshot are invoked
more frequently after the patch. Although these functions aren't inherently
expensive, the scenario I tested involved numerous transactions starting with
just a single insert each. This resulted in a high frequency of calls to these
functions.

Attaching the profile for both HEAD and PATCHED for reference.

With the patch, I think that we need to create ReorderBufferTXN entries for
each transaction even during fast_forward mode, which could lead to
overheads.

I think ReorderBufferTXN was created in fast_forward even without the patch.
See heap_decode-> ReorderBufferProcessXid.

I think that 4% degradation is something that we want to avoid.

As mentioned above, these costs arise from essential function executions
required to maintain txns_by_base_snapshot_lsn. So I think the cost is
reasonable to ensure correctness. Thoughts ?

Best Regards,
Hou zj

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Zhijie Hou (Fujitsu) (#1)

Re: Fix premature xmin advancement during fast forward decoding

On Fri, Apr 25, 2025 at 8:14 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Fri, Apr 25, 2025 at 5:44 AM Masahiko Sawada wrote:

On Tue, Apr 22, 2025 at 12:06 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When analyzing some issues in another thread[1], I found a bug that
fast forward decoding could lead to premature advancement of
catalog_xmin, resulting in required catalog data being removed by vacuum.

The issue arises because we do not build a base snapshot when decoding
changes during fast forward mode, preventing reference to the minimum
transaction ID that remains visible in the snapshot when determining
the candidate for catalog_xmin. As a result, catalog_xmin was directly
advanced to the oldest running transaction ID found in the latest

running_xacts record.

In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot
could not be reached during fast forward decoding, resulting in
rb->txns_by_base_snapshot_lsn being NULL. When advancing

catalog_xmin,

rb->the
system attempts to refer to rb->txns_by_base_snapshot_lsn via
ReorderBufferGetOldestXmin(). However, since
rb->txns_by_base_snapshot_lsn is NULL, it defaults to directly using
running->oldestRunningXid as the candidate for catalog_xmin. For
reference, see the details in SnapBuildProcessRunningXacts.

I agree with your analysis.

To fix this, I think we can allow the base snapshot to be built during
fast forward decoding, as implemented in the patch 0001 (We already
built base snapshot in fast-forward mode for logical message in

logicalmsg_decode()).

I also explored the possibility of further optimizing the fix to
reduce the overhead associated with building a snapshot in
fast-forward mode. E.g., to maintain a single base_snapshot_xmin
rather than generating a full snapshot for each transaction, and use
this base_snapshot_xmin when determining the candidate catalog_xmin.
However, I am not feeling confident about maintaining some
fast-forward dedicated fields in common structures and perhaps employing

different handling for catalog_xmin.

Moreover, I conducted a basic test[2] to test the patch's impact,
noting that advancing the slot incurs roughly a 4% increase in
processing time after applying the patch, which appears to be
acceptable. Additionally, the cost associated with building the
snapshot via SnapBuildBuildSnapshot() did not show up in the profile.

Where did the regression come from? I think that even with your patch
SnapBuildBuildSnapshot() would be called only a few times in the workload.

AFAICS, the primary cost arises from the additional function
invocations/executions. Functions such as SnapBuildProcessChange,
ReorderBufferSetBaseSnapshot, and ReorderBufferXidHasBaseSnapshot are invoked
more frequently after the patch. Although these functions aren't inherently
expensive, the scenario I tested involved numerous transactions starting with
just a single insert each. This resulted in a high frequency of calls to these
functions.

Yeah, I see below from the patched profile.

As the number of operations per transaction increases, this cost
should reduce further as we will set base_snapshot only once per
transaction. Now, we can try to micro-optimize this by passing
ReorderBufferTXN, as that is computed before we invoke
SnapBuildProcessChange, but not sure how much it will change. However,
the bigger question is, is it really worth optimizing this code path?

If we really want to optimize this code path for the test Hou-San did,
I see that SnapBuildCommitTxn() has a bigger overhead, so we should
investigate whether we really need it for the fast-forward mode, where
we won't process changes. I am of the opinion that we need to pay this
additional cost of setting base_snapshot for the correctness of the
fast-forward mode.

--
With Regards,
Amit Kapila.

Import Notes

Reply to msg id not found: OS0PR01MB57164FB1294A5CBE590097EE94842@OS0PR01MB5716.jpnprd01.prod.outlook.com

Masahiko Sawada

sawada.mshk@gmail.com

9 months ago

In reply to: Amit Kapila (#7)

Re: Fix premature xmin advancement during fast forward decoding

On Thu, Apr 24, 2025 at 9:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 25, 2025 at 8:14 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Fri, Apr 25, 2025 at 5:44 AM Masahiko Sawada wrote:

On Tue, Apr 22, 2025 at 12:06 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When analyzing some issues in another thread[1], I found a bug that
fast forward decoding could lead to premature advancement of
catalog_xmin, resulting in required catalog data being removed by vacuum.

The issue arises because we do not build a base snapshot when decoding
changes during fast forward mode, preventing reference to the minimum
transaction ID that remains visible in the snapshot when determining
the candidate for catalog_xmin. As a result, catalog_xmin was directly
advanced to the oldest running transaction ID found in the latest

running_xacts record.

In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot
could not be reached during fast forward decoding, resulting in
rb->txns_by_base_snapshot_lsn being NULL. When advancing

catalog_xmin,

rb->the
system attempts to refer to rb->txns_by_base_snapshot_lsn via
ReorderBufferGetOldestXmin(). However, since
rb->txns_by_base_snapshot_lsn is NULL, it defaults to directly using
running->oldestRunningXid as the candidate for catalog_xmin. For
reference, see the details in SnapBuildProcessRunningXacts.

I agree with your analysis.

To fix this, I think we can allow the base snapshot to be built during
fast forward decoding, as implemented in the patch 0001 (We already
built base snapshot in fast-forward mode for logical message in

logicalmsg_decode()).

I also explored the possibility of further optimizing the fix to
reduce the overhead associated with building a snapshot in
fast-forward mode. E.g., to maintain a single base_snapshot_xmin
rather than generating a full snapshot for each transaction, and use
this base_snapshot_xmin when determining the candidate catalog_xmin.
However, I am not feeling confident about maintaining some
fast-forward dedicated fields in common structures and perhaps employing

different handling for catalog_xmin.

Moreover, I conducted a basic test[2] to test the patch's impact,
noting that advancing the slot incurs roughly a 4% increase in
processing time after applying the patch, which appears to be
acceptable. Additionally, the cost associated with building the
snapshot via SnapBuildBuildSnapshot() did not show up in the profile.

Where did the regression come from? I think that even with your patch
SnapBuildBuildSnapshot() would be called only a few times in the workload.

AFAICS, the primary cost arises from the additional function
invocations/executions. Functions such as SnapBuildProcessChange,
ReorderBufferSetBaseSnapshot, and ReorderBufferXidHasBaseSnapshot are invoked
more frequently after the patch. Although these functions aren't inherently
expensive, the scenario I tested involved numerous transactions starting with
just a single insert each. This resulted in a high frequency of calls to these
functions.

Ah, you're right. I looked at the wrong profile.

Yeah, I see below from the patched profile.

|--4.65%--SnapBuildProcessChange
| |
| |
| |
| |--2.08%--ReorderBufferSetBaseSnapshot
| |
| | |
| |
| | --1.32%--__mcount_internal
| |
| |
| |
| |--1.29%--__mcount_internal
| |
| |
| |
| --0.72%--ReorderBufferXidHasBaseSnapshot

It looks like you've run the benchmark test with enabling assertions,
is that right? I can see AssertTXNLsnOrder() in the profiles.

As the number of operations per transaction increases, this cost
should reduce further as we will set base_snapshot only once per
transaction. Now, we can try to micro-optimize this by passing
ReorderBufferTXN, as that is computed before we invoke
SnapBuildProcessChange, but not sure how much it will change. However,
the bigger question is, is it really worth optimizing this code path?

If we really want to optimize this code path for the test Hou-San did,
I see that SnapBuildCommitTxn() has a bigger overhead, so we should
investigate whether we really need it for the fast-forward mode, where
we won't process changes. I am of the opinion that we need to pay this
additional cost of setting base_snapshot for the correctness of the
fast-forward mode.

I've done the same test in my environment and saw about 7.2% degradation:

HEAD: 773.538 ms
PATCHED: 833.906 ms

It depends on the workload and environment of course but I'm not sure
this is a reasonable cost we have to pay.

What I'm concerned about is the back branches. With this approach all
back branches will have such degradations and it doesn't make sense to
me to optimize SnapBuildCommitTxn() codes in back branches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Masahiko Sawada (#8)

Re: Fix premature xmin advancement during fast forward decoding

On Fri, Apr 25, 2025 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

What I'm concerned about is the back branches. With this approach all
back branches will have such degradations and it doesn't make sense to
me to optimize SnapBuildCommitTxn() codes in back branches.

One possibility could be that instead of maintaining an entire
snapshot in fast_forward mode, we can maintain snapshot's xmin in each
ReorderBufferTXN. But then also, how would we get the minimum
txns_by_base_snapshot_lsn as we are getting now in
ReorderBufferGetOldestXmin? I think we need to traverse the entire
list of txns to get it in fast_forward mode but that may not show up
because it will not be done for each transaction. We can try such a
thing, but it won't be clean to have fast_forward specific code and
also it would be better to add such things only for HEAD. Can you
think of any better ideas?

--
With Regards,
Amit Kapila.

#10

Masahiko Sawada

sawada.mshk@gmail.com

9 months ago

In reply to: Amit Kapila (#9)

Re: Fix premature xmin advancement during fast forward decoding

On Fri, Apr 25, 2025 at 4:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 25, 2025 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

What I'm concerned about is the back branches. With this approach all
back branches will have such degradations and it doesn't make sense to
me to optimize SnapBuildCommitTxn() codes in back branches.

One possibility could be that instead of maintaining an entire
snapshot in fast_forward mode, we can maintain snapshot's xmin in each
ReorderBufferTXN. But then also, how would we get the minimum
txns_by_base_snapshot_lsn as we are getting now in
ReorderBufferGetOldestXmin? I think we need to traverse the entire
list of txns to get it in fast_forward mode but that may not show up
because it will not be done for each transaction. We can try such a
thing, but it won't be clean to have fast_forward specific code and
also it would be better to add such things only for HEAD.

Agreed.

Can you think of any better ideas?

No idea. Hmm, there seems no reasonable way to fix this issue for back
branches. I consented to the view that these costs were something that
we should have paid from the beginning.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#11

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Masahiko Sawada (#10)

Re: Fix premature xmin advancement during fast forward decoding

On Sat, Apr 26, 2025 at 12:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 25, 2025 at 4:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 25, 2025 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

What I'm concerned about is the back branches. With this approach all
back branches will have such degradations and it doesn't make sense to
me to optimize SnapBuildCommitTxn() codes in back branches.

One possibility could be that instead of maintaining an entire
snapshot in fast_forward mode, we can maintain snapshot's xmin in each
ReorderBufferTXN. But then also, how would we get the minimum
txns_by_base_snapshot_lsn as we are getting now in
ReorderBufferGetOldestXmin? I think we need to traverse the entire
list of txns to get it in fast_forward mode but that may not show up
because it will not be done for each transaction. We can try such a
thing, but it won't be clean to have fast_forward specific code and
also it would be better to add such things only for HEAD.

Agreed.

We may need to consider handling xmin in SnapBuildCommitTxn(), where
we also set the base snapshot.

Can you think of any better ideas?

No idea. Hmm, there seems no reasonable way to fix this issue for back
branches. I consented to the view that these costs were something that
we should have paid from the beginning.

Right, I feel we should go with the simple change proposed by Hou-San
for now to fix the bug. If, in the future, we encounter any cases
where such optimizations can help for fast-forward mode, then we can
consider it. Does that sound reasonable to you?

--
With Regards,
Amit Kapila.

#12

Masahiko Sawada

sawada.mshk@gmail.com

9 months ago

In reply to: Amit Kapila (#11)

Re: Fix premature xmin advancement during fast forward decoding

On Sat, Apr 26, 2025 at 5:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 26, 2025 at 12:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 25, 2025 at 4:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Can you think of any better ideas?

No idea. Hmm, there seems no reasonable way to fix this issue for back
branches. I consented to the view that these costs were something that
we should have paid from the beginning.

Right, I feel we should go with the simple change proposed by Hou-San
for now to fix the bug. If, in the future, we encounter any cases
where such optimizations can help for fast-forward mode, then we can
consider it. Does that sound reasonable to you?

Yes, agreed with this approach.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#13

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

9 months ago

In reply to: Masahiko Sawada (#12)

2 attachment(s)

RE: Fix premature xmin advancement during fast forward decoding

On Sun, Apr 27, 2025 at 2:07 PM Masahiko Sawada wrote:

On Sat, Apr 26, 2025 at 5:01 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Sat, Apr 26, 2025 at 12:01 AM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

On Fri, Apr 25, 2025 at 4:42 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

Can you think of any better ideas?

No idea. Hmm, there seems no reasonable way to fix this issue for
back branches. I consented to the view that these costs were
something that we should have paid from the beginning.

Right, I feel we should go with the simple change proposed by Hou-San
for now to fix the bug. If, in the future, we encounter any cases
where such optimizations can help for fast-forward mode, then we can
consider it. Does that sound reasonable to you?

Yes, agreed with this approach.

+1. Thanks for the discussion!

Here is the V2 patch (including both HEAD and back-branch versions)
which merged Amit's suggestions for the comments. It can pass regression
and pgindent check.

I also adjusted the commit message to mention the commit f49a80c4
as suggested by Amit.

Best Regards,
Hou zj

Attachments:

v2-0001-HEAD-PG17-Fix-premature-xmin-advancement-during-fast-forwar.patchapplication/octet-stream; name=v2-0001-HEAD-PG17-Fix-premature-xmin-advancement-during-fast-forwar.patchDownload

From 60dde809b654b4f4ae87200cd427372df2fd3521 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Sun, 27 Apr 2025 14:48:06 +0800
Subject: [PATCH v2] Fix premature xmin advancement during fast forward
 decoding

Currently, fast-forward decoding could prematurely advance catalog_xmin,
resulting in required catalog data being removed by vacuum.

Commit f49a80c4 fixed a similar issue where the logical slot's catalog_xmin was
advanced prematurely during non-fast-forward mode, which involved using the
minimum transaction ID remaining visible in the base snapshot when determining
candidates for catalog_xmin. However, this fix did not apply to fast-forward
mode, as a base snapshot is not built when decoding changes in this mode.
Consequently, catalog_xmin was directly advanced to the oldest running
transaction ID found in the latest running_xacts record.

To resolve this, the commit allows the base snapshot to be built during fast
forward decoding.
---
 .../test_decoding/expected/oldest_xmin.out    | 41 +++++++++++++++++++
 contrib/test_decoding/specs/oldest_xmin.spec  |  5 +++
 src/backend/replication/logical/decode.c      | 37 +++++++++++------
 3 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/contrib/test_decoding/expected/oldest_xmin.out b/contrib/test_decoding/expected/oldest_xmin.out
index dd6053f9c1f..57268b38d33 100644
--- a/contrib/test_decoding/expected/oldest_xmin.out
+++ b/contrib/test_decoding/expected/oldest_xmin.out
@@ -38,3 +38,44 @@ COMMIT
 stop    
 (1 row)
 
+
+starting permutation: s0_begin s0_getxid s1_begin s1_insert s0_alter s0_commit s0_checkpoint s0_advance_slot s0_advance_slot s1_commit s0_vacuum s0_get_changes
+step s0_begin: BEGIN;
+step s0_getxid: SELECT pg_current_xact_id() IS NULL;
+?column?
+--------
+f       
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_insert: INSERT INTO harvest VALUES ((1, 2, 3));
+step s0_alter: ALTER TYPE basket DROP ATTRIBUTE mangos;
+step s0_commit: COMMIT;
+step s0_checkpoint: CHECKPOINT;
+step s0_advance_slot: SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn());
+slot_name     
+--------------
+isolation_slot
+(1 row)
+
+step s0_advance_slot: SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn());
+slot_name     
+--------------
+isolation_slot
+(1 row)
+
+step s1_commit: COMMIT;
+step s0_vacuum: VACUUM pg_attribute;
+step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data                                                  
+------------------------------------------------------
+BEGIN                                                 
+table public.harvest: INSERT: fruits[basket]:'(1,2,3)'
+COMMIT                                                
+(3 rows)
+
+?column?
+--------
+stop    
+(1 row)
+
diff --git a/contrib/test_decoding/specs/oldest_xmin.spec b/contrib/test_decoding/specs/oldest_xmin.spec
index 88bd30f5ff7..7f2fe3d7ed7 100644
--- a/contrib/test_decoding/specs/oldest_xmin.spec
+++ b/contrib/test_decoding/specs/oldest_xmin.spec
@@ -25,6 +25,7 @@ step "s0_commit" { COMMIT; }
 step "s0_checkpoint" { CHECKPOINT; }
 step "s0_vacuum" { VACUUM pg_attribute; }
 step "s0_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+step "s0_advance_slot" { SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn()); }
 
 session "s1"
 setup { SET synchronous_commit=on; }
@@ -40,3 +41,7 @@ step "s1_commit" { COMMIT; }
 # will be removed (xmax set) before T1 commits. That is, interlocking doesn't
 # forbid modifying catalog after someone read it (and didn't commit yet).
 permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_get_changes" "s0_get_changes" "s1_commit" "s0_vacuum" "s0_get_changes"
+
+# Perform the same testing process as described above, but use advance_slot to
+# forces xmin advancement during fast forward decoding.
+permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_advance_slot" "s0_advance_slot" "s1_commit" "s0_vacuum" "s0_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 78f9a0a11c4..cc03f0706e9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -412,19 +412,24 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding changes.
+	 * point in decoding data changes. However, it's crucial to build the base
+	 * snapshot during fast-forward mode (as is done in
+	 * SnapBuildProcessChange()) because we require the snapshot's xmin when
+	 * determining the candidate catalog_xmin for the replication slot. See
+	 * SnapBuildProcessRunningXacts().
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	switch (info)
 	{
 		case XLOG_HEAP2_MULTI_INSERT:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeMultiInsert(ctx, buf);
 			break;
 		case XLOG_HEAP2_NEW_CID:
+			if (!ctx->fast_forward)
 			{
 				xl_heap_new_cid *xlrec;
 
@@ -471,16 +476,20 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding data changes.
+	 * point in decoding data changes. However, it's crucial to build the base
+	 * snapshot during fast-forward mode (as is done in
+	 * SnapBuildProcessChange()) because we require the snapshot's xmin when
+	 * determining the candidate catalog_xmin for the replication slot. See
+	 * SnapBuildProcessRunningXacts().
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	switch (info)
 	{
 		case XLOG_HEAP_INSERT:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeInsert(ctx, buf);
 			break;
 
@@ -491,17 +500,20 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			 */
 		case XLOG_HEAP_HOT_UPDATE:
 		case XLOG_HEAP_UPDATE:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeUpdate(ctx, buf);
 			break;
 
 		case XLOG_HEAP_DELETE:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeDelete(ctx, buf);
 			break;
 
 		case XLOG_HEAP_TRUNCATE:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeTruncate(ctx, buf);
 			break;
 
@@ -525,7 +537,8 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			break;
 
 		case XLOG_HEAP_CONFIRM:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeSpecConfirm(ctx, buf);
 			break;
 
-- 
2.30.0.windows.2

v2-0001-PG13-16-Fix-premature-xmin-advancement-during-fast-forwar.patch.txttext/plain; name=v2-0001-PG13-16-Fix-premature-xmin-advancement-during-fast-forwar.patch.txtDownload

From 16ae28e3b3b4c10bd82ad679194325eb56aa3965 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Sun, 27 Apr 2025 14:53:35 +0800
Subject: [PATCH v2] Fix premature xmin advancement during fast forward
 decoding

Currently, fast-forward decoding could prematurely advance catalog_xmin,
resulting in required catalog data being removed by vacuum.

Commit f49a80c4 fixed a similar issue where the logical slot's catalog_xmin was
advanced prematurely during non-fast-forward mode, which involved using the
minimum transaction ID remaining visible in the base snapshot when determining
candidates for catalog_xmin. However, this fix did not apply to fast-forward
mode, as a base snapshot is not built when decoding changes in this mode.
Consequently, catalog_xmin was directly advanced to the oldest running
transaction ID found in the latest running_xacts record.

To resolve this, the commit allows the base snapshot to be built during fast
forward decoding.
---
 .../test_decoding/expected/oldest_xmin.out    | 41 +++++++++++++++++++
 contrib/test_decoding/specs/oldest_xmin.spec  |  5 +++
 src/backend/replication/logical/decode.c      | 38 +++++++++++------
 3 files changed, 71 insertions(+), 13 deletions(-)

diff --git a/contrib/test_decoding/expected/oldest_xmin.out b/contrib/test_decoding/expected/oldest_xmin.out
index dd6053f9c1f..57268b38d33 100644
--- a/contrib/test_decoding/expected/oldest_xmin.out
+++ b/contrib/test_decoding/expected/oldest_xmin.out
@@ -38,3 +38,44 @@ COMMIT
 stop    
 (1 row)
 
+
+starting permutation: s0_begin s0_getxid s1_begin s1_insert s0_alter s0_commit s0_checkpoint s0_advance_slot s0_advance_slot s1_commit s0_vacuum s0_get_changes
+step s0_begin: BEGIN;
+step s0_getxid: SELECT pg_current_xact_id() IS NULL;
+?column?
+--------
+f       
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_insert: INSERT INTO harvest VALUES ((1, 2, 3));
+step s0_alter: ALTER TYPE basket DROP ATTRIBUTE mangos;
+step s0_commit: COMMIT;
+step s0_checkpoint: CHECKPOINT;
+step s0_advance_slot: SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn());
+slot_name     
+--------------
+isolation_slot
+(1 row)
+
+step s0_advance_slot: SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn());
+slot_name     
+--------------
+isolation_slot
+(1 row)
+
+step s1_commit: COMMIT;
+step s0_vacuum: VACUUM pg_attribute;
+step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data                                                  
+------------------------------------------------------
+BEGIN                                                 
+table public.harvest: INSERT: fruits[basket]:'(1,2,3)'
+COMMIT                                                
+(3 rows)
+
+?column?
+--------
+stop    
+(1 row)
+
diff --git a/contrib/test_decoding/specs/oldest_xmin.spec b/contrib/test_decoding/specs/oldest_xmin.spec
index 88bd30f5ff7..7f2fe3d7ed7 100644
--- a/contrib/test_decoding/specs/oldest_xmin.spec
+++ b/contrib/test_decoding/specs/oldest_xmin.spec
@@ -25,6 +25,7 @@ step "s0_commit" { COMMIT; }
 step "s0_checkpoint" { CHECKPOINT; }
 step "s0_vacuum" { VACUUM pg_attribute; }
 step "s0_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+step "s0_advance_slot" { SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn()); }
 
 session "s1"
 setup { SET synchronous_commit=on; }
@@ -40,3 +41,7 @@ step "s1_commit" { COMMIT; }
 # will be removed (xmax set) before T1 commits. That is, interlocking doesn't
 # forbid modifying catalog after someone read it (and didn't commit yet).
 permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_get_changes" "s0_get_changes" "s1_commit" "s0_vacuum" "s0_get_changes"
+
+# Perform the same testing process as described above, but use advance_slot to
+# forces xmin advancement during fast forward decoding.
+permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_advance_slot" "s0_advance_slot" "s1_commit" "s0_vacuum" "s0_get_changes"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 297eb11b5a8..9c2f1f17c7f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -362,20 +362,24 @@ DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding changes.
+	 * point in decoding data changes. However, it's crucial to build the base
+	 * snapshot during fast-forward mode (as is done in
+	 * SnapBuildProcessChange()) because we require the snapshot's xmin when
+	 * determining the candidate catalog_xmin for the replication slot. See
+	 * SnapBuildProcessRunningXacts().
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	switch (info)
 	{
 		case XLOG_HEAP2_MULTI_INSERT:
-			if (!ctx->fast_forward &&
-				SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeMultiInsert(ctx, buf);
 			break;
 		case XLOG_HEAP2_NEW_CID:
+			if (!ctx->fast_forward)
 			{
 				xl_heap_new_cid *xlrec;
 
@@ -422,16 +426,20 @@ DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding data changes.
+	 * point in decoding data changes. However, it's crucial to build the base
+	 * snapshot during fast-forward mode (as is done in
+	 * SnapBuildProcessChange()) because we require the snapshot's xmin when
+	 * determining the candidate catalog_xmin for the replication slot. See
+	 * SnapBuildProcessRunningXacts().
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
 		return;
 
 	switch (info)
 	{
 		case XLOG_HEAP_INSERT:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeInsert(ctx, buf);
 			break;
 
@@ -442,17 +450,20 @@ DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			 */
 		case XLOG_HEAP_HOT_UPDATE:
 		case XLOG_HEAP_UPDATE:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeUpdate(ctx, buf);
 			break;
 
 		case XLOG_HEAP_DELETE:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeDelete(ctx, buf);
 			break;
 
 		case XLOG_HEAP_TRUNCATE:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeTruncate(ctx, buf);
 			break;
 
@@ -480,7 +491,8 @@ DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			break;
 
 		case XLOG_HEAP_CONFIRM:
-			if (SnapBuildProcessChange(builder, xid, buf->origptr))
+			if (SnapBuildProcessChange(builder, xid, buf->origptr) &&
+				!ctx->fast_forward)
 				DecodeSpecConfirm(ctx, buf);
 			break;
 
-- 
2.30.0.windows.2

#14

Amit Kapila

amit.kapila16@gmail.com

9 months ago

In reply to: Zhijie Hou (Fujitsu) (#13)

Re: Fix premature xmin advancement during fast forward decoding

On Sun, Apr 27, 2025 at 12:33 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

I also adjusted the commit message to mention the commit f49a80c4
as suggested by Amit.

Pushed.

--
With Regards,
Amit Kapila.

#15

Masahiko Sawada

sawada.mshk@gmail.com

9 months ago

In reply to: Amit Kapila (#14)

Re: Fix premature xmin advancement during fast forward decoding

On Mon, Apr 28, 2025 at 2:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Apr 27, 2025 at 12:33 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

I also adjusted the commit message to mention the commit f49a80c4
as suggested by Amit.

Pushed.

Thank you!

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com