Fix slot synchronization with two_phase decoding enabled

Started by Zhijie Hou (Fujitsu)10 months ago100 messages
#1Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
3 attachment(s)

Hi,

When testing the slot synchronization with logical replication slots that
enabled two_phase decoding, I found that transactions prepared before two-phase
decoding is enabled may fail to replicate to the subscriber after being
committed on a promoted standby following a failover.

To reproduce this issue, please follow these steps (also detailed in the
attached TAP test, v1-0001):

1. sub: create a subscription with (two_phase = false)
2. primary (pub): prepare a txn A.
3. sub: alter subscription set (two_phase = true) and wait for the logical slot to
be synced to standby.
4. primary (pub): stop primary, promote the standby and let the subscriber use
the promoted standby as publisher.
5. promoted standby (pub): COMMIT PREPARED A;
6. sub: the apply worker will report the following ERROR because it didn't
receive the PREPARE.
ERROR: prepared transaction with identifier "pg_gid_16387_752" does not exist

I think the root cause of this issue is that the two_phase_at field of the
slot, which indicates the LSN from which two-phase decoding is enabled (used to
prevent duplicate data transmission for prepared transactions), is not
synchronized to the standby server.

In step 3, transaction A is not immediately replicated because it occurred
before enabling two-phase decoding. Thus, the prepared transaction should only
be replicated after decoding the final COMMIT PREPARED, as referenced in
ReorderBufferFinishPrepared(). However, due to the invalid two_phase_at on the
standby, the prepared transaction fails to send at that time.

This problem arises after the support for altering the two-phase option
(1462aad). Previously, two-phase was only enabled during slot creation, which
wait for all prepared transactions to finish (via ... -> SnapBuildWaitSnapshot)
before reaching a consistent state, so the bug didn't exist.

To address the issue, I propose synchronizing the two_phase_at field to the
standby server, as implemented in the attached patches. As mentioned
earlier,this bug exists only for PG18 so we do not need to back patch.

v1-0001: Tap test to reproduce the issue

I place this patch as the first one so that reviewers can run it
independently to reproduce the issue. Once the problem is thoroughly
understood and the fix proves stable, the patches can be integrated.

v1-0002: Display two_phase_at in the pg_replication_slots view
v1-0003: Sync the two_phase_at field of a replication slot to the standby

An alternative approach might be modifying ALTER_REPLICATION_SLOT to wait
for all prepared transactions to commit when enabling two-phase. However, this
appears inelegant and less user friendly.

Best Regards,
Hou zj

Attachments:

v1-0001-Tap-test-to-reproduce-the-issue.patchapplication/octet-stream; name=v1-0001-Tap-test-to-reproduce-the-issue.patchDownload
From 81f47cfcf146dd2a99c3d6b2e1f74de3dcdc5b85 Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Thu, 27 Feb 2025 10:49:32 +0800
Subject: [PATCH v1 1/3] Tap test to reproduce the issue

This patch set fixes a bug for slot synchronization with logical replication
slots that enabled two_phase decoding. As it stands, transactions prepared
before two-phase decoding is enabled may fail to replicate to the subscriber
after being committed on a promoted standby following a failover.

The issue arises because the two_phase_at field of a slot, which tracks the LSN
from which two-phase decoding starts, is not synchronized to standby servers.
Without this field, the logical decoding might incorrectly identify prepared
transaction as already replicated to the subscriber, causing them to be
skipped.

To address the issue, this patch set makes the two_phase_at field of the slot
visible in the pg_replication_slots view and enables the slot synchronization
to copy this value to the corresponding synced slot on the standby server.
---
 .../t/040_standby_failover_slots_sync.pl      | 68 ++++++++++++++++++-
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 50388a494d6..692b746ccd9 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -22,7 +22,11 @@ $publisher->init(
 # Disable autovacuum to avoid generating xid during stats update as otherwise
 # the new XID could then be replicated to standby at some random point making
 # slots at primary lag behind standby during slot sync.
-$publisher->append_conf('postgresql.conf', 'autovacuum = off');
+$publisher->append_conf(
+	'postgresql.conf', qq{
+autovacuum = off
+max_prepared_transactions = 1
+});
 $publisher->start;
 
 $publisher->safe_psql('postgres',
@@ -33,6 +37,7 @@ my $publisher_connstr = $publisher->connstr . ' dbname=postgres';
 # Create a subscriber node, wait for sync to complete
 my $subscriber1 = PostgreSQL::Test::Cluster->new('subscriber1');
 $subscriber1->init;
+$subscriber1->append_conf('postgresql.conf', 'max_prepared_transactions = 1');
 $subscriber1->start;
 
 # Capture the time before the logical failover slot is created on the
@@ -830,13 +835,63 @@ $primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots',
 	"'sb1_slot'");
 $primary->reload;
 
+##################################################
+# Test the synchronization of the two_phase setting for a subscription with the
+# standby. Additionally, prepare a transaction before enabling the two_phase
+# option; subsequent tests will verify if it can be correctly replicated to the
+# subscriber after committing it on the promoted standby.
+##################################################
+
+$standby1->start;
+
+# Prepare a transaction
+$primary->safe_psql(
+	'postgres', qq[
+	BEGIN;
+	INSERT INTO tab_int values(0);
+	PREPARE TRANSACTION 'test_twophase_slotsync';
+]);
+
+$primary->wait_for_replay_catchup($standby1);
+$primary->wait_for_catchup('regress_mysub1');
+
+# Disable the subscription to allow changing the two_phase option.
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub1 DISABLE");
+
+# Wait for the replication slot to become inactive on the publisher
+$primary->poll_query_until(
+	'postgres',
+	"SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'lsub1_slot' AND active='f'",
+	1);
+
+# Set two_phase to true and enable the subscription
+$subscriber1->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_mysub1 SET (two_phase = true);
+	ALTER SUBSCRIPTION regress_mysub1 ENABLE;
+]);
+
+# Confirm that two_phase setting of lsub1_slot slot is synced to the standby
+ok( $standby1->poll_query_until(
+		'postgres',
+		"SELECT two_phase from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;"
+	),
+	'two_phase setting of slot lsub1_slot synced to standby');
+
+# Confirm that the prepared transaction is not yet replicated to the
+# subscriber.
+$result = $subscriber1->safe_psql('postgres',
+	"SELECT count(*) = 0 FROM pg_prepared_xacts;");
+is($result, 't',
+	"the prepared transaction is not replicated to the subscriber");
+
 ##################################################
 # Promote the standby1 to primary. Confirm that:
 # a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
 # b) logical replication for regress_mysub1 is resumed successfully after failover
 # c) changes can be consumed from the synced slot 'snap_test_slot'
 ##################################################
-$standby1->start;
 $primary->wait_for_replay_catchup($standby1);
 
 # Capture the time before the standby is promoted
@@ -885,6 +940,15 @@ $standby1->wait_for_catchup('regress_mysub1');
 is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
 	"20", 'data replicated from the new primary');
 
+# Commit the prepared transaction
+$standby1->safe_psql('postgres',
+	"COMMIT PREPARED 'test_twophase_slotsync';");
+$standby1->wait_for_catchup('regress_mysub1');
+
+# Confirm that the prepared transaction is replicated to the subscriber
+is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
+	"21", 'prepared data replicated from the new primary');
+
 # Consume the data from the snap_test_slot. The synced slot should reach a
 # consistent point by restoring the snapshot at the restart_lsn serialized
 # during slot synchronization.
-- 
2.30.0.windows.2

v1-0003-Sync-the-two_phase_at-field-of-a-replication-slot.patchapplication/octet-stream; name=v1-0003-Sync-the-two_phase_at-field-of-a-replication-slot.patchDownload
From 4eb7cfcca3e93ba8caba37eb1c3415ef9a764797 Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Thu, 27 Feb 2025 13:32:35 +0800
Subject: [PATCH v1 3/3] Sync the two_phase_at field of a replication slot to
 the standby

This patch enables slot synchronization to copy this value of two_phase_at field
to the corresponding synced slot on the standby server.
---
 src/backend/replication/logical/slotsync.c        | 15 +++++++++++----
 .../recovery/t/040_standby_failover_slots_sync.pl |  8 +++++++-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 2c0a7439be4..a2bf89d1ba6 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -139,6 +139,7 @@ typedef struct RemoteSlot
 	bool		failover;
 	XLogRecPtr	restart_lsn;
 	XLogRecPtr	confirmed_lsn;
+	XLogRecPtr	two_phase_at;
 	TransactionId catalog_xmin;
 
 	/* RS_INVAL_NONE if valid, or the reason of invalidation */
@@ -249,6 +250,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 			SpinLockAcquire(&slot->mutex);
 			slot->data.restart_lsn = remote_slot->restart_lsn;
 			slot->data.confirmed_flush = remote_slot->confirmed_lsn;
+			slot->data.two_phase_at = remote_slot->two_phase_at;
 			slot->data.catalog_xmin = remote_slot->catalog_xmin;
 			SpinLockRelease(&slot->mutex);
 
@@ -276,7 +278,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 	if (remote_dbid != slot->data.database ||
 		remote_slot->two_phase != slot->data.two_phase ||
 		remote_slot->failover != slot->data.failover ||
-		strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) != 0)
+		strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) != 0 ||
+		remote_slot->two_phase_at != slot->data.two_phase_at)
 	{
 		NameData	plugin_name;
 
@@ -287,6 +290,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		slot->data.plugin = plugin_name;
 		slot->data.database = remote_dbid;
 		slot->data.two_phase = remote_slot->two_phase;
+		slot->data.two_phase_at = remote_slot->two_phase_at;
 		slot->data.failover = remote_slot->failover;
 		SpinLockRelease(&slot->mutex);
 
@@ -788,9 +792,9 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 static bool
 synchronize_slots(WalReceiverConn *wrconn)
 {
-#define SLOTSYNC_COLUMN_COUNT 9
+#define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
-	LSNOID, XIDOID, BOOLOID, BOOLOID, TEXTOID, TEXTOID};
+	LSNOID, XIDOID, BOOLOID, LSNOID, BOOLOID, TEXTOID, TEXTOID};
 
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
@@ -798,7 +802,7 @@ synchronize_slots(WalReceiverConn *wrconn)
 	bool		some_slot_updated = false;
 	bool		started_tx = false;
 	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, failover,"
+		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
 		" database, invalidation_reason"
 		" FROM pg_catalog.pg_replication_slots"
 		" WHERE failover and NOT temporary";
@@ -853,6 +857,9 @@ synchronize_slots(WalReceiverConn *wrconn)
 														   &isnull));
 		Assert(!isnull);
 
+		d = slot_getattr(tupslot, ++col, &isnull);
+		remote_slot->two_phase_at = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
+
 		remote_slot->failover = DatumGetBool(slot_getattr(tupslot, ++col,
 														  &isnull));
 		Assert(!isnull);
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 692b746ccd9..160e349b967 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -872,10 +872,16 @@ $subscriber1->safe_psql(
 	ALTER SUBSCRIPTION regress_mysub1 ENABLE;
 ]);
 
+$primary->wait_for_catchup('regress_mysub1');
+
+my $two_phase_at = $primary->safe_psql('postgres',
+	"SELECT two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot';"
+);
+
 # Confirm that two_phase setting of lsub1_slot slot is synced to the standby
 ok( $standby1->poll_query_until(
 		'postgres',
-		"SELECT two_phase from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;"
+		"SELECT two_phase AND '$two_phase_at' = two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;"
 	),
 	'two_phase setting of slot lsub1_slot synced to standby');
 
-- 
2.30.0.windows.2

v1-0002-Display-two_phase_at-in-the-pg_replication_slots-.patchapplication/octet-stream; name=v1-0002-Display-two_phase_at-in-the-pg_replication_slots-.patchDownload
From 0028f7ef8a731f423476b2e92e2723912a43f8ba Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Fri, 28 Feb 2025 13:09:36 +0800
Subject: [PATCH v1 2/3] Display two_phase_at in the pg_replication_slots view

This patch makes the two_phase_at field of the slot visible in the
pg_replication_slots view.
---
 doc/src/sgml/system-views.sgml       | 11 +++++++++++
 src/backend/catalog/system_views.sql |  1 +
 src/backend/replication/slotfuncs.c  |  8 +++++++-
 src/include/catalog/pg_proc.dat      |  6 +++---
 src/test/regress/expected/rules.out  |  3 ++-
 5 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 3f5a306247e..141c140331d 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2560,6 +2560,17 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase_at</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       The address (<literal>LSN</literal>) from which the decoding of prepared
+       transactions is enabled. Always <literal>NULL</literal> for physical
+       slots.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>inactive_since</structfield> <type>timestamptz</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a4d2cfdcaf5..9cd6bac6e1d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1025,6 +1025,7 @@ CREATE VIEW pg_replication_slots AS
             L.wal_status,
             L.safe_wal_size,
             L.two_phase,
+            L.two_phase_at,
             L.inactive_since,
             L.conflicting,
             L.invalidation_reason,
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 146eef5871e..8a314b5ff3b 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -235,7 +235,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 19
+#define PG_GET_REPLICATION_SLOTS_COLS 20
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	XLogRecPtr	currlsn;
 	int			slotno;
@@ -406,6 +406,12 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 		values[i++] = BoolGetDatum(slot_contents.data.two_phase);
 
+		if (slot_contents.data.two_phase &&
+			!XLogRecPtrIsInvalid(slot_contents.data.two_phase_at))
+			values[i++] = LSNGetDatum(slot_contents.data.two_phase_at);
+		else
+			nulls[i++] = true;
+
 		if (slot_contents.inactive_since > 0)
 			values[i++] = TimestampTzGetDatum(slot_contents.inactive_since);
 		else
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bac..3c483a067aa 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11366,9 +11366,9 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,timestamptz,bool,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,inactive_since,conflicting,invalidation_reason,failover,synced}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 62f69ac20b2..7803318bee5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1474,12 +1474,13 @@ pg_replication_slots| SELECT l.slot_name,
     l.wal_status,
     l.safe_wal_size,
     l.two_phase,
+    l.two_phase_at,
     l.inactive_since,
     l.conflicting,
     l.invalidation_reason,
     l.failover,
     l.synced
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, inactive_since, conflicting, invalidation_reason, failover, synced)
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
2.30.0.windows.2

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#1)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When testing the slot synchronization with logical replication slots that
enabled two_phase decoding, I found that transactions prepared before two-phase
decoding is enabled may fail to replicate to the subscriber after being
committed on a promoted standby following a failover.

To reproduce this issue, please follow these steps (also detailed in the
attached TAP test, v1-0001):

1. sub: create a subscription with (two_phase = false)
2. primary (pub): prepare a txn A.
3. sub: alter subscription set (two_phase = true) and wait for the logical slot to
be synced to standby.
4. primary (pub): stop primary, promote the standby and let the subscriber use
the promoted standby as publisher.
5. promoted standby (pub): COMMIT PREPARED A;
6. sub: the apply worker will report the following ERROR because it didn't
receive the PREPARE.
ERROR: prepared transaction with identifier "pg_gid_16387_752" does not exist

I think the root cause of this issue is that the two_phase_at field of the
slot, which indicates the LSN from which two-phase decoding is enabled (used to
prevent duplicate data transmission for prepared transactions), is not
synchronized to the standby server.

In step 3, transaction A is not immediately replicated because it occurred
before enabling two-phase decoding. Thus, the prepared transaction should only
be replicated after decoding the final COMMIT PREPARED, as referenced in
ReorderBufferFinishPrepared(). However, due to the invalid two_phase_at on the
standby, the prepared transaction fails to send at that time.

This problem arises after the support for altering the two-phase option
(1462aad).

Thanks for the report and patch. I'll look into it.

--
With Regards,
Amit Kapila.

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#2)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Mar 25, 2025 at 12:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When testing the slot synchronization with logical replication slots that
enabled two_phase decoding, I found that transactions prepared before two-phase
decoding is enabled may fail to replicate to the subscriber after being
committed on a promoted standby following a failover.

To reproduce this issue, please follow these steps (also detailed in the
attached TAP test, v1-0001):

1. sub: create a subscription with (two_phase = false)
2. primary (pub): prepare a txn A.
3. sub: alter subscription set (two_phase = true) and wait for the logical slot to
be synced to standby.
4. primary (pub): stop primary, promote the standby and let the subscriber use
the promoted standby as publisher.
5. promoted standby (pub): COMMIT PREPARED A;
6. sub: the apply worker will report the following ERROR because it didn't
receive the PREPARE.
ERROR: prepared transaction with identifier "pg_gid_16387_752" does not exist

I think the root cause of this issue is that the two_phase_at field of the
slot, which indicates the LSN from which two-phase decoding is enabled (used to
prevent duplicate data transmission for prepared transactions), is not
synchronized to the standby server.

In step 3, transaction A is not immediately replicated because it occurred
before enabling two-phase decoding. Thus, the prepared transaction should only
be replicated after decoding the final COMMIT PREPARED, as referenced in
ReorderBufferFinishPrepared(). However, due to the invalid two_phase_at on the
standby, the prepared transaction fails to send at that time.

This problem arises after the support for altering the two-phase option
(1462aad).

I suspect that this can happen in PG17 as well, but I need to think
more about it to make a reproducible test case.

In the meantime, I have a few minor comments on the proposed patches:
1.
##################################################
# Promote the standby1 to primary. Confirm that:
# a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
# b) logical replication for regress_mysub1 is resumed successfully
after failover
# c) changes can be consumed from the synced slot 'snap_test_slot'
##################################################
-$standby1->start;
$primary->wait_for_replay_catchup($standby1);

# Capture the time before the standby is promoted
@@ -885,6 +940,15 @@ $standby1->wait_for_catchup('regress_mysub1');
is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
"20", 'data replicated from the new primary');

+# Commit the prepared transaction
+$standby1->safe_psql('postgres',
+ "COMMIT PREPARED 'test_twophase_slotsync';");
+$standby1->wait_for_catchup('regress_mysub1');
+
+# Confirm that the prepared transaction is replicated to the subscriber
+is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
+ "21", 'prepared data replicated from the new primary');

The commentary above this test should include information about
verifying the replication of previously prepared transactions after
promotion. Also, it would be better if confirm the commit prepared
before confirming the new Insert is replicated after promotion.

2.
@@ -249,6 +250,7 @@ update_local_synced_slot(RemoteSlot *remote_slot,
Oid remote_dbid,
  SpinLockAcquire(&slot->mutex);
  slot->data.restart_lsn = remote_slot->restart_lsn;
  slot->data.confirmed_flush = remote_slot->confirmed_lsn;
+ slot->data.two_phase_at = remote_slot->two_phase_at;

Why do we need to update the two_phase_at here when the patch does it
later in this function when local and remote values don't match?

--
With Regards,
Amit Kapila.

#4Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#3)
2 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

On Thu, Mar 27, 2025 at 2:29 PM Amit Kapila wrote:

On Tue, Mar 25, 2025 at 12:1 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When testing the slot synchronization with logical replication slots that
enabled two_phase decoding, I found that transactions prepared before

two-phase

decoding is enabled may fail to replicate to the subscriber after being
committed on a promoted standby following a failover.

To reproduce this issue, please follow these steps (also detailed in the
attached TAP test, v1-0001):

1. sub: create a subscription with (two_phase = false)
2. primary (pub): prepare a txn A.
3. sub: alter subscription set (two_phase = true) and wait for the logical

slot to

be synced to standby.
4. primary (pub): stop primary, promote the standby and let the subscriber

use

the promoted standby as publisher.
5. promoted standby (pub): COMMIT PREPARED A;
6. sub: the apply worker will report the following ERROR because it didn't
receive the PREPARE.
ERROR: prepared transaction with identifier "pg_gid_16387_752"

does not exist

I think the root cause of this issue is that the two_phase_at field of the
slot, which indicates the LSN from which two-phase decoding is enabled

(used to

prevent duplicate data transmission for prepared transactions), is not
synchronized to the standby server.

In step 3, transaction A is not immediately replicated because it occurred
before enabling two-phase decoding. Thus, the prepared transaction

should only

be replicated after decoding the final COMMIT PREPARED, as referenced

in

ReorderBufferFinishPrepared(). However, due to the invalid two_phase_at

on the

standby, the prepared transaction fails to send at that time.

This problem arises after the support for altering the two-phase option
(1462aad).

I suspect that this can happen in PG17 as well, but I need to think
more about it to make a reproducible test case.

After further analysis, I was able to reproduce the same issue [1]- pub: created a slot 'sub' with two_phase=false, then prepared a transaction - pub: after some activity, advanced the confirmed_flush_lsn of 'sub', so it is greater than prepared txn lsn. - sub: create subscription with (slot_name='sub', create_slot=false, failover = true, two_phase=true, copy_data=false); two_phase_at will be set to the same as confirmed_flush_lsn which is greater than the prepared transaction. - stop the primary and promote the standby. - commit the prepared transaction on standby, the following error will be reported on subscriber: in
PG 17.

However, since the proposed fix requires catalog changes and the issue is not a
security risk significant enough to justify changing the catalog in back
branches, we cannot back-patch the same solution. Following off-list
discussions with Amit and Kuroda-san, we are considering disallowing enabling
failover and two-phase decoding together for a replication slot, as suggested
in attachment 0002.

Another idea considered is to prevent the slot that enables two-phase decoding
from being synced to standby. IOW, this means displaying the failover field as
false in the view, if there is any possibility that transactions prepared
before the two_phase_at position exist (e.g., if restart_lsn is less than
two_phase_at). However, implementing this change would require additional
explanations to users for this new behavior, which seems tricky.

In the meantime, I have a few minor comments on the proposed patches:
1.
##################################################
# Promote the standby1 to primary. Confirm that:
# a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
# b) logical replication for regress_mysub1 is resumed successfully
after failover
# c) changes can be consumed from the synced slot 'snap_test_slot'
##################################################
-$standby1->start;
$primary->wait_for_replay_catchup($standby1);

# Capture the time before the standby is promoted
@@ -885,6 +940,15 @@ $standby1->wait_for_catchup('regress_mysub1');
is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
"20", 'data replicated from the new primary');

+# Commit the prepared transaction
+$standby1->safe_psql('postgres',
+ "COMMIT PREPARED 'test_twophase_slotsync';");
+$standby1->wait_for_catchup('regress_mysub1');
+
+# Confirm that the prepared transaction is replicated to the subscriber
+is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
+ "21", 'prepared data replicated from the new primary');

The commentary above this test should include information about
verifying the replication of previously prepared transactions after
promotion. Also, it would be better if confirm the commit prepared
before confirming the new Insert is replicated after promotion.

2.
@@ -249,6 +250,7 @@ update_local_synced_slot(RemoteSlot *remote_slot,
Oid remote_dbid,
SpinLockAcquire(&slot->mutex);
slot->data.restart_lsn = remote_slot->restart_lsn;
slot->data.confirmed_flush = remote_slot->confirmed_lsn;
+ slot->data.two_phase_at = remote_slot->two_phase_at;

Why do we need to update the two_phase_at here when the patch does it
later in this function when local and remote values don't match?

Thanks for the comments, they have been addressed in V2.

[1]: - pub: created a slot 'sub' with two_phase=false, then prepared a transaction - pub: after some activity, advanced the confirmed_flush_lsn of 'sub', so it is greater than prepared txn lsn. - sub: create subscription with (slot_name='sub', create_slot=false, failover = true, two_phase=true, copy_data=false); two_phase_at will be set to the same as confirmed_flush_lsn which is greater than the prepared transaction. - stop the primary and promote the standby. - commit the prepared transaction on standby, the following error will be reported on subscriber:
- pub: created a slot 'sub' with two_phase=false, then prepared a transaction
- pub: after some activity, advanced the confirmed_flush_lsn of 'sub', so it is
greater than prepared txn lsn.
- sub: create subscription with (slot_name='sub', create_slot=false, failover =
true, two_phase=true, copy_data=false); two_phase_at will be set to the same
as confirmed_flush_lsn which is greater than the prepared transaction.
- stop the primary and promote the standby.
- commit the prepared transaction on standby, the following error will be
reported on subscriber:

LOG: logical replication apply worker for subscription "sub2" has started
ERROR: prepared transaction with identifier "pg_gid_16398_764" does not exist.

Best Regards,
Hou zj

Attachments:

v2-0001-Fix-slot-synchronization-with-two_phase-decoding-.patchapplication/octet-stream; name=v2-0001-Fix-slot-synchronization-with-two_phase-decoding-.patchDownload
From 22d48c240a3ecbce74f4b92d23b6d0b281bd6c32 Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Thu, 27 Feb 2025 10:49:32 +0800
Subject: [PATCH v2] Fix slot synchronization with two_phase decoding enabled

This commit fixes a bug for slot synchronization with logical replication
slots that enabled two_phase decoding. As it stands, transactions prepared
before two-phase decoding is enabled may fail to replicate to the subscriber
after being committed on a promoted standby following a failover.

The issue arises because the two_phase_at field of a slot, which tracks the LSN
from which two-phase decoding starts, is not synchronized to standby servers.
Without this field, the logical decoding might incorrectly identify prepared
transaction as already replicated to the subscriber, causing them to be
skipped.

To address the issue on HEAD, this commit makes the two_phase_at field of the slot
visible in the pg_replication_slots view and enables the slot synchronization
to copy this value to the corresponding synced slot on the standby server.

The bug has been present since the introduction of slot synchronization in
PostgreSQL 17. However, due to the need for catalog changes, backpatching this
fix is not feasible. Instead, to prevent the risk of losing prepared
transactions in prior versions, we now disallow enabling failover and two-phase
decoding together for a replication slot.

---
 doc/src/sgml/system-views.sgml                | 11 +++
 src/backend/catalog/system_views.sql          |  1 +
 src/backend/replication/logical/slotsync.c    | 14 +++-
 src/backend/replication/slotfuncs.c           |  8 +-
 src/include/catalog/pg_proc.dat               |  6 +-
 .../t/040_standby_failover_slots_sync.pl      | 81 ++++++++++++++++++-
 src/test/regress/expected/rules.out           |  3 +-
 7 files changed, 111 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 3f5a306247e..141c140331d 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2560,6 +2560,17 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase_at</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       The address (<literal>LSN</literal>) from which the decoding of prepared
+       transactions is enabled. Always <literal>NULL</literal> for physical
+       slots.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>inactive_since</structfield> <type>timestamptz</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31d269b7ee0..a8fddd0183c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1025,6 +1025,7 @@ CREATE VIEW pg_replication_slots AS
             L.wal_status,
             L.safe_wal_size,
             L.two_phase,
+            L.two_phase_at,
             L.inactive_since,
             L.conflicting,
             L.invalidation_reason,
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 2c0a7439be4..e22d41891e6 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -139,6 +139,7 @@ typedef struct RemoteSlot
 	bool		failover;
 	XLogRecPtr	restart_lsn;
 	XLogRecPtr	confirmed_lsn;
+	XLogRecPtr	two_phase_at;
 	TransactionId catalog_xmin;
 
 	/* RS_INVAL_NONE if valid, or the reason of invalidation */
@@ -276,7 +277,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 	if (remote_dbid != slot->data.database ||
 		remote_slot->two_phase != slot->data.two_phase ||
 		remote_slot->failover != slot->data.failover ||
-		strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) != 0)
+		strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) != 0 ||
+		remote_slot->two_phase_at != slot->data.two_phase_at)
 	{
 		NameData	plugin_name;
 
@@ -287,6 +289,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		slot->data.plugin = plugin_name;
 		slot->data.database = remote_dbid;
 		slot->data.two_phase = remote_slot->two_phase;
+		slot->data.two_phase_at = remote_slot->two_phase_at;
 		slot->data.failover = remote_slot->failover;
 		SpinLockRelease(&slot->mutex);
 
@@ -788,9 +791,9 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 static bool
 synchronize_slots(WalReceiverConn *wrconn)
 {
-#define SLOTSYNC_COLUMN_COUNT 9
+#define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
-	LSNOID, XIDOID, BOOLOID, BOOLOID, TEXTOID, TEXTOID};
+	LSNOID, XIDOID, BOOLOID, LSNOID, BOOLOID, TEXTOID, TEXTOID};
 
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
@@ -798,7 +801,7 @@ synchronize_slots(WalReceiverConn *wrconn)
 	bool		some_slot_updated = false;
 	bool		started_tx = false;
 	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, failover,"
+		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
 		" database, invalidation_reason"
 		" FROM pg_catalog.pg_replication_slots"
 		" WHERE failover and NOT temporary";
@@ -853,6 +856,9 @@ synchronize_slots(WalReceiverConn *wrconn)
 														   &isnull));
 		Assert(!isnull);
 
+		d = slot_getattr(tupslot, ++col, &isnull);
+		remote_slot->two_phase_at = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
+
 		remote_slot->failover = DatumGetBool(slot_getattr(tupslot, ++col,
 														  &isnull));
 		Assert(!isnull);
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 146eef5871e..8a314b5ff3b 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -235,7 +235,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 19
+#define PG_GET_REPLICATION_SLOTS_COLS 20
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	XLogRecPtr	currlsn;
 	int			slotno;
@@ -406,6 +406,12 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 		values[i++] = BoolGetDatum(slot_contents.data.two_phase);
 
+		if (slot_contents.data.two_phase &&
+			!XLogRecPtrIsInvalid(slot_contents.data.two_phase_at))
+			values[i++] = LSNGetDatum(slot_contents.data.two_phase_at);
+		else
+			nulls[i++] = true;
+
 		if (slot_contents.inactive_since > 0)
 			values[i++] = TimestampTzGetDatum(slot_contents.inactive_since);
 		else
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8b68b16d79d..a89488419a6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11409,9 +11409,9 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,timestamptz,bool,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,inactive_since,conflicting,invalidation_reason,failover,synced}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 8f65142909a..67cc6374565 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -22,7 +22,11 @@ $publisher->init(
 # Disable autovacuum to avoid generating xid during stats update as otherwise
 # the new XID could then be replicated to standby at some random point making
 # slots at primary lag behind standby during slot sync.
-$publisher->append_conf('postgresql.conf', 'autovacuum = off');
+$publisher->append_conf(
+	'postgresql.conf', qq{
+autovacuum = off
+max_prepared_transactions = 1
+});
 $publisher->start;
 
 $publisher->safe_psql('postgres',
@@ -33,6 +37,7 @@ my $publisher_connstr = $publisher->connstr . ' dbname=postgres';
 # Create a subscriber node, wait for sync to complete
 my $subscriber1 = PostgreSQL::Test::Cluster->new('subscriber1');
 $subscriber1->init;
+$subscriber1->append_conf('postgresql.conf', 'max_prepared_transactions = 1');
 $subscriber1->start;
 
 # Capture the time before the logical failover slot is created on the
@@ -830,13 +835,72 @@ $primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots',
 	"'sb1_slot'");
 $primary->reload;
 
+##################################################
+# Test the synchronization of the two_phase setting for a subscription with the
+# standby. Additionally, prepare a transaction before enabling the two_phase
+# option; subsequent tests will verify if it can be correctly replicated to the
+# subscriber after committing it on the promoted standby.
+##################################################
+
+$standby1->start;
+
+# Prepare a transaction
+$primary->safe_psql(
+	'postgres', qq[
+	BEGIN;
+	INSERT INTO tab_int values(0);
+	PREPARE TRANSACTION 'test_twophase_slotsync';
+]);
+
+$primary->wait_for_replay_catchup($standby1);
+$primary->wait_for_catchup('regress_mysub1');
+
+# Disable the subscription to allow changing the two_phase option.
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub1 DISABLE");
+
+# Wait for the replication slot to become inactive on the publisher
+$primary->poll_query_until(
+	'postgres',
+	"SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'lsub1_slot' AND active='f'",
+	1);
+
+# Set two_phase to true and enable the subscription
+$subscriber1->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_mysub1 SET (two_phase = true);
+	ALTER SUBSCRIPTION regress_mysub1 ENABLE;
+]);
+
+$primary->wait_for_catchup('regress_mysub1');
+
+my $two_phase_at = $primary->safe_psql('postgres',
+	"SELECT two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot';"
+);
+
+# Confirm that two_phase setting of lsub1_slot slot is synced to the standby
+ok( $standby1->poll_query_until(
+		'postgres',
+		"SELECT two_phase AND '$two_phase_at' = two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;"
+	),
+	'two_phase setting of slot lsub1_slot synced to standby');
+
+# Confirm that the prepared transaction is not yet replicated to the
+# subscriber.
+$result = $subscriber1->safe_psql('postgres',
+	"SELECT count(*) = 0 FROM pg_prepared_xacts;");
+is($result, 't',
+	"the prepared transaction is not replicated to the subscriber");
+
 ##################################################
 # Promote the standby1 to primary. Confirm that:
 # a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
 # b) logical replication for regress_mysub1 is resumed successfully after failover
-# c) changes can be consumed from the synced slot 'snap_test_slot'
+# c) changes from the transaction 'test_twophase_slotsync', which was prepared
+#    on the old primary, can be consumed from the synced slot 'snap_test_slot'
+#    once committed on the new primary.
+# d) changes can be consumed from the synced slot 'snap_test_slot'
 ##################################################
-$standby1->start;
 $primary->wait_for_replay_catchup($standby1);
 
 # Capture the time before the standby is promoted
@@ -876,6 +940,15 @@ is( $standby1->safe_psql(
 	't',
 	'synced slot retained on the new primary');
 
+# Commit the prepared transaction
+$standby1->safe_psql('postgres',
+	"COMMIT PREPARED 'test_twophase_slotsync';");
+$standby1->wait_for_catchup('regress_mysub1');
+
+# Confirm that the prepared transaction is replicated to the subscriber
+is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
+	"1", 'prepared data replicated from the new primary');
+
 # Insert data on the new primary
 $standby1->safe_psql('postgres',
 	"INSERT INTO tab_int SELECT generate_series(11, 20);");
@@ -883,7 +956,7 @@ $standby1->wait_for_catchup('regress_mysub1');
 
 # Confirm that data in tab_int replicated on the subscriber
 is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
-	"20", 'data replicated from the new primary');
+	"21", 'data replicated from the new primary');
 
 # Consume the data from the snap_test_slot. The synced slot should reach a
 # consistent point by restoring the snapshot at the restart_lsn serialized
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 47478969135..035769b4624 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1474,12 +1474,13 @@ pg_replication_slots| SELECT l.slot_name,
     l.wal_status,
     l.safe_wal_size,
     l.two_phase,
+    l.two_phase_at,
     l.inactive_since,
     l.conflicting,
     l.invalidation_reason,
     l.failover,
     l.synced
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, inactive_since, conflicting, invalidation_reason, failover, synced)
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
2.30.0.windows.2

v2-0001-PG17-Disallow-enabling-failover-for-a-replication-slot.patch.txttext/plain; name=v2-0001-PG17-Disallow-enabling-failover-for-a-replication-slot.patch.txtDownload
From 71c23ab9a87817eab41b3d31516ceddfff04ed66 Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Mon, 31 Mar 2025 16:28:57 +0800
Subject: [PATCH v2] Disallow enabling failover for a replication slot that
 enables two-phase decoding

This commit fixes a bug for slot synchronization with logical replication
slots that enabled two_phase decoding. As it stands, transactions prepared
before two-phase decoding is enabled may fail to replicate to the subscriber
after being committed on a promoted standby following a failover.

The issue arises because the two_phase_at field of a slot, which tracks the LSN
from which two-phase decoding starts, is not synchronized to standby servers.
Without this field, the logical decoding might incorrectly identify prepared
transaction as already replicated to the subscriber, causing them to be
skipped.

To address the issue on HEAD, this commit makes the two_phase_at field of the slot
visible in the pg_replication_slots view and enables the slot synchronization
to copy this value to the corresponding synced slot on the standby server.

The bug has been present since the introduction of slot synchronization in
PostgreSQL 17. However, due to the need for catalog changes, backpatching this
fix is not feasible. Instead, to prevent the risk of losing prepared
transactions in prior versions, we now disallow enabling failover and two-phase
decoding together for a replication slot.
---
 contrib/test_decoding/expected/slot.out    |  2 ++
 contrib/test_decoding/sql/slot.sql         |  1 +
 src/backend/commands/subscriptioncmds.c    | 11 +++++++++
 src/backend/replication/slot.c             | 27 ++++++++++++++++++++++
 src/test/regress/expected/subscription.out |  3 +++
 src/test/regress/sql/subscription.sql      |  4 ++++
 6 files changed, 48 insertions(+)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..347b9c11467 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable failover for a replication slot that enables two-phase decoding
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..8308ccaad5a 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,17 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable failover and two_phase option together.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for
+	 * detailed reasons.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover option when two_phase option is enabled"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 780d22afbca..5b349b3c3af 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,21 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable failover for slots that enable
+		 * two-phase decoding.
+		 *
+		 * This is because the two_phase_at field of a slot, which tracks the
+		 * LSN from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without this field, the logical decoding might
+		 * incorrectly identify prepared transaction as already replicated to
+		 * the subscriber, causing them to be skipped.
+		 */
+		if (two_phase)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable failover for a replication slot that enables two-phase decoding"));
 	}
 
 	/*
@@ -848,6 +863,18 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for slots that enable two-phase
+	 * decoding.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for
+	 * detailed reasons.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a replication slot that enables two-phase decoding"));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..ee3603b67ef 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,9 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable failover option when two_phase option is enabled
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.31.1

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#4)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Mar 31, 2025 at 5:04 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Mar 27, 2025 at 2:29 PM Amit Kapila wrote:

I suspect that this can happen in PG17 as well, but I need to think
more about it to make a reproducible test case.

After further analysis, I was able to reproduce the same issue [1] in
PG 17.

However, since the proposed fix requires catalog changes and the issue is not a
security risk significant enough to justify changing the catalog in back
branches, we cannot back-patch the same solution.

Agreed. In the past, as in commit b6e39ca92e, we have backported a
catalog-modifying commit, but that is for a CVE. Users need to follow
manual steps as explained in 9.6.4 release notes [1]https://www.postgresql.org/docs/9.6/release-9-6-4.html, which would be
cumbersome for them. This is not a security issue, so we shouldn't
backpatch a catalog modifying commit following past.

Following off-list
discussions with Amit and Kuroda-san, we are considering disallowing enabling
failover and two-phase decoding together for a replication slot, as suggested
in attachment 0002.

Another idea considered is to prevent the slot that enables two-phase decoding
from being synced to standby. IOW, this means displaying the failover field as
false in the view, if there is any possibility that transactions prepared
before the two_phase_at position exist (e.g., if restart_lsn is less than
two_phase_at). However, implementing this change would require additional
explanations to users for this new behavior, which seems tricky.

I find it tricky to explain to users. We need to say that sometimes
the slots won't be synced even if the failover is set to true. Users
can verify that by checking slot properties on the publisher. Also, on
the subscriber, the failover flag in the subscription may still be
true unless we do more engineering to make it false. So, I prefer to
simply disallow setting failover and two_phase together. We need to
recommend to users in release notes for 17 that they need to disable
failover for subscriptions where two_phase is enabled or re-create the
subscriptions with two_phase=false and failover=true. Users may not
like it, but I think it is better than getting a complaint that after
promotion of standby the data to subscriber is not getting replicated.

[1]: https://www.postgresql.org/docs/9.6/release-9-6-4.html

--
With Regards,
Amit Kapila.

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#4)
2 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Mar 31, 2025 at 5:04 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Thanks for the comments, they have been addressed in V2.

In the PG17 patch, we only have a check preventing failover and
two_phase in CreateSubscription. Don't we need a similar check for
AlterSubscription?

Apart from the above, I have a few suggestions for changes in docs,
error messages, and comments for both versions. See attached.

--
With Regards,
Amit Kapila.

Attachments:

v2-0001-Head-amit.1.patch.txttext/plain; charset=US-ASCII; name=v2-0001-Head-amit.1.patch.txtDownload
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 141c140331d..bff0d143ac5 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2566,7 +2566,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
       </para>
       <para>
        The address (<literal>LSN</literal>) from which the decoding of prepared
-       transactions is enabled. Always <literal>NULL</literal> for physical
+       transactions is enabled. <literal>NULL</literal> for physical
        slots.
       </para></entry>
      </row>
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 67cc6374565..19273a49914 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -896,9 +896,9 @@ is($result, 't',
 # Promote the standby1 to primary. Confirm that:
 # a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
 # b) logical replication for regress_mysub1 is resumed successfully after failover
-# c) changes from the transaction 'test_twophase_slotsync', which was prepared
-#    on the old primary, can be consumed from the synced slot 'snap_test_slot'
-#    once committed on the new primary.
+# c) changes from the transaction prepared 'test_twophase_slotsync' can be
+#    consumed from the synced slot 'snap_test_slot' once committed on the new
+#    primary.
 # d) changes can be consumed from the synced slot 'snap_test_slot'
 ##################################################
 $primary->wait_for_replay_catchup($standby1);
v2-0001-PG17-amit.1.patch.txttext/plain; charset=US-ASCII; name=v2-0001-PG17-amit.1.patch.txtDownload
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 8308ccaad5a..df7ab827f7d 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -657,7 +657,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 	if (opts.twophase && opts.failover)
 		ereport(ERROR,
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				errmsg("cannot enable failover option when two_phase option is enabled"));
+				errmsg("failover and two_phase are mutually exclusive options"));
 
 	/*
 	 * If built with appropriate switch, whine when regression-testing
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index c1bbd16254e..da510f60f9c 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -357,7 +357,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 		if (two_phase)
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					errmsg("cannot enable failover for a replication slot that enables two-phase decoding"));
+					errmsg("failover and two_phase are mutually exclusive options"));
 	}
 
 	/*
@@ -873,7 +873,7 @@ ReplicationSlotAlter(const char *name, bool failover)
 	if (failover && MyReplicationSlot->data.two_phase)
 		ereport(ERROR,
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				errmsg("cannot enable failover for a replication slot that enables two-phase decoding"));
+				errmsg("cannot enable failover for a two-phase enabled replication slot"));
 
 	if (MyReplicationSlot->data.failover != failover)
 	{
#7Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#6)
2 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 1, 2025 at 2:09 PM Amit Kapila wrote:

On Mon, Mar 31, 2025 at 5:0 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Thanks for the comments, they have been addressed in V2.

In the PG17 patch, we only have a check preventing failover and
two_phase in CreateSubscription. Don't we need a similar check for AlterSubscription?

Apart from the above, I have a few suggestions for changes in docs,
error messages, and comments for both versions. See attached.

Thanks for the comments.

Here is the V3 patch set which addressed all the comments.

I also added a testcase for ALTER SUB in 0002 as suggested by
Kuroda-san off-list.

Additionally, I found that a test failed in PG17 following this
patch because it involved creating a subscription with both failover and
two-phase enabled. Since that test was designed to test the upgrade of
replication slots and is not directly related to failover functionality, I
decided to disable the failover option for test case.

Best Regards,
Hou zj

Attachments:

v3-0001-Fix-slot-synchronization-with-two_phase-decoding-.patchapplication/octet-stream; name=v3-0001-Fix-slot-synchronization-with-two_phase-decoding-.patchDownload
From cace883386387f9c4702fe0457e288aa3d5b2559 Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Thu, 27 Feb 2025 10:49:32 +0800
Subject: [PATCH v3] Fix slot synchronization with two_phase decoding enabled

This commit fixes a bug for slot synchronization with logical replication
slots that enabled two_phase decoding. As it stands, transactions prepared
before two-phase decoding is enabled may fail to replicate to the subscriber
after being committed on a promoted standby following a failover.

The issue arises because the two_phase_at field of a slot, which tracks the LSN
from which two-phase decoding starts, is not synchronized to standby servers.
Without this field, the logical decoding might incorrectly identify prepared
transaction as already replicated to the subscriber, causing them to be
skipped.

To address the issue on HEAD, this commit makes the two_phase_at field of the slot
visible in the pg_replication_slots view and enables the slot synchronization
to copy this value to the corresponding synced slot on the standby server.

The bug has been present since the introduction of slot synchronization in
PostgreSQL 17. However, due to the need for catalog changes, backpatching this
fix is not feasible. Instead, to prevent the risk of losing prepared
transactions in prior versions, we now disallow enabling failover and two-phase
decoding together for a replication slot.
---
 doc/src/sgml/system-views.sgml                | 11 +++
 src/backend/catalog/system_views.sql          |  1 +
 src/backend/replication/logical/slotsync.c    | 14 +++-
 src/backend/replication/slotfuncs.c           |  8 +-
 src/include/catalog/pg_proc.dat               |  6 +-
 .../t/040_standby_failover_slots_sync.pl      | 81 ++++++++++++++++++-
 src/test/regress/expected/rules.out           |  3 +-
 7 files changed, 111 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 3f5a306247e..5c9425e6d1f 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2560,6 +2560,17 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase_at</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       The address (<literal>LSN</literal>) from which the decoding of prepared
+       transactions is enabled. <literal>NULL</literal> for logical slots where
+       <structfield>two_phase</structfield> is false and physical slots.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>inactive_since</structfield> <type>timestamptz</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31d269b7ee0..a8fddd0183c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1025,6 +1025,7 @@ CREATE VIEW pg_replication_slots AS
             L.wal_status,
             L.safe_wal_size,
             L.two_phase,
+            L.two_phase_at,
             L.inactive_since,
             L.conflicting,
             L.invalidation_reason,
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 2c0a7439be4..e22d41891e6 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -139,6 +139,7 @@ typedef struct RemoteSlot
 	bool		failover;
 	XLogRecPtr	restart_lsn;
 	XLogRecPtr	confirmed_lsn;
+	XLogRecPtr	two_phase_at;
 	TransactionId catalog_xmin;
 
 	/* RS_INVAL_NONE if valid, or the reason of invalidation */
@@ -276,7 +277,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 	if (remote_dbid != slot->data.database ||
 		remote_slot->two_phase != slot->data.two_phase ||
 		remote_slot->failover != slot->data.failover ||
-		strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) != 0)
+		strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) != 0 ||
+		remote_slot->two_phase_at != slot->data.two_phase_at)
 	{
 		NameData	plugin_name;
 
@@ -287,6 +289,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		slot->data.plugin = plugin_name;
 		slot->data.database = remote_dbid;
 		slot->data.two_phase = remote_slot->two_phase;
+		slot->data.two_phase_at = remote_slot->two_phase_at;
 		slot->data.failover = remote_slot->failover;
 		SpinLockRelease(&slot->mutex);
 
@@ -788,9 +791,9 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 static bool
 synchronize_slots(WalReceiverConn *wrconn)
 {
-#define SLOTSYNC_COLUMN_COUNT 9
+#define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
-	LSNOID, XIDOID, BOOLOID, BOOLOID, TEXTOID, TEXTOID};
+	LSNOID, XIDOID, BOOLOID, LSNOID, BOOLOID, TEXTOID, TEXTOID};
 
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
@@ -798,7 +801,7 @@ synchronize_slots(WalReceiverConn *wrconn)
 	bool		some_slot_updated = false;
 	bool		started_tx = false;
 	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, failover,"
+		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
 		" database, invalidation_reason"
 		" FROM pg_catalog.pg_replication_slots"
 		" WHERE failover and NOT temporary";
@@ -853,6 +856,9 @@ synchronize_slots(WalReceiverConn *wrconn)
 														   &isnull));
 		Assert(!isnull);
 
+		d = slot_getattr(tupslot, ++col, &isnull);
+		remote_slot->two_phase_at = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
+
 		remote_slot->failover = DatumGetBool(slot_getattr(tupslot, ++col,
 														  &isnull));
 		Assert(!isnull);
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 146eef5871e..8a314b5ff3b 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -235,7 +235,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 19
+#define PG_GET_REPLICATION_SLOTS_COLS 20
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	XLogRecPtr	currlsn;
 	int			slotno;
@@ -406,6 +406,12 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 		values[i++] = BoolGetDatum(slot_contents.data.two_phase);
 
+		if (slot_contents.data.two_phase &&
+			!XLogRecPtrIsInvalid(slot_contents.data.two_phase_at))
+			values[i++] = LSNGetDatum(slot_contents.data.two_phase_at);
+		else
+			nulls[i++] = true;
+
 		if (slot_contents.inactive_since > 0)
 			values[i++] = TimestampTzGetDatum(slot_contents.inactive_since);
 		else
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8b68b16d79d..a89488419a6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11409,9 +11409,9 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,timestamptz,bool,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,inactive_since,conflicting,invalidation_reason,failover,synced}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 8f65142909a..9c8b49e942d 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -22,7 +22,11 @@ $publisher->init(
 # Disable autovacuum to avoid generating xid during stats update as otherwise
 # the new XID could then be replicated to standby at some random point making
 # slots at primary lag behind standby during slot sync.
-$publisher->append_conf('postgresql.conf', 'autovacuum = off');
+$publisher->append_conf(
+	'postgresql.conf', qq{
+autovacuum = off
+max_prepared_transactions = 1
+});
 $publisher->start;
 
 $publisher->safe_psql('postgres',
@@ -33,6 +37,7 @@ my $publisher_connstr = $publisher->connstr . ' dbname=postgres';
 # Create a subscriber node, wait for sync to complete
 my $subscriber1 = PostgreSQL::Test::Cluster->new('subscriber1');
 $subscriber1->init;
+$subscriber1->append_conf('postgresql.conf', 'max_prepared_transactions = 1');
 $subscriber1->start;
 
 # Capture the time before the logical failover slot is created on the
@@ -830,13 +835,72 @@ $primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots',
 	"'sb1_slot'");
 $primary->reload;
 
+##################################################
+# Test the synchronization of the two_phase setting for a subscription with the
+# standby. Additionally, prepare a transaction before enabling the two_phase
+# option; subsequent tests will verify if it can be correctly replicated to the
+# subscriber after committing it on the promoted standby.
+##################################################
+
+$standby1->start;
+
+# Prepare a transaction
+$primary->safe_psql(
+	'postgres', qq[
+	BEGIN;
+	INSERT INTO tab_int values(0);
+	PREPARE TRANSACTION 'test_twophase_slotsync';
+]);
+
+$primary->wait_for_replay_catchup($standby1);
+$primary->wait_for_catchup('regress_mysub1');
+
+# Disable the subscription to allow changing the two_phase option.
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub1 DISABLE");
+
+# Wait for the replication slot to become inactive on the publisher
+$primary->poll_query_until(
+	'postgres',
+	"SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'lsub1_slot' AND active='f'",
+	1);
+
+# Set two_phase to true and enable the subscription
+$subscriber1->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_mysub1 SET (two_phase = true);
+	ALTER SUBSCRIPTION regress_mysub1 ENABLE;
+]);
+
+$primary->wait_for_catchup('regress_mysub1');
+
+my $two_phase_at = $primary->safe_psql('postgres',
+	"SELECT two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot';"
+);
+
+# Confirm that two_phase setting of lsub1_slot slot is synced to the standby
+ok( $standby1->poll_query_until(
+		'postgres',
+		"SELECT two_phase AND '$two_phase_at' = two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;"
+	),
+	'two_phase setting of slot lsub1_slot synced to standby');
+
+# Confirm that the prepared transaction is not yet replicated to the
+# subscriber.
+$result = $subscriber1->safe_psql('postgres',
+	"SELECT count(*) = 0 FROM pg_prepared_xacts;");
+is($result, 't',
+	"the prepared transaction is not replicated to the subscriber");
+
 ##################################################
 # Promote the standby1 to primary. Confirm that:
 # a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
 # b) logical replication for regress_mysub1 is resumed successfully after failover
-# c) changes can be consumed from the synced slot 'snap_test_slot'
+# c) changes from the transaction prepared 'test_twophase_slotsync' can be
+#    consumed from the synced slot 'snap_test_slot' once committed on the new
+#    primary.
+# d) changes can be consumed from the synced slot 'snap_test_slot'
 ##################################################
-$standby1->start;
 $primary->wait_for_replay_catchup($standby1);
 
 # Capture the time before the standby is promoted
@@ -876,6 +940,15 @@ is( $standby1->safe_psql(
 	't',
 	'synced slot retained on the new primary');
 
+# Commit the prepared transaction
+$standby1->safe_psql('postgres',
+	"COMMIT PREPARED 'test_twophase_slotsync';");
+$standby1->wait_for_catchup('regress_mysub1');
+
+# Confirm that the prepared transaction is replicated to the subscriber
+is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
+	"11", 'prepared data replicated from the new primary');
+
 # Insert data on the new primary
 $standby1->safe_psql('postgres',
 	"INSERT INTO tab_int SELECT generate_series(11, 20);");
@@ -883,7 +956,7 @@ $standby1->wait_for_catchup('regress_mysub1');
 
 # Confirm that data in tab_int replicated on the subscriber
 is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
-	"20", 'data replicated from the new primary');
+	"21", 'data replicated from the new primary');
 
 # Consume the data from the snap_test_slot. The synced slot should reach a
 # consistent point by restoring the snapshot at the restart_lsn serialized
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 47478969135..035769b4624 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1474,12 +1474,13 @@ pg_replication_slots| SELECT l.slot_name,
     l.wal_status,
     l.safe_wal_size,
     l.two_phase,
+    l.two_phase_at,
     l.inactive_since,
     l.conflicting,
     l.invalidation_reason,
     l.failover,
     l.synced
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, inactive_since, conflicting, invalidation_reason, failover, synced)
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
2.30.0.windows.2

v3-0002-PG-Disallow-enabling-failover-for-a-replication-slot.patch.txttext/plain; name=v3-0002-PG-Disallow-enabling-failover-for-a-replication-slot.patch.txtDownload
From 0c9a8086913ca2e13106df386544821f39c1d41c Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Mon, 31 Mar 2025 16:28:57 +0800
Subject: [PATCH v3] Disallow enabling failover for a replication slot that
 enables two-phase decoding

This commit fixes a bug for slot synchronization with logical replication
slots that enabled two_phase decoding. As it stands, transactions prepared
before two-phase decoding is enabled may fail to replicate to the subscriber
after being committed on a promoted standby following a failover.

The issue arises because the two_phase_at field of a slot, which tracks the LSN
from which two-phase decoding starts, is not synchronized to standby servers.
Without this field, the logical decoding might incorrectly identify prepared
transaction as already replicated to the subscriber, causing them to be
skipped.

To address the issue on HEAD, this commit makes the two_phase_at field of the slot
visible in the pg_replication_slots view and enables the slot synchronization
to copy this value to the corresponding synced slot on the standby server.

The bug has been present since the introduction of slot synchronization in
PostgreSQL 17. However, due to the need for catalog changes, backpatching this
fix is not feasible. Instead, to prevent the risk of losing prepared
transactions in prior versions, we now disallow enabling failover and two-phase
decoding together for a replication slot.
---
 contrib/test_decoding/expected/slot.out       |  2 ++
 contrib/test_decoding/sql/slot.sql            |  1 +
 src/backend/commands/subscriptioncmds.c       | 25 +++++++++++++++++
 src/backend/replication/slot.c                | 28 +++++++++++++++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  6 ++--
 .../t/040_standby_failover_slots_sync.pl      | 23 +++++++++++++++
 src/test/regress/expected/subscription.out    |  3 ++
 src/test/regress/sql/subscription.sql         |  4 +++
 8 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..8fd762dea85 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..6db3c9347fa 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,18 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable failover and two_phase option together.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for
+	 * detailed reasons.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+					   "failover", "two_phase"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1257,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable failover and two_phase
+					 * option together.
+					 *
+					 * See comments atop the similar check in
+					 * ReplicationSlotCreate() for detailed reasons.
+					 */
+					if (sub->twophasestate != LOGICALREP_TWOPHASE_STATE_DISABLED &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+								errmsg("cannot enable failover for a two_phase enabled subscription"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..f40028bd941 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,22 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable failover for slots that enable
+		 * two-phase decoding.
+		 *
+		 * This is because the two_phase_at field of a slot, which tracks the
+		 * LSN from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without this field, the logical decoding might
+		 * incorrectly identify prepared transaction as already replicated to
+		 * the subscriber, causing them to be skipped.
+		 */
+		if (two_phase)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +864,18 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for slots that enable two-phase
+	 * decoding.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for
+	 * detailed reasons.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..6e68ed5d4bb 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,29 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option cannot be enabled for a two_phase enabled
+# subscription.
+##################################################
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, create_slot = false, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~ /ERROR:  cannot enable failover for a two_phase enabled subscription/,
+	"Enabling failover is not allowed for a two_phase enabled subscription");
+
+# Drop the subscription
+$subscriber1->safe_psql(
+	'postgres', q{
+ALTER SUBSCRIPTION regress_mysub2 SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_mysub2;
+});
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..bbd83244ba9 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,9 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.30.0.windows.2

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#7)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 1, 2025 at 4:28 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Here is the V3 patch set which addressed all the comments.

Comment 0n 0001
<literal>NULL</literal> for logical slots where
+       <structfield>two_phase</structfield> is false and physical slots.
+      </para></entry>

change above to:
<literal>NULL</literal> for logical slots where
<structfield>two_phase</structfield> is false and for physical slots.

Comment on 0002
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+ "CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr'
PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, create_slot =
false, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+ "ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~ /ERROR:  cannot enable failover for a two_phase
enabled subscription/,
+ "Enabling failover is not allowed for a two_phase enabled subscription");

Is there a need for this test to be in .pl file? Can't we add it in .sql file?

Apart from the above, I have made minor modifications to the PG17
patch in the attached.

--
With Regards,
Amit Kapila.

Attachments:

v3-0002-amit.1.patch.txttext/plain; charset=US-ASCII; name=v3-0002-amit.1.patch.txtDownload
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 6db3c9347fa..09c5f0ef685 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -649,10 +649,11 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
 	/*
-	 * Do not allow users to enable failover and two_phase option together.
+	 * Do not allow users to enable the failover and two_phase options
+	 * together.
 	 *
-	 * See comments atop the similar check in ReplicationSlotCreate() for
-	 * detailed reasons.
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
 	 */
 	if (opts.twophase && opts.failover)
 		ereport(ERROR,
@@ -1258,11 +1259,11 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 										"failover")));
 
 					/*
-					 * Do not allow users to enable failover and two_phase
-					 * option together.
+					 * Do not allow users to enable the failover and two_phase
+					 * options together.
 					 *
 					 * See comments atop the similar check in
-					 * ReplicationSlotCreate() for detailed reasons.
+					 * ReplicationSlotCreate() for a detailed reason.
 					 */
 					if (sub->twophasestate != LOGICALREP_TWOPHASE_STATE_DISABLED &&
 						opts.failover)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f40028bd941..d251ce95fe6 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -345,14 +345,14 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 					errmsg("cannot enable failover for a temporary replication slot"));
 
 		/*
-		 * Do not allow users to enable failover for slots that enable
-		 * two-phase decoding.
+		 * Do not allow users to enable both failover and two_phase for slots.
 		 *
 		 * This is because the two_phase_at field of a slot, which tracks the
-		 * LSN from which two-phase decoding starts, is not synchronized to
-		 * standby servers. Without this field, the logical decoding might
+		 * LSN, from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without two_phase_at, the logical decoding might
 		 * incorrectly identify prepared transaction as already replicated to
-		 * the subscriber, causing them to be skipped.
+		 * the subscriber after promotion of standby server, causing them to be
+		 * skipped.
 		 */
 		if (two_phase)
 			ereport(ERROR,
@@ -865,11 +865,10 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errmsg("cannot enable failover for a temporary replication slot"));
 
 	/*
-	 * Do not allow users to enable failover for slots that enable two-phase
-	 * decoding.
+	 * Do not allow users to enable failover for a two_phase enabled slot.
 	 *
-	 * See comments atop the similar check in ReplicationSlotCreate() for
-	 * detailed reasons.
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
 	 */
 	if (failover && MyReplicationSlot->data.two_phase)
 		ereport(ERROR,
#9Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#8)
2 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

On Wed, Apr 2, 2025 at 12:41 PM Amit Kapila wrote:

On Tue, Apr 1, 2025 at 4:28 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com>
wrote:

Here is the V3 patch set which addressed all the comments.

Comment 0n 0001
<literal>NULL</literal> for logical slots where
+       <structfield>two_phase</structfield> is false and physical slots.
+      </para></entry>

change above to:
<literal>NULL</literal> for logical slots where
<structfield>two_phase</structfield> is false and for physical slots.

Changed.

Comment on 0002
+# Create a subscription with two_phase enabled 
+$subscriber1->safe_psql('postgres',
+ "CREATE SUBSCRIPTION regress_mysub2 CONNECTION
'$publisher_connstr'
PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, create_slot = 
false, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription ($result, $stdout, $stderr) = 
+$subscriber1->psql('postgres',  "ALTER SUBSCRIPTION regress_mysub2 
+SET (failover = true)"); ok( $stderr =~
+/ERROR:  cannot enable failover for a two_phase
enabled subscription/,
+ "Enabling failover is not allowed for a two_phase enabled 
+ subscription");

Is there a need for this test to be in .pl file? Can't we add it in .sql file?

Right, I moved the test into subscription.sql

Apart from the above, I have made minor modifications to the PG17
patch in the attached.

Here is V4 patch set which addressed above comments and passed
pgindent test.

Best Regards,
Hou zj

Attachments:

v4-0001-HEAD-Fix-slot-synchronization-for-two_phase-enables-sl.patchapplication/octet-stream; name=v4-0001-HEAD-Fix-slot-synchronization-for-two_phase-enables-sl.patchDownload
From 470ef78dc9b0c2a9eabdc29b976651f34b76a277 Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Thu, 27 Feb 2025 10:49:32 +0800
Subject: [PATCH v4] Fix slot synchronization for two_phase enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To address the issue on HEAD, the two_phase_at field of the slot is
exposed by the pg_replication_slots view and allows the slot
synchronization to copy this value to the corresponding synced slot on the
standby server.

However, we can't fix the issue in the same way in PG17 where slot
synchronization has been introduced, as it requires a change in view and
builtin function definition, which in turn requires a catversion bump.
Instead, to prevent the risk of losing prepared transactions, we disallow
enabling failover and two-phase decoding together for a replication slot.
Users are advised to disable slot synchronization for two_phase-enabled
slots or re-create the subscriptions with option 'two_phase' as false and
'failover' as true.
---
 doc/src/sgml/system-views.sgml                | 11 +++
 src/backend/catalog/system_views.sql          |  1 +
 src/backend/replication/logical/slotsync.c    | 14 +++-
 src/backend/replication/slotfuncs.c           |  8 +-
 src/include/catalog/pg_proc.dat               |  6 +-
 .../t/040_standby_failover_slots_sync.pl      | 81 ++++++++++++++++++-
 src/test/regress/expected/rules.out           |  3 +-
 7 files changed, 111 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index e9a59af8c34..4f336ee0adf 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2854,6 +2854,17 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase_at</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       The address (<literal>LSN</literal>) from which the decoding of prepared
+       transactions is enabled. <literal>NULL</literal> for logical slots
+       where <structfield>two_phase</structfield> is false and for physical slots.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>inactive_since</structfield> <type>timestamptz</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 64a7240aa77..273008db37f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1025,6 +1025,7 @@ CREATE VIEW pg_replication_slots AS
             L.wal_status,
             L.safe_wal_size,
             L.two_phase,
+            L.two_phase_at,
             L.inactive_since,
             L.conflicting,
             L.invalidation_reason,
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 2c0a7439be4..e22d41891e6 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -139,6 +139,7 @@ typedef struct RemoteSlot
 	bool		failover;
 	XLogRecPtr	restart_lsn;
 	XLogRecPtr	confirmed_lsn;
+	XLogRecPtr	two_phase_at;
 	TransactionId catalog_xmin;
 
 	/* RS_INVAL_NONE if valid, or the reason of invalidation */
@@ -276,7 +277,8 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 	if (remote_dbid != slot->data.database ||
 		remote_slot->two_phase != slot->data.two_phase ||
 		remote_slot->failover != slot->data.failover ||
-		strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) != 0)
+		strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) != 0 ||
+		remote_slot->two_phase_at != slot->data.two_phase_at)
 	{
 		NameData	plugin_name;
 
@@ -287,6 +289,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		slot->data.plugin = plugin_name;
 		slot->data.database = remote_dbid;
 		slot->data.two_phase = remote_slot->two_phase;
+		slot->data.two_phase_at = remote_slot->two_phase_at;
 		slot->data.failover = remote_slot->failover;
 		SpinLockRelease(&slot->mutex);
 
@@ -788,9 +791,9 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 static bool
 synchronize_slots(WalReceiverConn *wrconn)
 {
-#define SLOTSYNC_COLUMN_COUNT 9
+#define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
-	LSNOID, XIDOID, BOOLOID, BOOLOID, TEXTOID, TEXTOID};
+	LSNOID, XIDOID, BOOLOID, LSNOID, BOOLOID, TEXTOID, TEXTOID};
 
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
@@ -798,7 +801,7 @@ synchronize_slots(WalReceiverConn *wrconn)
 	bool		some_slot_updated = false;
 	bool		started_tx = false;
 	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, failover,"
+		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
 		" database, invalidation_reason"
 		" FROM pg_catalog.pg_replication_slots"
 		" WHERE failover and NOT temporary";
@@ -853,6 +856,9 @@ synchronize_slots(WalReceiverConn *wrconn)
 														   &isnull));
 		Assert(!isnull);
 
+		d = slot_getattr(tupslot, ++col, &isnull);
+		remote_slot->two_phase_at = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
+
 		remote_slot->failover = DatumGetBool(slot_getattr(tupslot, ++col,
 														  &isnull));
 		Assert(!isnull);
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 146eef5871e..8a314b5ff3b 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -235,7 +235,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 19
+#define PG_GET_REPLICATION_SLOTS_COLS 20
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	XLogRecPtr	currlsn;
 	int			slotno;
@@ -406,6 +406,12 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 		values[i++] = BoolGetDatum(slot_contents.data.two_phase);
 
+		if (slot_contents.data.two_phase &&
+			!XLogRecPtrIsInvalid(slot_contents.data.two_phase_at))
+			values[i++] = LSNGetDatum(slot_contents.data.two_phase_at);
+		else
+			nulls[i++] = true;
+
 		if (slot_contents.inactive_since > 0)
 			values[i++] = TimestampTzGetDatum(slot_contents.inactive_since);
 		else
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6b57b7e18d9..30e77458b5d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11421,9 +11421,9 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,timestamptz,bool,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,inactive_since,conflicting,invalidation_reason,failover,synced}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 8f65142909a..9c8b49e942d 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -22,7 +22,11 @@ $publisher->init(
 # Disable autovacuum to avoid generating xid during stats update as otherwise
 # the new XID could then be replicated to standby at some random point making
 # slots at primary lag behind standby during slot sync.
-$publisher->append_conf('postgresql.conf', 'autovacuum = off');
+$publisher->append_conf(
+	'postgresql.conf', qq{
+autovacuum = off
+max_prepared_transactions = 1
+});
 $publisher->start;
 
 $publisher->safe_psql('postgres',
@@ -33,6 +37,7 @@ my $publisher_connstr = $publisher->connstr . ' dbname=postgres';
 # Create a subscriber node, wait for sync to complete
 my $subscriber1 = PostgreSQL::Test::Cluster->new('subscriber1');
 $subscriber1->init;
+$subscriber1->append_conf('postgresql.conf', 'max_prepared_transactions = 1');
 $subscriber1->start;
 
 # Capture the time before the logical failover slot is created on the
@@ -830,13 +835,72 @@ $primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots',
 	"'sb1_slot'");
 $primary->reload;
 
+##################################################
+# Test the synchronization of the two_phase setting for a subscription with the
+# standby. Additionally, prepare a transaction before enabling the two_phase
+# option; subsequent tests will verify if it can be correctly replicated to the
+# subscriber after committing it on the promoted standby.
+##################################################
+
+$standby1->start;
+
+# Prepare a transaction
+$primary->safe_psql(
+	'postgres', qq[
+	BEGIN;
+	INSERT INTO tab_int values(0);
+	PREPARE TRANSACTION 'test_twophase_slotsync';
+]);
+
+$primary->wait_for_replay_catchup($standby1);
+$primary->wait_for_catchup('regress_mysub1');
+
+# Disable the subscription to allow changing the two_phase option.
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub1 DISABLE");
+
+# Wait for the replication slot to become inactive on the publisher
+$primary->poll_query_until(
+	'postgres',
+	"SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'lsub1_slot' AND active='f'",
+	1);
+
+# Set two_phase to true and enable the subscription
+$subscriber1->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_mysub1 SET (two_phase = true);
+	ALTER SUBSCRIPTION regress_mysub1 ENABLE;
+]);
+
+$primary->wait_for_catchup('regress_mysub1');
+
+my $two_phase_at = $primary->safe_psql('postgres',
+	"SELECT two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot';"
+);
+
+# Confirm that two_phase setting of lsub1_slot slot is synced to the standby
+ok( $standby1->poll_query_until(
+		'postgres',
+		"SELECT two_phase AND '$two_phase_at' = two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;"
+	),
+	'two_phase setting of slot lsub1_slot synced to standby');
+
+# Confirm that the prepared transaction is not yet replicated to the
+# subscriber.
+$result = $subscriber1->safe_psql('postgres',
+	"SELECT count(*) = 0 FROM pg_prepared_xacts;");
+is($result, 't',
+	"the prepared transaction is not replicated to the subscriber");
+
 ##################################################
 # Promote the standby1 to primary. Confirm that:
 # a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
 # b) logical replication for regress_mysub1 is resumed successfully after failover
-# c) changes can be consumed from the synced slot 'snap_test_slot'
+# c) changes from the transaction prepared 'test_twophase_slotsync' can be
+#    consumed from the synced slot 'snap_test_slot' once committed on the new
+#    primary.
+# d) changes can be consumed from the synced slot 'snap_test_slot'
 ##################################################
-$standby1->start;
 $primary->wait_for_replay_catchup($standby1);
 
 # Capture the time before the standby is promoted
@@ -876,6 +940,15 @@ is( $standby1->safe_psql(
 	't',
 	'synced slot retained on the new primary');
 
+# Commit the prepared transaction
+$standby1->safe_psql('postgres',
+	"COMMIT PREPARED 'test_twophase_slotsync';");
+$standby1->wait_for_catchup('regress_mysub1');
+
+# Confirm that the prepared transaction is replicated to the subscriber
+is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
+	"11", 'prepared data replicated from the new primary');
+
 # Insert data on the new primary
 $standby1->safe_psql('postgres',
 	"INSERT INTO tab_int SELECT generate_series(11, 20);");
@@ -883,7 +956,7 @@ $standby1->wait_for_catchup('regress_mysub1');
 
 # Confirm that data in tab_int replicated on the subscriber
 is($subscriber1->safe_psql('postgres', q{SELECT count(*) FROM tab_int;}),
-	"20", 'data replicated from the new primary');
+	"21", 'data replicated from the new primary');
 
 # Consume the data from the snap_test_slot. The synced slot should reach a
 # consistent point by restoring the snapshot at the restart_lsn serialized
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index d9533deb04e..673c63b8d1b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1490,12 +1490,13 @@ pg_replication_slots| SELECT l.slot_name,
     l.wal_status,
     l.safe_wal_size,
     l.two_phase,
+    l.two_phase_at,
     l.inactive_since,
     l.conflicting,
     l.invalidation_reason,
     l.failover,
     l.synced
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, inactive_since, conflicting, invalidation_reason, failover, synced)
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced)
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
2.30.0.windows.2

v4-0002-PG17-Fix-slot-synchronization-for-two_phase-enables-sl.patch.txttext/plain; name=v4-0002-PG17-Fix-slot-synchronization-for-two_phase-enables-sl.patch.txtDownload
From e1ad472436f02ed9b410bc96c1b5d0e7952f7883 Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Mon, 31 Mar 2025 16:28:57 +0800
Subject: [PATCH v4 2/2] Fix slot synchronization for two_phase enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To address the issue on HEAD, the two_phase_at field of the slot is
exposed by the pg_replication_slots view and allows the slot
synchronization to copy this value to the corresponding synced slot on the
standby server.

However, we can't fix the issue in the same way in PG17 where slot
synchronization has been introduced, as it requires a change in view and
builtin function definition, which in turn requires a catversion bump.
Instead, to prevent the risk of losing prepared transactions, we disallow
enabling failover and two-phase decoding together for a replication slot.
Users are advised to disable slot synchronization for two_phase-enabled
slots or re-create the subscriptions with option 'two_phase' as false and
'failover' as true.
---
 contrib/test_decoding/expected/slot.out    |  2 ++
 contrib/test_decoding/sql/slot.sql         |  1 +
 src/backend/commands/subscriptioncmds.c    | 26 +++++++++++++++++++++
 src/backend/replication/slot.c             | 27 ++++++++++++++++++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl  |  6 ++---
 src/test/regress/expected/subscription.out |  6 +++++
 src/test/regress/sql/subscription.sql      |  7 ++++++
 7 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..8fd762dea85 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..09c5f0ef685 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,19 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+					   "failover", "two_phase"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1258,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover and two_phase
+					 * options together.
+					 *
+					 * See comments atop the similar check in
+					 * ReplicationSlotCreate() for a detailed reason.
+					 */
+					if (sub->twophasestate != LOGICALREP_TWOPHASE_STATE_DISABLED &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+								errmsg("cannot enable failover for a two_phase enabled subscription"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 780d22afbca..3293651f725 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,22 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 *
+		 * This is because the two_phase_at field of a slot, which tracks the
+		 * LSN, from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without two_phase_at, the logical decoding might
+		 * incorrectly identify prepared transaction as already replicated to
+		 * the subscriber after promotion of standby server, causing them to
+		 * be skipped.
+		 */
+		if (two_phase)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +864,17 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..fd94389d916 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -389,6 +389,9 @@ ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
  regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
+--fail - cannot enable failover for a two_phase enabled subscription
+ALTER SUBSCRIPTION regress_testsub SET (failover = true);
+ERROR:  cannot enable failover for a two_phase enabled subscription
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 -- two_phase and streaming are compatible.
@@ -479,6 +482,9 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..bc6e0f04e64 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -264,6 +264,9 @@ ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
 
 \dRs+
 
+--fail - cannot enable failover for a two_phase enabled subscription
+ALTER SUBSCRIPTION regress_testsub SET (failover = true);
+
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
@@ -342,6 +345,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.31.1

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#4)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Mar 31, 2025 at 4:34 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Mar 27, 2025 at 2:29 PM Amit Kapila wrote:

On Tue, Mar 25, 2025 at 12:1 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When testing the slot synchronization with logical replication slots that
enabled two_phase decoding, I found that transactions prepared before

two-phase

decoding is enabled may fail to replicate to the subscriber after being
committed on a promoted standby following a failover.

To reproduce this issue, please follow these steps (also detailed in the
attached TAP test, v1-0001):

1. sub: create a subscription with (two_phase = false)
2. primary (pub): prepare a txn A.
3. sub: alter subscription set (two_phase = true) and wait for the logical

slot to

be synced to standby.
4. primary (pub): stop primary, promote the standby and let the subscriber

use

the promoted standby as publisher.
5. promoted standby (pub): COMMIT PREPARED A;
6. sub: the apply worker will report the following ERROR because it didn't
receive the PREPARE.
ERROR: prepared transaction with identifier "pg_gid_16387_752"

does not exist

I think the root cause of this issue is that the two_phase_at field of the
slot, which indicates the LSN from which two-phase decoding is enabled

(used to

prevent duplicate data transmission for prepared transactions), is not
synchronized to the standby server.

In step 3, transaction A is not immediately replicated because it occurred
before enabling two-phase decoding. Thus, the prepared transaction

should only

be replicated after decoding the final COMMIT PREPARED, as referenced

in

ReorderBufferFinishPrepared(). However, due to the invalid two_phase_at

on the

standby, the prepared transaction fails to send at that time.

This problem arises after the support for altering the two-phase option
(1462aad).

I suspect that this can happen in PG17 as well, but I need to think
more about it to make a reproducible test case.

After further analysis, I was able to reproduce the same issue [1] in
PG 17.

[1]
- pub: created a slot 'sub' with two_phase=false, then prepared a transaction
- pub: after some activity, advanced the confirmed_flush_lsn of 'sub', so it is
greater than prepared txn lsn.
- sub: create subscription with (slot_name='sub', create_slot=false, failover =
true, two_phase=true, copy_data=false); two_phase_at will be set to the same
as confirmed_flush_lsn which is greater than the prepared transaction.
- stop the primary and promote the standby.
- commit the prepared transaction on standby, the following error will be
reported on subscriber:

It seems to require elaborate steps to reproduce this issue in v17. I
wonder if we could somehow narrow down the cases that we want to
prohibit. The patch for v17 disallows CREATE SUBSCRIPTION to enable
both two_phase and failover, but I guess that it's still safe if it
also creates the replication slot (e.g., create_slot is true). If my
understanding is right, we can allow users to specify both fields if
CRETE SUBSCRIPTION creates the slot, and we don't need to disallow
that in ReplicationSlotCreate().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#11Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#10)
RE: Fix slot synchronization with two_phase decoding enabled

On Wed, Apr 2, 2025 at 3:45 PM Masahiko Sawada wrote:

Hi,

On Mon, Mar 31, 2025 at 4:3 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Mar 27, 2025 at 2:29 PM Amit Kapila wrote:

On Tue, Mar 25, 2025 at 12:1 PM Amit Kapila
<amit.kapila16@gmail.com>
wrote:

On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When testing the slot synchronization with logical replication
slots that enabled two_phase decoding, I found that transactions
prepared before

two-phase

decoding is enabled may fail to replicate to the subscriber
after being committed on a promoted standby following a failover.

To reproduce this issue, please follow these steps (also
detailed in the attached TAP test, v1-0001):

1. sub: create a subscription with (two_phase = false) 2.
primary (pub): prepare a txn A.
3. sub: alter subscription set (two_phase = true) and wait for
the logical

slot to

be synced to standby.
4. primary (pub): stop primary, promote the standby and let the
subscriber

use

the promoted standby as publisher.
5. promoted standby (pub): COMMIT PREPARED A; 6. sub: the apply
worker will report the following ERROR because it didn't
receive the PREPARE.
ERROR: prepared transaction with identifier "pg_gid_16387_752"

does not exist

I think the root cause of this issue is that the two_phase_at
field of the slot, which indicates the LSN from which two-phase
decoding is enabled

(used to

prevent duplicate data transmission for prepared transactions),
is not synchronized to the standby server.

In step 3, transaction A is not immediately replicated because
it occurred before enabling two-phase decoding. Thus, the
prepared transaction

should only

be replicated after decoding the final COMMIT PREPARED, as
referenced

in

ReorderBufferFinishPrepared(). However, due to the invalid
two_phase_at

on the

standby, the prepared transaction fails to send at that time.

This problem arises after the support for altering the two-phase
option (1462aad).

I suspect that this can happen in PG17 as well, but I need to think
more about it to make a reproducible test case.

After further analysis, I was able to reproduce the same issue [1] in
PG 17.

[1]
- pub: created a slot 'sub' with two_phase=false, then prepared a
transaction
- pub: after some activity, advanced the confirmed_flush_lsn of 'sub', so it is
greater than prepared txn lsn.
- sub: create subscription with (slot_name='sub', create_slot=false,
failover = true, two_phase=true, copy_data=false); two_phase_at will
be set to the same as confirmed_flush_lsn which is greater than the

prepared transaction.

- stop the primary and promote the standby.
- commit the prepared transaction on standby, the following error will be
reported on subscriber:

It seems to require elaborate steps to reproduce this issue in v17. I wonder if we
could somehow narrow down the cases that we want to prohibit. The patch for
v17 disallows CREATE SUBSCRIPTION to enable both two_phase and failover,
but I guess that it's still safe if it also creates the replication slot (e.g.,
create_slot is true). If my understanding is right, we can allow users to specify
both fields if CRETE SUBSCRIPTION creates the slot, and we don't need to
disallow that in ReplicationSlotCreate().

Thanks for reviewing the steps.

The current reproducer aims for simplicity; however, I think it's possible to reproduce
the issue even with create_slot = true, although it requires the help of a
debugger and additional steps. But as long as there are transactions prepared
before the two_phase_at position, and they are skipped due to checks in
ReorderBufferFinishPrepared() (refer to comments[1]/* * It is possible that this transaction is not decoded at prepare time * either because by that time we didn't have a consistent snapshot, or * two_phase was not enabled, or it was decoded earlier but we have * restarted. We only need to send the prepare if it was not decoded * earlier. We don't need to decode the xact for aborts if it is not done * already. */ for why we skip sending
prepared transaction), the issue can be reproduced.

For instance, when creating a subscription with (copy_data=true, failover=true,
two_phase=true), the slot's two_phase setting starts as false and shifts to
true after table sync (refer to [2]/* * Even if two_phase is set, don't create the slot with * two-phase enabled. Will enable it once all the tables are * synced and ready. This avoids race-conditions like prepared * transactions being skipped due to changes not being applied * due to checks in should_apply_changes_for_rel() when * tablesync for the corresponding tables are in progress. See * comments atop worker.c. * * Note that if tables were specified but copy_data is false * then it is safe to enable two_phase up-front because those * tables are already initially in READY state. When the * subscription has no tables, we leave the twophase state as * PENDING, to allow ALTER SUBSCRIPTION ... REFRESH * PUBLICATION to work. */ Best Regards, Hou zj for related comments). During this period,
if a user prepares a transaction where the prepare LSN is less than the
two_phase_at, the same problem could happen.

Similarly, when setting up a subscription with (copy_data=false, failover=true,
two_phase=true), although two_phase is initially set to true and we wait for
running transactions to finish when building consistent snapshot, a race
condition may still exist: If the snapshot state reaches FULL_SNAPSHOT, it
won't check running transactions further (see SnapBuildFindSnapshot() for
specifics), if a user prepares a transaction at this point, it's possible for
the prepare LSN to be less than the LSN marking the snapshot's consistent
state, causing the same issue.

[1]: /* * It is possible that this transaction is not decoded at prepare time * either because by that time we didn't have a consistent snapshot, or * two_phase was not enabled, or it was decoded earlier but we have * restarted. We only need to send the prepare if it was not decoded * earlier. We don't need to decode the xact for aborts if it is not done * already. */

/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot, or
* two_phase was not enabled, or it was decoded earlier but we have
* restarted. We only need to send the prepare if it was not decoded
* earlier. We don't need to decode the xact for aborts if it is not done
* already.
*/

[2]: /* * Even if two_phase is set, don't create the slot with * two-phase enabled. Will enable it once all the tables are * synced and ready. This avoids race-conditions like prepared * transactions being skipped due to changes not being applied * due to checks in should_apply_changes_for_rel() when * tablesync for the corresponding tables are in progress. See * comments atop worker.c. * * Note that if tables were specified but copy_data is false * then it is safe to enable two_phase up-front because those * tables are already initially in READY state. When the * subscription has no tables, we leave the twophase state as * PENDING, to allow ALTER SUBSCRIPTION ... REFRESH * PUBLICATION to work. */ Best Regards, Hou zj

/*
* Even if two_phase is set, don't create the slot with
* two-phase enabled. Will enable it once all the tables are
* synced and ready. This avoids race-conditions like prepared
* transactions being skipped due to changes not being applied
* due to checks in should_apply_changes_for_rel() when
* tablesync for the corresponding tables are in progress. See
* comments atop worker.c.
*
* Note that if tables were specified but copy_data is false
* then it is safe to enable two_phase up-front because those
* tables are already initially in READY state. When the
* subscription has no tables, we leave the twophase state as
* PENDING, to allow ALTER SUBSCRIPTION ... REFRESH
* PUBLICATION to work.
*/
Best Regards,
Hou zj

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#11)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wed, Apr 2, 2025 at 3:45 PM Masahiko Sawada wrote:

Hi,

On Mon, Mar 31, 2025 at 4:3 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Mar 27, 2025 at 2:29 PM Amit Kapila wrote:

On Tue, Mar 25, 2025 at 12:1 PM Amit Kapila
<amit.kapila16@gmail.com>
wrote:

On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

When testing the slot synchronization with logical replication
slots that enabled two_phase decoding, I found that transactions
prepared before

two-phase

decoding is enabled may fail to replicate to the subscriber
after being committed on a promoted standby following a failover.

To reproduce this issue, please follow these steps (also
detailed in the attached TAP test, v1-0001):

1. sub: create a subscription with (two_phase = false) 2.
primary (pub): prepare a txn A.
3. sub: alter subscription set (two_phase = true) and wait for
the logical

slot to

be synced to standby.
4. primary (pub): stop primary, promote the standby and let the
subscriber

use

the promoted standby as publisher.
5. promoted standby (pub): COMMIT PREPARED A; 6. sub: the apply
worker will report the following ERROR because it didn't
receive the PREPARE.
ERROR: prepared transaction with identifier "pg_gid_16387_752"

does not exist

I think the root cause of this issue is that the two_phase_at
field of the slot, which indicates the LSN from which two-phase
decoding is enabled

(used to

prevent duplicate data transmission for prepared transactions),
is not synchronized to the standby server.

In step 3, transaction A is not immediately replicated because
it occurred before enabling two-phase decoding. Thus, the
prepared transaction

should only

be replicated after decoding the final COMMIT PREPARED, as
referenced

in

ReorderBufferFinishPrepared(). However, due to the invalid
two_phase_at

on the

standby, the prepared transaction fails to send at that time.

This problem arises after the support for altering the two-phase
option (1462aad).

I suspect that this can happen in PG17 as well, but I need to think
more about it to make a reproducible test case.

After further analysis, I was able to reproduce the same issue [1] in
PG 17.

[1]
- pub: created a slot 'sub' with two_phase=false, then prepared a
transaction
- pub: after some activity, advanced the confirmed_flush_lsn of 'sub', so it is
greater than prepared txn lsn.
- sub: create subscription with (slot_name='sub', create_slot=false,
failover = true, two_phase=true, copy_data=false); two_phase_at will
be set to the same as confirmed_flush_lsn which is greater than the

prepared transaction.

- stop the primary and promote the standby.
- commit the prepared transaction on standby, the following error will be
reported on subscriber:

It seems to require elaborate steps to reproduce this issue in v17. I wonder if we
could somehow narrow down the cases that we want to prohibit. The patch for
v17 disallows CREATE SUBSCRIPTION to enable both two_phase and failover,
but I guess that it's still safe if it also creates the replication slot (e.g.,
create_slot is true). If my understanding is right, we can allow users to specify
both fields if CRETE SUBSCRIPTION creates the slot, and we don't need to
disallow that in ReplicationSlotCreate().

Thanks for reviewing the steps.

The current reproducer aims for simplicity; however, I think it's possible to reproduce
the issue even with create_slot = true, although it requires the help of a
debugger and additional steps. But as long as there are transactions prepared
before the two_phase_at position, and they are skipped due to checks in
ReorderBufferFinishPrepared() (refer to comments[1] for why we skip sending
prepared transaction), the issue can be reproduced.

For instance, when creating a subscription with (copy_data=true, failover=true,
two_phase=true), the slot's two_phase setting starts as false and shifts to
true after table sync (refer to [2] for related comments). During this period,
if a user prepares a transaction where the prepare LSN is less than the
two_phase_at, the same problem could happen.

Similarly, when setting up a subscription with (copy_data=false, failover=true,
two_phase=true), although two_phase is initially set to true and we wait for
running transactions to finish when building consistent snapshot, a race
condition may still exist: If the snapshot state reaches FULL_SNAPSHOT, it
won't check running transactions further (see SnapBuildFindSnapshot() for
specifics), if a user prepares a transaction at this point, it's possible for
the prepare LSN to be less than the LSN marking the snapshot's consistent
state, causing the same issue.

Thank you for the explanation! I agree that the issue happens in these cases.

As another idea, I wonder if we could somehow defer to make the synced
slot as 'sync-ready' until we can ensure that the slot doesn't have
any transactions that are prepared before the point of enabling
two_phase. For example, when the slotsync worker fetches the remote
slot, it remembers the confirmed_flush_lsn (say LSN-1) if the local
slot's two_phase becomes true or the local slot is newly created with
enabling two_phase, and then it makes the slot 'sync-ready' once it
confirmed that the slot's restart_lsn passed LSN-1. Does it work?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#13Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#12)
RE: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wed, Apr 2, 2025 at 3:45 PM Masahiko Sawada wrote:

Hi,

On Mon, Mar 31, 2025 at 4:3 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

After further analysis, I was able to reproduce the same issue [1] in
PG 17.

[1]
- pub: created a slot 'sub' with two_phase=false, then prepared a
transaction
- pub: after some activity, advanced the confirmed_flush_lsn of 'sub', so

it is

greater than prepared txn lsn.
- sub: create subscription with (slot_name='sub', create_slot=false,
failover = true, two_phase=true, copy_data=false); two_phase_at will
be set to the same as confirmed_flush_lsn which is greater than the

prepared transaction.

- stop the primary and promote the standby.
- commit the prepared transaction on standby, the following error will be
reported on subscriber:

It seems to require elaborate steps to reproduce this issue in v17. I wonder

if we

could somehow narrow down the cases that we want to prohibit. The patch

for

v17 disallows CREATE SUBSCRIPTION to enable both two_phase and

failover,

but I guess that it's still safe if it also creates the replication slot (e.g.,
create_slot is true). If my understanding is right, we can allow users to

specify

both fields if CRETE SUBSCRIPTION creates the slot, and we don't need to
disallow that in ReplicationSlotCreate().

Thanks for reviewing the steps.

The current reproducer aims for simplicity; however, I think it's possible to

reproduce

the issue even with create_slot = true, although it requires the help of a
debugger and additional steps. But as long as there are transactions

prepared

before the two_phase_at position, and they are skipped due to checks in
ReorderBufferFinishPrepared() (refer to comments[1] for why we skip

sending

prepared transaction), the issue can be reproduced.

For instance, when creating a subscription with (copy_data=true,

failover=true,

two_phase=true), the slot's two_phase setting starts as false and shifts to
true after table sync (refer to [2] for related comments). During this period,
if a user prepares a transaction where the prepare LSN is less than the
two_phase_at, the same problem could happen.

Similarly, when setting up a subscription with (copy_data=false,

failover=true,

two_phase=true), although two_phase is initially set to true and we wait for
running transactions to finish when building consistent snapshot, a race
condition may still exist: If the snapshot state reaches FULL_SNAPSHOT, it
won't check running transactions further (see SnapBuildFindSnapshot() for
specifics), if a user prepares a transaction at this point, it's possible for
the prepare LSN to be less than the LSN marking the snapshot's consistent
state, causing the same issue.

Thank you for the explanation! I agree that the issue happens in these cases.

As another idea, I wonder if we could somehow defer to make the synced
slot as 'sync-ready' until we can ensure that the slot doesn't have
any transactions that are prepared before the point of enabling
two_phase. For example, when the slotsync worker fetches the remote
slot, it remembers the confirmed_flush_lsn (say LSN-1) if the local
slot's two_phase becomes true or the local slot is newly created with
enabling two_phase, and then it makes the slot 'sync-ready' once it
confirmed that the slot's restart_lsn passed LSN-1. Does it work?

Thanks for the idea!

We considered a similar approach in [1]/messages/by-id/OS0PR01MB57161D9BB5409F229564957994AD2@OS0PR01MB5716.jpnprd01.prod.outlook.com to confirm there is no prepared
transactions before two_phase_at, but the issue is that when the two_phase flag
is switched from 'false' to 'true' (as in the case with (copy_data=true,
failover=true, two_phase=true)). In this case, the slot may have already been
marked as sync-ready before the two_phase flag is enabled, as slotsync is
unaware of potential future changes to the two_phase flag. Then the slot have
to revert to sync-not-ready after the two_phase flag change, which could be
difficult for users to understand.

What do you think ?

[1]: /messages/by-id/OS0PR01MB57161D9BB5409F229564957994AD2@OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#13)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Thank you for the explanation! I agree that the issue happens in these cases.

As another idea, I wonder if we could somehow defer to make the synced
slot as 'sync-ready' until we can ensure that the slot doesn't have
any transactions that are prepared before the point of enabling
two_phase. For example, when the slotsync worker fetches the remote
slot, it remembers the confirmed_flush_lsn (say LSN-1) if the local
slot's two_phase becomes true or the local slot is newly created with
enabling two_phase, and then it makes the slot 'sync-ready' once it
confirmed that the slot's restart_lsn passed LSN-1. Does it work?

Thanks for the idea!

We considered a similar approach in [1] to confirm there is no prepared
transactions before two_phase_at, but the issue is that when the two_phase flag
is switched from 'false' to 'true' (as in the case with (copy_data=true,
failover=true, two_phase=true)). In this case, the slot may have already been
marked as sync-ready before the two_phase flag is enabled, as slotsync is
unaware of potential future changes to the two_phase flag.

This can happen because when copy_data is true, tablesync can take a
long time to complete the sync and in the meantime, slot without a
two_phase flag would have been synced to standby. Such a slot would be
marked as sync-ready even if we follow the calculation proposed by
Sawada-san. Note that we enable two_phase once all the tables are in
ready state (See run_apply_worker() and comments atop worker.c
(TWO_PHASE TRANSACTIONS)).

--
With Regards,
Amit Kapila.

#15Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#14)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Thank you for the explanation! I agree that the issue happens in these cases.

As another idea, I wonder if we could somehow defer to make the synced
slot as 'sync-ready' until we can ensure that the slot doesn't have
any transactions that are prepared before the point of enabling
two_phase. For example, when the slotsync worker fetches the remote
slot, it remembers the confirmed_flush_lsn (say LSN-1) if the local
slot's two_phase becomes true or the local slot is newly created with
enabling two_phase, and then it makes the slot 'sync-ready' once it
confirmed that the slot's restart_lsn passed LSN-1. Does it work?

Thanks for the idea!

We considered a similar approach in [1] to confirm there is no prepared
transactions before two_phase_at, but the issue is that when the two_phase flag
is switched from 'false' to 'true' (as in the case with (copy_data=true,
failover=true, two_phase=true)). In this case, the slot may have already been
marked as sync-ready before the two_phase flag is enabled, as slotsync is
unaware of potential future changes to the two_phase flag.

This can happen because when copy_data is true, tablesync can take a
long time to complete the sync and in the meantime, slot without a
two_phase flag would have been synced to standby. Such a slot would be
marked as sync-ready even if we follow the calculation proposed by
Sawada-san. Note that we enable two_phase once all the tables are in
ready state (See run_apply_worker() and comments atop worker.c
(TWO_PHASE TRANSACTIONS)).

Right. It doesn't make sense to make the slot not-sync-ready and then
back to sync-ready.

While I agree with the approach for HEAD and it seems difficult to
find a solution, I'm concerned that disallowing to use both failover
and two_phase in a minor release would affect users much. Users who
are already using that combination might end up needing to re-think
their system architecture. So I'm trying to narrow down use cases
where we're going to prohibit or to find workarounds.

If we agree with the fix for HEAD, we can push the fix for HEAD first,
which would be better to be done sooner as it needs to bump the
catversion. We can discuss the ideas and workarounds for v17 later.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#15)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 3, 2025 at 11:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Thank you for the explanation! I agree that the issue happens in these cases.

As another idea, I wonder if we could somehow defer to make the synced
slot as 'sync-ready' until we can ensure that the slot doesn't have
any transactions that are prepared before the point of enabling
two_phase. For example, when the slotsync worker fetches the remote
slot, it remembers the confirmed_flush_lsn (say LSN-1) if the local
slot's two_phase becomes true or the local slot is newly created with
enabling two_phase, and then it makes the slot 'sync-ready' once it
confirmed that the slot's restart_lsn passed LSN-1. Does it work?

Thanks for the idea!

We considered a similar approach in [1] to confirm there is no prepared
transactions before two_phase_at, but the issue is that when the two_phase flag
is switched from 'false' to 'true' (as in the case with (copy_data=true,
failover=true, two_phase=true)). In this case, the slot may have already been
marked as sync-ready before the two_phase flag is enabled, as slotsync is
unaware of potential future changes to the two_phase flag.

This can happen because when copy_data is true, tablesync can take a
long time to complete the sync and in the meantime, slot without a
two_phase flag would have been synced to standby. Such a slot would be
marked as sync-ready even if we follow the calculation proposed by
Sawada-san. Note that we enable two_phase once all the tables are in
ready state (See run_apply_worker() and comments atop worker.c
(TWO_PHASE TRANSACTIONS)).

Right. It doesn't make sense to make the slot not-sync-ready and then
back to sync-ready.

While I agree with the approach for HEAD and it seems difficult to
find a solution, I'm concerned that disallowing to use both failover
and two_phase in a minor release would affect users much. Users who
are already using that combination might end up needing to re-think
their system architecture. So I'm trying to narrow down use cases
where we're going to prohibit or to find workarounds.

If we agree with the fix for HEAD, we can push the fix for HEAD first,
which would be better to be done sooner as it needs to bump the
catversion. We can discuss the ideas and workarounds for v17 later.

Thanks, I'll push the patch for HEAD and then keep thinking if we have
a better way to deal with the problem in 17. BTW, the problem for 17
can happen in a much narrower set of cases as explained in the emails
above.

--
With Regards,
Amit Kapila.

#17Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#15)
RE: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 3, 2025 at 1:38 PM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Thank you for the explanation! I agree that the issue happens in these

cases.

Just to share the steps to reproduce the issue when creating the subscription
With (create_slot=true, failover = true, two_phase=true, copy_data=false) for reference.

- pub: begin a txn A. This is to stop the slot from building a consistent
snapshot immediately.
- sub: create subscription with (slot_name='sub', create_slot=true, failover =
true, two_phase=true, copy_data=false);
- pub: Attach to the walsender that will create the slot and add a checkpoint
in SnapBuildFindSnapshot() -> (builder->state = SNAPBUILD_FULL_SNAPSHOT;).
For now, the state should be BUILDING_SNAPSHOT.
- pub: begin another txn B and commit txn A. The snapshot should reach
FULL_SNAPSHOT now.
- pub: prepared a txn C and then commit txn B.
- pub: release the checkpoint in SnapBuildFindSnapshot(), then it should reach
the SNAPBUILD_CONSISTENT. Now we have a prepared txn whose prepare_lsn is
less than the two_phase_at.
- stop the primary and promote the standby.
- sub: alter the subscription to use the new primary as the publisher.
- commit the prepared transaction on new primary, the following error will be
reported on subscriber:

LOG: logical replication apply worker for subscription "sub" has started
ERROR: prepared transaction with identifier "pg_gid_16398_764" does not exist.

As another idea, I wonder if we could somehow defer to make the
synced slot as 'sync-ready' until we can ensure that the slot
doesn't have any transactions that are prepared before the point
of enabling two_phase. For example, when the slotsync worker
fetches the remote slot, it remembers the confirmed_flush_lsn (say
LSN-1) if the local slot's two_phase becomes true or the local
slot is newly created with enabling two_phase, and then it makes
the slot 'sync-ready' once it confirmed that the slot's restart_lsn passed

LSN-1. Does it work?

Thanks for the idea!

We considered a similar approach in [1] to confirm there is no
prepared transactions before two_phase_at, but the issue is that
when the two_phase flag is switched from 'false' to 'true' (as in
the case with (copy_data=true, failover=true, two_phase=true)). In
this case, the slot may have already been marked as sync-ready
before the two_phase flag is enabled, as slotsync is unaware of potential

future changes to the two_phase flag.

This can happen because when copy_data is true, tablesync can take a
long time to complete the sync and in the meantime, slot without a
two_phase flag would have been synced to standby. Such a slot would be
marked as sync-ready even if we follow the calculation proposed by
Sawada-san. Note that we enable two_phase once all the tables are in
ready state (See run_apply_worker() and comments atop worker.c
(TWO_PHASE TRANSACTIONS)).

Right. It doesn't make sense to make the slot not-sync-ready and then back to
sync-ready.

While I agree with the approach for HEAD and it seems difficult to find a
solution, I'm concerned that disallowing to use both failover and two_phase in
a minor release would affect users much. Users who are already using that
combination might end up needing to re-think their system architecture. So I'm
trying to narrow down use cases where we're going to prohibit or to find
workarounds.

If we agree with the fix for HEAD, we can push the fix for HEAD first, which
would be better to be done sooner as it needs to bump the catversion. We can
discuss the ideas and workarounds for v17 later.

Agreed. I will think more on it. One workaround could be skipping the prepared
transaction when such an issue arises, followed by manually replicating the
skipped changes.

Best Regards,
Hou zj

#18Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#17)
RE: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 3, 2025 at 3:16 PM Zhijie Hou (Fujitsu) wrote:

On Thu, Apr 3, 2025 at 1:38 PM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Thank you for the explanation! I agree that the issue happens in
these

cases.

As another idea, I wonder if we could somehow defer to make the
synced slot as 'sync-ready' until we can ensure that the slot
doesn't have any transactions that are prepared before the point
of enabling two_phase. For example, when the slotsync worker
fetches the remote slot, it remembers the confirmed_flush_lsn
(say
LSN-1) if the local slot's two_phase becomes true or the local
slot is newly created with enabling two_phase, and then it makes
the slot 'sync-ready' once it confirmed that the slot's
restart_lsn passed

LSN-1. Does it work?

Thanks for the idea!

We considered a similar approach in [1] to confirm there is no
prepared transactions before two_phase_at, but the issue is that
when the two_phase flag is switched from 'false' to 'true' (as in
the case with (copy_data=true, failover=true, two_phase=true)). In
this case, the slot may have already been marked as sync-ready
before the two_phase flag is enabled, as slotsync is unaware of
potential

future changes to the two_phase flag.

This can happen because when copy_data is true, tablesync can take a
long time to complete the sync and in the meantime, slot without a
two_phase flag would have been synced to standby. Such a slot would
be marked as sync-ready even if we follow the calculation proposed
by Sawada-san. Note that we enable two_phase once all the tables are
in ready state (See run_apply_worker() and comments atop worker.c
(TWO_PHASE TRANSACTIONS)).

Right. It doesn't make sense to make the slot not-sync-ready and then
back to sync-ready.

While I agree with the approach for HEAD and it seems difficult to
find a solution, I'm concerned that disallowing to use both failover
and two_phase in a minor release would affect users much. Users who
are already using that combination might end up needing to re-think
their system architecture. So I'm trying to narrow down use cases
where we're going to prohibit or to find workarounds.

We analyzed more for the backbranch fix, and here is a summary of different
approaches that could be used for PG17.

----------
Approach 1
----------

This is the original approach implemented in V4 patch.

In this approach, we entirely disallow enabling both failover and the two-phase
feature together for a replication slot or subscription.

pros:

This restriction is simple to implement and easy for users to comprehend.

cons:

Users would be unable to use the two-phase feature in conjunction with
failover.

Following the upgrade to a new release with this fix, existing subscriptions
that have both failover and two-phase enabled would require manual re-creation,
which is time-consuming.

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before the
two_phase_at that are not yet replicated. During slot creation with two-phase
and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to prepare
during consistent snapshot creation, the restriction becomes almost
unnoticeable.

Users are not always forced to re-create subscriptions post-upgrade.

cons:

The logic involved for (restart_lsn > two_phase_at) might be less intuitive for
users.

After upgrading, it's recommended that users verify whether two_phase_at for a
source slot is less than restart_lsn of the synced slot. If it isn't, users
should be careful when using the synced slot on a standby. This might be
complex to understand.

----------
Approach 3
----------

This approach is similar to Approach 2 but simplifies some aspects by avoiding
checks for prepared transactions during slot creation. It disallows enabling
both failover and twophase during slot creation, but permits altering failover
to true once ensured that no prepared transaction exists (restart_lsn >
two_phase_at).

pros:

Like Approach 2, this reduces user restrictions compared to completely
disallowing failover and two-phase usage together.

Users are not always forced to re-create subscriptions post-upgrade.

cons:

User still could not enable failover and twophase when creating a slot.

Similar to approach 1, the check (restart_lsn > two_phase_at) might be less
intuitive for users.

After upgrading, it's recommended that users verify whether two_phase_at for a
source slot is less than restart_lsn of the synced slot. If it isn't, users
should be careful when using the synced slot on a standby. This might be
complex to understand.

Best Regards,
Hou zj

#19Nisha Moond
nisha.moond412@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#18)
3 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

HI,

Please find the patches attached for all three approaches.

On Wed, Apr 9, 2025 at 10:45 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Apr 3, 2025 at 3:16 PM Zhijie Hou (Fujitsu) wrote:

On Thu, Apr 3, 2025 at 1:38 PM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:

On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Thank you for the explanation! I agree that the issue happens in
these

cases.

As another idea, I wonder if we could somehow defer to make the
synced slot as 'sync-ready' until we can ensure that the slot
doesn't have any transactions that are prepared before the point
of enabling two_phase. For example, when the slotsync worker
fetches the remote slot, it remembers the confirmed_flush_lsn
(say
LSN-1) if the local slot's two_phase becomes true or the local
slot is newly created with enabling two_phase, and then it makes
the slot 'sync-ready' once it confirmed that the slot's
restart_lsn passed

LSN-1. Does it work?

Thanks for the idea!

We considered a similar approach in [1] to confirm there is no
prepared transactions before two_phase_at, but the issue is that
when the two_phase flag is switched from 'false' to 'true' (as in
the case with (copy_data=true, failover=true, two_phase=true)). In
this case, the slot may have already been marked as sync-ready
before the two_phase flag is enabled, as slotsync is unaware of
potential

future changes to the two_phase flag.

This can happen because when copy_data is true, tablesync can take a
long time to complete the sync and in the meantime, slot without a
two_phase flag would have been synced to standby. Such a slot would
be marked as sync-ready even if we follow the calculation proposed
by Sawada-san. Note that we enable two_phase once all the tables are
in ready state (See run_apply_worker() and comments atop worker.c
(TWO_PHASE TRANSACTIONS)).

Right. It doesn't make sense to make the slot not-sync-ready and then
back to sync-ready.

While I agree with the approach for HEAD and it seems difficult to
find a solution, I'm concerned that disallowing to use both failover
and two_phase in a minor release would affect users much. Users who
are already using that combination might end up needing to re-think
their system architecture. So I'm trying to narrow down use cases
where we're going to prohibit or to find workarounds.

We analyzed more for the backbranch fix, and here is a summary of different
approaches that could be used for PG17.

----------
Approach 1
----------

This is the original approach implemented in V4 patch.

In this approach, we entirely disallow enabling both failover and the two-phase
feature together for a replication slot or subscription.

pros:

This restriction is simple to implement and easy for users to comprehend.

cons:

Users would be unable to use the two-phase feature in conjunction with
failover.

Following the upgrade to a new release with this fix, existing subscriptions
that have both failover and two-phase enabled would require manual re-creation,
which is time-consuming.

Patch "v5_aprch_1-0001" implements the above Approach 1.

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before the
two_phase_at that are not yet replicated. During slot creation with two-phase
and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to prepare
during consistent snapshot creation, the restriction becomes almost
unnoticeable.

Users are not always forced to re-create subscriptions post-upgrade.

cons:

The logic involved for (restart_lsn > two_phase_at) might be less intuitive for
users.

After upgrading, it's recommended that users verify whether two_phase_at for a
source slot is less than restart_lsn of the synced slot. If it isn't, users
should be careful when using the synced slot on a standby. This might be
complex to understand.

Patch "v5_aprch_2-0001" implements the above Approach 2.

----------
Approach 3
----------

This approach is similar to Approach 2 but simplifies some aspects by avoiding
checks for prepared transactions during slot creation. It disallows enabling
both failover and twophase during slot creation, but permits altering failover
to true once ensured that no prepared transaction exists (restart_lsn >
two_phase_at).

pros:

Like Approach 2, this reduces user restrictions compared to completely
disallowing failover and two-phase usage together.

Users are not always forced to re-create subscriptions post-upgrade.

cons:

User still could not enable failover and twophase when creating a slot.

Similar to approach 1, the check (restart_lsn > two_phase_at) might be less
intuitive for users.

After upgrading, it's recommended that users verify whether two_phase_at for a
source slot is less than restart_lsn of the synced slot. If it isn't, users
should be careful when using the synced slot on a standby. This might be
complex to understand.

Patch "v5_aprch_3-0001" implements the above Approach 3.

Thanks Hou-san for implementing approach-2 and providing the patch.

--
Thanks,
Nisha

Attachments:

v5_aprch_3-0001-PG17-Approach-3-Fix-slot-synchronization-.patchapplication/octet-stream; name=v5_aprch_3-0001-PG17-Approach-3-Fix-slot-synchronization-.patchDownload
From 78fb83e3b79dd78920454e90444a3ffef0ca3083 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Mon, 7 Apr 2025 09:36:29 +0530
Subject: [PATCH v5_aprch_3] PG17 Approach 3 Fix slot synchronization for
 two_phase enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that no prepared transaction exists
(restart_lsn > two_phase_at).
---
 contrib/test_decoding/expected/slot.out       |  2 +
 contrib/test_decoding/sql/slot.sql            |  1 +
 src/backend/commands/subscriptioncmds.c       | 28 +++++++++++
 src/backend/replication/logical/logical.c     | 12 +++++
 src/backend/replication/slot.c                | 35 ++++++++++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  6 +--
 .../t/040_standby_failover_slots_sync.pl      | 47 +++++++++++++++++++
 src/test/regress/expected/subscription.out    |  3 ++
 src/test/regress/sql/subscription.sql         |  4 ++
 9 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..8fd762dea85 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..122f999229e 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,19 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+					   "failover", "two_phase"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1258,21 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable failover if the two_phase
+					 * state is still pending, indicating that the replication
+					 * slot's two_phase option has yet to be finalized. This
+					 * restriction is necessary because the check in
+					 * ReplicationSlotAlter() forbids enabling two_phase and
+					 * failover together and requires access to the actual
+					 * two_phase value.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..2421e03d088 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,18 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop the similar check in ReplicationSlotCreate() for
+		 * a detailed reason.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..1d952000bdb 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,26 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 *
+		 * This is because the two_phase_at field of a slot, which tracks the
+		 * LSN, from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without two_phase_at, the logical decoding might
+		 * incorrectly identify prepared transaction as already replicated to
+		 * the subscriber after promotion of standby server, causing them to
+		 * be skipped.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +868,21 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot
+	 * where slot's restart_lsn is less than two_phase_at. In such cases,
+	 * there is a risk that transactions prepared before two_phase_at exist
+	 * and would be skipped during decoding.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..dc122348f84 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,53 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription.
+##################################################
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub2_slot, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'lsub1_slot';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql(
+	'postgres', q{
+DROP SUBSCRIPTION regress_mysub2;
+});
+
+# Create a slot with two_phase enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub2', 'pgoutput', false, true, false);");
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, copy_data = false, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~ /ERROR:  cannot enable failover for a subscription with a pending two_phase state/,
+	"Enabling failover is not allowed for a two_phase pending subscription");
+
+# Drop the subscription
+$subscriber1->safe_psql(
+	'postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..bbd83244ba9 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,9 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

v5_aprch_2-0001-PG17-Approach-2-Fix-slot-synchronization-.patchapplication/octet-stream; name=v5_aprch_2-0001-PG17-Approach-2-Fix-slot-synchronization-.patchDownload
From 7a87391f4a0e62525a92ca6a8f3aa0de62f169e6 Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Mon, 7 Apr 2025 15:53:37 +0800
Subject: [PATCH v5_aprch_2] PG17 Approach 2 Fix slot synchronization for
 two_phase enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before the
two_phase_at that are not yet replicated. During slot creation with two-phase
and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.
---
 src/backend/commands/subscriptioncmds.c       | 28 ++++++++++++++++
 src/backend/replication/logical/decode.c      |  1 +
 src/backend/replication/logical/logical.c     | 21 ++++++++++++
 .../replication/logical/reorderbuffer.c       | 21 ++++++++++++
 src/backend/replication/slot.c                | 17 ++++++++++
 src/include/replication/reorderbuffer.h       |  1 +
 .../t/040_standby_failover_slots_sync.pl      | 32 +++++++++++++++++++
 src/test/regress/expected/subscription.out    |  3 ++
 src/test/regress/sql/subscription.sql         |  4 +++
 9 files changed, 128 insertions(+)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..59e5827b06a 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,19 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options together
+	 * when create_slot is specified.
+	 *
+	 * See comments atop the similar check in DecodingContextFindStartpoint()
+	 * for a detailed reason.
+	 */
+	if (!opts.create_slot && opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\" is specified",
+					   "failover", "two_phase", "create_slot"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1258,21 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable failover if the two_phase
+					 * state is still pending, indicating that the replication
+					 * slot's two_phase option has yet to be finalized. This
+					 * restriction is necessary because the check in
+					 * ReplicationSlotAlter() forbids enabling two_phase and
+					 * failover together and requires access to the actual
+					 * two_phase value.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 8ec5adfd909..60dbec37309 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -327,6 +327,7 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				{
 					ReorderBufferProcessXid(reorder, parsed.twophase_xid,
 											buf->origptr);
+					ReorderBufferSkipPrepare(reorder, parsed.twophase_xid);
 					break;
 				}
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..5a71d441d52 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -679,6 +679,27 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 		CHECK_FOR_INTERRUPTS();
 	}
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are transactions prepared before reaching the consistent point
+	 * that were skipped during decoding. These prepared transactions can fail
+	 * to replicate to the subscriber after being committed on a promoted
+	 * standby following a failover.
+	 *
+	 * This is because the two_phase_at field of a slot, which tracks the LSN,
+	 * from which two-phase decoding starts, is not synchronized to standby
+	 * servers. Without two_phase_at, the logical decoding might incorrectly
+	 * identify prepared transaction as already replicated to the subscriber
+	 * after promotion of standby server, causing them to be skipped.
+	 */
+	if (slot->data.two_phase && slot->data.failover &&
+		ReorderBufferSkippedPrepare(ctx->reorder))
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"),
+				errdetail("Prepared transactions exist before the logical decoding starting point %X/%X.",
+						  LSN_FORMAT_ARGS(ctx->reader->EndRecPtr)));
+
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
 	if (slot->data.two_phase)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 03eb005c39d..80c2b99a71b 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2800,6 +2800,27 @@ ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid)
 	txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
 }
 
+/*
+ * Check whether we have skipped processing prepared transactions.
+ */
+bool
+ReorderBufferSkippedPrepare(ReorderBuffer *rb)
+{
+	dlist_mutable_iter it;
+
+	dlist_foreach_modify(it, &rb->toplevel_by_lsn)
+	{
+		ReorderBufferTXN *txn;
+
+		txn = dlist_container(ReorderBufferTXN, node, it.cur);
+
+		if (rbtxn_skip_prepared(txn))
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Prepare a two-phase transaction.
  *
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..df0602cf640 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -848,6 +848,23 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot
+	 * where slot's restart_lsn is less than two_phase_at. In such cases,
+	 * there is a risk that transactions prepared before two_phase_at exist
+	 * and would be skipped during decoding.
+	 *
+	 * See comments atop the similar check in DecodingContextFindStartpoint()
+	 * for a detailed reason.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"),
+				errhint("WALs prior to %X/%X, where two-phase decoding begins, are still reserved by this slot.",
+						LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 4c56f219fd8..43d6e8608ab 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -722,6 +722,7 @@ extern bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xi
 											 TimestampTz prepare_time,
 											 RepOriginId origin_id, XLogRecPtr origin_lsn);
 extern void ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid);
+extern bool ReorderBufferSkippedPrepare(ReorderBuffer *rb);
 extern void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid, char *gid);
 extern ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *rb);
 extern TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..9f0ce33af6a 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,38 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option cannot be enabled for a two_phase replication
+# slot.
+##################################################
+
+# # Create a slot with two_phase enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub2', 'pgoutput', false, true, false);");
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, copy_data = false, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~ /ERROR:  cannot enable failover for a subscription with a pending two_phase state/,
+	"Enabling failover is not allowed for a two_phase pending subscription");
+
+# Attempt to enable failover for a two-phase enabled slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT regress_mysub2 (failover);],
+	replication => 'database');
+ok($stderr =~ /ERROR:  cannot enable failover for a two-phase enabled replication slot/,
+	"Enabling failover is not allowed for a two_phase enabled slot");
+
+# Drop the subscription
+$subscriber1->safe_psql(
+	'postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..0815094967f 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,9 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options when "create_slot" is specified
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

v5_aprch_1-0001-PG-17-Approach-1-Fix-slot-synchronization.patchapplication/octet-stream; name=v5_aprch_1-0001-PG-17-Approach-1-Fix-slot-synchronization.patchDownload
From 53da792c21c574075f3cab69a4dc173cff1d1715 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Fri, 11 Apr 2025 11:39:19 +0530
Subject: [PATCH v5_aprch_1] PG 17 Approach 1 Fix slot synchronization for
 two_phase enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow
enabling failover and two-phase decoding together for a replication slot.
Users are advised to disable slot synchronization for two_phase-enabled
slots or re-create the subscriptions with option 'two_phase' as false and
'failover' as true.
---
 contrib/test_decoding/expected/slot.out    |  2 ++
 contrib/test_decoding/sql/slot.sql         |  1 +
 src/backend/commands/subscriptioncmds.c    | 26 +++++++++++++++++++++
 src/backend/replication/slot.c             | 27 ++++++++++++++++++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl  |  6 ++---
 src/test/regress/expected/subscription.out |  6 +++++
 src/test/regress/sql/subscription.sql      |  7 ++++++
 7 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..8fd762dea85 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..09c5f0ef685 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,19 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+					   "failover", "two_phase"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1258,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover and two_phase
+					 * options together.
+					 *
+					 * See comments atop the similar check in
+					 * ReplicationSlotCreate() for a detailed reason.
+					 */
+					if (sub->twophasestate != LOGICALREP_TWOPHASE_STATE_DISABLED &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+								errmsg("cannot enable failover for a two_phase enabled subscription"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..c7ef7f95011 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,22 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 *
+		 * This is because the two_phase_at field of a slot, which tracks the
+		 * LSN, from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without two_phase_at, the logical decoding might
+		 * incorrectly identify prepared transaction as already replicated to
+		 * the subscriber after promotion of standby server, causing them to
+		 * be skipped.
+		 */
+		if (two_phase)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +864,17 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..fd94389d916 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -389,6 +389,9 @@ ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
  regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
 (1 row)
 
+--fail - cannot enable failover for a two_phase enabled subscription
+ALTER SUBSCRIPTION regress_testsub SET (failover = true);
+ERROR:  cannot enable failover for a two_phase enabled subscription
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 -- two_phase and streaming are compatible.
@@ -479,6 +482,9 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..bc6e0f04e64 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -264,6 +264,9 @@ ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
 
 \dRs+
 
+--fail - cannot enable failover for a two_phase enabled subscription
+ALTER SUBSCRIPTION regress_testsub SET (failover = true);
+
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
@@ -342,6 +345,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#20Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#16)
2 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

Hi,

The recently added tests for slotsync on two-phase enabled slots failed[1] due
to a broken assertion triggered while decoding the COMMIT PREPARED record on
the promoted standby.

-----
Analysis
-----

if ((txn->final_lsn < two_phase_at) && is_commit)
{
/*
* txn must have been marked as a prepared transaction and skipped but
* not sent a prepare. Also, the prepare info must have been updated
* in txn even if we skip prepare.
*/
Assert((txn->txn_flags & RBTXN_PREPARE_STATUS_MASK) ==
(RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE));

I think the issue stem from the PREPARED transaction, which was skipped on the
primary, not being skipped on the promoted standby. The root cause appears to
be the latest 'confirmed_flush' (0/6005118) of the source slot not being synced
to the standby. This results in the confirmed_flush (0/6004F90) of the synced
slot being less than the synced two_phase_at field (0/6005118). Consequently,
the PREPARE record's LSN exceeds the synced confirmed_flush during
decoding, causing it to be incorrectly decoded and sent to the subscriber. Thus, when
decoding COMMIT PREPARED, the PREPARE record is found before two_phase_at but
wasn't skipped.

-- The LOGs that proves the confirmed_flush is not synced:

decoding on original slot (primary):
LOG: 0/6004F90 has been already streamed, forwarding to 0/6005118
...
DETAIL: Streaming transactions committing after 0/6005118, reading WAL from 0/60049C8.

decoding on synced slot (promted standby) - failed run:
DETAIL: Streaming transactions committing after 0/6004F90, reading WAL from 0/60049C8.

decoding on synced slot (promted standby) - success run:
LOG: 0/6004F90 has been already streamed, forwarding to 0/6005118
...
DETAIL: Streaming transactions committing after 0/6005118, reading WAL from 0/60049C8.
--

The logs above clearly show that during the test failure, the starting position
(6004F90) was less than that of a successful run. Additionally, it was lower
than the starting position of the remote slot on the primary, indicating a
failure to synchronize confirmed_lsn.

The reason confirmed_flush isn't synced should stem from the following check,
which skip the syncing of the confirmed_flush value. I also reproduced the
assertion failure by creating a scenario that leads to sync omission due to
this check:

/*
* Don't overwrite if we already have a newer catalog_xmin and
* restart_lsn.
*/
if (remote_slot->restart_lsn < slot->data.restart_lsn ||
TransactionIdPrecedes(remote_slot->catalog_xmin,
slot->data.catalog_xmin))
{

This check triggered because the synced catalog_xmin surpassed that of the
source slot (It can be seen from the log that the restart_lsn was synced to the
same value) ). This situation arises due to several factors:

On the publisher(primary), the slot may retain an older catalog_xmin and
restart_lsn because it is being consumed by a walsender. In this scenario, if
the apply worker does not immediately confirm that changes have been flushed,
the walsender advances the catalog_xmin/restart_lsn slowly. It sets an old
value for candidate_catalog_xmin/candidate_restart_lsn and would not update
them until the apply worker confirms via LogicalConfirmReceivedLocation.

However, the slotsync worker has the opportunity to begin WAL reading from a
more recent point, potentially advancing catalog_xmin/restart_lsn to newer
values.

And it's also possible to advance only catalog_xmin while keep restart_lsn
unchanged, by starting a transaction before the running_xact record. In this
case, it would pass builder->last_serialized_snapshot as restart_lsn to
LogicalIncreaseRestartDecodingForSlot(), but last_serialized_snapshot would
point to the position of the last running_xact record instead of the current
one, or the value would be NULL if no snapshot has been serialized before, so
it would not advance restart_lsn. The advancement of catalog_xmin is not
blocked by running transaction, as it would either use the XID of the running
transaction or the oldestRunningXid as candidate.

-----
Reproduction
-----

Accompanying this analysis is a script to reproduce the issue. In these
scripts, two running_xacts were logged post-slot creation on the publisher.
Below is a detailed progression of restart_lsn/catalog_xmin advancement
observed on both the publisher and standby:

The process is : slot creation -> log running_xact -> prepare a txn -> log
running_xact

The process of advancing the catalog_xmin/restart_lsn on publisher:

slot creation
...
1. 0/301B880 - running_xacts (no running txn, oldestrunningxid = 755)

got new catalog xmin 755 at 0/301B880
LOG: got new restart lsn 0/301B880 at 0/301B880
2. 0/301B8B8 - PREPAPRE a TRANSACTION.
3. 0/301BA20 - running_xacts (no running txn, oldestrunningxid = 756)

failed to increase restart lsn: proposed 0/301B880, after 0/301BA20,
current candidate 0/301B880, current after 0/301B880, flushed up to
0/301B810

final slot data:

catalog_xmin | 755
restart_lsn | 0/301B880
confirmed_flush_lsn | 0/301BAC8

The process of advancing the catalog_xmin/restart_lsn on standby (during slot
sync):

slot creation
...
1. 0/301B880 - running_xacts (no running txn, oldestrunningxid = 755)

building consistent snapshot, so will skip advancing the catalog_xmin/restart_lsn
...
2. 0/301BA20 - running_xacts (no running txn, oldestrunningxid = 756)

got new catalog xmin 756 at 0/301BA20

cannot get new restart_lsn as running txn exists and didn't serialize
snapshot yet.

first synced slot data:

catalog_xmin | 756
restart_lsn | 0/301B880
confirmed_flush_lsn | 0/301BAC8

The second slot sync cycle would skip updating confirmed_lsn because
catalog_xmin is newer.

-----
Fix
-----

I think we should keep the confirmed_flush even if the previous synced
restart_lsn/catalog_xmin is newer. Attachments include a patch for the same.

Best Regards,
Hou zj

Attachments:

test_two_phase.sh.txttext/plain; name=test_two_phase.sh.txtDownload
0001-Fix-assertion-failure-when-decoding-synced-two-phase.patchapplication/octet-stream; name=0001-Fix-assertion-failure-when-decoding-synced-two-phase.patchDownload
From ea5df7ebf90900e31e002317a3320cb66a8ef31a Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Thu, 17 Apr 2025 17:14:24 +0800
Subject: [PATCH] Fix assertion failure when decoding synced two-phase enabled
 slots.

Current, during slot synchronization, it skips updating the confirmed_lsn if it
detects that the catalog_xmin or restart_lsn of the synced slot has already
surpassed those of the remote slot on the primary. This behavior poses a
problem when two-phase commit is enabled on the remote slot. The lack of
synchronization between the latest confirmed_lsn and two_phase_at may result in
transactions prepared between the old confirmed_lsn and two_phase_at being
unexpectedly decoded and sent to subscribers following a promotion.

To fix this, we keep confirmed_flush updated even if the local catalog_xmin or
restart_lsn is more recent.
---
 src/backend/replication/logical/slotsync.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index e22d41891e6..2a12167cdaf 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -219,6 +219,21 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn),
 						  slot->data.catalog_xmin));
 
+		/*
+		 * Keep confirmed_flush updated even if catalog_xmin or restart_lsn
+		 * advances beyond the values of the remote slot. This is necessary
+		 * when twophase is enabled on the remote slot, as failing to sync the
+		 * latest confirmed_lsn along with two_phase_at can lead to the
+		 * transactions between the old confirmed_lsn and two_phase_at being
+		 * unexpectedly decoded and sent to the subscriber.
+		 */
+		if (remote_slot->confirmed_lsn > remote_slot->restart_lsn)
+		{
+			SpinLockAcquire(&slot->mutex);
+			slot->data.confirmed_flush = remote_slot->confirmed_lsn;
+			SpinLockRelease(&slot->mutex);
+		}
+
 		if (remote_slot_precedes)
 			*remote_slot_precedes = true;
 	}
-- 
2.30.0.windows.2

#21Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#20)
RE: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 17, 2025 at 8:44 PM Zhijie Hou (Fujitsu) wrote:

Hi,

The recently added tests for slotsync on two-phase enabled slots
failed[1] due to a broken assertion triggered while decoding the
COMMIT PREPARED record on the promoted standby.

Here is the missing link:

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&amp;dt=2025-04-09%2010%3A36%3A46

Best Regards,
Hou zj

#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#20)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

The recently added tests for slotsync on two-phase enabled slots failed[1] due
to a broken assertion triggered while decoding the COMMIT PREPARED record on
the promoted standby.

-----
Analysis
-----

if ((txn->final_lsn < two_phase_at) && is_commit)
{
/*
* txn must have been marked as a prepared transaction and skipped but
* not sent a prepare. Also, the prepare info must have been updated
* in txn even if we skip prepare.
*/
Assert((txn->txn_flags & RBTXN_PREPARE_STATUS_MASK) ==
(RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE));

I think the issue stem from the PREPARED transaction, which was skipped on the
primary, not being skipped on the promoted standby. The root cause appears to
be the latest 'confirmed_flush' (0/6005118) of the source slot not being synced
to the standby. This results in the confirmed_flush (0/6004F90) of the synced
slot being less than the synced two_phase_at field (0/6005118). Consequently,
the PREPARE record's LSN exceeds the synced confirmed_flush during
decoding, causing it to be incorrectly decoded and sent to the subscriber. Thus, when
decoding COMMIT PREPARED, the PREPARE record is found before two_phase_at but
wasn't skipped.

-- The LOGs that proves the confirmed_flush is not synced:

decoding on original slot (primary):
LOG: 0/6004F90 has been already streamed, forwarding to 0/6005118
...
DETAIL: Streaming transactions committing after 0/6005118, reading WAL from 0/60049C8.

decoding on synced slot (promted standby) - failed run:
DETAIL: Streaming transactions committing after 0/6004F90, reading WAL from 0/60049C8.

decoding on synced slot (promted standby) - success run:
LOG: 0/6004F90 has been already streamed, forwarding to 0/6005118
...
DETAIL: Streaming transactions committing after 0/6005118, reading WAL from 0/60049C8.
--

The logs above clearly show that during the test failure, the starting position
(6004F90) was less than that of a successful run. Additionally, it was lower
than the starting position of the remote slot on the primary, indicating a
failure to synchronize confirmed_lsn.

The reason confirmed_flush isn't synced should stem from the following check,
which skip the syncing of the confirmed_flush value. I also reproduced the
assertion failure by creating a scenario that leads to sync omission due to
this check:

/*
* Don't overwrite if we already have a newer catalog_xmin and
* restart_lsn.
*/
if (remote_slot->restart_lsn < slot->data.restart_lsn ||
TransactionIdPrecedes(remote_slot->catalog_xmin,
slot->data.catalog_xmin))
{

This check triggered because the synced catalog_xmin surpassed that of the
source slot (It can be seen from the log that the restart_lsn was synced to the
same value) ). This situation arises due to several factors:

On the publisher(primary), the slot may retain an older catalog_xmin and
restart_lsn because it is being consumed by a walsender. In this scenario, if
the apply worker does not immediately confirm that changes have been flushed,
the walsender advances the catalog_xmin/restart_lsn slowly. It sets an old
value for candidate_catalog_xmin/candidate_restart_lsn and would not update
them until the apply worker confirms via LogicalConfirmReceivedLocation.

However, the slotsync worker has the opportunity to begin WAL reading from a
more recent point, potentially advancing catalog_xmin/restart_lsn to newer
values.

And it's also possible to advance only catalog_xmin while keep restart_lsn
unchanged, by starting a transaction before the running_xact record. In this
case, it would pass builder->last_serialized_snapshot as restart_lsn to
LogicalIncreaseRestartDecodingForSlot(), but last_serialized_snapshot would
point to the position of the last running_xact record instead of the current
one, or the value would be NULL if no snapshot has been serialized before, so
it would not advance restart_lsn. The advancement of catalog_xmin is not
blocked by running transaction, as it would either use the XID of the running
transaction or the oldestRunningXid as candidate.

Your analysis appears correct to me.

-----
Fix
-----

I think we should keep the confirmed_flush even if the previous synced
restart_lsn/catalog_xmin is newer. Attachments include a patch for the same.

This will fix the case we are facing but adds a new rule for slot
synchronization. Can we think of a simpler way to fix this by avoiding
updating other slot fields (like two_phase, two_phase_at) if
restart_lsn or catalog_xmin of the local slot is ahead of the remote
slot?

--
With Regards,
Amit Kapila.

#23Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#18)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before the
two_phase_at that are not yet replicated. During slot creation with two-phase
and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to prepare
during consistent snapshot creation, the restriction becomes almost
unnoticeable.

I think this approach can work for the transactions that are prepared
while the slot is created. But if I understand the problem correctly,
while the initial table sync is performing, the slot's two_phase is
still false, so we need to deal with the transactions that are
prepared during the initial table sync too. What do you think?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#24Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#23)
RE: Fix slot synchronization with two_phase decoding enabled

On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:

On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before

the

two_phase_at that are not yet replicated. During slot creation with

two-phase

and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to

prepare

during consistent snapshot creation, the restriction becomes almost
unnoticeable.

I think this approach can work for the transactions that are prepared
while the slot is created. But if I understand the problem correctly,
while the initial table sync is performing, the slot's two_phase is
still false, so we need to deal with the transactions that are
prepared during the initial table sync too. What do you think?

Yes, I agree that we need to restrict this case too. Given that we haven't
started decoding when setting two_phase=true during CreateDecodingContext()
after tablesync, we could check prepared transactions afterwards during
decoding. This could involve reporting an ERROR when skipping a prepared
transaction during decoding if its prepare LSN is less than two_phase_at.
Alternatively, a simpler method would be to prevent this situation entirely
during the CREATE SUBSCRIPTION command. For example, we could restrict slots
created with failover set to true and twophase is later modified to true after
tablesync. Although the simpler check is more user-visible, it may offer less
flexibility.

What do you think ?

Best Regards,
Hou zj

#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#22)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Apr 18, 2025 at 9:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

-----
Fix
-----

I think we should keep the confirmed_flush even if the previous synced
restart_lsn/catalog_xmin is newer. Attachments include a patch for the same.

This will fix the case we are facing but adds a new rule for slot
synchronization. Can we think of a simpler way to fix this by avoiding
updating other slot fields (like two_phase, two_phase_at) if
restart_lsn or catalog_xmin of the local slot is ahead of the remote
slot?

Thinking more about this problem, it seems to me that if the
catalog_xmin of synced slot is allowed to be ahead than the
remote_slot when there is still an open (prepared transaction), it
could cause data loss. I mean that after the promotion, some of the
required catalog rows could be removed, and decoding corresponding
changes (changes from tables affected by DDL) could give unexpected
results. Those would be protected on primary/publisher because the
catalog_xmin on it was still accurate and behind. If this theory turns
out to be true, then this is a drawback/bug of the existing
fast_forward mode code.

--
With Regards,
Amit Kapila.

#26Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#25)
1 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 22, 2025 at 11:23 AM Amit Kapila wrote:

On Fri, Apr 18, 2025 at 9:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

-----
Fix
-----

I think we should keep the confirmed_flush even if the previous
synced restart_lsn/catalog_xmin is newer. Attachments include a patch

for the same.

This will fix the case we are facing but adds a new rule for slot
synchronization. Can we think of a simpler way to fix this by avoiding
updating other slot fields (like two_phase, two_phase_at) if
restart_lsn or catalog_xmin of the local slot is ahead of the remote
slot?

Thinking more about this problem, it seems to me that if the catalog_xmin of
synced slot is allowed to be ahead than the remote_slot when there is still an
open (prepared transaction), it could cause data loss. I mean that after the
promotion, some of the required catalog rows could be removed, and decoding
corresponding changes (changes from tables affected by DDL) could give
unexpected results. Those would be protected on primary/publisher because
the catalog_xmin on it was still accurate and behind. If this theory turns out to
be true, then this is a drawback/bug of the existing fast_forward mode code.

I agree that could be a problem.

Upon further analysis, I find that the core problem is we do not build a base
snapshot when decoding changes during fast forward mode, preventing reference
to the minimum transaction ID that remains visible in the snapshot when
determining the candidate for catalog_xmin. As a result, catalog_xmin was
directly advanced to the oldest running transaction ID found in the latest
running_xacts record.

In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could not
be reached during fast forward decoding, resulting in
rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the
system attempts to refer to rb->txns_by_base_snapshot_lsn via
ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is
NULL, it defaults to directly using running->oldestRunningXid as the candidate
for catalog_xmin. For reference, see the implementation details in
SnapBuildProcessRunningXacts.

I think this is a general issue in fast forward decoding, which not only affect
slotsync. I can reproduce the issue using slot_advance SQL API as well. See the
attachment for a patch that includes a test to prove that the catalog data that
are still required would be removed after premature catalog_xmin advancement
during fast forward decoding. If you test that patch on HEAD, you would find
that the output missed a column due to vacuum removal.

I will start a new thread to report and fix this general.

Best Regards,
Hou zj

Attachments:

0001-Add-a-test.patchapplication/octet-stream; name=0001-Add-a-test.patchDownload
From ae24b1696a8f7f4446997f9e5dad77cdc6ad991c Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Tue, 22 Apr 2025 11:03:40 +0800
Subject: [PATCH v11] Add a test

---
 .../test_decoding/expected/oldest_xmin.out    | 41 +++++++++++++++++++
 contrib/test_decoding/specs/oldest_xmin.spec  |  5 +++
 2 files changed, 46 insertions(+)

diff --git a/contrib/test_decoding/expected/oldest_xmin.out b/contrib/test_decoding/expected/oldest_xmin.out
index dd6053f9c1f..57268b38d33 100644
--- a/contrib/test_decoding/expected/oldest_xmin.out
+++ b/contrib/test_decoding/expected/oldest_xmin.out
@@ -38,3 +38,44 @@ COMMIT
 stop    
 (1 row)
 
+
+starting permutation: s0_begin s0_getxid s1_begin s1_insert s0_alter s0_commit s0_checkpoint s0_advance_slot s0_advance_slot s1_commit s0_vacuum s0_get_changes
+step s0_begin: BEGIN;
+step s0_getxid: SELECT pg_current_xact_id() IS NULL;
+?column?
+--------
+f       
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_insert: INSERT INTO harvest VALUES ((1, 2, 3));
+step s0_alter: ALTER TYPE basket DROP ATTRIBUTE mangos;
+step s0_commit: COMMIT;
+step s0_checkpoint: CHECKPOINT;
+step s0_advance_slot: SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn());
+slot_name     
+--------------
+isolation_slot
+(1 row)
+
+step s0_advance_slot: SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn());
+slot_name     
+--------------
+isolation_slot
+(1 row)
+
+step s1_commit: COMMIT;
+step s0_vacuum: VACUUM pg_attribute;
+step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data                                                  
+------------------------------------------------------
+BEGIN                                                 
+table public.harvest: INSERT: fruits[basket]:'(1,2,3)'
+COMMIT                                                
+(3 rows)
+
+?column?
+--------
+stop    
+(1 row)
+
diff --git a/contrib/test_decoding/specs/oldest_xmin.spec b/contrib/test_decoding/specs/oldest_xmin.spec
index 88bd30f5ff7..7f2fe3d7ed7 100644
--- a/contrib/test_decoding/specs/oldest_xmin.spec
+++ b/contrib/test_decoding/specs/oldest_xmin.spec
@@ -25,6 +25,7 @@ step "s0_commit" { COMMIT; }
 step "s0_checkpoint" { CHECKPOINT; }
 step "s0_vacuum" { VACUUM pg_attribute; }
 step "s0_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
+step "s0_advance_slot" { SELECT slot_name FROM pg_replication_slot_advance('isolation_slot', pg_current_wal_lsn()); }
 
 session "s1"
 setup { SET synchronous_commit=on; }
@@ -40,3 +41,7 @@ step "s1_commit" { COMMIT; }
 # will be removed (xmax set) before T1 commits. That is, interlocking doesn't
 # forbid modifying catalog after someone read it (and didn't commit yet).
 permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_get_changes" "s0_get_changes" "s1_commit" "s0_vacuum" "s0_get_changes"
+
+# Perform the same testing process as described above, but use advance_slot to
+# forces xmin advancement during fast forward decoding.
+permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_advance_slot" "s0_advance_slot" "s1_commit" "s0_vacuum" "s0_get_changes"
-- 
2.31.1

#27shveta malik
shveta.malik@gmail.com
In reply to: Nisha Moond (#19)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Apr 11, 2025 at 1:03 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Patch "v5_aprch_3-0001" implements the above Approach 3.

Thanks Hou-san for implementing approach-2 and providing the patch.

I find Approach 2 better than Approach1. Yet to review Approach3.
Please find my initial comments:

Approach#1:

1)
When slot is created with failover enabled and later we try to create
a subscription using that pre-created slot with two-phase enabled, it
does not error out. But it silently retains two_phase of the existing
slot as false. Please check if an error is possible in this case too.

Approach #2:

1)
+ ReorderBufferSkipPrepare(reorder, parsed.twophase_xid);

In xact_decode, why do we have a new call to ReorderBufferSkipPrepare,
can you please add some comments here?

2)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\"
is specified",

So like approach 1, here as well we disallow creating subscriptions
with both failover and two_phase enabled when create_slot and
copy_data is true? But users can alter the sub later to allow failover
for two-phase enabled slot provided there are no pending PREPARE txns
which are not yet consumed by the subscriber. Is my understanding
correct? Can you please tell all the ways using which the user can
enable both failover and two_phase together? The patch msg is not that
clear. It will be good to add such details in patch message.

3)

+ /*
+ * Do not allow users to enable the failover and two_phase options together
+ * when create_slot is specified.
+ *
+ * See comments atop the similar check in DecodingContextFindStartpoint()
+ * for a detailed reason.
+ */
+ if (!opts.create_slot && opts.twophase && opts.failover)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\"
is specified",
+

Is the check '!opts.create_slot' correct? The error msg and comment
says otherwise.

thanks
Shveta

#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#24)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:

On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before

the

two_phase_at that are not yet replicated. During slot creation with

two-phase

and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to

prepare

during consistent snapshot creation, the restriction becomes almost
unnoticeable.

I think this approach can work for the transactions that are prepared
while the slot is created. But if I understand the problem correctly,
while the initial table sync is performing, the slot's two_phase is
still false, so we need to deal with the transactions that are
prepared during the initial table sync too. What do you think?

Yes, I agree that we need to restrict this case too. Given that we haven't
started decoding when setting two_phase=true during CreateDecodingContext()
after tablesync, we could check prepared transactions afterwards during
decoding. This could involve reporting an ERROR when skipping a prepared
transaction during decoding if its prepare LSN is less than two_phase_at.

It will make it difficult for users to detect it as this happens at a
later point of time.

Alternatively, a simpler method would be to prevent this situation entirely
during the CREATE SUBSCRIPTION command. For example, we could restrict slots
created with failover set to true and twophase is later modified to true after
tablesync. Although the simpler check is more user-visible, it may offer less
flexibility.

I agree with your point, but OTOH, I am also afraid of adding too many
smart checks in the back-branch. If we follow what you say here, then
users have the following ways in PG17 to enable both failover and
two_phase. (a) During Create Subscription, users can set both
'failover' and 'two_phase', if 'copy_data' is false, or (b), if
'copy_data' is true, during Create Subscription, then users can enable
'two_phase' and wait for it to be enabled. Then use Alter Subscription
to set 'failover'.

--
With Regards,
Amit Kapila.

#29Nisha Moond
nisha.moond412@gmail.com
In reply to: shveta malik (#27)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 22, 2025 at 3:23 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Apr 11, 2025 at 1:03 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Patch "v5_aprch_3-0001" implements the above Approach 3.

Thanks Hou-san for implementing approach-2 and providing the patch.

I find Approach 2 better than Approach1. Yet to review Approach3.
Please find my initial comments:

Thanks for the review! I’ve addressed the comments for Approach 2, as
it seems the most suitable. We can revisit the others if needed.

Approach #2:

1)
+ ReorderBufferSkipPrepare(reorder, parsed.twophase_xid);

In xact_decode, why do we have a new call to ReorderBufferSkipPrepare,
can you please add some comments here?

Done.

2)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\"
is specified",

So like approach 1, here as well we disallow creating subscriptions
with both failover and two_phase enabled when create_slot and
copy_data is true? But users can alter the sub later to allow failover
for two-phase enabled slot provided there are no pending PREPARE txns
which are not yet consumed by the subscriber. Is my understanding
correct? Can you please tell all the ways using which the user can
enable both failover and two_phase together? The patch msg is not that
clear. It will be good to add such details in patch message.

We allow creating subscriptions in this scenario as long as no
prepared transactions exist before the two_phase_at. Similarly,
altering a subscription to set failover = true is permitted, provided
the slot's restart_lsn is greater than two_phase_at.
Amit has suggested the ways at [1]/messages/by-id/CAA4eK1LvMwXxvAzHpK+Egjc7vu1NmGxxKcaK_06pE7GKk7JtJQ@mail.gmail.com in which users can enable both
failover and two_phase together.
I've updated the commit message to include more details about the
conditions enforced by the fix.

3)

+ /*
+ * Do not allow users to enable the failover and two_phase options together
+ * when create_slot is specified.
+ *
+ * See comments atop the similar check in DecodingContextFindStartpoint()
+ * for a detailed reason.
+ */
+ if (!opts.create_slot && opts.twophase && opts.failover)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\"
is specified",
+

Is the check '!opts.create_slot' correct? The error msg and comment
says otherwise.

Corrected the check as it should be checking if copy_data is
specified. Thanks for the input!

Please find the attached v6 patch for approach-2 fix on PG17.

[1]: /messages/by-id/CAA4eK1LvMwXxvAzHpK+Egjc7vu1NmGxxKcaK_06pE7GKk7JtJQ@mail.gmail.com

--
Thanks,
Nisha

Attachments:

v6-0001-PG17-Approach-2-Fix-slot-synchronization-for-two_.patchapplication/octet-stream; name=v6-0001-PG17-Approach-2-Fix-slot-synchronization-for-two_.patchDownload
From 1cc6e8f99524125dc150d2296993acd7eeb9c158 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Wed, 23 Apr 2025 16:30:24 +0530
Subject: [PATCH v6] PG17 Approach 2 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before the
two_phase_at that are not yet replicated. During slot creation with two-phase
and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

To address the issue, the fix enforces the following conditions:
1) Prevent the creation of a slot with two_phase=true and failover=true if
 any prepared transactions are detected during decoding before achieving a
 consistent snapshot.
2) Disallow creating subscriptions with (two_phase=true, failover=true,
 copy_data=true). In this scenario, since the two_phase flag is switched
 to true after table sync, detecting prepared transactions is challenging
 in this case, so we simply restrict it.
3) Disallow altering the slot's failover to true when two_phase=true and
 restart_lsn is less than two_phase_at.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
 User can try altering failover again when two_phase state is moved beyond
 'pending'.
---
 src/backend/commands/subscriptioncmds.c       | 43 +++++++++++++++++++
 src/backend/replication/logical/decode.c      | 10 +++++
 src/backend/replication/logical/logical.c     | 21 +++++++++
 .../replication/logical/reorderbuffer.c       | 21 +++++++++
 src/backend/replication/slot.c                | 17 ++++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  6 +--
 src/include/replication/reorderbuffer.h       |  1 +
 .../t/040_standby_failover_slots_sync.pl      | 32 ++++++++++++++
 src/test/regress/expected/subscription.out    |  3 ++
 src/test/regress/sql/subscription.sql         |  4 ++
 10 files changed, 155 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..c06762aa1fd 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,23 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together when copy_data is specified. In this scenario, the two_phase
+	 * flag is switched to true after table sync, making it challenging to
+	 * detect prepared transactions, so we simply restrict it.
+	 *
+	 * See comments atop the similar check in DecodingContextFindStartpoint()
+	 * for a detailed reason on why detecting prepared transactions is
+	 * important here.
+	 *
+	 */
+	if (opts.copy_data && opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\" is specified",
+					   "failover", "two_phase", "copy_data"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1262,32 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover when the
+					 * two_phase state is still pending i.e., the replication
+					 * slot’s two_phase option has not yet been finalized.
+					 *
+					 * In most cases, the restriction in
+					 * ReplicationSlotAlter() is sufficient to prevent
+					 * enabling failover for a slot with two_phase enabled.
+					 * However, this additional check is necessary to handle a
+					 * race condition, when a user runs CREATE SUBSCRIPTION
+					 * with two_phase=true, but the slot's two_phase flag
+					 * hasn't been set yet, a concurrent attempt to do ALTER
+					 * SUBSCRIPTION(failover=true) may bypass the check in
+					 * ReplicationSlotAlter().
+					 *
+					 * For a detailed explanation of why enforcing this
+					 * restriction is important when combining two_phase and
+					 * failover, refer to the comments atop similar check in
+					 * DecodingContextFindStartpoint().
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 8ec5adfd909..fc569444ebd 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -327,6 +327,16 @@ xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				{
 					ReorderBufferProcessXid(reorder, parsed.twophase_xid,
 											buf->origptr);
+
+					/*
+					 * Mark the transaction as skipped to handle the case
+					 * where two_phase is temporarily disabled in decoding
+					 * context during slot creation. Such transactions need to
+					 * be accounted for in DecodingContextFindStartpoint(), as
+					 * there may be prepared transactions for which decoding
+					 * was skipped before two_phase is enabled for the slot.
+					 */
+					ReorderBufferSkipPrepare(reorder, parsed.twophase_xid);
 					break;
 				}
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..76464f1fcc8 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -679,6 +679,27 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 		CHECK_FOR_INTERRUPTS();
 	}
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are transactions prepared before reaching the consistent point
+	 * that were skipped during decoding. These prepared transactions can fail
+	 * to replicate to the subscriber after being committed on a promoted
+	 * standby following a failover.
+	 *
+	 * This is because the two_phase_at field of a slot, which tracks the LSN,
+	 * from which two-phase decoding starts, is not synchronized to standby
+	 * servers. Without two_phase_at, the logical decoding might incorrectly
+	 * identify prepared transaction as already replicated to the subscriber
+	 * after promotion of standby server, causing them to be skipped.
+	 */
+	if (slot->data.two_phase && slot->data.failover &&
+		ReorderBufferPrapareSkipped(ctx->reorder))
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"),
+				errdetail("Prepared transactions exist before the logical decoding starting point %X/%X.",
+						  LSN_FORMAT_ARGS(ctx->reader->EndRecPtr)));
+
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
 	if (slot->data.two_phase)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 03eb005c39d..f19f5d22762 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2800,6 +2800,27 @@ ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid)
 	txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
 }
 
+/*
+ * Check whether we have skipped processing prepared transactions.
+ */
+bool
+ReorderBufferPrapareSkipped(ReorderBuffer *rb)
+{
+	dlist_mutable_iter it;
+
+	dlist_foreach_modify(it, &rb->toplevel_by_lsn)
+	{
+		ReorderBufferTXN *txn;
+
+		txn = dlist_container(ReorderBufferTXN, node, it.cur);
+
+		if (rbtxn_skip_prepared(txn))
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Prepare a two-phase transaction.
  *
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..df0602cf640 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -848,6 +848,23 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot
+	 * where slot's restart_lsn is less than two_phase_at. In such cases,
+	 * there is a risk that transactions prepared before two_phase_at exist
+	 * and would be skipped during decoding.
+	 *
+	 * See comments atop the similar check in DecodingContextFindStartpoint()
+	 * for a detailed reason.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"),
+				errhint("WALs prior to %X/%X, where two-phase decoding begins, are still reserved by this slot.",
+						LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 4c56f219fd8..6a69196d5b2 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -722,6 +722,7 @@ extern bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xi
 											 TimestampTz prepare_time,
 											 RepOriginId origin_id, XLogRecPtr origin_lsn);
 extern void ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid);
+extern bool ReorderBufferPrapareSkipped(ReorderBuffer *rb);
 extern void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid, char *gid);
 extern ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *rb);
 extern TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..9f0ce33af6a 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,38 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option cannot be enabled for a two_phase replication
+# slot.
+##################################################
+
+# # Create a slot with two_phase enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub2', 'pgoutput', false, true, false);");
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, copy_data = false, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~ /ERROR:  cannot enable failover for a subscription with a pending two_phase state/,
+	"Enabling failover is not allowed for a two_phase pending subscription");
+
+# Attempt to enable failover for a two-phase enabled slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT regress_mysub2 (failover);],
+	replication => 'database');
+ok($stderr =~ /ERROR:  cannot enable failover for a two-phase enabled replication slot/,
+	"Enabling failover is not allowed for a two_phase enabled slot");
+
+# Drop the subscription
+$subscriber1->safe_psql(
+	'postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..971d2f619d2 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,9 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (two_phase = true, failover = true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options when "copy_data" is specified
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..39dc35fce6a 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#30Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#28)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:

On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before

the

two_phase_at that are not yet replicated. During slot creation with

two-phase

and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to

prepare

during consistent snapshot creation, the restriction becomes almost
unnoticeable.

I think this approach can work for the transactions that are prepared
while the slot is created. But if I understand the problem correctly,
while the initial table sync is performing, the slot's two_phase is
still false, so we need to deal with the transactions that are
prepared during the initial table sync too. What do you think?

Yes, I agree that we need to restrict this case too. Given that we haven't
started decoding when setting two_phase=true during CreateDecodingContext()
after tablesync, we could check prepared transactions afterwards during
decoding. This could involve reporting an ERROR when skipping a prepared
transaction during decoding if its prepare LSN is less than two_phase_at.

It will make it difficult for users to detect it as this happens at a
later point of time.

Alternatively, a simpler method would be to prevent this situation entirely
during the CREATE SUBSCRIPTION command. For example, we could restrict slots
created with failover set to true and twophase is later modified to true after
tablesync. Although the simpler check is more user-visible, it may offer less
flexibility.

I agree with your point, but OTOH, I am also afraid of adding too many
smart checks in the back-branch. If we follow what you say here, then
users have the following ways in PG17 to enable both failover and
two_phase. (a) During Create Subscription, users can set both
'failover' and 'two_phase', if 'copy_data' is false, or (b), if
'copy_data' is true, during Create Subscription, then users can enable
'two_phase' and wait for it to be enabled. Then use Alter Subscription
to set 'failover'.

Yet another idea would be to disallow enabling both two_phase and
failover at CREATE SUBSCRIPTION regardless of copy_data value and to
add check when enabling failover for the two_phase-enabled-slots. For
example, users who want to enable both need to do two steps:

1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
2. ALTER SUBSCRIPTION with failover = true.

At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
the two_phase states is ready (and possibly check if the slot's
two_phase has been enabled too), otherwise raises an ERROR. Then, when
the publisher enables the failover for the two_phase-enabled-slot up
on walrcv_alter_slot() request, it checks the slot's restart_lsn has
passed slot's two_phase_at, otherwise raise an error with the message
like "the slot need to consume change upto %X/%X to enable failover".

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#30)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Apr 23, 2025 at 11:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:

On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before

the

two_phase_at that are not yet replicated. During slot creation with

two-phase

and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to

prepare

during consistent snapshot creation, the restriction becomes almost
unnoticeable.

I think this approach can work for the transactions that are prepared
while the slot is created. But if I understand the problem correctly,
while the initial table sync is performing, the slot's two_phase is
still false, so we need to deal with the transactions that are
prepared during the initial table sync too. What do you think?

Yes, I agree that we need to restrict this case too. Given that we haven't
started decoding when setting two_phase=true during CreateDecodingContext()
after tablesync, we could check prepared transactions afterwards during
decoding. This could involve reporting an ERROR when skipping a prepared
transaction during decoding if its prepare LSN is less than two_phase_at.

It will make it difficult for users to detect it as this happens at a
later point of time.

Alternatively, a simpler method would be to prevent this situation entirely
during the CREATE SUBSCRIPTION command. For example, we could restrict slots
created with failover set to true and twophase is later modified to true after
tablesync. Although the simpler check is more user-visible, it may offer less
flexibility.

I agree with your point, but OTOH, I am also afraid of adding too many
smart checks in the back-branch. If we follow what you say here, then
users have the following ways in PG17 to enable both failover and
two_phase. (a) During Create Subscription, users can set both
'failover' and 'two_phase', if 'copy_data' is false, or (b), if
'copy_data' is true, during Create Subscription, then users can enable
'two_phase' and wait for it to be enabled. Then use Alter Subscription
to set 'failover'.

Yet another idea would be to disallow enabling both two_phase and
failover at CREATE SUBSCRIPTION regardless of copy_data value and to
add check when enabling failover for the two_phase-enabled-slots. For
example, users who want to enable both need to do two steps:

1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
2. ALTER SUBSCRIPTION with failover = true.

At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
the two_phase states is ready (and possibly check if the slot's
two_phase has been enabled too), otherwise raises an ERROR. Then, when
the publisher enables the failover for the two_phase-enabled-slot up
on walrcv_alter_slot() request, it checks the slot's restart_lsn has
passed slot's two_phase_at, otherwise raise an error with the message
like "the slot need to consume change upto %X/%X to enable failover".

This should further simplify the checks with an additional restriction
during the CREATE SUBSCRIPTION time. I am in favor of it because I
want the code in this area to be as much same in HEAD and backbranches
as possible.

Please note that the PG17 also suffers from data loss after promotion
due to a bug in fast-forward mode as described in email [1]/messages/by-id/OS0PR01MB57163087F86621D44D9A72BF94BB2@OS0PR01MB5716.jpnprd01.prod.outlook.com. So, we
should try to get that fixed as well.

Thanks for your feedback.

[1]: /messages/by-id/OS0PR01MB57163087F86621D44D9A72BF94BB2@OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

#32Nisha Moond
nisha.moond412@gmail.com
In reply to: Amit Kapila (#31)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 24, 2025 at 12:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 23, 2025 at 11:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:

On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before

the

two_phase_at that are not yet replicated. During slot creation with

two-phase

and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to

prepare

during consistent snapshot creation, the restriction becomes almost
unnoticeable.

I think this approach can work for the transactions that are prepared
while the slot is created. But if I understand the problem correctly,
while the initial table sync is performing, the slot's two_phase is
still false, so we need to deal with the transactions that are
prepared during the initial table sync too. What do you think?

Yes, I agree that we need to restrict this case too. Given that we haven't
started decoding when setting two_phase=true during CreateDecodingContext()
after tablesync, we could check prepared transactions afterwards during
decoding. This could involve reporting an ERROR when skipping a prepared
transaction during decoding if its prepare LSN is less than two_phase_at.

It will make it difficult for users to detect it as this happens at a
later point of time.

Alternatively, a simpler method would be to prevent this situation entirely
during the CREATE SUBSCRIPTION command. For example, we could restrict slots
created with failover set to true and twophase is later modified to true after
tablesync. Although the simpler check is more user-visible, it may offer less
flexibility.

I agree with your point, but OTOH, I am also afraid of adding too many
smart checks in the back-branch. If we follow what you say here, then
users have the following ways in PG17 to enable both failover and
two_phase. (a) During Create Subscription, users can set both
'failover' and 'two_phase', if 'copy_data' is false, or (b), if
'copy_data' is true, during Create Subscription, then users can enable
'two_phase' and wait for it to be enabled. Then use Alter Subscription
to set 'failover'.

Yet another idea would be to disallow enabling both two_phase and
failover at CREATE SUBSCRIPTION regardless of copy_data value and to
add check when enabling failover for the two_phase-enabled-slots. For
example, users who want to enable both need to do two steps:

1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
2. ALTER SUBSCRIPTION with failover = true.

At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
the two_phase states is ready (and possibly check if the slot's
two_phase has been enabled too), otherwise raises an ERROR. Then, when
the publisher enables the failover for the two_phase-enabled-slot up
on walrcv_alter_slot() request, it checks the slot's restart_lsn has
passed slot's two_phase_at, otherwise raise an error with the message
like "the slot need to consume change upto %X/%X to enable failover".

This should further simplify the checks with an additional restriction
during the CREATE SUBSCRIPTION time. I am in favor of it because I
want the code in this area to be as much same in HEAD and backbranches
as possible.

Please find the updated patch for Approach 3, which implements the
idea suggested by Swada-san above.

--
Thanks,
Nisha

Attachments:

v7-0001-PG17-Approach-3-Fix-slot-synchronization-for-two_.patchapplication/octet-stream; name=v7-0001-PG17-Approach-3-Fix-slot-synchronization-for-two_.patchDownload
From f5db2b348df37490df288b7c4c2dc84d4492a529 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Mon, 7 Apr 2025 09:36:29 +0530
Subject: [PATCH v7] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |  2 +
 contrib/test_decoding/sql/slot.sql            |  1 +
 src/backend/commands/subscriptioncmds.c       | 39 +++++++++++++++
 src/backend/replication/logical/logical.c     | 12 +++++
 src/backend/replication/slot.c                | 37 +++++++++++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  6 +--
 .../t/040_standby_failover_slots_sync.pl      | 47 +++++++++++++++++++
 src/test/regress/expected/subscription.out    |  3 ++
 src/test/regress/sql/subscription.sql         |  4 ++
 9 files changed, 148 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..8fd762dea85 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..e93d02ed90c 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,19 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+					   "failover", "two_phase"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1258,32 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover when the
+					 * two_phase state is still pending i.e., the replication
+					 * slot’s two_phase option has not yet been finalized.
+					 *
+					 * In most cases, the restriction in
+					 * ReplicationSlotAlter() is sufficient to prevent
+					 * enabling failover for a slot with two_phase enabled.
+					 * However, this additional check is necessary to handle a
+					 * race condition, when a user runs CREATE SUBSCRIPTION
+					 * with two_phase=true, but the slot's two_phase flag
+					 * hasn't been set yet, a concurrent attempt to do ALTER
+					 * SUBSCRIPTION(failover=true) may bypass the check in
+					 * ReplicationSlotAlter().
+					 *
+					 * For a detailed explanation of why enforcing this
+					 * restriction is important when combining two_phase and
+					 * failover, refer to the comments atop similar check in
+					 * ReplicationSlotCreate().
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..2421e03d088 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,18 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop the similar check in ReplicationSlotCreate() for
+		 * a detailed reason.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..5e0058bc328 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,26 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 *
+		 * This is because the two_phase_at field of a slot, which tracks the
+		 * LSN, from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without two_phase_at, the logical decoding might
+		 * incorrectly identify prepared transaction as already replicated to
+		 * the subscriber after promotion of standby server, causing them to
+		 * be skipped.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +868,23 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot
+	 * where slot's restart_lsn is less than two_phase_at. In such cases,
+	 * there is a risk that transactions prepared before two_phase_at exist
+	 * and would be skipped during decoding.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..dc122348f84 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,53 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription.
+##################################################
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub2_slot, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'lsub1_slot';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql(
+	'postgres', q{
+DROP SUBSCRIPTION regress_mysub2;
+});
+
+# Create a slot with two_phase enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub2', 'pgoutput', false, true, false);");
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, copy_data = false, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~ /ERROR:  cannot enable failover for a subscription with a pending two_phase state/,
+	"Enabling failover is not allowed for a two_phase pending subscription");
+
+# Drop the subscription
+$subscriber1->safe_psql(
+	'postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..bbd83244ba9 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,9 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#33shveta malik
shveta.malik@gmail.com
In reply to: Nisha Moond (#32)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 24, 2025 at 2:54 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the updated patch for Approach 3, which implements the
idea suggested by Swada-san above.

Thank You for the patch.

1)

CreateDecodingContext:

  if (ctx->twophase && !slot->data.two_phase)
  {
+ /*
+ * Do not allow two-phase decoding for failover enabled slots.
+ *
+ * See comments atop the similar check in ReplicationSlotCreate() for
+ * a detailed reason.
+ */
+ if (slot->data.failover)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+ NameStr(slot->data.name))));
+

Can you please add tetscase to cover this scenario?

2)
We shall update create-sub documents as well for these mutually
exclusive options. Review other pages (alter-sub, create-slot) as well
for any required change.

3)
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription.
+##################################################

Suggestion: 'Test that the failover option can be enabled for a two_phase
enabled subscription only through Alter Subscription (failover=true)'

thanks
Shveta

#34Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Nisha Moond (#32)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 24, 2025 at 2:24 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Thu, Apr 24, 2025 at 12:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 23, 2025 at 11:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:

On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before

the

two_phase_at that are not yet replicated. During slot creation with

two-phase

and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to

prepare

during consistent snapshot creation, the restriction becomes almost
unnoticeable.

I think this approach can work for the transactions that are prepared
while the slot is created. But if I understand the problem correctly,
while the initial table sync is performing, the slot's two_phase is
still false, so we need to deal with the transactions that are
prepared during the initial table sync too. What do you think?

Yes, I agree that we need to restrict this case too. Given that we haven't
started decoding when setting two_phase=true during CreateDecodingContext()
after tablesync, we could check prepared transactions afterwards during
decoding. This could involve reporting an ERROR when skipping a prepared
transaction during decoding if its prepare LSN is less than two_phase_at.

It will make it difficult for users to detect it as this happens at a
later point of time.

Alternatively, a simpler method would be to prevent this situation entirely
during the CREATE SUBSCRIPTION command. For example, we could restrict slots
created with failover set to true and twophase is later modified to true after
tablesync. Although the simpler check is more user-visible, it may offer less
flexibility.

I agree with your point, but OTOH, I am also afraid of adding too many
smart checks in the back-branch. If we follow what you say here, then
users have the following ways in PG17 to enable both failover and
two_phase. (a) During Create Subscription, users can set both
'failover' and 'two_phase', if 'copy_data' is false, or (b), if
'copy_data' is true, during Create Subscription, then users can enable
'two_phase' and wait for it to be enabled. Then use Alter Subscription
to set 'failover'.

Yet another idea would be to disallow enabling both two_phase and
failover at CREATE SUBSCRIPTION regardless of copy_data value and to
add check when enabling failover for the two_phase-enabled-slots. For
example, users who want to enable both need to do two steps:

1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
2. ALTER SUBSCRIPTION with failover = true.

At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
the two_phase states is ready (and possibly check if the slot's
two_phase has been enabled too), otherwise raises an ERROR. Then, when
the publisher enables the failover for the two_phase-enabled-slot up
on walrcv_alter_slot() request, it checks the slot's restart_lsn has
passed slot's two_phase_at, otherwise raise an error with the message
like "the slot need to consume change upto %X/%X to enable failover".

This should further simplify the checks with an additional restriction
during the CREATE SUBSCRIPTION time. I am in favor of it because I
want the code in this area to be as much same in HEAD and backbranches
as possible.

Please find the updated patch for Approach 3, which implements the
idea suggested by Swada-san above.

Thank you for the patch! I think we need to update the documentation
as well. I'll also review the patch shortly so could you please
prepare the documentation changes?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#35Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#34)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 24, 2025 at 10:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 24, 2025 at 2:24 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Thu, Apr 24, 2025 at 12:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 23, 2025 at 11:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:

On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

----------
Approach 2
----------

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before

the

two_phase_at that are not yet replicated. During slot creation with

two-phase

and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to

prepare

during consistent snapshot creation, the restriction becomes almost
unnoticeable.

I think this approach can work for the transactions that are prepared
while the slot is created. But if I understand the problem correctly,
while the initial table sync is performing, the slot's two_phase is
still false, so we need to deal with the transactions that are
prepared during the initial table sync too. What do you think?

Yes, I agree that we need to restrict this case too. Given that we haven't
started decoding when setting two_phase=true during CreateDecodingContext()
after tablesync, we could check prepared transactions afterwards during
decoding. This could involve reporting an ERROR when skipping a prepared
transaction during decoding if its prepare LSN is less than two_phase_at.

It will make it difficult for users to detect it as this happens at a
later point of time.

Alternatively, a simpler method would be to prevent this situation entirely
during the CREATE SUBSCRIPTION command. For example, we could restrict slots
created with failover set to true and twophase is later modified to true after
tablesync. Although the simpler check is more user-visible, it may offer less
flexibility.

I agree with your point, but OTOH, I am also afraid of adding too many
smart checks in the back-branch. If we follow what you say here, then
users have the following ways in PG17 to enable both failover and
two_phase. (a) During Create Subscription, users can set both
'failover' and 'two_phase', if 'copy_data' is false, or (b), if
'copy_data' is true, during Create Subscription, then users can enable
'two_phase' and wait for it to be enabled. Then use Alter Subscription
to set 'failover'.

Yet another idea would be to disallow enabling both two_phase and
failover at CREATE SUBSCRIPTION regardless of copy_data value and to
add check when enabling failover for the two_phase-enabled-slots. For
example, users who want to enable both need to do two steps:

1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
2. ALTER SUBSCRIPTION with failover = true.

At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
the two_phase states is ready (and possibly check if the slot's
two_phase has been enabled too), otherwise raises an ERROR. Then, when
the publisher enables the failover for the two_phase-enabled-slot up
on walrcv_alter_slot() request, it checks the slot's restart_lsn has
passed slot's two_phase_at, otherwise raise an error with the message
like "the slot need to consume change upto %X/%X to enable failover".

This should further simplify the checks with an additional restriction
during the CREATE SUBSCRIPTION time. I am in favor of it because I
want the code in this area to be as much same in HEAD and backbranches
as possible.

Please find the updated patch for Approach 3, which implements the
idea suggested by Swada-san above.

Thank you for the patch! I think we need to update the documentation
as well. I'll also review the patch shortly so could you please
prepare the documentation changes?

Here are reviews comments on v7 patch:

+   /*
+    * Do not allow users to enable the failover and two_phase options
+    * together.
+    *
+    * See comments atop the similar check in ReplicationSlotCreate() for a
+    * detailed reason.
+    */
+   if (opts.twophase && opts.failover)
+       ereport(ERROR,
+               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+               errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+                      "failover", "two_phase"));

I think we can reword the error message to something like "cannot
enable both failover and two_phase options at CREATE SUBSCRIPTION",
and we can give a errhint that failover can be enabled after two_phase
state is ready.

---
+       if (two_phase && !IsSyncingReplicationSlots())
+           ereport(ERROR,
+                   errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                   errmsg("\"%s\" and \"%s\" are mutually exclusive options",
+                          "failover", "two_phase"));

Similar to the above comment, how about rewriting it to something like
"cannot enable both failover and two_phase during replication slot
creation".

I realized that users who create a logical slot using
pg_create_logical_replication_slot() would not be able to enable both
options at slot creation, and there is no easy way to enable the
failover after two_phase-enabled-slot creation. Users would need to
use ALTER_REPLICATION_SLOT replication command, which seems
unrealistics for users to use. On the other hand, if we allow creating
a logical slot with enabling failover and two_phase using SQL API,
there is still a chance for this bug to occur. Would it be worth
considering that if a logical slot is created with enabling failover
and two_phase using SQL API, we create the slot with only
two_phase=true, then advance the slot until the slot satisfies
restart_lsn >= two_phase_at, and then enable the failover?

---
+                   if (sub->twophasestate ==
LOGICALREP_TWOPHASE_STATE_PENDING &&
+                       opts.failover)
+                       ereport(ERROR,
+                               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                               errmsg("cannot enable failover for a
subscription with a pending two_phase state"));
+

I think ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is a more appropriate
error code.

---
+   if (failover && MyReplicationSlot->data.two_phase &&
+       MyReplicationSlot->data.restart_lsn <
MyReplicationSlot->data.two_phase_at)
+       ereport(ERROR,
+               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+               errmsg("cannot enable failover for a two-phase enabled
replication slot"),
+               errdetail("The slot need to consume change upto %X/%X
to enable failover.",
+
LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));

Same as above comment with regards to the error code.

---
We need regression tests for replication slot creation SQL API too.

---
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+       'postgres',
+       q{SELECT failover from pg_replication_slots WHERE slot_name =
'lsub1_slot';}
+   ),
+   "t",
+   'logical slot has failover true on the publisher');

It should check 'lsub2_slot' instead.

---
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+   "CREATE SUBSCRIPTION regress_mysub2 CONNECTION
'$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot =
false, copy_data = false, enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+   "ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~ /ERROR:  cannot enable failover for a subscription
with a pending two_phase state/,
+   "Enabling failover is not allowed for a two_phase pending subscription");
+

I think it would be good to have more tests where we make the
two_phase state ready and execute 'ALTER SUBSCRIPTION SET (failover =
true)' successfully.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#36Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#35)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I realized that users who create a logical slot using
pg_create_logical_replication_slot() would not be able to enable both
options at slot creation, and there is no easy way to enable the
failover after two_phase-enabled-slot creation. Users would need to
use ALTER_REPLICATION_SLOT replication command, which seems
unrealistics for users to use. On the other hand, if we allow creating
a logical slot with enabling failover and two_phase using SQL API,
there is still a chance for this bug to occur. Would it be worth
considering that if a logical slot is created with enabling failover
and two_phase using SQL API, we create the slot with only
two_phase=true, then advance the slot until the slot satisfies
restart_lsn >= two_phase_at, and then enable the failover?

This means we either need to maintain somewhere that user has provided
failover flag till restart_lsn >= two_phase_at or and then set
failover flag in the slot or initially mark it but enable the
functionality of failover when we reach the condition restart_lsn >=
two_phase_at. Both seem to have different kinds of problems. The first
idea seems to have an issue with persistence, which means we can lose
track of the flag after the restart. The second can mislead the user
for a long period in cases where prepare and commit have a large time
gap. I feel this will introduce complexity either in the form of code
or in giving the information to the user.

--
With Regards,
Amit Kapila.

#37Nisha Moond
nisha.moond412@gmail.com
In reply to: shveta malik (#33)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Apr 24, 2025 at 3:57 PM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, Apr 24, 2025 at 2:54 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the updated patch for Approach 3, which implements the
idea suggested by Swada-san above.

Thank You for the patch.

1)

CreateDecodingContext:

if (ctx->twophase && !slot->data.two_phase)
{
+ /*
+ * Do not allow two-phase decoding for failover enabled slots.
+ *
+ * See comments atop the similar check in ReplicationSlotCreate() for
+ * a detailed reason.
+ */
+ if (slot->data.failover)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+ NameStr(slot->data.name))));
+

Can you please add tetscase to cover this scenario?

Added a test for the above scenario.

2)
We shall update create-sub documents as well for these mutually
exclusive options. Review other pages (alter-sub, create-slot) as well
for any required change.

Updated docs where I felt this new change should be mentioned. Please
let me know if I missed any place.

3)
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription.
+##################################################

Suggestion: 'Test that the failover option can be enabled for a two_phase
enabled subscription only through Alter Subscription (failover=true)'

Done.
~~~

Please find the attached v8 patch with above comments addressed.
This version includes the documentation updates suggested by
Sawada-san at [1]/messages/by-id/CAD21AoD08Nb588fAJ+Jd5xRxtAT8yWnOfZ3zq-K5sto9b3ntsA@mail.gmail.com, and incorporates his feedback from [2]/messages/by-id/CAD21AoD08Nb588fAJ+Jd5xRxtAT8yWnOfZ3zq-K5sto9b3ntsA@mail.gmail.com as well.

[1]: /messages/by-id/CAD21AoD08Nb588fAJ+Jd5xRxtAT8yWnOfZ3zq-K5sto9b3ntsA@mail.gmail.com
[2]: /messages/by-id/CAD21AoD08Nb588fAJ+Jd5xRxtAT8yWnOfZ3zq-K5sto9b3ntsA@mail.gmail.com

--
Thanks,
Nisha

Attachments:

v8-0001-PG17-Approach-3-Fix-slot-synchronization-for-two_.patchapplication/octet-stream; name=v8-0001-PG17-Approach-3-Fix-slot-synchronization-for-two_.patchDownload
From 578c9cbd987f2de96953094d3bf33b08fa6d77f2 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Mon, 7 Apr 2025 09:36:29 +0530
Subject: [PATCH v8] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   3 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |   7 +-
 doc/src/sgml/protocol.sgml                    |   4 +-
 doc/src/sgml/ref/alter_subscription.sgml      |   8 ++
 doc/src/sgml/ref/create_subscription.sgml     |   5 +
 src/backend/commands/subscriptioncmds.c       |  41 ++++++
 src/backend/replication/logical/logical.c     |  12 ++
 src/backend/replication/slot.c                |  39 ++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |   6 +-
 .../t/040_standby_failover_slots_sync.pl      | 124 +++++++++++++++++-
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 13 files changed, 249 insertions(+), 9 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..b787115ced2 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,9 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
+HINT:  The "failover" option can be enabled after "two_phase" state is ready.
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f441ec43314..748f38f4737 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29073,9 +29073,10 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         <parameter>failover</parameter>, when set to true,
         specifies that this slot is enabled to be synced to the
         standbys so that logical replication can be resumed after
-        failover. A call to this function has the same effect as
-        the replication protocol command
-        <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+        failover. The parameters <parameter>twophase</parameter>
+        and <parameter>failover</parameter> cannot be enabled together when
+        creating a replication slot. A call to this function has the same
+        effect as the replication protocol command <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..b5d5b94b7b4 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2060,7 +2060,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           and <literal>ROLLBACK PREPARED</literal> are decoded and transmitted.
           The transaction will be decoded and transmitted at
           <literal>PREPARE TRANSACTION</literal> time.
-          The default is false.
+          The default is false. This option is mutually exclusive with <literal>failover</literal>.
          </para>
         </listitem>
        </varlistentry>
@@ -2101,7 +2101,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
          <para>
           If true, the slot is enabled to be synced to the standbys
           so that logical replication can be resumed after failover.
-          The default is false.
+          The default is false. This option is mutually exclusive with <literal>two_phase</literal>.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..072c0832a40 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,14 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When <literal>two_phase</literal> is in the "pending" state, setting
+      <literal>failover = true</literal> is not permitted. Once
+      <literal>two_phase</literal> is "enabled", <literal>failover = true</literal>
+      can be set only if the slot's <literal>restart_lsn</literal> has advanced
+      beyond its <literal>two_phase_at</literal> value.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..e55f02a39c6 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -426,6 +426,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          The options <parameter>failover</parameter> and <parameter>twophase</parameter>
+          cannot be enabled together when creating a subscription.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..4b7f05b8a87 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,21 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("The \"%s\" option can be enabled after  \"%s\" state is ready.",
+						"failover", "two_phase"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1260,32 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover when the
+					 * two_phase state is still pending i.e., the replication
+					 * slot’s two_phase option has not yet been finalized.
+					 *
+					 * In most cases, the restriction in
+					 * ReplicationSlotAlter() is sufficient to prevent
+					 * enabling failover for a slot with two_phase enabled.
+					 * However, this additional check is necessary to handle a
+					 * race condition, when a user runs CREATE SUBSCRIPTION
+					 * with two_phase=true, but the slot's two_phase flag
+					 * hasn't been set yet, a concurrent attempt to do ALTER
+					 * SUBSCRIPTION(failover=true) may bypass the check in
+					 * ReplicationSlotAlter().
+					 *
+					 * For a detailed explanation of why enforcing this
+					 * restriction is important when combining two_phase and
+					 * failover, refer to the comments atop similar check in
+					 * ReplicationSlotCreate().
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..2421e03d088 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,18 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop the similar check in ReplicationSlotCreate() for
+		 * a detailed reason.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..1598aa26b27 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,28 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 *
+		 * This is because the two_phase_at field of a slot, which tracks the
+		 * LSN, from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without two_phase_at, the logical decoding might
+		 * incorrectly identify prepared transaction as already replicated to
+		 * the subscriber after promotion of standby server, causing them to
+		 * be skipped.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"),
+					errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+							"failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +870,23 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot
+	 * where slot's restart_lsn is less than two_phase_at. In such cases,
+	 * there is a risk that transactions prepared before two_phase_at exist
+	 * and would be skipped during decoding.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..f0d7657711c 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -25,6 +25,8 @@ $publisher->init(
 $publisher->append_conf('postgresql.conf', 'autovacuum = off');
 $publisher->start;
 
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 $publisher->safe_psql('postgres',
 	"CREATE PUBLICATION regress_mypub FOR ALL TABLES;");
 
@@ -42,6 +44,8 @@ my $slot_creation_time_on_primary = $publisher->safe_psql(
     SELECT current_timestamp;
 ]);
 
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 # Create a subscription that enables failover.
 $subscriber1->safe_psql('postgres',
 	"CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, copy_data = false, failover = true, enabled = false);"
@@ -98,6 +102,123 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true)
+##################################################
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub2_slot, enabled = true, two_phase = true);"
+);
+
+# Wait for initial table sync to finish
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+$subscriber1->poll_query_until(
+	'postgres', qq[
+	SELECT count(1) = 0 FROM pg_subscription WHERE subname = 'regress_mysub2' and subtwophasestate NOT IN ('e');]
+) or die "Timed out while waiting for subscriber to enable twophase";
+
+# Try to enable the failover for the subscription, should give error
+($result, $stdout, $stderr) = $subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+ok( $stderr =~
+	  qr/ERROR:  could not alter replication slot "lsub2_slot": ERROR:  cannot enable failover for a two-phase enabled replication slot/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Alter subscription to enable failover
+($result, $stdout, $stderr) = $subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that SQL API disallows slot creation with both two_phase and failover enabled
+($result, $stdout, $stderr) = $publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, true, true);"
+);
+ok( $stderr =~
+	  /ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
+HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"cannot create slot with both two_phase and failover enabled");
+
+# Test that the failover option cannot be set for a subscription with
+# two_phase in pending state.
+
+# Create a subscription with two_phase enabled, but in pending state
+$subscriber1->psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub3 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state/,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub4', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub4 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub4"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub4;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
@@ -280,7 +401,8 @@ $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
 # Confirm that the invalidated slot has been dropped.
 $standby1->wait_for_log(
-	qr/dropped replication slot "lsub1_slot" of database with OID [0-9]+/, $log_offset);
+	qr/dropped replication slot "lsub1_slot" of database with OID [0-9]+/,
+	$log_offset);
 
 # Confirm that the logical slot has been re-created on the standby and is
 # flagged as 'synced'
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..e7b9a80ec08 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  The "failover" option can be enabled after  "two_phase" state is ready.
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#38Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#36)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I realized that users who create a logical slot using
pg_create_logical_replication_slot() would not be able to enable both
options at slot creation, and there is no easy way to enable the
failover after two_phase-enabled-slot creation. Users would need to
use ALTER_REPLICATION_SLOT replication command, which seems
unrealistics for users to use. On the other hand, if we allow creating
a logical slot with enabling failover and two_phase using SQL API,
there is still a chance for this bug to occur. Would it be worth
considering that if a logical slot is created with enabling failover
and two_phase using SQL API, we create the slot with only
two_phase=true, then advance the slot until the slot satisfies
restart_lsn >= two_phase_at, and then enable the failover?

This means we either need to maintain somewhere that user has provided
failover flag till restart_lsn >= two_phase_at or and then set
failover flag in the slot

I was thinking of this idea.

or initially mark it but enable the
functionality of failover when we reach the condition restart_lsn >=
two_phase_at.

IIUC the slot could be synchronized to the standby as soon as we
complete DecodingContextFindStartpoint() for a failover-enabled slot.
So we would need some mechanisms to make sure that the slot is not
synchronized while we're waiting to reach the condition restart_lsn >=
two_phase_at even if the failover is enabled.

Both seem to have different kinds of problems. The first
idea seems to have an issue with persistence, which means we can lose
track of the flag after the restart.

I think we can do this series of operations while the slot is not
persistent, that is the slot is still RS_EPHEMERAL.

The second can mislead the user
for a long period in cases where prepare and commit have a large time
gap. I feel this will introduce complexity either in the form of code
or in giving the information to the user.

Agreed. Both ways introduce complexity so we need to consider the
user-unfriendliness (by not having a proper way to enable failover for
the two_phase-enabled-slot using SQL API) vs. risk (of introducing
complexity).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#39Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#38)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Apr 25, 2025 at 9:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I realized that users who create a logical slot using
pg_create_logical_replication_slot() would not be able to enable both
options at slot creation, and there is no easy way to enable the
failover after two_phase-enabled-slot creation. Users would need to
use ALTER_REPLICATION_SLOT replication command, which seems
unrealistics for users to use. On the other hand, if we allow creating
a logical slot with enabling failover and two_phase using SQL API,
there is still a chance for this bug to occur. Would it be worth
considering that if a logical slot is created with enabling failover
and two_phase using SQL API, we create the slot with only
two_phase=true, then advance the slot until the slot satisfies
restart_lsn >= two_phase_at, and then enable the failover?

This means we either need to maintain somewhere that user has provided
failover flag till restart_lsn >= two_phase_at or and then set
failover flag in the slot

I was thinking of this idea.

or initially mark it but enable the
functionality of failover when we reach the condition restart_lsn >=
two_phase_at.

IIUC the slot could be synchronized to the standby as soon as we
complete DecodingContextFindStartpoint() for a failover-enabled slot.
So we would need some mechanisms to make sure that the slot is not
synchronized while we're waiting to reach the condition restart_lsn >=
two_phase_at even if the failover is enabled.

So, then we need any state or persistent flag for this.

Both seem to have different kinds of problems. The first
idea seems to have an issue with persistence, which means we can lose
track of the flag after the restart.

I think we can do this series of operations while the slot is not
persistent, that is the slot is still RS_EPHEMERAL.

But we still need a persistent flag to indicate such slots shouldn't
be synced to standby till we reach the condition restart_lsn >=
two_phase_at.

The second can mislead the user
for a long period in cases where prepare and commit have a large time
gap. I feel this will introduce complexity either in the form of code
or in giving the information to the user.

Agreed. Both ways introduce complexity so we need to consider the
user-unfriendliness (by not having a proper way to enable failover for
the two_phase-enabled-slot using SQL API) vs. risk (of introducing
complexity).

Right, to me it sounds risky to provide such functionality for SQL API
in the back branch.

--
With Regards,
Amit Kapila.

#40shveta malik
shveta.malik@gmail.com
In reply to: Nisha Moond (#37)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Apr 25, 2025 at 5:53 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the attached v8 patch with above comments addressed.
This version includes the documentation updates suggested by
Sawada-san at [1], and incorporates his feedback from [2] as well.

Thanks for the patches.

1)
Regarding documents, these are the details added:
a) pg_create_logical_replication_slot says "The parameters twophase
and failover cannot be enabled together when creating a replication
slot."
b) CREATE_REPLICATION_SLOT says TWO_PHASE: This option is mutually
exclusive with failover.
c) CREATE SUBSCRIPTION doc says: The options failover and twophase
cannot be enabled together when creating a subscription.
d) ALTER SUB says: When two_phase is in the "pending" state, setting
failover = true is not permitted. Once two_phase is "enabled",
failover = true can be set only if the slot's restart_lsn has advanced
beyond its two_phase_at value.

The individual points mentioned are correct. But nowhere we get
complete picture as in how user can enable failover and twophase
together when setting up a slot and subscription. I think we can add
brief notes in a,b and c to state how user can actually enable these 2
together.
Or can add this detail in single page and let other pages refer/point
to that page. Thoughts?

2)

ReplicationSlotCreate():

+ if (two_phase && !IsSyncingReplicationSlots())
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable both \"%s\" and \"%s\" options during
replication slot creation",
+    "failover", "two_phase"),
+ errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+ "failover", "two_phase"));

I think errhint here is not correct. We can not alter failover of
replication slot without /o doing create-sub first and then performing
alter-sub. So either we should mention complete details or remove this
hint, otherwise it is confusing.

3)
CreateSubscription:

+ if (opts.twophase && opts.failover)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+    "failover", "two_phase"),
+ errhint("The \"%s\" option can be enabled after  \"%s\" state is ready.",
+ "failover", "two_phase"));

Shall we mention "using Alter Sub" in the errhint?

4)

AlterSubscription:

+ if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+ opts.failover)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot enable failover for a subscription with a pending
two_phase state"));
+state"));

Shall we mention errhint here too?

thanks
Shveta

#41Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#22)
1 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

On Fri, Apr 18, 2025 at 12:29 PM Amit Kapila wrote:

On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>

-----
Fix
-----

I think we should keep the confirmed_flush even if the previous synced
restart_lsn/catalog_xmin is newer. Attachments include a patch for the

same.

This will fix the case we are facing but adds a new rule for slot synchronization.
Can we think of a simpler way to fix this by avoiding updating other slot fields
(like two_phase, two_phase_at) if restart_lsn or catalog_xmin of the local slot
is ahead of the remote slot?

Since the fix for xmin advancement during fast-forward decoding has been pushed
(commit aaf9e95), I'm attaching the V2 patch to address the assert failure by
avoiding updating other slot fields (like two_phase, two_phase_at) if
restart_lsn or catalog_xmin of the local slot is ahead.

After commit aaf9e95, it's less likely for the synced slot's restart/catalog to
advance ahead of the source slot (The tests in
040_standby_failover_slots_sync.pl would not trigger it anymore). However, this
can still occur in certain scenarios.

Here is an example scenario with the steps involved:

1. Create a failover replication slot named 'logicalslot' on primary
and acquire it in the walsender.

2. Log two standby snapshots on primary:
- `0/3000428` - `running_xacts` (no running transactions)
- `0/3000460` - `running_xacts` (no running transactions)

3. The walsender sets `0/3000428` as the `candidate_restart_lsn`, skipping the
second `running_xacts` because `candidate_restart_lsn` is already valid.

4. After receiving a reply from the apply worker, the walsender assigns
`0/3000428` to `restart_lsn`. At this point, the replication slot
'logicalslot' has `restart_lsn` set to `0/3000428`.

5. On the standby, execute `pg_sync_replication_slots()` to synchronize
'logicalslot'.

6. During synchronization, with the initial `restart_lsn` at `0/3000428`, the
sync slot reaches a consistent point at this position. As a result, it does
not use this consistent point as `candidate_restart_lsn` (refer to
`SnapBuildProcessRunningXacts()`).

7. The sync process identifies the second standby snapshot at `0/3000460` and
uses its LSN as `candidate_restart_lsn`, eventually updating it to
`restart_lsn`.

8. Now, the synced slot holds `restart_lsn` at `0/3000460`, which is ahead of
the remote slot on the primary server.

So, if a user prepares a txn and changes the two_phase setting to true after
the steps mentioned above, the value for two_phase_at might get synced to the
standby while skipping the sync of the latest confirmed_flush_lsn. So I think
it's still necessary to fix the assert failure.

BTW, this fix is only for HEAD as back-branches do not sync
two_phase_at field.

Best Regards,
Hou zj

Attachments:

v2-0001-Fix-assertion-failure-when-decoding-synced-two-ph.patchapplication/octet-stream; name=v2-0001-Fix-assertion-failure-when-decoding-synced-two-ph.patchDownload
From f068d4e59354d401da09126431ab03f21321782a Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Mon, 28 Apr 2025 14:52:00 +0800
Subject: [PATCH v2] Fix assertion failure when decoding synced two-phase
 enabled slots.

Current, during slot synchronization, it skips updating the confirmed_lsn if it
detects that the catalog_xmin or restart_lsn of the synced slot has already
surpassed those of the remote slot on the primary. This behavior poses a
problem when two-phase commit is enabled on the remote slot. The lack of
synchronization between the latest confirmed_lsn and two_phase_at may result in
transactions prepared between the old confirmed_lsn and two_phase_at being
unexpectedly decoded and sent to subscribers following a promotion.

To fix this, when catalog_xmin or restart_lsn of the synced slot are ahead,
we postpone syncing the slot configurations including two_phase_at
until the remote slot's restart_lsn and catalog_xmin are ahead.
---
 src/backend/replication/logical/slotsync.c | 36 ++++++++++++++--------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index e22d41891e6..03478cfb94c 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -196,14 +196,14 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		 * restart_lsn or the initial xmin_horizon computed for the local slot
 		 * is ahead of the remote slot.
 		 *
-		 * If the slot is persistent, restart_lsn of the synced slot could
-		 * still be ahead of the remote slot. Since we use slot advance
-		 * functionality to keep snapbuild/slot updated, it is possible that
-		 * the restart_lsn is advanced to a later position than it has on the
-		 * primary. This can happen when slot advancing machinery finds
-		 * running xacts record after reaching the consistent state at a later
-		 * point than the primary where it serializes the snapshot and updates
-		 * the restart_lsn.
+		 * If the slot is persistent, both restart_lsn and catalog_xmin of the
+		 * synced slot could still be ahead of the remote slot. Since we use
+		 * slot advance functionality to keep snapbuild/slot updated, it is
+		 * possible that the restart_lsn and catalog_xmin are advanced to a
+		 * later position than it has on the primary. This can happen when slot
+		 * advancing machinery finds running xacts record after reaching the
+		 * consistent state at a later point than the primary where it
+		 * serializes the snapshot and updates the restart_lsn.
 		 *
 		 * We LOG the message if the slot is temporary as it can help the user
 		 * to understand why the slot is not sync-ready. In the case of a
@@ -221,16 +221,28 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 
 		if (remote_slot_precedes)
 			*remote_slot_precedes = true;
+
+		/*
+		 * Return immediately without updating the configuration. This is
+		 * necessary when two-phase commit is enabled on the remote slot.
+		 * Syncing only two_phase_at, without also syncing the latest
+		 * confirmed_lsn, might lead to transactions between the old
+		 * confirmed_lsn and two_phase_at being unexpectedly decoded and sent
+		 * to the subscriber. Therefore, we postpone syncing the latest
+		 * confirmed_lsn and two_phase_at until the remote slot's restart_lsn
+		 * and catalog_xmin are ahead.
+		 */
+		return;
 	}
 
 	/*
 	 * Attempt to sync LSNs and xmins only if remote slot is ahead of local
 	 * slot.
 	 */
-	else if (remote_slot->confirmed_lsn > slot->data.confirmed_flush ||
-			 remote_slot->restart_lsn > slot->data.restart_lsn ||
-			 TransactionIdFollows(remote_slot->catalog_xmin,
-								  slot->data.catalog_xmin))
+	if (remote_slot->confirmed_lsn > slot->data.confirmed_flush ||
+		remote_slot->restart_lsn > slot->data.restart_lsn ||
+		TransactionIdFollows(remote_slot->catalog_xmin,
+							 slot->data.catalog_xmin))
 	{
 		/*
 		 * We can't directly copy the remote slot's LSN or xmin unless there
-- 
2.30.0.windows.2

#42shveta malik
shveta.malik@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#41)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Apr 28, 2025 at 2:27 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Fri, Apr 18, 2025 at 12:29 PM Amit Kapila wrote:

On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>

-----
Fix
-----

I think we should keep the confirmed_flush even if the previous synced
restart_lsn/catalog_xmin is newer. Attachments include a patch for the

same.

This will fix the case we are facing but adds a new rule for slot synchronization.
Can we think of a simpler way to fix this by avoiding updating other slot fields
(like two_phase, two_phase_at) if restart_lsn or catalog_xmin of the local slot
is ahead of the remote slot?

Since the fix for xmin advancement during fast-forward decoding has been pushed
(commit aaf9e95), I'm attaching the V2 patch to address the assert failure by
avoiding updating other slot fields (like two_phase, two_phase_at) if
restart_lsn or catalog_xmin of the local slot is ahead.

The patch looks good to me. We can have minor comment tweak:

+ * Syncing only two_phase_at, without also syncing the latest
+ * confirmed_lsn, might lead to transactions between the old
+ * confirmed_lsn and two_phase_at being unexpectedly decoded and sent
+ * to the subscriber.

We can append "following a promotion".

thanks
Shveta

#43Nisha Moond
nisha.moond412@gmail.com
In reply to: shveta malik (#40)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Apr 28, 2025 at 12:04 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Apr 25, 2025 at 5:53 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the attached v8 patch with above comments addressed.
This version includes the documentation updates suggested by
Sawada-san at [1], and incorporates his feedback from [2] as well.

Thanks for the patches.

1)
Regarding documents, these are the details added:
a) pg_create_logical_replication_slot says "The parameters twophase
and failover cannot be enabled together when creating a replication
slot."
b) CREATE_REPLICATION_SLOT says TWO_PHASE: This option is mutually
exclusive with failover.
c) CREATE SUBSCRIPTION doc says: The options failover and twophase
cannot be enabled together when creating a subscription.
d) ALTER SUB says: When two_phase is in the "pending" state, setting
failover = true is not permitted. Once two_phase is "enabled",
failover = true can be set only if the slot's restart_lsn has advanced
beyond its two_phase_at value.

The individual points mentioned are correct. But nowhere we get
complete picture as in how user can enable failover and twophase
together when setting up a slot and subscription. I think we can add
brief notes in a,b and c to state how user can actually enable these 2
together.
Or can add this detail in single page and let other pages refer/point
to that page. Thoughts?

I have updated the docs with a brief explanation of how both options
can be enabled together for points (a), (b), and (c).
Let’s wait for others’ opinions on whether this detail should stay
consolidated on one page, with other pages referring to it.

2)

ReplicationSlotCreate():

+ if (two_phase && !IsSyncingReplicationSlots())
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable both \"%s\" and \"%s\" options during
replication slot creation",
+    "failover", "two_phase"),
+ errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+ "failover", "two_phase"));

I think errhint here is not correct. We can not alter failover of
replication slot without /o doing create-sub first and then performing
alter-sub. So either we should mention complete details or remove this
hint, otherwise it is confusing.

Removed the hint. I feel this explanation will be too long for the
HINT and also docs have the details.
~~~

Please find the v9 patch, addressing the above and all other comments as well.

--
Thanks,
Nisha

Attachments:

v9-0001-PG17-Approach-3-Fix-slot-synchronization-for-two_.patchapplication/octet-stream; name=v9-0001-PG17-Approach-3-Fix-slot-synchronization-for-two_.patchDownload
From 049352858524b5a7d761463352d1a44daedc8ddd Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Mon, 7 Apr 2025 09:36:29 +0530
Subject: [PATCH v9] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  10 +-
 doc/src/sgml/protocol.sgml                    |  14 +-
 doc/src/sgml/ref/alter_subscription.sgml      |   8 ++
 doc/src/sgml/ref/create_subscription.sgml     |   9 ++
 src/backend/commands/subscriptioncmds.c       |  43 ++++++
 src/backend/replication/logical/logical.c     |  12 ++
 src/backend/replication/slot.c                |  37 ++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |   6 +-
 .../t/040_standby_failover_slots_sync.pl      | 124 +++++++++++++++++-
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 13 files changed, 265 insertions(+), 9 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f441ec43314..02c16478cee 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29073,9 +29073,13 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         <parameter>failover</parameter>, when set to true,
         specifies that this slot is enabled to be synced to the
         standbys so that logical replication can be resumed after
-        failover. A call to this function has the same effect as
-        the replication protocol command
-        <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+        failover. The parameters <parameter>two_phase</parameter> and
+        <parameter>failover</parameter> cannot be enabled together when
+        creating a replication slot. However, a slot created with <parameter>two_phase</parameter>
+        enabled can later have <parameter>failover</parameter> set to true via
+        <command>ALTER SUBSCRIPTION</command>, if a subscription is using this
+        slot. A call to this function has the same effect as the replication
+        protocol command <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..538030ae0e1 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2060,7 +2060,12 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           and <literal>ROLLBACK PREPARED</literal> are decoded and transmitted.
           The transaction will be decoded and transmitted at
           <literal>PREPARE TRANSACTION</literal> time.
-          The default is false.
+          The default is false. This option cannot be set together with
+          <literal>failover</literal> when creating a logical replication slot.
+          However, once the slot is created with <literal>two_phase=true</literal>,
+          the <literal>failover</literal> can be set to true after the
+          <literal>restart_lsn</literal> has advanced beyond its
+          <literal>two_phase_at</literal> value.
          </para>
         </listitem>
        </varlistentry>
@@ -2101,7 +2106,12 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
          <para>
           If true, the slot is enabled to be synced to the standbys
           so that logical replication can be resumed after failover.
-          The default is false.
+          The default is false. This option cannot be set together with
+          <literal>two_phase</literal> when creating the slot. However, once
+          the slot is created with <literal>two_phase=true</literal>, the
+          <literal>failover</literal> can be set to true after the
+          <literal>restart_lsn</literal> has advanced beyond its
+          <literal>two_phase_at</literal> value.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..5bc9b943500 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,14 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When <literal>two_phase</literal> is in the "pending" state, altering
+      <literal>failover = true</literal> is not permitted. Once
+      <literal>two_phase</literal> is enabled, <literal>failover = true</literal>
+      can be set only if the slot's <literal>restart_lsn</literal> has advanced
+      beyond its <literal>two_phase_at</literal> value.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..9bc05d5103f 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -426,6 +426,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          The options <parameter>failover</parameter> and <parameter>twophase</parameter>
+          cannot be enabled together when creating a subscription. However, a
+          <parameter>two_phase</parameter> enabled subscription can later have
+          <parameter>failover</parameter> set to true via
+          <command>ALTER SUBSCRIPTION</command>, once the <literal>restart_lsn</literal>
+          of the slot has advanced beyond its <literal>two_phase_at</literal> value.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..6e2830603ac 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,21 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("The \"%s\" option can be enabled after \"%s\" state is ready using ALTER SUBSCRIPTION.",
+						"failover", "two_phase"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1260,34 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover when the
+					 * two_phase state is still pending i.e., the replication
+					 * slot’s two_phase option has not yet been finalized.
+					 *
+					 * In most cases, the restriction in
+					 * ReplicationSlotAlter() is sufficient to prevent
+					 * enabling failover for a slot with two_phase enabled.
+					 * However, this additional check is necessary to handle a
+					 * race condition, when a user runs CREATE SUBSCRIPTION
+					 * with two_phase=true, but the slot's two_phase flag
+					 * hasn't been set yet, a concurrent attempt to do ALTER
+					 * SUBSCRIPTION(failover=true) may bypass the check in
+					 * ReplicationSlotAlter().
+					 *
+					 * For a detailed explanation of why enforcing this
+					 * restriction is important when combining two_phase and
+					 * failover, refer to the comments atop similar check in
+					 * ReplicationSlotCreate().
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"),
+								errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+										"failover", "two_phase"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..2421e03d088 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,18 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop the similar check in ReplicationSlotCreate() for
+		 * a detailed reason.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..6b763ae5dc6 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,26 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 *
+		 * This is because the two_phase_at field of a slot, which tracks the
+		 * LSN, from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without two_phase_at, the logical decoding might
+		 * incorrectly identify prepared transaction as already replicated to
+		 * the subscriber after promotion of standby server, causing them to
+		 * be skipped.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +868,23 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot
+	 * where slot's restart_lsn is less than two_phase_at. In such cases,
+	 * there is a risk that transactions prepared before two_phase_at exist
+	 * and would be skipped during decoding.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..c7ab4aa87bd 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -25,6 +25,8 @@ $publisher->init(
 $publisher->append_conf('postgresql.conf', 'autovacuum = off');
 $publisher->start;
 
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 $publisher->safe_psql('postgres',
 	"CREATE PUBLICATION regress_mypub FOR ALL TABLES;");
 
@@ -42,6 +44,8 @@ my $slot_creation_time_on_primary = $publisher->safe_psql(
     SELECT current_timestamp;
 ]);
 
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 # Create a subscription that enables failover.
 $subscriber1->safe_psql('postgres',
 	"CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, copy_data = false, failover = true, enabled = false);"
@@ -98,6 +102,123 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true)
+##################################################
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub2_slot, two_phase = true);"
+);
+
+# Wait for initial table sync to finish
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+$subscriber1->poll_query_until(
+	'postgres', qq[
+	SELECT count(1) = 0 FROM pg_subscription WHERE subname = 'regress_mysub2' and subtwophasestate NOT IN ('e');]
+) or die "Timed out while waiting for subscriber to enable twophase";
+
+# Try to enable the failover for the subscription, should give error
+($result, $stdout, $stderr) = $subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+ok( $stderr =~
+	  qr/ERROR:  could not alter replication slot "lsub2_slot": ERROR:  cannot enable failover for a two-phase enabled replication slot/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Alter subscription to enable failover
+($result, $stdout, $stderr) = $subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that SQL API disallows slot creation with both two_phase and failover enabled
+($result, $stdout, $stderr) = $publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, true, true);"
+);
+ok( $stderr =~
+	  /ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation/,
+	"cannot create slot with both two_phase and failover enabled");
+
+# Test that the failover option cannot be set for a subscription with
+# two_phase in pending state.
+
+# Create a subscription with two_phase enabled, but in pending state
+$subscriber1->psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub3 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state
+.*HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub4', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub4 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub4"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub4;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
@@ -280,7 +401,8 @@ $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
 # Confirm that the invalidated slot has been dropped.
 $standby1->wait_for_log(
-	qr/dropped replication slot "lsub1_slot" of database with OID [0-9]+/, $log_offset);
+	qr/dropped replication slot "lsub1_slot" of database with OID [0-9]+/,
+	$log_offset);
 
 # Confirm that the logical slot has been re-created on the standby and is
 # flagged as 'synced'
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..66b1f5aa08f 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  The "failover" option can be enabled after "two_phase" state is ready using ALTER SUBSCRIPTION.
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#44Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: shveta malik (#42)
2 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

On Mon, Apr 28, 2025 at 5:33 PM shveta malik wrote:

On Mon, Apr 28, 2025 at 2:27 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Fri, Apr 18, 2025 at 12:29 PM Amit Kapila wrote:

On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com>

-----
Fix
-----

I think we should keep the confirmed_flush even if the previous
synced restart_lsn/catalog_xmin is newer. Attachments include a
patch for the

same.

This will fix the case we are facing but adds a new rule for slot

synchronization.

Can we think of a simpler way to fix this by avoiding updating other
slot fields (like two_phase, two_phase_at) if restart_lsn or
catalog_xmin of the local slot is ahead of the remote slot?

Since the fix for xmin advancement during fast-forward decoding has
been pushed (commit aaf9e95), I'm attaching the V2 patch to address
the assert failure by avoiding updating other slot fields (like
two_phase, two_phase_at) if restart_lsn or catalog_xmin of the local slot is

ahead.

The patch looks good to me. We can have minor comment tweak:

+ * Syncing only two_phase_at, without also syncing the latest
+ * confirmed_lsn, might lead to transactions between the old
+ * confirmed_lsn and two_phase_at being unexpectedly decoded and sent
+ * to the subscriber.

We can append "following a promotion".

Thanks for reviewing. Here is V3 patch that addressed it.

BTW, I also did some tests to confirm the catalog_xmin could
still be ahead in some case, and here is an example:

1. Create a failover replication slot named 'logicalslot' on primary
and acquire it in the walsender.

2. Log two standby snapshots on primary. Before logging, call txid_current()
To assign a xid, so that each standby snapshot will hold a new xid in its oldestrunningXid field:
- txid_current();
- `0/3000420` - `running_xacts` (no running transactions, oldestrunningXid = 755)
- txid_current();
- `0/3000488` - `running_xacts` (no running transactions, oldestrunningXid = 756)

3. The walsender sets `0/3000420` as the `candidate_restart_lsn`, 755 as
`candidate_catalog_xmin`, skipping the second `running_xacts` because
`candidate_restart_lsn`/`candidate_catalog_xmin` is already valid.

4. After receiving a reply from the apply worker, the walsender assigns
`0/3000420` to `restart_lsn`, `755` to `catalog_xmin`. At this point, the
replication slot 'logicalslot' has `restart_lsn` set to `0/3000420`,
`catalog_xmin` set to `755`.

5. On the standby, execute `pg_sync_replication_slots()` to synchronize
'logicalslot'.

6. During synchronization, with the initial `restart_lsn` at `0/3000420`, the
sync slot reaches a consistent point at this position. As a result, it does
not update `candidate_restart_lsn` and `candidate_catalog_xmin` at
consistent point (refer to `SnapBuildProcessRunningXacts()`).

7. The sync process identifies the second standby snapshot at `0/3000488` and
uses its LSN as `candidate_restart_lsn`, and use the oldestrunningXid `756`
as `candidate_catalog_xmin`, eventually updating it to `restart_lsn` and
`catalog_xmin`.

8. Now, the synced slot holds `restart_lsn` at `0/3000488`, `catalog_xmin` at
`756`, which are all ahead of the remote slot on the primary server.

Attaching a script to reproduce the same.

Note that, to reproduce this stably, we'd better modify the value of
LOG_SNAPSHOT_INTERVAL_MS in bgwriter.c to a bigger number to avoid
unexpected xl_running_xacts logging.

Best Regards,
Hou zj

Attachments:

advance_catalog_xmin.shapplication/octet-stream; name=advance_catalog_xmin.shDownload
v3-0001-Fix-assertion-failure-when-decoding-synced-two-ph.patchapplication/octet-stream; name=v3-0001-Fix-assertion-failure-when-decoding-synced-two-ph.patchDownload
From 777078845255f6c663ace970d7642bbee7e357c8 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Mon, 28 Apr 2025 14:52:00 +0800
Subject: [PATCH v3] Fix assertion failure when decoding synced two-phase
 enabled slots.

Current, during slot synchronization, it skips updating the confirmed_lsn if it
detects that the catalog_xmin or restart_lsn of the synced slot has already
surpassed those of the remote slot on the primary. This behavior poses a
problem when two-phase commit is enabled on the remote slot. The lack of
synchronization between the latest confirmed_lsn and two_phase_at may result in
transactions prepared between the old confirmed_lsn and two_phase_at being
unexpectedly decoded and sent to subscribers following a promotion.

To fix this, when catalog_xmin or restart_lsn of the synced slot are ahead,
we postpone syncing the slot configurations including two_phase_at
until the remote slot's restart_lsn and catalog_xmin are ahead.
---
 src/backend/replication/logical/slotsync.c | 36 ++++++++++++++--------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index e22d41891e6..dd1c725a670 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -196,14 +196,14 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		 * restart_lsn or the initial xmin_horizon computed for the local slot
 		 * is ahead of the remote slot.
 		 *
-		 * If the slot is persistent, restart_lsn of the synced slot could
-		 * still be ahead of the remote slot. Since we use slot advance
-		 * functionality to keep snapbuild/slot updated, it is possible that
-		 * the restart_lsn is advanced to a later position than it has on the
-		 * primary. This can happen when slot advancing machinery finds
-		 * running xacts record after reaching the consistent state at a later
-		 * point than the primary where it serializes the snapshot and updates
-		 * the restart_lsn.
+		 * If the slot is persistent, both restart_lsn and catalog_xmin of the
+		 * synced slot could still be ahead of the remote slot. Since we use
+		 * slot advance functionality to keep snapbuild/slot updated, it is
+		 * possible that the restart_lsn and catalog_xmin are advanced to a
+		 * later position than it has on the primary. This can happen when
+		 * slot advancing machinery finds running xacts record after reaching
+		 * the consistent state at a later point than the primary where it
+		 * serializes the snapshot and updates the restart_lsn.
 		 *
 		 * We LOG the message if the slot is temporary as it can help the user
 		 * to understand why the slot is not sync-ready. In the case of a
@@ -221,16 +221,28 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 
 		if (remote_slot_precedes)
 			*remote_slot_precedes = true;
+
+		/*
+		 * Return immediately without updating the configuration. This is
+		 * crucial when two-phase commit is enabled on the remote slot. In
+		 * this scenario, syncing only two_phase_at, without the latest
+		 * confirmed_lsn, could cause transactions between the old
+		 * confirmed_lsn and two_phase_at to be unexpectedly decoded and sent
+		 * to the subscriber following a promotion. Therefore, we delay
+		 * syncing both the latest confirmed_lsn and two_phase_at until the
+		 * remote slot's restart_lsn and catalog_xmin are ahead.
+		 */
+		return false;
 	}
 
 	/*
 	 * Attempt to sync LSNs and xmins only if remote slot is ahead of local
 	 * slot.
 	 */
-	else if (remote_slot->confirmed_lsn > slot->data.confirmed_flush ||
-			 remote_slot->restart_lsn > slot->data.restart_lsn ||
-			 TransactionIdFollows(remote_slot->catalog_xmin,
-								  slot->data.catalog_xmin))
+	if (remote_slot->confirmed_lsn > slot->data.confirmed_flush ||
+		remote_slot->restart_lsn > slot->data.restart_lsn ||
+		TransactionIdFollows(remote_slot->catalog_xmin,
+							 slot->data.catalog_xmin))
 	{
 		/*
 		 * We can't directly copy the remote slot's LSN or xmin unless there
-- 
2.31.1

#45Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#39)
Re: Fix slot synchronization with two_phase decoding enabled

On Sat, Apr 26, 2025 at 5:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 25, 2025 at 9:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

The second can mislead the user
for a long period in cases where prepare and commit have a large time
gap. I feel this will introduce complexity either in the form of code
or in giving the information to the user.

Agreed. Both ways introduce complexity so we need to consider the
user-unfriendliness (by not having a proper way to enable failover for
the two_phase-enabled-slot using SQL API) vs. risk (of introducing
complexity).

Right, to me it sounds risky to provide such functionality for SQL API
in the back branch.

So do you think it's okay to leave it as a restriction (i.e. there is
no easy way to enable failover for a two_phase-enabled logical slot
created by SQL API)? or do you have any better idea for that?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#46Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#44)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Apr 28, 2025 at 4:33 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Mon, Apr 28, 2025 at 5:33 PM shveta malik wrote:

On Mon, Apr 28, 2025 at 2:27 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Fri, Apr 18, 2025 at 12:29 PM Amit Kapila wrote:

On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com>

-----
Fix
-----

I think we should keep the confirmed_flush even if the previous
synced restart_lsn/catalog_xmin is newer. Attachments include a
patch for the

same.

This will fix the case we are facing but adds a new rule for slot

synchronization.

Can we think of a simpler way to fix this by avoiding updating other
slot fields (like two_phase, two_phase_at) if restart_lsn or
catalog_xmin of the local slot is ahead of the remote slot?

Since the fix for xmin advancement during fast-forward decoding has
been pushed (commit aaf9e95), I'm attaching the V2 patch to address
the assert failure by avoiding updating other slot fields (like
two_phase, two_phase_at) if restart_lsn or catalog_xmin of the local slot is

ahead.

The patch looks good to me. We can have minor comment tweak:

+ * Syncing only two_phase_at, without also syncing the latest
+ * confirmed_lsn, might lead to transactions between the old
+ * confirmed_lsn and two_phase_at being unexpectedly decoded and sent
+ * to the subscriber.

We can append "following a promotion".

Thanks for reviewing. Here is V3 patch that addressed it.

BTW, I also did some tests to confirm the catalog_xmin could
still be ahead in some case, and here is an example:

1. Create a failover replication slot named 'logicalslot' on primary
and acquire it in the walsender.

2. Log two standby snapshots on primary. Before logging, call txid_current()
To assign a xid, so that each standby snapshot will hold a new xid in its oldestrunningXid field:
- txid_current();
- `0/3000420` - `running_xacts` (no running transactions, oldestrunningXid = 755)
- txid_current();
- `0/3000488` - `running_xacts` (no running transactions, oldestrunningXid = 756)

3. The walsender sets `0/3000420` as the `candidate_restart_lsn`, 755 as
`candidate_catalog_xmin`, skipping the second `running_xacts` because
`candidate_restart_lsn`/`candidate_catalog_xmin` is already valid.

4. After receiving a reply from the apply worker, the walsender assigns
`0/3000420` to `restart_lsn`, `755` to `catalog_xmin`. At this point, the
replication slot 'logicalslot' has `restart_lsn` set to `0/3000420`,
`catalog_xmin` set to `755`.

5. On the standby, execute `pg_sync_replication_slots()` to synchronize
'logicalslot'.

6. During synchronization, with the initial `restart_lsn` at `0/3000420`, the
sync slot reaches a consistent point at this position. As a result, it does
not update `candidate_restart_lsn` and `candidate_catalog_xmin` at
consistent point (refer to `SnapBuildProcessRunningXacts()`).

7. The sync process identifies the second standby snapshot at `0/3000488` and
uses its LSN as `candidate_restart_lsn`, and use the oldestrunningXid `756`
as `candidate_catalog_xmin`, eventually updating it to `restart_lsn` and
`catalog_xmin`.

8. Now, the synced slot holds `restart_lsn` at `0/3000488`, `catalog_xmin` at
`756`, which are all ahead of the remote slot on the primary server.

Attaching a script to reproduce the same.

Note that, to reproduce this stably, we'd better modify the value of
LOG_SNAPSHOT_INTERVAL_MS in bgwriter.c to a bigger number to avoid
unexpected xl_running_xacts logging.

Thank you for sharing the patch and reproducible steps. I agree that
this issue still happens on HEAD and should be fixed.

If I understand the problem correctly, we need to avoid setting the
values such that two_phase_at > confirmed_lsn. Because otherwise if a
transaction is prepared between them, we end up decoding the prepared
transaction twice or with an assertion failure if it's enabled. It
never happens on the primary since we always set the same value to
two_phase_at as confirmed_lsn.

Your fix looks good to me. Is it worth considering putting an
assertion to verify if new two_phase_at is equal to or greater than
confirmed_lsn (or at least it doesn't go backward), when syncing
two_phase_at?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#47Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#45)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Apr 28, 2025 at 11:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 26, 2025 at 5:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 25, 2025 at 9:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

The second can mislead the user
for a long period in cases where prepare and commit have a large time
gap. I feel this will introduce complexity either in the form of code
or in giving the information to the user.

Agreed. Both ways introduce complexity so we need to consider the
user-unfriendliness (by not having a proper way to enable failover for
the two_phase-enabled-slot using SQL API) vs. risk (of introducing
complexity).

Right, to me it sounds risky to provide such functionality for SQL API
in the back branch.

So do you think it's okay to leave it as a restriction (i.e. there is
no easy way to enable failover for a two_phase-enabled logical slot
created by SQL API)? or do you have any better idea for that?

I don't have any better ideas. I think it is better to leave it as a
restriction.

--
With Regards,
Amit Kapila.

#48Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#44)
2 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

On Mon, Apr 28, 2025 at 7:33 PM Zhijie Hou (Fujitsu) wrote:

Thanks for reviewing. Here is V3 patch that addressed it.

BTW, I also did some tests to confirm the catalog_xmin could still be
ahead in some case, and here is an example:

1. Create a failover replication slot named 'logicalslot' on primary
and acquire it in the walsender.

2. Log two standby snapshots on primary. Before logging, call
txid_current() To assign a xid, so that each standby snapshot will
hold a new xid in its oldestrunningXid field:
- txid_current();
- `0/3000420` - `running_xacts` (no running transactions,
oldestrunningXid = 755)
- txid_current();
- `0/3000488` - `running_xacts` (no running transactions,
oldestrunningXid = 756)

3. The walsender sets `0/3000420` as the `candidate_restart_lsn`, 755 as
`candidate_catalog_xmin`, skipping the second `running_xacts` because
`candidate_restart_lsn`/`candidate_catalog_xmin` is already valid.

4. After receiving a reply from the apply worker, the walsender assigns
`0/3000420` to `restart_lsn`, `755` to `catalog_xmin`. At this point, the
replication slot 'logicalslot' has `restart_lsn` set to `0/3000420`,
`catalog_xmin` set to `755`.

5. On the standby, execute `pg_sync_replication_slots()` to synchronize
'logicalslot'.

6. During synchronization, with the initial `restart_lsn` at `0/3000420`, the
sync slot reaches a consistent point at this position. As a result, it does
not update `candidate_restart_lsn` and `candidate_catalog_xmin` at
consistent point (refer to `SnapBuildProcessRunningXacts()`).

7. The sync process identifies the second standby snapshot at
`0/3000488` and
uses its LSN as `candidate_restart_lsn`, and use the
oldestrunningXid `756`
as `candidate_catalog_xmin`, eventually updating it to `restart_lsn` and
`catalog_xmin`.

8. Now, the synced slot holds `restart_lsn` at `0/3000488`, `catalog_xmin` at
`756`, which are all ahead of the remote slot on the primary server.

Attaching a script to reproduce the same.

Note that, to reproduce this stably, we'd better modify the value of
LOG_SNAPSHOT_INTERVAL_MS in bgwriter.c to a bigger number to avoid
unexpected xl_running_xacts logging.

In addition to above steps, for those interested in reproducing the specific
scenario where two_phase_at advances past the synced confirmed_flush, I'm
attaching a new script. This script can reproduce the issue after applying the
injection points I provided.

Best Regards,
Hou zj

Attachments:

0001-injection-point.patch.txttext/plain; name=0001-injection-point.patch.txtDownload
From a4a314b7041779d6e5725a9d66b0135715d393f4 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Tue, 29 Apr 2025 13:56:25 +0800
Subject: [PATCH] injection point

---
 src/backend/replication/logical/logical.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index a8d2e024d34..afe194fb398 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -41,6 +41,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/injection_point.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 
@@ -603,6 +604,8 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 		SnapBuildSetTwoPhaseAt(ctx->snapshot_builder, start_lsn);
 	}
 
+	INJECTION_POINT("initialized-two-phase-at");
+
 	ctx->reorder->output_rewrites = ctx->options.receive_rewrites;
 
 	ereport(LOG,
-- 
2.30.0.windows.2

twophaseat_precedes_confirmed_flush.sh.txttext/plain; name=twophaseat_precedes_confirmed_flush.sh.txtDownload
#49shveta malik
shveta.malik@gmail.com
In reply to: Nisha Moond (#43)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Apr 28, 2025 at 4:33 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the v9 patch, addressing the above and all other comments as well.

Thanks for the patch.

1)

+          The default is false. This option cannot be set together with
+          <literal>two_phase</literal> when creating the slot. However, once
+          the slot is created with <literal>two_phase=true</literal>, the
+          <literal>failover</literal> can be set to true after the
+          <literal>restart_lsn</literal> has advanced beyond its
+          <literal>two_phase_at</literal> value

As the user is not aware of two_phase_at, we shall change all the docs
to have a more user friendly explanation.

2)
 # Confirm that the invalidated slot has been dropped.
 $standby1->wait_for_log(
- qr/dropped replication slot "lsub1_slot" of database with OID
[0-9]+/, $log_offset);
+ qr/dropped replication slot "lsub1_slot" of database with OID [0-9]+/,
+ $log_offset);

I think this change is not needed.

thanks
Shveta

#50Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#46)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 29, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Your fix looks good to me. Is it worth considering putting an
assertion to verify if new two_phase_at is equal to or greater than
confirmed_lsn (or at least it doesn't go backward), when syncing
two_phase_at?

Yeah, it makes sense. But the condition should be reverse
(two_phase_at should be less than or equal to confirmed_flush). I have
done that, changed a few comments, and committed the patch.

--
With Regards,
Amit Kapila.

#51Nisha Moond
nisha.moond412@gmail.com
In reply to: shveta malik (#49)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 29, 2025 at 2:51 PM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Apr 28, 2025 at 4:33 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the v9 patch, addressing the above and all other comments as well.

Thanks for the patch.

1)

+          The default is false. This option cannot be set together with
+          <literal>two_phase</literal> when creating the slot. However, once
+          the slot is created with <literal>two_phase=true</literal>, the
+          <literal>failover</literal> can be set to true after the
+          <literal>restart_lsn</literal> has advanced beyond its
+          <literal>two_phase_at</literal> value

As the user is not aware of two_phase_at, we shall change all the docs
to have a more user friendly explanation.

Done.

2)
# Confirm that the invalidated slot has been dropped.
$standby1->wait_for_log(
- qr/dropped replication slot "lsub1_slot" of database with OID
[0-9]+/, $log_offset);
+ qr/dropped replication slot "lsub1_slot" of database with OID [0-9]+/,
+ $log_offset);

I think this change is not needed.

The perltidy is auto-correcting it now, though it didn't for earlier
patches. I've kept the original for now.
~~~

Thanks for review, please find the updated patch v10.

--
Thanks,
Nisha

Attachments:

v10-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/x-patch; name=v10-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From 7aa643c49411f9b855ee01aacb28c6f646d6dc1f Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Mon, 7 Apr 2025 09:36:29 +0530
Subject: [PATCH v10] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  10 +-
 doc/src/sgml/protocol.sgml                    |  14 +-
 doc/src/sgml/ref/alter_subscription.sgml      |   8 ++
 doc/src/sgml/ref/create_subscription.sgml     |   9 ++
 src/backend/commands/subscriptioncmds.c       |  43 +++++++
 src/backend/replication/logical/logical.c     |  12 ++
 src/backend/replication/slot.c                |  37 ++++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |   6 +-
 .../t/040_standby_failover_slots_sync.pl      | 121 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 13 files changed, 263 insertions(+), 8 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f441ec43314..e0d70117202 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29073,9 +29073,13 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         <parameter>failover</parameter>, when set to true,
         specifies that this slot is enabled to be synced to the
         standbys so that logical replication can be resumed after
-        failover. A call to this function has the same effect as
-        the replication protocol command
-        <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+        failover. The parameters <parameter>twophase</parameter> and
+        <parameter>failover</parameter> cannot be enabled together when
+        creating a replication slot. However, a slot created with <parameter>twophase</parameter>
+        enabled can later have <parameter>failover</parameter> set to true via
+        <command>ALTER SUBSCRIPTION</command>, if a subscription is using this
+        slot. A call to this function has the same effect as the replication
+        protocol command <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..c87c60b74a6 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2060,7 +2060,12 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           and <literal>ROLLBACK PREPARED</literal> are decoded and transmitted.
           The transaction will be decoded and transmitted at
           <literal>PREPARE TRANSACTION</literal> time.
-          The default is false.
+          The default is false. This option cannot be set together with
+          <literal>failover</literal> when creating a logical replication slot.
+          However, once the slot is created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all transactions up to the point where two-phase decoding
+          was enabled.
          </para>
         </listitem>
        </varlistentry>
@@ -2101,7 +2106,12 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
          <para>
           If true, the slot is enabled to be synced to the standbys
           so that logical replication can be resumed after failover.
-          The default is false.
+          The default is false. This option cannot be set together with
+          <literal>two_phase</literal> when creating the slot. However, once
+          the slot is created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all transactions up to the point where two-phase decoding
+          was enabled.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..00e1c87e9c9 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,14 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When <literal>two_phase</literal> is in the pending state, altering
+      <literal>failover = true</literal> is not permitted. Once
+      <literal>two_phase</literal> is enabled, <literal>failover = true</literal>
+      can be set only if the slot has consumed all transactions up to the point
+      where two-phase decoding was enabled.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..567b241f1e7 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -426,6 +426,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          The options <literal>failover</literal> and <literal>two_phase</literal>
+          cannot be enabled together when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true via <command>ALTER SUBSCRIPTION</command>,
+          once the slot has consumed all transactions up to the point where
+          two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..6e2830603ac 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,21 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("The \"%s\" option can be enabled after \"%s\" state is ready using ALTER SUBSCRIPTION.",
+						"failover", "two_phase"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1260,34 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover when the
+					 * two_phase state is still pending i.e., the replication
+					 * slot’s two_phase option has not yet been finalized.
+					 *
+					 * In most cases, the restriction in
+					 * ReplicationSlotAlter() is sufficient to prevent
+					 * enabling failover for a slot with two_phase enabled.
+					 * However, this additional check is necessary to handle a
+					 * race condition, when a user runs CREATE SUBSCRIPTION
+					 * with two_phase=true, but the slot's two_phase flag
+					 * hasn't been set yet, a concurrent attempt to do ALTER
+					 * SUBSCRIPTION(failover=true) may bypass the check in
+					 * ReplicationSlotAlter().
+					 *
+					 * For a detailed explanation of why enforcing this
+					 * restriction is important when combining two_phase and
+					 * failover, refer to the comments atop similar check in
+					 * ReplicationSlotCreate().
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"),
+								errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+										"failover", "two_phase"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..2421e03d088 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,18 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop the similar check in ReplicationSlotCreate() for
+		 * a detailed reason.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..6b763ae5dc6 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,26 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 *
+		 * This is because the two_phase_at field of a slot, which tracks the
+		 * LSN, from which two-phase decoding starts, is not synchronized to
+		 * standby servers. Without two_phase_at, the logical decoding might
+		 * incorrectly identify prepared transaction as already replicated to
+		 * the subscriber after promotion of standby server, causing them to
+		 * be skipped.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +868,23 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot
+	 * where slot's restart_lsn is less than two_phase_at. In such cases,
+	 * there is a risk that transactions prepared before two_phase_at exist
+	 * and would be skipped during decoding.
+	 *
+	 * See comments atop the similar check in ReplicationSlotCreate() for a
+	 * detailed reason.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..22f3779be3c 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -25,6 +25,8 @@ $publisher->init(
 $publisher->append_conf('postgresql.conf', 'autovacuum = off');
 $publisher->start;
 
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 $publisher->safe_psql('postgres',
 	"CREATE PUBLICATION regress_mypub FOR ALL TABLES;");
 
@@ -42,6 +44,8 @@ my $slot_creation_time_on_primary = $publisher->safe_psql(
     SELECT current_timestamp;
 ]);
 
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 # Create a subscription that enables failover.
 $subscriber1->safe_psql('postgres',
 	"CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, copy_data = false, failover = true, enabled = false);"
@@ -98,6 +102,123 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true)
+##################################################
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub2_slot, two_phase = true);"
+);
+
+# Wait for initial table sync to finish
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+$subscriber1->poll_query_until(
+	'postgres', qq[
+	SELECT count(1) = 0 FROM pg_subscription WHERE subname = 'regress_mysub2' and subtwophasestate NOT IN ('e');]
+) or die "Timed out while waiting for subscriber to enable twophase";
+
+# Try to enable the failover for the subscription, should give error
+($result, $stdout, $stderr) = $subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+ok( $stderr =~
+	  qr/ERROR:  could not alter replication slot "lsub2_slot": ERROR:  cannot enable failover for a two-phase enabled replication slot/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Alter subscription to enable failover
+($result, $stdout, $stderr) = $subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that SQL API disallows slot creation with both two_phase and failover enabled
+($result, $stdout, $stderr) = $publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, true, true);"
+);
+ok( $stderr =~
+	  /ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation/,
+	"cannot create slot with both two_phase and failover enabled");
+
+# Test that the failover option cannot be set for a subscription with
+# two_phase in pending state.
+
+# Create a subscription with two_phase enabled, but in pending state
+$subscriber1->psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (enabled = false, two_phase = true);"
+);
+
+# Enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub3 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state
+.*HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub4', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub4 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub4"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub4;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..66b1f5aa08f 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  The "failover" option can be enabled after "two_phase" state is ready using ALTER SUBSCRIPTION.
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#52Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Nisha Moond (#51)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 29, 2025 at 5:00 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Tue, Apr 29, 2025 at 2:51 PM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Apr 28, 2025 at 4:33 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the v9 patch, addressing the above and all other comments as well.

Thanks for the patch.

1)

+          The default is false. This option cannot be set together with
+          <literal>two_phase</literal> when creating the slot. However, once
+          the slot is created with <literal>two_phase=true</literal>, the
+          <literal>failover</literal> can be set to true after the
+          <literal>restart_lsn</literal> has advanced beyond its
+          <literal>two_phase_at</literal> value

As the user is not aware of two_phase_at, we shall change all the docs
to have a more user friendly explanation.

Done.

2)
# Confirm that the invalidated slot has been dropped.
$standby1->wait_for_log(
- qr/dropped replication slot "lsub1_slot" of database with OID
[0-9]+/, $log_offset);
+ qr/dropped replication slot "lsub1_slot" of database with OID [0-9]+/,
+ $log_offset);

I think this change is not needed.

The perltidy is auto-correcting it now, though it didn't for earlier
patches. I've kept the original for now.
~~~

Thanks for review, please find the updated patch v10.

Thank you for updating the patch! Here are some comments on v10.

We refer to the following comment in ReplicationSlotCreate() from many places:

/*
* Do not allow users to enable both failover and two_phase for slots.
*
* This is because the two_phase_at field of a slot, which tracks the
* LSN, from which two-phase decoding starts, is not synchronized to
* standby servers. Without two_phase_at, the logical decoding might
* incorrectly identify prepared transaction as already replicated to
* the subscriber after promotion of standby server, causing them to
* be skipped.
*
* However, both failover and two_phase enabled slots can be created
* during slot synchronization because we need to retain the same
* values as the remote slot.
*/

I think this is an important implementation restriction, so how about
describing the overall idea of combination of two_phase and failover
in the comment atop slotsync.c or worker.c? It would be easier to
find.

---
+   if (failover && MyReplicationSlot->data.two_phase &&
+       MyReplicationSlot->data.restart_lsn <
MyReplicationSlot->data.two_phase_at)
+       ereport(ERROR,
+               errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+               errmsg("cannot enable failover for a two-phase enabled
replication slot"),
+               errdetail("The slot need to consume change upto %X/%X
to enable failover.",
+
LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));

I think it's better to mention the reason in the error message why we
cannot enable two-phase in this situation. How about:

-               errmsg("cannot enable failover for a two-phase enabled
replication slot"),
+               errmsg("cannot enable failover for a two-phase enabled
replication slot due to unconsumed changes"),
---
+       ereport(ERROR,
+               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+               errmsg("cannot enable both \"%s\" and \"%s\" options
at CREATE SUBSCRIPTION",
+                      "failover", "two_phase"),
+               errhint("The \"%s\" option can be enabled after \"%s\"
state is ready using ALTER SUBSCRIPTION.",
+                       "failover", "two_phase"));

How about rewriting the errhint message to:

-               errhint("The \"%s\" option can be enabled after \"%s\"
state is ready using ALTER SUBSCRIPTION.",
-                       "failover", "two_phase"));
+               errhint("Use ALTER SUBSCRIPTION ... SET (failover =
true) to enable \"%s\" after two_phase state is ready",
+                       "failover"));
---
@@ -29073,9 +29073,13 @@ postgres=# SELECT '0/0'::pg_lsn +
pd.segment_number * ps.setting::int + :offset
         <parameter>failover</parameter>, when set to true,
         specifies that this slot is enabled to be synced to the
         standbys so that logical replication can be resumed after
-        failover. A call to this function has the same effect as
-        the replication protocol command
-        <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+        failover. The parameters <parameter>twophase</parameter> and
+        <parameter>failover</parameter> cannot be enabled together when
+        creating a replication slot. However, a slot created with
<parameter>twophase</parameter>
+        enabled can later have <parameter>failover</parameter> set to true via
+        <command>ALTER SUBSCRIPTION</command>, if a subscription is using this
+        slot. A call to this function has the same effect as the replication
+        protocol command <literal>CREATE_REPLICATION_SLOT ...
LOGICAL</literal>.
        </para></entry>
       </row>

Can we also mention ALTER_REPLICATION to enable the failover for slots
that are not used by subscriptions?

---
+# Also wait for two-phase to be enabled
+$subscriber1->poll_query_until(
+   'postgres', qq[
+   SELECT count(1) = 0 FROM pg_subscription WHERE subname =
'regress_mysub2' and subtwophasestate NOT IN ('e');]
+) or die "Timed out while waiting for subscriber to enable twophase";

Since we check the twophasestate of the specific slot, how about doing
something like:

+ok( $subscriber1->poll_query_until(
+   'postgres',
+       qq[SELECT subtwophasestate FROM pg_subscription WHERE subname
= 'regress_mysub2';],
+   'e',
+    ),
+    "Timed out while waiting for subscriber to enable twophase");
---
+
+# Also wait for two-phase to be enabled
+$subscriber1->poll_query_until(
+   'postgres', qq[
+   SELECT count(1) = 0 FROM pg_subscription WHERE subname =
'regress_mysub2' and subtwophasestate NOT IN ('e');]
+) or die "Timed out while waiting for subscriber to enable twophase";
+
+# Try to enable the failover for the subscription, should give error
+($result, $stdout, $stderr) = $subscriber1->psql(
+   'postgres',
+   "ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+    ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+ok( $stderr =~
+     qr/ERROR:  could not alter replication slot "lsub2_slot": ERROR:
 cannot enable failover for a two-phase enabled replication slot/,
+   "failover can't be enabled if restart_lsn < two_phase_at on a
two_phase subscription."
+);

I think it's possible between two tests that the walsender consumes
some changes (e.g. generated by autovacuums) then the second check
fails (i.e., ALTER SUBSCRIPTION SET (failover = ture) succeeds).

---
+# Test that SQL API disallows slot creation with both two_phase and
failover enabled
+($result, $stdout, $stderr) = $publisher->psql('postgres',
+   "SELECT pg_create_logical_replication_slot('regress_mysub3',
'pgoutput', false, true, true);"
+);
+ok( $stderr =~
+     /ERROR:  cannot enable both "failover" and "two_phase" options
during replication slot creation/,
+   "cannot create slot with both two_phase and failover enabled");

Isn't this test already covered by test_decoding/sql/slot.sql?

I've attached a patch that contains comment changes I mentioned above.
Please incorporate it if you agree with them.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

comment_improvement_v10.patchapplication/octet-stream; name=comment_improvement_v10.patchDownload
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 6e2830603ac..d78a13f27ac 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -650,18 +650,15 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 
 	/*
 	 * Do not allow users to enable the failover and two_phase options
-	 * together.
-	 *
-	 * See comments atop the similar check in ReplicationSlotCreate() for a
-	 * detailed reason.
+	 * together. See comments atop slotsync.c for details.
 	 */
 	if (opts.twophase && opts.failover)
 		ereport(ERROR,
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
 					   "failover", "two_phase"),
-				errhint("The \"%s\" option can be enabled after \"%s\" state is ready using ALTER SUBSCRIPTION.",
-						"failover", "two_phase"));
+				errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable \"%s\" after two_phase state is ready",
+						"failover"));
 
 	/*
 	 * If built with appropriate switch, whine when regression-testing
@@ -1261,24 +1258,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 										"failover")));
 
 					/*
-					 * Do not allow users to enable the failover when the
-					 * two_phase state is still pending i.e., the replication
-					 * slot’s two_phase option has not yet been finalized.
-					 *
-					 * In most cases, the restriction in
-					 * ReplicationSlotAlter() is sufficient to prevent
-					 * enabling failover for a slot with two_phase enabled.
-					 * However, this additional check is necessary to handle a
-					 * race condition, when a user runs CREATE SUBSCRIPTION
-					 * with two_phase=true, but the slot's two_phase flag
-					 * hasn't been set yet, a concurrent attempt to do ALTER
-					 * SUBSCRIPTION(failover=true) may bypass the check in
-					 * ReplicationSlotAlter().
-					 *
-					 * For a detailed explanation of why enforcing this
-					 * restriction is important when combining two_phase and
-					 * failover, refer to the comments atop similar check in
-					 * ReplicationSlotCreate().
+					 * Do not allow users to enable the failover until two_phase
+					 * state reaches READY state.
 					 */
 					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
 						opts.failover)
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 6b763ae5dc6..1834c86aaac 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -346,13 +346,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 
 		/*
 		 * Do not allow users to enable both failover and two_phase for slots.
-		 *
-		 * This is because the two_phase_at field of a slot, which tracks the
-		 * LSN, from which two-phase decoding starts, is not synchronized to
-		 * standby servers. Without two_phase_at, the logical decoding might
-		 * incorrectly identify prepared transaction as already replicated to
-		 * the subscriber after promotion of standby server, causing them to
-		 * be skipped.
+		 * Please see the comments atop slotsync.c for details.
 		 *
 		 * However, both failover and two_phase enabled slots can be created
 		 * during slot synchronization because we need to retain the same
@@ -870,18 +864,14 @@ ReplicationSlotAlter(const char *name, bool failover)
 
 	/*
 	 * Do not allow users to enable failover for a two_phase enabled slot
-	 * where slot's restart_lsn is less than two_phase_at. In such cases,
-	 * there is a risk that transactions prepared before two_phase_at exist
-	 * and would be skipped during decoding.
-	 *
-	 * See comments atop the similar check in ReplicationSlotCreate() for a
-	 * detailed reason.
+	 * if there are potentially un-decoded transactions that are prepared
+	 * before two_phase_at. See comments atop slotsync.c for details.
 	 */
 	if (failover && MyReplicationSlot->data.two_phase &&
 		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
 		ereport(ERROR,
 				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				errmsg("cannot enable failover for a two-phase enabled replication slot"),
+				errmsg("cannot enable failover for a two-phase enabled replication slot due to unconsumed changes"),
 				errdetail("The slot need to consume change upto %X/%X to enable failover.",
 						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
 
#53Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#50)
RE: Fix slot synchronization with two_phase decoding enabled

On Tue, Apr 29, 2025 at 6:57 PM Amit Kapila wrote:

On Tue, Apr 29, 2025 at 3:01 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

Your fix looks good to me. Is it worth considering putting an
assertion to verify if new two_phase_at is equal to or greater than
confirmed_lsn (or at least it doesn't go backward), when syncing
two_phase_at?

Yeah, it makes sense. But the condition should be reverse (two_phase_at
should be less than or equal to confirmed_flush). I have done that, changed a
few comments, and committed the patch.

I noticed a BF failure[1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=calliphoridae&amp;dt=2025-04-29%2009%3A11%3A31&amp;stg=recovery-check related to the fix, where the recently added assertion
Assert(slot->data.two_phase_at <= slot->data.confirmed_flush) was broken. After
examining the logs and code, I couldn't identify any functionality issues. Here is
my analysis:

AFAICS, the last slotsync cycle should have retrieved the latest confirmed_flush and
two_phase_at from the source slot, both expected to be 0/6005150. Here are
some details:

The standby's log[1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=calliphoridae&amp;dt=2025-04-29%2009%3A11%3A31&amp;stg=recovery-check shows the source slot's two_phase_at as 0/6005150.
Meanwhile, the publisher's log reveals that the source slot's confirmed_flush
was already 0/6005150 before the last slotsync. Therefore, the last slotsync
cycle should have fetched confirmed_flush: 0/6005150, two_phase_at: 0/6005150.

If we assume remote_slot->confirmed_lsn during the last sync cycle is
0/6005150, then either the local slot's confirmed_lsn had already been updated
to this value in a prior sync, leading it to skip the update in the last cycle
(in which case, the assertion shouldn't be broken), or it should enter the slot
update branch to set the synced slot to 0/6005150 (see
update_local_synced_slot() for details). Thus, theoretically, the assertion
wouldn't fail.

Beyond functionality problems, another possibility might be the CPU reordered
memory access, causing the assertion to execute before updating
confirmed_flush. However, we lack the information needed to verify this, so one
idea is to add some DEBUG message in update_local_synced_slot() to facilitate
tracking if the issue recurs.

I'll continue to investigate, but any suggestions or insights would be greatly
appreciated.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=calliphoridae&amp;dt=2025-04-29%2009%3A11%3A31&amp;stg=recovery-check

[2]: 2025-04-29 09:18:05.641 UTC [1023004][client backend][0/2:0] LOG: statement: SELECT two_phase AND '0/6005150' = two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;
2025-04-29 09:18:05.641 UTC [1023004][client backend][0/2:0] LOG: statement: SELECT two_phase AND '0/6005150' = two_phase_at from pg_replication_slots WHERE slot_name = 'lsub1_slot' AND synced AND NOT temporary;

[3]: 2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] LOG: 0/6004FC8 has been already streamed, forwarding to 0/6005150 2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] STATEMENT: START_REPLICATION SLOT "lsub1_slot" LOGICAL 0/6004FC8 (proto_version '4', streaming 'parallel', two_phase 'on', origin 'any', publication_names '"regress_mypub"') 2025-04-29 09:18:05.489 UTC [1022911][client backend][:0] LOG: disconnection: session time: 0:00:00.005 user=bf database=postgres host=[local] 2025-04-29 09:18:05.516 UTC [1022913][walsender][29/0:0] LOG: starting logical decoding for slot "lsub1_slot" 2025-04-29 09:18:05.516 UTC [1022913][walsender][29/0:0] DETAIL: Streaming transactions committing after 0/6005150, reading WAL from 0/6004A00. 2025-04-29 09:18:05.516 UTC [1022913][walsender][29/0:0] STATEMENT: START_REPLICATION SLOT "lsub1_slot" LOGICAL 0/6004FC8 (proto_version '4', streaming 'parallel', two_phase 'on', origin 'any', publication_names '"regress_mypub"') 2025-04-29 09:18:05.518 UTC [1022687][client backend][9/19:0] LOG: statement: SELECT slot_name, plugin, confirmed_flush_lsn, restart_lsn, catalog_xmin, two_phase, two_phase_at, failover, database, invalidation_reason FROM pg_catalog.pg_replication_slots WHERE failover and NOT temporary 2025-04-29 09:18:05.524 UTC [1022928][not initialized][:0] LOG: connection received: host=[local]
2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] LOG: 0/6004FC8 has been already streamed, forwarding to 0/6005150
2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] STATEMENT: START_REPLICATION SLOT "lsub1_slot" LOGICAL 0/6004FC8 (proto_version '4', streaming 'parallel', two_phase 'on', origin 'any', publication_names '"regress_mypub"')
2025-04-29 09:18:05.489 UTC [1022911][client backend][:0] LOG: disconnection: session time: 0:00:00.005 user=bf database=postgres host=[local]
2025-04-29 09:18:05.516 UTC [1022913][walsender][29/0:0] LOG: starting logical decoding for slot "lsub1_slot"
2025-04-29 09:18:05.516 UTC [1022913][walsender][29/0:0] DETAIL: Streaming transactions committing after 0/6005150, reading WAL from 0/6004A00.
2025-04-29 09:18:05.516 UTC [1022913][walsender][29/0:0] STATEMENT: START_REPLICATION SLOT "lsub1_slot" LOGICAL 0/6004FC8 (proto_version '4', streaming 'parallel', two_phase 'on', origin 'any', publication_names '"regress_mypub"')
2025-04-29 09:18:05.518 UTC [1022687][client backend][9/19:0] LOG: statement: SELECT slot_name, plugin, confirmed_flush_lsn, restart_lsn, catalog_xmin, two_phase, two_phase_at, failover, database, invalidation_reason FROM pg_catalog.pg_replication_slots WHERE failover and NOT temporary
2025-04-29 09:18:05.524 UTC [1022928][not initialized][:0] LOG: connection received: host=[local]

Best Regards,
Hou zj

#54Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#53)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Apr 30, 2025 at 2:38 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Tue, Apr 29, 2025 at 6:57 PM Amit Kapila wrote:

On Tue, Apr 29, 2025 at 3:01 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

Your fix looks good to me. Is it worth considering putting an
assertion to verify if new two_phase_at is equal to or greater than
confirmed_lsn (or at least it doesn't go backward), when syncing
two_phase_at?

Yeah, it makes sense. But the condition should be reverse (two_phase_at
should be less than or equal to confirmed_flush). I have done that, changed a
few comments, and committed the patch.

I noticed a BF failure[1] related to the fix, where the recently added assertion
Assert(slot->data.two_phase_at <= slot->data.confirmed_flush) was broken. After
examining the logs and code, I couldn't identify any functionality issues.

Yeah, that's weird.

AFAICS, the last slotsync cycle should have retrieved the latest confirmed_flush and
two_phase_at from the source slot, both expected to be 0/6005150. Here are
some details:

The standby's log[1] shows the source slot's two_phase_at as 0/6005150.
Meanwhile, the publisher's log reveals that the source slot's confirmed_flush
was already 0/6005150 before the last slotsync. Therefore, the last slotsync
cycle should have fetched confirmed_flush: 0/6005150, two_phase_at: 0/6005150.

If we assume remote_slot->confirmed_lsn during the last sync cycle is
0/6005150, then either the local slot's confirmed_lsn had already been updated
to this value in a prior sync, leading it to skip the update in the last cycle
(in which case, the assertion shouldn't be broken), or it should enter the slot
update branch to set the synced slot to 0/6005150 (see
update_local_synced_slot() for details). Thus, theoretically, the assertion
wouldn't fail.

Agreed with your analysis. After enabling the subscription with
two_phase=true, the primary server has the following logs that
indicate that logical decoding started from 0/6005150, not 0/6004FC8.
Given that the slot's two_phase was enabled at this time, the slot's
confirmed_flush and two_phase_at were 0/6005150.

2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] LOG:
received replication command: START_REPLICATION SLOT "lsub1_slot"
LOGICAL 0/6004FC8 (proto_version '4', streaming 'parallel', two_phase
'on', origin 'any', publication_names '"regress_mypub"')
2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] STATEMENT:
START_REPLICATION SLOT "lsub1_slot" LOGICAL 0/6004FC8 (proto_version
'4', streaming 'parallel', two_phase 'on', origin 'any',
publication_names '"regress_mypub"')
2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] LOG:
acquired logical replication slot "lsub1_slot"
2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] STATEMENT:
START_REPLICATION SLOT "lsub1_slot" LOGICAL 0/6004FC8 (proto_version
'4', streaming 'parallel', two_phase 'on', origin 'any',
publication_names '"regress_mypub"')
2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] LOG:
0/6004FC8 has been already streamed, forwarding to 0/6005150
2025-04-29 09:18:05.487 UTC [1022913][walsender][29/0:0] STATEMENT:
START_REPLICATION SLOT "lsub1_slot" LOGICAL 0/6004FC8 (proto_version
'4', streaming 'parallel', two_phase 'on', origin 'any',
publication_names '"regress_mypub"')
2025-04-29 09:18:05.516 UTC [1022913][walsender][29/0:0] LOG:
starting logical decoding for slot "lsub1_slot"
2025-04-29 09:18:05.516 UTC [1022913][walsender][29/0:0] DETAIL:
Streaming transactions committing after 0/6005150, reading WAL from
0/6004A00.
2025-04-29 09:18:05.516 UTC [1022913][walsender][29/0:0] STATEMENT:
START_REPLICATION SLOT "lsub1_slot" LOGICAL 0/6004FC8 (proto_version
'4', streaming 'parallel', two_phase 'on', origin 'any',
publication_names '"regress_mypub"')

Beyond functionality problems, another possibility might be the CPU reordered
memory access, causing the assertion to execute before updating
confirmed_flush.

Not sure CPU reordered memory access could matter in this case but I
don't have other ideas of possible causes.

However, we lack the information needed to verify this, so one
idea is to add some DEBUG message in update_local_synced_slot() to facilitate
tracking if the issue recurs.

That would be a good idea.

Also, it's unrelated to this issue but probably we are better off
doing this assertion check only when enabling two_phase. The assertion
is currently checked even when disabling the two_phase, which seems
not to make sense, and we don't clear two_phase_at value when
disabling two_phase by ReplicationSlotAlter().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#55Nisha Moond
nisha.moond412@gmail.com
In reply to: Masahiko Sawada (#52)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Apr 30, 2025 at 5:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 29, 2025 at 5:00 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Thank you for updating the patch! Here are some comments on v10.

Thanks for reviewing the patch!

---
+
+# Also wait for two-phase to be enabled
+$subscriber1->poll_query_until(
+   'postgres', qq[
+   SELECT count(1) = 0 FROM pg_subscription WHERE subname =
'regress_mysub2' and subtwophasestate NOT IN ('e');]
+) or die "Timed out while waiting for subscriber to enable twophase";
+
+# Try to enable the failover for the subscription, should give error
+($result, $stdout, $stderr) = $subscriber1->psql(
+   'postgres',
+   "ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+    ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+ok( $stderr =~
+     qr/ERROR:  could not alter replication slot "lsub2_slot": ERROR:
cannot enable failover for a two-phase enabled replication slot/,
+   "failover can't be enabled if restart_lsn < two_phase_at on a
two_phase subscription."
+);

I think it's possible between two tests that the walsender consumes
some changes (e.g. generated by autovacuums) then the second check
fails (i.e., ALTER SUBSCRIPTION SET (failover = ture) succeeds).

Yes, this is a possibility. To account for this negative test case
where restart_lsn < two_phase_at, I updated the test to attempt
altering a two_phase-enabled slot without any associated subscription.

---
+# Test that SQL API disallows slot creation with both two_phase and
failover enabled
+($result, $stdout, $stderr) = $publisher->psql('postgres',
+   "SELECT pg_create_logical_replication_slot('regress_mysub3',
'pgoutput', false, true, true);"
+);
+ok( $stderr =~
+     /ERROR:  cannot enable both "failover" and "two_phase" options
during replication slot creation/,
+   "cannot create slot with both two_phase and failover enabled");

Isn't this test already covered by test_decoding/sql/slot.sql?

Yes, it is covered in slot.sql, hence removed from here.

I've attached a patch that contains comment changes I mentioned above.
Please incorporate it if you agree with them.

I have incorporated the suggested changes and updated a couple more
places accordingly.
~~~

Please find the v11 patch addressing the above points and all other
comments. I have also optimized the test by reducing the number of
subscriptions and slots.

--
Thanks,
Nisha

Attachments:

v11-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/octet-stream; name=v11-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From 9e543eca8cae7724e51f65175e9f68461aca3065 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Mon, 7 Apr 2025 09:36:29 +0530
Subject: [PATCH v11] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  11 +-
 doc/src/sgml/protocol.sgml                    |  14 ++-
 doc/src/sgml/ref/alter_subscription.sgml      |   8 ++
 doc/src/sgml/ref/create_subscription.sgml     |   9 ++
 src/backend/commands/subscriptioncmds.c       |  25 ++++
 src/backend/replication/logical/logical.c     |  11 ++
 src/backend/replication/logical/slotsync.c    |  10 ++
 src/backend/replication/slot.c                |  27 +++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |   6 +-
 .../t/040_standby_failover_slots_sync.pl      | 112 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 14 files changed, 236 insertions(+), 8 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f441ec43314..8802cb19264 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29073,9 +29073,14 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         <parameter>failover</parameter>, when set to true,
         specifies that this slot is enabled to be synced to the
         standbys so that logical replication can be resumed after
-        failover. A call to this function has the same effect as
-        the replication protocol command
-        <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+        failover. The parameters <parameter>twophase</parameter> and
+        <parameter>failover</parameter> cannot be enabled together when
+        creating a replication slot. However, a slot created with <parameter>twophase</parameter>
+        enabled can later have <parameter>failover</parameter> set to true
+        either using <command>ALTER_REPLICATION_SLOT</command> or via
+        <command>ALTER SUBSCRIPTION</command> if a subscription is using this
+        slot. A call to this function has the same effect as the replication
+        protocol command <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..c87c60b74a6 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2060,7 +2060,12 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           and <literal>ROLLBACK PREPARED</literal> are decoded and transmitted.
           The transaction will be decoded and transmitted at
           <literal>PREPARE TRANSACTION</literal> time.
-          The default is false.
+          The default is false. This option cannot be set together with
+          <literal>failover</literal> when creating a logical replication slot.
+          However, once the slot is created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all transactions up to the point where two-phase decoding
+          was enabled.
          </para>
         </listitem>
        </varlistentry>
@@ -2101,7 +2106,12 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
          <para>
           If true, the slot is enabled to be synced to the standbys
           so that logical replication can be resumed after failover.
-          The default is false.
+          The default is false. This option cannot be set together with
+          <literal>two_phase</literal> when creating the slot. However, once
+          the slot is created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all transactions up to the point where two-phase decoding
+          was enabled.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..00e1c87e9c9 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,14 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When <literal>two_phase</literal> is in the pending state, altering
+      <literal>failover = true</literal> is not permitted. Once
+      <literal>two_phase</literal> is enabled, <literal>failover = true</literal>
+      can be set only if the slot has consumed all transactions up to the point
+      where two-phase decoding was enabled.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..567b241f1e7 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -426,6 +426,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          The options <literal>failover</literal> and <literal>two_phase</literal>
+          cannot be enabled together when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true via <command>ALTER SUBSCRIPTION</command>,
+          once the slot has consumed all transactions up to the point where
+          two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..54afb3f2e42 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,18 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together. See comments atop slotsync.c for details.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable \"%s\" after two_phase state is ready",
+						"failover"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1257,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover until
+					 * two_phase state reaches READY state. See comments atop
+					 * slotsync.c for details.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"),
+								errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+										"failover", "two_phase"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..c33024dad19 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,17 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop slotsync.c for details.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..f5f2bcc708b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,20 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 * Please see the comments atop slotsync.c for details.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +862,19 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are potentially un-decoded transactions that are prepared before
+	 * two_phase_at. See comments atop slotsync.c for details.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot due to unconsumed changes"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..e329cc609ce 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -193,8 +193,8 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
 
 # Update the connection
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..8dd0e9e6b51 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -25,6 +25,8 @@ $publisher->init(
 $publisher->append_conf('postgresql.conf', 'autovacuum = off');
 $publisher->start;
 
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 $publisher->safe_psql('postgres',
 	"CREATE PUBLICATION regress_mypub FOR ALL TABLES;");
 
@@ -42,6 +44,8 @@ my $slot_creation_time_on_primary = $publisher->safe_psql(
     SELECT current_timestamp;
 ]);
 
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 # Create a subscription that enables failover.
 $subscriber1->safe_psql('postgres',
 	"CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, copy_data = false, failover = true, enabled = false);"
@@ -98,6 +102,114 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true).
+##################################################
+
+# Test that the failover option cannot be set for a slot with
+# restart_lsn < two_phase_at
+
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub2', 'pgoutput', false, true, false);"
+);
+
+# Try to enable failover of the slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT regress_mysub2 (failover);],
+	replication => 'database');
+ok( $stderr =~
+	  qr/ERROR:  cannot enable failover for a two-phase enabled replication slot due to unconsumed changes/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+# Create a subscription with two_phase enabled, using above slot
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, enabled = false, two_phase = true);"
+);
+
+# Try to enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state
+.*HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Enable the subscription
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+ok( $subscriber1->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_mysub2';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Alter subscription to enable failover
+($result, $stdout, $stderr) = $subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'regress_mysub2';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub3"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..8d4d55c802d 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  Use ALTER SUBSCRIPTION ... SET (failover = true) to enable "failover" after two_phase state is ready
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#56shveta malik
shveta.malik@gmail.com
In reply to: Nisha Moond (#55)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, May 2, 2025 at 12:57 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the v11 patch addressing the above points and all other
comments. I have also optimized the test by reducing the number of
subscriptions and slots.

Thanks for the patch. Few comments:

1)

pg_upgrade/t/003_logical_slots.pl:

- "SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');

With this change, the failover property test during upgrade is
removed. Shall we add it again either using ALTER SUB to enable
failover and two_phase together or a new subscription with failover
alone?

2)
+          The default is false. This option cannot be set together with
+          <literal>failover</literal> when creating a logical replication slot.
+          However, once the slot is created with <literal>two_phase =
true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all transactions up to the point where two-phase decoding
+          was enabled.

Suggestion: all transactions --> all the changes.

thanks
Shveta

#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: shveta malik (#56)
Re: Fix slot synchronization with two_phase decoding enabled

BF member mule just showed what seems an identical failure [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&amp;dt=2025-05-03%2015%3A30%3A02:

2025-05-03 17:46:20.088 CEST [24308:3] LOG: database system is ready to accept read-only connections
2025-05-03 17:46:20.091 CEST [24321:1] LOG: slot sync worker started
2025-05-03 17:46:20.100 CEST [24322:1] LOG: started streaming WAL from primary at 0/6000000 on timeline 1
2025-05-03 17:46:20.117 CEST [24321:2] LOG: starting logical decoding for slot "lsub1_slot"
2025-05-03 17:46:20.117 CEST [24321:3] DETAIL: Streaming transactions committing after 0/60049C8, reading WAL from 0/5000048.
2025-05-03 17:46:20.117 CEST [24321:4] LOG: logical decoding found consistent point at 0/5000048
2025-05-03 17:46:20.117 CEST [24321:5] DETAIL: There are no running transactions.
TRAP: failed Assert("slot->data.two_phase_at <= slot->data.confirmed_flush"), File: "slotsync.c", Line: 311, PID: 24321
postgres: standby1: slotsync worker (ExceptionalCondition+0x59)[0x556c3b8cb3e9]
postgres: standby1: slotsync worker (+0x4cb100)[0x556c3b706100]
postgres: standby1: slotsync worker (+0x4cba4c)[0x556c3b706a4c]
postgres: standby1: slotsync worker (ReplSlotSyncWorkerMain+0x258)[0x556c3b707598]
postgres: standby1: slotsync worker (postmaster_child_launch+0x102)[0x556c3b6d6962]
postgres: standby1: slotsync worker (+0x49daea)[0x556c3b6d8aea]
postgres: standby1: slotsync worker (+0x49fb3d)[0x556c3b6dab3d]
postgres: standby1: slotsync worker (PostmasterMain+0xd3f)[0x556c3b6dc04f]
postgres: standby1: slotsync worker (main+0x1ca)[0x556c3b3bee1a]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7fe08da4824a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7fe08da48305]
postgres: standby1: slotsync worker (_start+0x21)[0x556c3b3bf421]
2025-05-03 17:46:20.337 CEST [24308:4] LOG: slot sync worker process (PID 24321) was terminated by signal 6: Aborted

regards, tom lane

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&amp;dt=2025-05-03%2015%3A30%3A02

#58Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Tom Lane (#57)
Re: Fix slot synchronization with two_phase decoding enabled

On Sat, May 3, 2025 at 10:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

BF member mule just showed what seems an identical failure [1]:

2025-05-03 17:46:20.088 CEST [24308:3] LOG: database system is ready to accept read-only connections
2025-05-03 17:46:20.091 CEST [24321:1] LOG: slot sync worker started
2025-05-03 17:46:20.100 CEST [24322:1] LOG: started streaming WAL from primary at 0/6000000 on timeline 1
2025-05-03 17:46:20.117 CEST [24321:2] LOG: starting logical decoding for slot "lsub1_slot"
2025-05-03 17:46:20.117 CEST [24321:3] DETAIL: Streaming transactions committing after 0/60049C8, reading WAL from 0/5000048.
2025-05-03 17:46:20.117 CEST [24321:4] LOG: logical decoding found consistent point at 0/5000048
2025-05-03 17:46:20.117 CEST [24321:5] DETAIL: There are no running transactions.
TRAP: failed Assert("slot->data.two_phase_at <= slot->data.confirmed_flush"), File: "slotsync.c", Line: 311, PID: 24321
postgres: standby1: slotsync worker (ExceptionalCondition+0x59)[0x556c3b8cb3e9]
postgres: standby1: slotsync worker (+0x4cb100)[0x556c3b706100]
postgres: standby1: slotsync worker (+0x4cba4c)[0x556c3b706a4c]
postgres: standby1: slotsync worker (ReplSlotSyncWorkerMain+0x258)[0x556c3b707598]
postgres: standby1: slotsync worker (postmaster_child_launch+0x102)[0x556c3b6d6962]
postgres: standby1: slotsync worker (+0x49daea)[0x556c3b6d8aea]
postgres: standby1: slotsync worker (+0x49fb3d)[0x556c3b6dab3d]
postgres: standby1: slotsync worker (PostmasterMain+0xd3f)[0x556c3b6dc04f]
postgres: standby1: slotsync worker (main+0x1ca)[0x556c3b3bee1a]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7fe08da4824a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7fe08da48305]
postgres: standby1: slotsync worker (_start+0x21)[0x556c3b3bf421]
2025-05-03 17:46:20.337 CEST [24308:4] LOG: slot sync worker process (PID 24321) was terminated by signal 6: Aborted

While I cannot be entirely certain of my analysis, I believe the root
cause might be related to the backward movement of the confirmed_flush
LSN. The following scenario seems possible:

1. The walsender enables the two_phase and sets two_phase_at (which
should be the same as confirmed_flush).
2. The slot's confirmed_flush regresses for some reason.
3. The slotsync worker retrieves the remote slot information and
enables two_phase for the local slot.

If I recall correctly, we previously discussed[1]/messages/by-id/85fff40e-148b-4e86-b921-b4b846289132@vondra.me how a slot's
confirmed_flush LSN could move backward depending on the feedback
messages received from the downstream system. Based on the following
primary server's logs, it appears possible that the subscriber sent
feedback LSNs older than the slot's confirmed_flush, consequently
causing the confirmed_flush to move backward. However, I should note
that this is currently a hypothesis, as I haven't yet been able to
reproduce and verify this scenario.

Regards,

[1]: /messages/by-id/85fff40e-148b-4e86-b921-b4b846289132@vondra.me

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#59Nisha Moond
nisha.moond412@gmail.com
In reply to: shveta malik (#56)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, May 2, 2025 at 3:05 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, May 2, 2025 at 12:57 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the v11 patch addressing the above points and all other
comments. I have also optimized the test by reducing the number of
subscriptions and slots.

Thanks for the patch. Few comments:

1)

pg_upgrade/t/003_logical_slots.pl:

- "SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');

With this change, the failover property test during upgrade is
removed. Shall we add it again either using ALTER SUB to enable
failover and two_phase together or a new subscription with failover
alone?

If ALTER SUB is used to set failover for the existing subscription,
then pg_upgrade will fail while trying to create a slot with both
failover and two_phase on the upgraded node $newpub.
So, I've implemented the other suggestion by adding a separate
pub-subs to verify the failover property.

2)
+          The default is false. This option cannot be set together with
+          <literal>failover</literal> when creating a logical replication slot.
+          However, once the slot is created with <literal>two_phase =
true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all transactions up to the point where two-phase decoding
+          was enabled.

Suggestion: all transactions --> all the changes.

Done.
~~~

Please find the v12 patch with above suggested changes.

--
Thanks,
Nisha

Attachments:

v12-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/x-patch; name=v12-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From 3de761aa43578952543974af48178234ac37e88d Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Mon, 7 Apr 2025 09:36:29 +0530
Subject: [PATCH v12] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  11 +-
 doc/src/sgml/protocol.sgml                    |  14 ++-
 doc/src/sgml/ref/alter_subscription.sgml      |   8 ++
 doc/src/sgml/ref/create_subscription.sgml     |   9 ++
 src/backend/commands/subscriptioncmds.c       |  25 ++++
 src/backend/replication/logical/logical.c     |  11 ++
 src/backend/replication/logical/slotsync.c    |  10 ++
 src/backend/replication/slot.c                |  27 +++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  50 ++++++--
 .../t/040_standby_failover_slots_sync.pl      | 112 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 14 files changed, 272 insertions(+), 16 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f441ec43314..8802cb19264 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29073,9 +29073,14 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         <parameter>failover</parameter>, when set to true,
         specifies that this slot is enabled to be synced to the
         standbys so that logical replication can be resumed after
-        failover. A call to this function has the same effect as
-        the replication protocol command
-        <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+        failover. The parameters <parameter>twophase</parameter> and
+        <parameter>failover</parameter> cannot be enabled together when
+        creating a replication slot. However, a slot created with <parameter>twophase</parameter>
+        enabled can later have <parameter>failover</parameter> set to true
+        either using <command>ALTER_REPLICATION_SLOT</command> or via
+        <command>ALTER SUBSCRIPTION</command> if a subscription is using this
+        slot. A call to this function has the same effect as the replication
+        protocol command <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..c36977e2d60 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2060,7 +2060,12 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           and <literal>ROLLBACK PREPARED</literal> are decoded and transmitted.
           The transaction will be decoded and transmitted at
           <literal>PREPARE TRANSACTION</literal> time.
-          The default is false.
+          The default is false. This option cannot be set together with
+          <literal>failover</literal> when creating a logical replication slot.
+          However, once the slot is created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.
          </para>
         </listitem>
        </varlistentry>
@@ -2101,7 +2106,12 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
          <para>
           If true, the slot is enabled to be synced to the standbys
           so that logical replication can be resumed after failover.
-          The default is false.
+          The default is false. This option cannot be set together with
+          <literal>two_phase</literal> when creating the slot. However, once
+          the slot is created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..bf9ba6d3c89 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,14 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When <literal>two_phase</literal> is in the pending state, altering
+      <literal>failover = true</literal> is not permitted. Once
+      <literal>two_phase</literal> is enabled, <literal>failover = true</literal>
+      can be set only if the slot has consumed all the changes up to the point
+      where two-phase decoding was enabled.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..c2e6c73ce21 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -426,6 +426,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          The options <literal>failover</literal> and <literal>two_phase</literal>
+          cannot be enabled together when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true via <command>ALTER SUBSCRIPTION</command>,
+          once the slot has consumed all the changes up to the point where
+          two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..54afb3f2e42 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,18 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together. See comments atop slotsync.c for details.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable \"%s\" after two_phase state is ready",
+						"failover"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1257,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover until
+					 * two_phase state reaches READY state. See comments atop
+					 * slotsync.c for details.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"),
+								errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+										"failover", "two_phase"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..c33024dad19 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,17 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop slotsync.c for details.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..f5f2bcc708b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,20 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 * Please see the comments atop slotsync.c for details.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +862,19 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are potentially un-decoded transactions that are prepared before
+	 * two_phase_at. See comments atop slotsync.c for details.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot due to unconsumed changes"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..36c3b73677e 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -106,6 +106,7 @@ $oldpub->start;
 $oldpub->safe_psql(
 	'postgres', qq[
 		CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+		CREATE TABLE tbl2 AS SELECT generate_series(1, 10) AS a;
 		SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
 		SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
 ]);
@@ -162,7 +163,8 @@ $oldpub->safe_psql(
 	'postgres', qq[
 	SELECT * FROM pg_drop_replication_slot('test_slot1');
 	SELECT * FROM pg_drop_replication_slot('test_slot2');
-	CREATE PUBLICATION regress_pub FOR ALL TABLES;
+	CREATE PUBLICATION regress_pub FOR TABLE tbl;
+	CREATE PUBLICATION regress_pub2 FOR TABLE tbl2;
 ]);
 
 # Initialize subscriber cluster
@@ -173,18 +175,23 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE TABLE tbl2 (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true');
+	CREATE SUBSCRIPTION regress_sub2 CONNECTION '$old_connstr' PUBLICATION regress_pub2 WITH (failover = 'true');
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
+$sub->wait_for_subscription_sync($oldpub, 'regress_sub2');
 
 # Also wait for two-phase to be enabled
-my $twophase_query =
-  "SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');";
-$sub->poll_query_until('postgres', $twophase_query)
-  or die "Timed out while waiting for subscriber to enable twophase";
+ok( $sub->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_sub';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
 
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub2 DISABLE");
 $oldpub->stop;
 
 # pg_upgrade should be successful
@@ -193,10 +200,17 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots where slot_name = 'regress_sub'"
+);
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Check that the slot 'regress_sub2' has migrated to the new cluster
+$result = $newpub->safe_psql('postgres',
+	"SELECT slot_name, failover FROM pg_replication_slots where slot_name = 'regress_sub2'"
+);
+is($result, qq(regress_sub2|t), 'check the slot exists on new cluster');
 
-# Update the connection
+# Update the connection for regress_sub
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
 $sub->safe_psql(
 	'postgres', qq[
@@ -204,13 +218,27 @@ $sub->safe_psql(
 	ALTER SUBSCRIPTION regress_sub ENABLE;
 ]);
 
+# Update the connection for regress_sub2
+$sub->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub2 CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub2 ENABLE;
+]);
+
 # Check whether changes on the new publisher get replicated to the subscriber
-$newpub->safe_psql('postgres',
-	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$newpub->safe_psql(
+	'postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20));
+	INSERT INTO tbl2 VALUES (generate_series(11, 20));");
+
 $newpub->wait_for_catchup('regress_sub');
 $result = $sub->safe_psql('postgres', "SELECT count(*) FROM tbl");
 is($result, qq(20), 'check changes are replicated to the sub');
 
+$newpub->wait_for_catchup('regress_sub2');
+$result = $sub->safe_psql('postgres', "SELECT count(*) FROM tbl2");
+is($result, qq(20), 'check changes are replicated to the sub2');
+
 # Clean up
 $sub->stop();
 $newpub->stop();
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..a78c4d2bd95 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -25,6 +25,8 @@ $publisher->init(
 $publisher->append_conf('postgresql.conf', 'autovacuum = off');
 $publisher->start;
 
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 $publisher->safe_psql('postgres',
 	"CREATE PUBLICATION regress_mypub FOR ALL TABLES;");
 
@@ -42,6 +44,8 @@ my $slot_creation_time_on_primary = $publisher->safe_psql(
     SELECT current_timestamp;
 ]);
 
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 # Create a subscription that enables failover.
 $subscriber1->safe_psql('postgres',
 	"CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, copy_data = false, failover = true, enabled = false);"
@@ -98,6 +102,114 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true).
+##################################################
+
+# Test that the failover option cannot be set for a slot with
+# restart_lsn < two_phase_at
+
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub2', 'pgoutput', false, true, false);"
+);
+
+# Try to enable failover of the slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT regress_mysub2 (failover);],
+	replication => 'database');
+ok( $stderr =~
+	  qr/ERROR:  cannot enable failover for a two-phase enabled replication slot due to unconsumed changes/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+# Create a subscription with two_phase enabled, using above slot
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, enabled = false, two_phase = true);"
+);
+
+# Try to enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state
+.*HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Enable the subscription
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+ok( $subscriber1->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_mysub2';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'regress_mysub2';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub3"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..8d4d55c802d 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  Use ALTER SUBSCRIPTION ... SET (failover = true) to enable "failover" after two_phase state is ready
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#60Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#58)
Re: Fix slot synchronization with two_phase decoding enabled

On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

While I cannot be entirely certain of my analysis, I believe the root
cause might be related to the backward movement of the confirmed_flush
LSN. The following scenario seems possible:

1. The walsender enables the two_phase and sets two_phase_at (which
should be the same as confirmed_flush).
2. The slot's confirmed_flush regresses for some reason.
3. The slotsync worker retrieves the remote slot information and
enables two_phase for the local slot.

Yes, this is possible. Here is my theory as to how it can happen in
the current case. In the failed test, after the primary has prepared a
transaction, the transaction won't be replicated to the subscriber as
two_phase was not enabled for the slot. However, subsequent keepalive
messages can send the latest WAL location to the subscriber and get
the confirmation of the same from the subscriber without its origin
being moved. Now, after we restart the apply worker (due to
disable/enable for a subscription), it will use the previous
origin_lsn to temporarily move back the confirmed flush LSN as
explained in one of the previous emails in another thread [1]/messages/by-id/CAA4eK1+zWQwOe5G8zCYGvErnaXh5+Dbyg_A1Z3uywSf_4=T0UA@mail.gmail.com. During
this temporary movement of confirm flush LSN, the slotsync worker
fetches the two_phase_at and confirm_flush_lsn values, leading to the
assertion failure. We see this issue intermittently because it depends
on the timing of slotsync worker's request to fetch the slot's value.

If this theory is correct, then we need something on the lines of what
Vignesh proposed in email [2]/messages/by-id/CALDaNm3hgow2+oEov5jBk4iYP5eQrUCF1yZtW7+dV3J__p4KLQ@mail.gmail.com (Confirm_flush_dont_allow_backward) to
fix it.

[1]: /messages/by-id/CAA4eK1+zWQwOe5G8zCYGvErnaXh5+Dbyg_A1Z3uywSf_4=T0UA@mail.gmail.com
[2]: /messages/by-id/CALDaNm3hgow2+oEov5jBk4iYP5eQrUCF1yZtW7+dV3J__p4KLQ@mail.gmail.com

--
With Regards,
Amit Kapila.

#61Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#60)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, May 5, 2025 at 3:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

While I cannot be entirely certain of my analysis, I believe the root
cause might be related to the backward movement of the confirmed_flush
LSN. The following scenario seems possible:

1. The walsender enables the two_phase and sets two_phase_at (which
should be the same as confirmed_flush).
2. The slot's confirmed_flush regresses for some reason.
3. The slotsync worker retrieves the remote slot information and
enables two_phase for the local slot.

Yes, this is possible. Here is my theory as to how it can happen in
the current case. In the failed test, after the primary has prepared a
transaction, the transaction won't be replicated to the subscriber as
two_phase was not enabled for the slot. However, subsequent keepalive
messages can send the latest WAL location to the subscriber and get
the confirmation of the same from the subscriber without its origin
being moved. Now, after we restart the apply worker (due to
disable/enable for a subscription), it will use the previous
origin_lsn to temporarily move back the confirmed flush LSN as
explained in one of the previous emails in another thread [1]. During
this temporary movement of confirm flush LSN, the slotsync worker
fetches the two_phase_at and confirm_flush_lsn values, leading to the
assertion failure. We see this issue intermittently because it depends
on the timing of slotsync worker's request to fetch the slot's value.

If this theory is correct, then we need something on the lines of what
Vignesh proposed in email [2] (Confirm_flush_dont_allow_backward) to
fix it.

This theory seems plausible. According to the thread you shared, we
didn't address the issue of backward movement in the slot's
confirmed_flush LSN. I think that we should carefully consider whether
this behavior is acceptable in the first place. If we decide to
maintain this behavior, we would need to modify the slotsync to deal
with such scenarios (note we have already ensured that the synced
slot's confirmed_flush cannot regress). However, I'm concerned about a
case like where the server crashes immediately after persisting the
retreated confirmed_flush LSN (along with restart_lsn or xmin updates)
to disk. In such a case, wouldn't the walsender potentially send
duplicate data? If so, we would need a patch like Vignesh proposed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#62Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#61)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, May 6, 2025 at 12:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 5, 2025 at 3:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

While I cannot be entirely certain of my analysis, I believe the root
cause might be related to the backward movement of the confirmed_flush
LSN. The following scenario seems possible:

1. The walsender enables the two_phase and sets two_phase_at (which
should be the same as confirmed_flush).
2. The slot's confirmed_flush regresses for some reason.
3. The slotsync worker retrieves the remote slot information and
enables two_phase for the local slot.

Yes, this is possible. Here is my theory as to how it can happen in
the current case. In the failed test, after the primary has prepared a
transaction, the transaction won't be replicated to the subscriber as
two_phase was not enabled for the slot. However, subsequent keepalive
messages can send the latest WAL location to the subscriber and get
the confirmation of the same from the subscriber without its origin
being moved. Now, after we restart the apply worker (due to
disable/enable for a subscription), it will use the previous
origin_lsn to temporarily move back the confirmed flush LSN as
explained in one of the previous emails in another thread [1]. During
this temporary movement of confirm flush LSN, the slotsync worker
fetches the two_phase_at and confirm_flush_lsn values, leading to the
assertion failure. We see this issue intermittently because it depends
on the timing of slotsync worker's request to fetch the slot's value.

If this theory is correct, then we need something on the lines of what
Vignesh proposed in email [2] (Confirm_flush_dont_allow_backward) to
fix it.

This theory seems plausible. According to the thread you shared, we
didn't address the issue of backward movement in the slot's
confirmed_flush LSN. I think that we should carefully consider whether
this behavior is acceptable in the first place. If we decide to
maintain this behavior, we would need to modify the slotsync to deal
with such scenarios (note we have already ensured that the synced
slot's confirmed_flush cannot regress). However, I'm concerned about a
case like where the server crashes immediately after persisting the
retreated confirmed_flush LSN (along with restart_lsn or xmin updates)
to disk. In such a case, wouldn't the walsender potentially send
duplicate data?

Yes that could happen but last time when we tried we couldn't
reproduce such a case. However, we will try some more to see if we can
reproduce a problem even without slotsync. Having said that, we will
also try to reproduce the failure with slotsync in the test where BF
is showing failure to confirm the above theory.

If so, we would need a patch like Vignesh proposed.

Right, I feel that is the right thing to do, even if we could prove
that it can lead to failure in slotsync.

--
With Regards,
Amit Kapila.

#63Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Nisha Moond (#59)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, May 5, 2025 at 2:59 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Fri, May 2, 2025 at 3:05 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, May 2, 2025 at 12:57 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the v11 patch addressing the above points and all other
comments. I have also optimized the test by reducing the number of
subscriptions and slots.

Thanks for the patch. Few comments:

1)

pg_upgrade/t/003_logical_slots.pl:

- "SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');

With this change, the failover property test during upgrade is
removed. Shall we add it again either using ALTER SUB to enable
failover and two_phase together or a new subscription with failover
alone?

If ALTER SUB is used to set failover for the existing subscription,
then pg_upgrade will fail while trying to create a slot with both
failover and two_phase on the upgraded node $newpub.
So, I've implemented the other suggestion by adding a separate
pub-subs to verify the failover property.

2)
+          The default is false. This option cannot be set together with
+          <literal>failover</literal> when creating a logical replication slot.
+          However, once the slot is created with <literal>two_phase =
true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all transactions up to the point where two-phase decoding
+          was enabled.

Suggestion: all transactions --> all the changes.

Done.
~~~

Please find the v12 patch with above suggested changes.

Thank you for updating the patch! I'll review it and share comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#64Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#60)
2 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

On Mon, May 5, 2025 at 6:59 PM Amit Kapila wrote:

On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

While I cannot be entirely certain of my analysis, I believe the root
cause might be related to the backward movement of the confirmed_flush
LSN. The following scenario seems possible:

1. The walsender enables the two_phase and sets two_phase_at (which
should be the same as confirmed_flush).
2. The slot's confirmed_flush regresses for some reason.
3. The slotsync worker retrieves the remote slot information and
enables two_phase for the local slot.

Yes, this is possible. Here is my theory as to how it can happen in the current
case. In the failed test, after the primary has prepared a transaction, the
transaction won't be replicated to the subscriber as two_phase was not
enabled for the slot. However, subsequent keepalive messages can send the
latest WAL location to the subscriber and get the confirmation of the same from
the subscriber without its origin being moved. Now, after we restart the apply
worker (due to disable/enable for a subscription), it will use the previous
origin_lsn to temporarily move back the confirmed flush LSN as explained in
one of the previous emails in another thread [1]. During this temporary
movement of confirm flush LSN, the slotsync worker fetches the two_phase_at
and confirm_flush_lsn values, leading to the assertion failure. We see this
issue intermittently because it depends on the timing of slotsync worker's
request to fetch the slot's value.

Based on this theory, I can reproduce the BF failure in the 040 tap-test on
HEAD after applying the 0001 patch. This is achieved by using the injection
point to stop the walsender from sending a keepalive before receiving the old
origin position from the apply worker, ensuring the confirmed_flush
consistently moves backward before slotsync.

Additionally, I've reproduced the duplicate data issue on HEAD without slotsync
using the attached script (after applying the injection point patch). This
issue arises if we immediately disable the subscription after the
confirm_flush_lsn moves backward, preventing the walsender from advancing the
confirm_flush_lsn.

In this case, if a prepared transaction exists before two_phase_at, then after
re-enabling the subscription, it will replicate that prepared transaction when
decoding the PREPARE record and replicate that again when decoding the COMMIT
PREPARED record. In such cases, the apply worker keeps reporting the error:

ERROR: transaction identifier "pg_gid_16387_755" is already in use.

Apart from above, we're investigating whether the same issue can occur in
back-branches and will share the results once ready.

Best Regards,
Hou zj

Attachments:

reproduce_without_slotsync_HEAD.sh.txttext/plain; name=reproduce_without_slotsync_HEAD.sh.txtDownload
0001-reproduce-BF-failure-in-040_standby_failover_slots_s.patchapplication/octet-stream; name=0001-reproduce-BF-failure-in-040_standby_failover_slots_s.patchDownload
From fd1c96dcb3d2aeba279c605a8db960c7d0e7a0ad Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Tue, 6 May 2025 12:02:17 +0800
Subject: [PATCH] reproduce BF failure in 040_standby_failover_slots_sync

---
 src/backend/replication/logical/logical.c     |  3 +++
 src/backend/replication/walsender.c           |  4 +++
 .../t/040_standby_failover_slots_sync.pl      | 27 ++++++++++++++++++-
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index a8d2e024d34..f7cc4cabd49 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1819,6 +1819,9 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
 {
 	Assert(lsn != InvalidXLogRecPtr);
 
+	if (lsn < MyReplicationSlot->data.confirmed_flush)
+		elog(LOG, "confirmed_flush move backwards");
+
 	/* Do an unlocked check for candidate_lsn first. */
 	if (MyReplicationSlot->candidate_xmin_lsn != InvalidXLogRecPtr ||
 		MyReplicationSlot->candidate_restart_valid != InvalidXLogRecPtr)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9fa8beb6103..c74dea40815 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -95,6 +95,7 @@
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
 #include "utils/timestamp.h"
+#include "utils/injection_point.h"
 
 /* Minimum interval used by walsender for stats flushes, in ms */
 #define WALSENDER_STATS_FLUSH_INTERVAL         1000
@@ -2810,6 +2811,9 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			SyncRepInitConfig();
 		}
 
+		if (send_data == XLogSendLogical)
+			INJECTION_POINT("process-replies");
+
 		/* Check for input from the client */
 		ProcessRepliesIfAny();
 
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 9c8b49e942d..441fd77a157 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -841,9 +841,11 @@ $primary->reload;
 # option; subsequent tests will verify if it can be correctly replicated to the
 # subscriber after committing it on the promoted standby.
 ##################################################
-
+$standby1->append_conf('postgresql.conf', qq(sync_replication_slots = off));
 $standby1->start;
 
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
 # Prepare a transaction
 $primary->safe_psql(
 	'postgres', qq[
@@ -865,6 +867,12 @@ $primary->poll_query_until(
 	"SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'lsub1_slot' AND active='f'",
 	1);
 
+$primary->safe_psql(
+	'postgres', qq[
+	CREATE EXTENSION injection_points;
+	SELECT injection_points_attach('process-replies', 'wait');
+]);
+
 # Set two_phase to true and enable the subscription
 $subscriber1->safe_psql(
 	'postgres', qq[
@@ -872,6 +880,23 @@ $subscriber1->safe_psql(
 	ALTER SUBSCRIPTION regress_mysub1 ENABLE;
 ]);
 
+sleep 1;
+
+$primary->safe_psql('postgres',
+	"SELECT injection_points_wakeup('process-replies');");
+
+$standby1->append_conf('postgresql.conf', qq(sync_replication_slots = on));
+$standby1->reload;
+
+
+sleep 2;
+
+$primary->safe_psql(
+	'postgres', qq[
+	SELECT injection_points_detach('process-replies');
+	SELECT injection_points_wakeup('process-replies');
+]);
+
 $primary->wait_for_catchup('regress_mysub1');
 
 my $two_phase_at = $primary->safe_psql('postgres',
-- 
2.31.1

#65shveta malik
shveta.malik@gmail.com
In reply to: Nisha Moond (#59)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, May 5, 2025 at 3:29 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the v12 patch with above suggested changes.

Thanks for the patch. Few comments for doc changes:

1)
func.sgml - pg_create_logical_replication_slot:

+        failover. The parameters <parameter>twophase</parameter> and
+        <parameter>failover</parameter> cannot be enabled together when
+        creating a replication slot. However, a slot created with
<parameter>twophase</parameter>
+        enabled can later have <parameter>failover</parameter> set to true
+        either using <command>ALTER_REPLICATION_SLOT</command> or via
+        <command>ALTER SUBSCRIPTION</command> if a subscription is using this
+        slot.

Will it be better if we make a NOTE for this? Also can we create links for
<command>ALTER SUBSCRIPTION</command> and
<command>ALTER_REPLICATION_SLOT</command>?

2)
protocol.sgml:

+          The default is false. This option cannot be set together with
+          <literal>failover</literal> when creating a logical replication slot.
+          However, once the slot is created with <literal>two_phase =
true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.

Can we make a new paragraph for the new restriction added. Same for
other new restriction in this page.

3)
create_subscription.sgml:

+          cannot be enabled together when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true via <command>ALTER
SUBSCRIPTION</command>,

via-->using

thanks
Shveta

#66Nisha Moond
nisha.moond412@gmail.com
In reply to: shveta malik (#65)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, May 7, 2025 at 10:32 AM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, May 5, 2025 at 3:29 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Please find the v12 patch with above suggested changes.

Thanks for the patch. Few comments for doc changes:

1)
func.sgml - pg_create_logical_replication_slot:

+        failover. The parameters <parameter>twophase</parameter> and
+        <parameter>failover</parameter> cannot be enabled together when
+        creating a replication slot. However, a slot created with
<parameter>twophase</parameter>
+        enabled can later have <parameter>failover</parameter> set to true
+        either using <command>ALTER_REPLICATION_SLOT</command> or via
+        <command>ALTER SUBSCRIPTION</command> if a subscription is using this
+        slot.

Will it be better if we make a NOTE for this? Also can we create links for
<command>ALTER SUBSCRIPTION</command> and
<command>ALTER_REPLICATION_SLOT</command>?

2)
protocol.sgml:

+          The default is false. This option cannot be set together with
+          <literal>failover</literal> when creating a logical replication slot.
+          However, once the slot is created with <literal>two_phase =
true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.

Can we make a new paragraph for the new restriction added. Same for
other new restriction in this page.

3)
create_subscription.sgml:

+          cannot be enabled together when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true via <command>ALTER
SUBSCRIPTION</command>,

via-->using

Attached is the v13 patch with the above comments addressed.

--
Thanks,
Nisha

Attachments:

v13-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/x-patch; name=v13-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From fce2cdece5e16f11ec940ffd7bac34dc53e5f9da Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Mon, 7 Apr 2025 09:36:29 +0530
Subject: [PATCH v13] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  13 ++
 doc/src/sgml/protocol.sgml                    |  17 +++
 doc/src/sgml/ref/alter_subscription.sgml      |   8 ++
 doc/src/sgml/ref/create_subscription.sgml     |   9 ++
 src/backend/commands/subscriptioncmds.c       |  25 ++++
 src/backend/replication/logical/logical.c     |  11 ++
 src/backend/replication/logical/slotsync.c    |  10 ++
 src/backend/replication/slot.c                |  27 +++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  50 ++++++--
 .../t/040_standby_failover_slots_sync.pl      | 112 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 14 files changed, 282 insertions(+), 11 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f441ec43314..a294fafa2ac 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29076,6 +29076,19 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         failover. A call to this function has the same effect as
         the replication protocol command
         <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+       
+        <note>
+         <para>
+          The parameters <parameter>twophase</parameter> and
+          <parameter>failover</parameter> cannot be enabled together when
+          creating a replication slot. However, a slot created with
+          <parameter>twophase</parameter> enabled can later have
+          <parameter>failover</parameter> set to true either using
+          <xref linkend="protocol-replication-alter-replication-slot"/> or via
+          <xref linkend="sql-altersubscription"/> if a subscription is using
+          this slot.
+         </para>
+        </note>
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..2c2d66fad35 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2062,6 +2062,15 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           <literal>PREPARE TRANSACTION</literal> time.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>failover</literal>
+          when creating a logical replication slot. However, once the slot is
+          created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -2103,6 +2112,14 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           so that logical replication can be resumed after failover.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>two_phase</literal>
+          when creating the slot. However, once the slot is created with
+          <literal>two_phase = true</literal>, <literal>failover</literal> can
+          be set to true after the slot has consumed all the changes up to the
+          point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..bf9ba6d3c89 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,14 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When <literal>two_phase</literal> is in the pending state, altering
+      <literal>failover = true</literal> is not permitted. Once
+      <literal>two_phase</literal> is enabled, <literal>failover = true</literal>
+      can be set only if the slot has consumed all the changes up to the point
+      where two-phase decoding was enabled.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..ab784b99fc3 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -426,6 +426,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          The options <literal>failover</literal> and <literal>two_phase</literal>
+          cannot be enabled together when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..54afb3f2e42 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,18 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together. See comments atop slotsync.c for details.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable \"%s\" after two_phase state is ready",
+						"failover"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1257,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover until
+					 * two_phase state reaches READY state. See comments atop
+					 * slotsync.c for details.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"),
+								errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+										"failover", "two_phase"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..c33024dad19 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,17 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop slotsync.c for details.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..f5f2bcc708b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,20 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 * Please see the comments atop slotsync.c for details.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +862,19 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are potentially un-decoded transactions that are prepared before
+	 * two_phase_at. See comments atop slotsync.c for details.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot due to unconsumed changes"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..36c3b73677e 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -106,6 +106,7 @@ $oldpub->start;
 $oldpub->safe_psql(
 	'postgres', qq[
 		CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+		CREATE TABLE tbl2 AS SELECT generate_series(1, 10) AS a;
 		SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
 		SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
 ]);
@@ -162,7 +163,8 @@ $oldpub->safe_psql(
 	'postgres', qq[
 	SELECT * FROM pg_drop_replication_slot('test_slot1');
 	SELECT * FROM pg_drop_replication_slot('test_slot2');
-	CREATE PUBLICATION regress_pub FOR ALL TABLES;
+	CREATE PUBLICATION regress_pub FOR TABLE tbl;
+	CREATE PUBLICATION regress_pub2 FOR TABLE tbl2;
 ]);
 
 # Initialize subscriber cluster
@@ -173,18 +175,23 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE TABLE tbl2 (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true');
+	CREATE SUBSCRIPTION regress_sub2 CONNECTION '$old_connstr' PUBLICATION regress_pub2 WITH (failover = 'true');
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
+$sub->wait_for_subscription_sync($oldpub, 'regress_sub2');
 
 # Also wait for two-phase to be enabled
-my $twophase_query =
-  "SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');";
-$sub->poll_query_until('postgres', $twophase_query)
-  or die "Timed out while waiting for subscriber to enable twophase";
+ok( $sub->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_sub';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
 
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub2 DISABLE");
 $oldpub->stop;
 
 # pg_upgrade should be successful
@@ -193,10 +200,17 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots where slot_name = 'regress_sub'"
+);
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Check that the slot 'regress_sub2' has migrated to the new cluster
+$result = $newpub->safe_psql('postgres',
+	"SELECT slot_name, failover FROM pg_replication_slots where slot_name = 'regress_sub2'"
+);
+is($result, qq(regress_sub2|t), 'check the slot exists on new cluster');
 
-# Update the connection
+# Update the connection for regress_sub
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
 $sub->safe_psql(
 	'postgres', qq[
@@ -204,13 +218,27 @@ $sub->safe_psql(
 	ALTER SUBSCRIPTION regress_sub ENABLE;
 ]);
 
+# Update the connection for regress_sub2
+$sub->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub2 CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub2 ENABLE;
+]);
+
 # Check whether changes on the new publisher get replicated to the subscriber
-$newpub->safe_psql('postgres',
-	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$newpub->safe_psql(
+	'postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20));
+	INSERT INTO tbl2 VALUES (generate_series(11, 20));");
+
 $newpub->wait_for_catchup('regress_sub');
 $result = $sub->safe_psql('postgres', "SELECT count(*) FROM tbl");
 is($result, qq(20), 'check changes are replicated to the sub');
 
+$newpub->wait_for_catchup('regress_sub2');
+$result = $sub->safe_psql('postgres', "SELECT count(*) FROM tbl2");
+is($result, qq(20), 'check changes are replicated to the sub2');
+
 # Clean up
 $sub->stop();
 $newpub->stop();
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..a78c4d2bd95 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -25,6 +25,8 @@ $publisher->init(
 $publisher->append_conf('postgresql.conf', 'autovacuum = off');
 $publisher->start;
 
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 $publisher->safe_psql('postgres',
 	"CREATE PUBLICATION regress_mypub FOR ALL TABLES;");
 
@@ -42,6 +44,8 @@ my $slot_creation_time_on_primary = $publisher->safe_psql(
     SELECT current_timestamp;
 ]);
 
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 # Create a subscription that enables failover.
 $subscriber1->safe_psql('postgres',
 	"CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, copy_data = false, failover = true, enabled = false);"
@@ -98,6 +102,114 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true).
+##################################################
+
+# Test that the failover option cannot be set for a slot with
+# restart_lsn < two_phase_at
+
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub2', 'pgoutput', false, true, false);"
+);
+
+# Try to enable failover of the slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT regress_mysub2 (failover);],
+	replication => 'database');
+ok( $stderr =~
+	  qr/ERROR:  cannot enable failover for a two-phase enabled replication slot due to unconsumed changes/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+# Create a subscription with two_phase enabled, using above slot
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, enabled = false, two_phase = true);"
+);
+
+# Try to enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state
+.*HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Enable the subscription
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+ok( $subscriber1->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_mysub2';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'regress_mysub2';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub3"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..8d4d55c802d 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  Use ALTER SUBSCRIPTION ... SET (failover = true) to enable "failover" after two_phase state is ready
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#67shveta malik
shveta.malik@gmail.com
In reply to: Nisha Moond (#66)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, May 7, 2025 at 4:36 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached is the v13 patch with the above comments addressed.

--

Thanks for the patch.

I think we can have the restriction mentioned under the 'two_phase'
section as well along with the 'failover' section in the CREATE
SUBSCRIPTION doc, similar to what we have in CREATE_REPLICATION_SLOT.

Apart from this, I have no more comments.

thanks
Shveta

#68Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#64)
2 attachment(s)
RE: Fix slot synchronization with two_phase decoding enabled

On Tue, May 6, 2025 at 7:22 PM Zhijie Hou (Fujitsu) wrote:

On Mon, May 5, 2025 at 6:59 PM Amit Kapila wrote:

On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada

<sawada.mshk@gmail.com>

wrote:

While I cannot be entirely certain of my analysis, I believe the
root cause might be related to the backward movement of the
confirmed_flush LSN. The following scenario seems possible:

1. The walsender enables the two_phase and sets two_phase_at
(which should be the same as confirmed_flush).
2. The slot's confirmed_flush regresses for some reason.
3. The slotsync worker retrieves the remote slot information and
enables two_phase for the local slot.

Yes, this is possible. Here is my theory as to how it can happen in
the current case. In the failed test, after the primary has prepared
a transaction, the transaction won't be replicated to the subscriber
as two_phase was not enabled for the slot. However, subsequent
keepalive messages can send the latest WAL location to the
subscriber and get the confirmation of the same from the subscriber
without its origin being moved. Now, after we restart the apply
worker (due to disable/enable for a subscription), it will use the
previous origin_lsn to temporarily move back the confirmed flush LSN
as explained in one of the previous emails in another thread [1].
During this temporary movement of confirm flush LSN, the slotsync
worker fetches the two_phase_at and confirm_flush_lsn values,
leading to the assertion failure. We see this issue intermittently
because it depends on the

timing of slotsync worker's request to fetch the slot's value.

Based on this theory, I can reproduce the BF failure in the 040
tap-test on HEAD after applying the 0001 patch. This is achieved by
using the injection point to stop the walsender from sending a
keepalive before receiving the old origin position from the apply
worker, ensuring the confirmed_flush consistently moves backward before slotsync.

Additionally, I've reproduced the duplicate data issue on HEAD without
slotsync using the attached script (after applying the injection point patch).
This issue arises if we immediately disable the subscription after the
confirm_flush_lsn moves backward, preventing the walsender from
advancing the confirm_flush_lsn.

In this case, if a prepared transaction exists before two_phase_at,
then after re-enabling the subscription, it will replicate that
prepared transaction when decoding the PREPARE record and replicate
that again when decoding the COMMIT PREPARED record. In such cases,
the apply worker keeps reporting the error:

ERROR: transaction identifier "pg_gid_16387_755" is already in use.

Apart from above, we're investigating whether the same issue can occur
in back-branches and will share the results once ready.

I reproduced the duplicate data issue on PG17 as well using the attached shell
script. Since PG17 doesn’t allow altering the twophase option, I created a
subscription with two_phase=on and copy_data=on. I prepared a transaction
before the table synchronization was ready, at a time when the slot's two_phase
hadn't been set to true. This setup can cause in the prepared transaction being
replicated twice after restarting the apply worker and the confirmed_flush_lsn
move backwards.

To ensure the origin position is initialized during table sync, I inserted some
data before the prepared transaction. I added injection points(0001) to manage
the table sync worker's process, allowing the apply worker to replicate some
changes and update the origin position while table sync was ongoing.

Best Regards,
Hou zj

Attachments:

reproduce_without_slotsync_HEAD_twophase.sh.txttext/plain; name=reproduce_without_slotsync_HEAD_twophase.sh.txtDownload
0001-table-sync-injection-points.patchapplication/octet-stream; name=0001-table-sync-injection-points.patchDownload
From 86f162871f612f2f533c0d0e7574f02ac08e6321 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@cn.fujitsu.com>
Date: Thu, 8 May 2025 10:53:53 +0800
Subject: [PATCH v222] 2 injection points

---
 src/backend/replication/logical/tablesync.c | 14 ++++++++++++++
 src/backend/replication/logical/worker.c    |  6 ++++++
 src/backend/replication/walsender.c         |  4 ++++
 3 files changed, 24 insertions(+)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 8e1e8762f62..4e8626f5a61 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -122,6 +122,7 @@
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/usercontext.h"
+#include "utils/injection_point.h"
 
 typedef enum
 {
@@ -293,6 +294,15 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
 static void
 process_syncing_tables_for_sync(XLogRecPtr current_lsn)
 {
+	if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
+		current_lsn >= MyLogicalRepWorker->relstate_lsn)
+	{
+		if (MyLogicalRepWorker->relid % 2 == 0)
+			INJECTION_POINT("table-sync-done-2");
+		else
+			INJECTION_POINT("table-sync-done-1");
+	}
+
 	SpinLockAcquire(&MyLogicalRepWorker->relmutex);
 
 	if (MyLogicalRepWorker->relstate == SUBREL_STATE_CATCHUP &&
@@ -1551,6 +1561,10 @@ copy_table_done:
 		 "LogicalRepSyncTableStart: '%s' origin_startpos lsn %X/%X",
 		 originname, LSN_FORMAT_ARGS(*origin_startpos));
 
+	if (MyLogicalRepWorker->relid % 2 == 0)
+		INJECTION_POINT("table-sync-wait-2");
+	else
+		INJECTION_POINT("table-sync-wait-1");
 	/*
 	 * We are done with the initial data synchronization, update the state.
 	 */
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 4151a4b2a96..d269b69df77 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -192,6 +192,7 @@
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/usercontext.h"
+#include "utils/injection_point.h"
 
 #define NAPTIME_PER_CYCLE 1000	/* max sleep time between cycles (1s) */
 
@@ -2451,6 +2452,9 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
+	if (am_leader_apply_worker())
+		elog(LOG, "handle remote insert");
+
 	/* For a partitioned table, insert the tuple into a partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		apply_handle_tuple_routing(edata,
@@ -4589,6 +4593,8 @@ run_apply_worker()
 	must_use_password = MySubscription->passwordrequired &&
 		!MySubscription->ownersuperuser;
 
+	INJECTION_POINT("connecting-walsender");
+
 	LogRepWorkerWalRcvConn = walrcv_connect(MySubscription->conninfo, true,
 											true, must_use_password,
 											MySubscription->name, &err);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9fa8beb6103..c74dea40815 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -95,6 +95,7 @@
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
 #include "utils/timestamp.h"
+#include "utils/injection_point.h"
 
 /* Minimum interval used by walsender for stats flushes, in ms */
 #define WALSENDER_STATS_FLUSH_INTERVAL         1000
@@ -2810,6 +2811,9 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			SyncRepInitConfig();
 		}
 
+		if (send_data == XLogSendLogical)
+			INJECTION_POINT("process-replies");
+
 		/* Check for input from the client */
 		ProcessRepliesIfAny();
 
-- 
2.30.0.windows.2

#69Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#68)
RE: Fix slot synchronization with two_phase decoding enabled

On Thu, May 8, 2025 at 6:04 PM Zhijie Hou (Fujitsu) wrote:

On Tue, May 6, 2025 at 7:22 PM Zhijie Hou (Fujitsu) wrote:

On Mon, May 5, 2025 at 6:59 PM Amit Kapila wrote:

On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada

<sawada.mshk@gmail.com>

wrote:

While I cannot be entirely certain of my analysis, I believe the
root cause might be related to the backward movement of the
confirmed_flush LSN. The following scenario seems possible:

1. The walsender enables the two_phase and sets two_phase_at
(which should be the same as confirmed_flush).
2. The slot's confirmed_flush regresses for some reason.
3. The slotsync worker retrieves the remote slot information and
enables two_phase for the local slot.

Yes, this is possible. Here is my theory as to how it can happen
in the current case. In the failed test, after the primary has
prepared a transaction, the transaction won't be replicated to the
subscriber as two_phase was not enabled for the slot. However,
subsequent keepalive messages can send the latest WAL location to
the subscriber and get the confirmation of the same from the
subscriber without its origin being moved. Now, after we restart
the apply worker (due to disable/enable for a subscription), it
will use the previous origin_lsn to temporarily move back the
confirmed flush LSN as explained in one of the previous emails in another thread [1].
During this temporary movement of confirm flush LSN, the slotsync
worker fetches the two_phase_at and confirm_flush_lsn values,
leading to the assertion failure. We see this issue intermittently
because it depends on the

timing of slotsync worker's request to fetch the slot's value.

Based on this theory, I can reproduce the BF failure in the 040
tap-test on HEAD after applying the 0001 patch. This is achieved by
using the injection point to stop the walsender from sending a
keepalive before receiving the old origin position from the apply
worker, ensuring the confirmed_flush consistently moves backward
before

slotsync.

Additionally, I've reproduced the duplicate data issue on HEAD
without slotsync using the attached script (after applying the injection point patch).
This issue arises if we immediately disable the subscription after
the confirm_flush_lsn moves backward, preventing the walsender from
advancing the confirm_flush_lsn.

In this case, if a prepared transaction exists before two_phase_at,
then after re-enabling the subscription, it will replicate that
prepared transaction when decoding the PREPARE record and replicate
that again when decoding the COMMIT PREPARED record. In such cases,
the apply worker keeps reporting the error:

ERROR: transaction identifier "pg_gid_16387_755" is already in use.

Apart from above, we're investigating whether the same issue can
occur in back-branches and will share the results once ready.

I reproduced the duplicate data issue on PG17 as well using the
attached shell script. Since PG17 doesn’t allow altering the twophase
option, I created a subscription with two_phase=on and copy_data=on. I
prepared a transaction before the table synchronization was ready, at
a time when the slot's two_phase hadn't been set to true. This setup
can cause in the prepared transaction being replicated twice after
restarting the apply worker and the confirmed_flush_lsn move backwards.

To ensure the origin position is initialized during table sync, I
inserted some data before the prepared transaction. I added injection
points(0001) to manage the table sync worker's process, allowing the
apply worker to replicate some changes and update the origin position
while table sync was ongoing.

The above reproduction of the issue indicates that it has been present since at
least PG15, when the twophase subscription option was introduced. I am
currently investigating whether the issue occurs without the twophase option.
If it does, the fix will need to be applied to all supported branches. I will
share the results once they are available.

Best Regards,
Hou zj

#70Nisha Moond
nisha.moond412@gmail.com
In reply to: shveta malik (#67)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, May 8, 2025 at 11:35 AM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, May 7, 2025 at 4:36 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached is the v13 patch with the above comments addressed.

--

Thanks for the patch.

I think we can have the restriction mentioned under the 'two_phase'
section as well along with the 'failover' section in the CREATE
SUBSCRIPTION doc, similar to what we have in CREATE_REPLICATION_SLOT.

Apart from this, I have no more comments.

Thanks for the review.
Please find the v14 patch, updated with the above suggestion and
rephrased changes in alter_subscription.sgml with the missing links.

--
Thanks,
Nisha

Attachments:

v14-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/x-patch; name=v14-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From c7b4f5dd22a2d629d88c7e34469bac7547c0e2fc Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Fri, 9 May 2025 11:09:23 +0530
Subject: [PATCH v14] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  13 ++
 doc/src/sgml/protocol.sgml                    |  17 +++
 doc/src/sgml/ref/alter_subscription.sgml      |  12 ++
 doc/src/sgml/ref/create_subscription.sgml     |  17 +++
 src/backend/commands/subscriptioncmds.c       |  25 ++++
 src/backend/replication/logical/logical.c     |  11 ++
 src/backend/replication/logical/slotsync.c    |  10 ++
 src/backend/replication/slot.c                |  27 +++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  50 ++++++--
 .../t/040_standby_failover_slots_sync.pl      | 112 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 14 files changed, 294 insertions(+), 11 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 697c1a02891..45504cf14d5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29076,6 +29076,19 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         failover. A call to this function has the same effect as
         the replication protocol command
         <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+
+        <note>
+         <para>
+          The parameters <parameter>twophase</parameter> and
+          <parameter>failover</parameter> cannot be enabled together when
+          creating a replication slot. However, a slot created with
+          <parameter>twophase</parameter> enabled can later have
+          <parameter>failover</parameter> set to true either using
+          <xref linkend="protocol-replication-alter-replication-slot"/> or via
+          <xref linkend="sql-altersubscription"/> if a subscription is using
+          this slot.
+         </para>
+        </note>
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..2c2d66fad35 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2062,6 +2062,15 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           <literal>PREPARE TRANSACTION</literal> time.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>failover</literal>
+          when creating a logical replication slot. However, once the slot is
+          created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -2103,6 +2112,14 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           so that logical replication can be resumed after failover.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>two_phase</literal>
+          when creating the slot. However, once the slot is created with
+          <literal>two_phase = true</literal>, <literal>failover</literal> can
+          be set to true after the slot has consumed all the changes up to the
+          point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..afa29d4049b 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,18 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When a subscription's
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is in the pending state, setting <literal>failover</literal> to true is not
+      permitted. Once
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is enabled,
+      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+      can be set to true only after the slot has consumed all the changes up to the
+      point where two-phase decoding was enabled.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..55b20893e80 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -354,6 +354,14 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           to know the actual two-phase state.
          </para>
 
+         <para>
+          This option cannot be enabled together with
+          <literal>failover</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -426,6 +434,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          This option cannot be enabled together with
+          <literal>two_phase</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..54afb3f2e42 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,18 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together. See comments atop slotsync.c for details.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable \"%s\" after two_phase state is ready",
+						"failover"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1257,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover until
+					 * two_phase state reaches READY state. See comments atop
+					 * slotsync.c for details.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"),
+								errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+										"failover", "two_phase"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index e941bb491d8..c33024dad19 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,17 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop slotsync.c for details.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..f5f2bcc708b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,20 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 * Please see the comments atop slotsync.c for details.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +862,19 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are potentially un-decoded transactions that are prepared before
+	 * two_phase_at. See comments atop slotsync.c for details.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot due to unconsumed changes"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..36c3b73677e 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -106,6 +106,7 @@ $oldpub->start;
 $oldpub->safe_psql(
 	'postgres', qq[
 		CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+		CREATE TABLE tbl2 AS SELECT generate_series(1, 10) AS a;
 		SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
 		SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
 ]);
@@ -162,7 +163,8 @@ $oldpub->safe_psql(
 	'postgres', qq[
 	SELECT * FROM pg_drop_replication_slot('test_slot1');
 	SELECT * FROM pg_drop_replication_slot('test_slot2');
-	CREATE PUBLICATION regress_pub FOR ALL TABLES;
+	CREATE PUBLICATION regress_pub FOR TABLE tbl;
+	CREATE PUBLICATION regress_pub2 FOR TABLE tbl2;
 ]);
 
 # Initialize subscriber cluster
@@ -173,18 +175,23 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE TABLE tbl2 (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true');
+	CREATE SUBSCRIPTION regress_sub2 CONNECTION '$old_connstr' PUBLICATION regress_pub2 WITH (failover = 'true');
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
+$sub->wait_for_subscription_sync($oldpub, 'regress_sub2');
 
 # Also wait for two-phase to be enabled
-my $twophase_query =
-  "SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');";
-$sub->poll_query_until('postgres', $twophase_query)
-  or die "Timed out while waiting for subscriber to enable twophase";
+ok( $sub->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_sub';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
 
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub2 DISABLE");
 $oldpub->stop;
 
 # pg_upgrade should be successful
@@ -193,10 +200,17 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots where slot_name = 'regress_sub'"
+);
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Check that the slot 'regress_sub2' has migrated to the new cluster
+$result = $newpub->safe_psql('postgres',
+	"SELECT slot_name, failover FROM pg_replication_slots where slot_name = 'regress_sub2'"
+);
+is($result, qq(regress_sub2|t), 'check the slot exists on new cluster');
 
-# Update the connection
+# Update the connection for regress_sub
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
 $sub->safe_psql(
 	'postgres', qq[
@@ -204,13 +218,27 @@ $sub->safe_psql(
 	ALTER SUBSCRIPTION regress_sub ENABLE;
 ]);
 
+# Update the connection for regress_sub2
+$sub->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub2 CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub2 ENABLE;
+]);
+
 # Check whether changes on the new publisher get replicated to the subscriber
-$newpub->safe_psql('postgres',
-	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$newpub->safe_psql(
+	'postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20));
+	INSERT INTO tbl2 VALUES (generate_series(11, 20));");
+
 $newpub->wait_for_catchup('regress_sub');
 $result = $sub->safe_psql('postgres', "SELECT count(*) FROM tbl");
 is($result, qq(20), 'check changes are replicated to the sub');
 
+$newpub->wait_for_catchup('regress_sub2');
+$result = $sub->safe_psql('postgres', "SELECT count(*) FROM tbl2");
+is($result, qq(20), 'check changes are replicated to the sub2');
+
 # Clean up
 $sub->stop();
 $newpub->stop();
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..a78c4d2bd95 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -25,6 +25,8 @@ $publisher->init(
 $publisher->append_conf('postgresql.conf', 'autovacuum = off');
 $publisher->start;
 
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 $publisher->safe_psql('postgres',
 	"CREATE PUBLICATION regress_mypub FOR ALL TABLES;");
 
@@ -42,6 +44,8 @@ my $slot_creation_time_on_primary = $publisher->safe_psql(
     SELECT current_timestamp;
 ]);
 
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 # Create a subscription that enables failover.
 $subscriber1->safe_psql('postgres',
 	"CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, copy_data = false, failover = true, enabled = false);"
@@ -98,6 +102,114 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true).
+##################################################
+
+# Test that the failover option cannot be set for a slot with
+# restart_lsn < two_phase_at
+
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub2', 'pgoutput', false, true, false);"
+);
+
+# Try to enable failover of the slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT regress_mysub2 (failover);],
+	replication => 'database');
+ok( $stderr =~
+	  qr/ERROR:  cannot enable failover for a two-phase enabled replication slot due to unconsumed changes/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+# Create a subscription with two_phase enabled, using above slot
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, enabled = false, two_phase = true);"
+);
+
+# Try to enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state
+.*HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Enable the subscription
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+ok( $subscriber1->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_mysub2';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'regress_mysub2';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub3"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..8d4d55c802d 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  Use ALTER SUBSCRIPTION ... SET (failover = true) to enable "failover" after two_phase state is ready
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#71Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#64)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, May 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Mon, May 5, 2025 at 6:59 PM Amit Kapila wrote:

Yes, this is possible. Here is my theory as to how it can happen in the current
case. In the failed test, after the primary has prepared a transaction, the
transaction won't be replicated to the subscriber as two_phase was not
enabled for the slot. However, subsequent keepalive messages can send the
latest WAL location to the subscriber and get the confirmation of the same from
the subscriber without its origin being moved. Now, after we restart the apply
worker (due to disable/enable for a subscription), it will use the previous
origin_lsn to temporarily move back the confirmed flush LSN as explained in
one of the previous emails in another thread [1]. During this temporary
movement of confirm flush LSN, the slotsync worker fetches the two_phase_at
and confirm_flush_lsn values, leading to the assertion failure. We see this
issue intermittently because it depends on the timing of slotsync worker's
request to fetch the slot's value.

Based on this theory, I can reproduce the BF failure in the 040 tap-test on
HEAD after applying the 0001 patch. This is achieved by using the injection
point to stop the walsender from sending a keepalive before receiving the old
origin position from the apply worker, ensuring the confirmed_flush
consistently moves backward before slotsync.

Additionally, I've reproduced the duplicate data issue on HEAD without slotsync
using the attached script (after applying the injection point patch). This
issue arises if we immediately disable the subscription after the
confirm_flush_lsn moves backward, preventing the walsender from advancing the
confirm_flush_lsn.

Script contents:
psql -d postgres -p $port_primary -c "create extension
injection_points;SELECT injection_points_attach('process-replies',
'wait');"

psql -d postgres -p $port_subscriber -c "alter subscription sub set
(two_phase =on); alter subscription sub enable ;"

sleep 1

psql -d postgres -p $port_subscriber -c "alter subscription sub disable;"

I think what you said in the above paragraph is happening here. How
can walsender move back the confirm_flush_lsn backwards when it is
waiting due to the injection point? I think I am missing something
here. It would be good if you could add a few comments to your
scripts.

--
With Regards,
Amit Kapila.

#72shveta malik
shveta.malik@gmail.com
In reply to: Amit Kapila (#71)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Sat, May 10, 2025 at 4:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Mon, May 5, 2025 at 6:59 PM Amit Kapila wrote:

Yes, this is possible. Here is my theory as to how it can happen in the current
case. In the failed test, after the primary has prepared a transaction, the
transaction won't be replicated to the subscriber as two_phase was not
enabled for the slot. However, subsequent keepalive messages can send the
latest WAL location to the subscriber and get the confirmation of the same from
the subscriber without its origin being moved. Now, after we restart the apply
worker (due to disable/enable for a subscription), it will use the previous
origin_lsn to temporarily move back the confirmed flush LSN as explained in
one of the previous emails in another thread [1]. During this temporary
movement of confirm flush LSN, the slotsync worker fetches the two_phase_at
and confirm_flush_lsn values, leading to the assertion failure. We see this
issue intermittently because it depends on the timing of slotsync worker's
request to fetch the slot's value.

Based on this theory, I can reproduce the BF failure in the 040 tap-test on
HEAD after applying the 0001 patch. This is achieved by using the injection
point to stop the walsender from sending a keepalive before receiving the old
origin position from the apply worker, ensuring the confirmed_flush
consistently moves backward before slotsync.

Additionally, I've reproduced the duplicate data issue on HEAD without slotsync
using the attached script (after applying the injection point patch). This
issue arises if we immediately disable the subscription after the
confirm_flush_lsn moves backward, preventing the walsender from advancing the
confirm_flush_lsn.

Script contents:
psql -d postgres -p $port_primary -c "create extension
injection_points;SELECT injection_points_attach('process-replies',
'wait');"

psql -d postgres -p $port_subscriber -c "alter subscription sub set
(two_phase =on); alter subscription sub enable ;"

sleep 1

psql -d postgres -p $port_subscriber -c "alter subscription sub disable;"

I think what you said in the above paragraph is happening here. How
can walsender move back the confirm_flush_lsn backwards when it is
waiting due to the injection point? I think I am missing something
here. It would be good if you could add a few comments to your
scripts.

Please see my comments in the attached (updated) script. The testcase
to reproduce the issue on HEAD is the same, only the comments have
been added to elaborate the flow which moves confirmed_flush backward.

thanks
Shveta

Attachments:

reproduce_without_slotsync_HEAD.shtext/x-sh; charset=US-ASCII; name=reproduce_without_slotsync_HEAD.shDownload
#73Nisha Moond
nisha.moond412@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#64)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, May 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Mon, May 5, 2025 at 6:59 PM Amit Kapila wrote:

On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

While I cannot be entirely certain of my analysis, I believe the root
cause might be related to the backward movement of the confirmed_flush
LSN. The following scenario seems possible:

1. The walsender enables the two_phase and sets two_phase_at (which
should be the same as confirmed_flush).
2. The slot's confirmed_flush regresses for some reason.
3. The slotsync worker retrieves the remote slot information and
enables two_phase for the local slot.

Yes, this is possible. Here is my theory as to how it can happen in the current
case. In the failed test, after the primary has prepared a transaction, the
transaction won't be replicated to the subscriber as two_phase was not
enabled for the slot. However, subsequent keepalive messages can send the
latest WAL location to the subscriber and get the confirmation of the same from
the subscriber without its origin being moved. Now, after we restart the apply
worker (due to disable/enable for a subscription), it will use the previous
origin_lsn to temporarily move back the confirmed flush LSN as explained in
one of the previous emails in another thread [1]. During this temporary
movement of confirm flush LSN, the slotsync worker fetches the two_phase_at
and confirm_flush_lsn values, leading to the assertion failure. We see this
issue intermittently because it depends on the timing of slotsync worker's
request to fetch the slot's value.

Based on this theory, I can reproduce the BF failure in the 040 tap-test on
HEAD after applying the 0001 patch. This is achieved by using the injection
point to stop the walsender from sending a keepalive before receiving the old
origin position from the apply worker, ensuring the confirmed_flush
consistently moves backward before slotsync.

Additionally, I've reproduced the duplicate data issue on HEAD without slotsync
using the attached script (after applying the injection point patch). This
issue arises if we immediately disable the subscription after the
confirm_flush_lsn moves backward, preventing the walsender from advancing the
confirm_flush_lsn.

In this case, if a prepared transaction exists before two_phase_at, then after
re-enabling the subscription, it will replicate that prepared transaction when
decoding the PREPARE record and replicate that again when decoding the COMMIT
PREPARED record. In such cases, the apply worker keeps reporting the error:

ERROR: transaction identifier "pg_gid_16387_755" is already in use.

Apart from above, we're investigating whether the same issue can occur in
back-branches and will share the results once ready.

The issue was confirmed to occur on back branches as well, due to
confirmed_flush_lsn moving backward. It has now been fixed on HEAD and
all supported back-branches down to PG13.

For details, refer to the separate thread [1]/messages/by-id/CAJpy0uDZ29P=BYB1JDWMCh-6wXaNqMwG1u1mB4=10Ly0x7HhwQ@mail.gmail.com; the fix was committed
(commit: ad5eaf3)[2]https://github.com/postgres/postgres/commit/ad5eaf390c58294e2e4c1509aa87bf13261a5d15.

The BF failure has not occurred since the fix, but we’ll continue to
keep an eye.

[1]: /messages/by-id/CAJpy0uDZ29P=BYB1JDWMCh-6wXaNqMwG1u1mB4=10Ly0x7HhwQ@mail.gmail.com
[2]: https://github.com/postgres/postgres/commit/ad5eaf390c58294e2e4c1509aa87bf13261a5d15

--
Thanks,
Nisha

#74Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Nisha Moond (#70)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, May 9, 2025 at 5:09 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Thu, May 8, 2025 at 11:35 AM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, May 7, 2025 at 4:36 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached is the v13 patch with the above comments addressed.

--

Thanks for the patch.

I think we can have the restriction mentioned under the 'two_phase'
section as well along with the 'failover' section in the CREATE
SUBSCRIPTION doc, similar to what we have in CREATE_REPLICATION_SLOT.

Apart from this, I have no more comments.

Thanks for the review.
Please find the v14 patch, updated with the above suggestion and
rephrased changes in alter_subscription.sgml with the missing links.

Thank you for updating the patch. I'm reviewing the v14 patch and will
share some comments tomorrow.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#75Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#74)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, May 21, 2025 at 4:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, May 9, 2025 at 5:09 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Thu, May 8, 2025 at 11:35 AM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, May 7, 2025 at 4:36 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached is the v13 patch with the above comments addressed.

--

Thanks for the patch.

I think we can have the restriction mentioned under the 'two_phase'
section as well along with the 'failover' section in the CREATE
SUBSCRIPTION doc, similar to what we have in CREATE_REPLICATION_SLOT.

Apart from this, I have no more comments.

Thanks for the review.
Please find the v14 patch, updated with the above suggestion and
rephrased changes in alter_subscription.sgml with the missing links.

Thank you for updating the patch. I'm reviewing the v14 patch and will
share some comments tomorrow.

Here are review comments for v14 patch:

I think we need to include a basic test case where we simply create a
subscription with two_phase=true and then enable the failover via
ALTER SUBSCRIPTION after making sure that slot's restart_lsn passed
two_phase_at. Probably if we use this case in 003_logical_slots.pl, we
can avoid creating regress_sub2?

---
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+        'postgres', qq(
+                BEGIN;
+                SELECT txid_current();
+                SELECT pg_log_standby_snapshot();
+                COMMIT;
+                BEGIN;
+                SELECT txid_current();
+                SELECT pg_log_standby_snapshot();
+                COMMIT;
+));
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+        'postgres',
+        "ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+         ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");

I think we need to wait for the subscription to consume these changes
before disabling the subscription. If the publisher doesn't consume
these WAL records for some reason, the subsequent "ALTER SUBSCRIPTION
... SET (failover = true)" will fail.

---
@@ -25,6 +25,8 @@ $publisher->init(
 $publisher->append_conf('postgresql.conf', 'autovacuum = off');
 $publisher->start;
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 $publisher->safe_psql('postgres',
        "CREATE PUBLICATION regress_mypub FOR ALL TABLES;");

@@ -42,6 +44,8 @@ my $slot_creation_time_on_primary = $publisher->safe_psql(
SELECT current_timestamp;
]);

+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
 # Create a subscription that enables failover.
 $subscriber1->safe_psql('postgres',
        "CREATE SUBSCRIPTION regress_mysub1 CONNECTION
'$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name =
lsub1_slot, copy_data = false, failover = true, enabled = false);"
@@ -98,6 +102,114 @@ my ($result, $stdout, $stderr) =
$subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
        "altering failover is not allowed for enabled subscription");

I think it's better to create the tables before the new test starts
with a comment explaining why we need to create the table here.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#76Nisha Moond
nisha.moond412@gmail.com
In reply to: Masahiko Sawada (#75)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

to
On Fri, May 23, 2025 at 10:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Here are review comments for v14 patch:

Thank you for the review.

I think we need to include a basic test case where we simply create a
subscription with two_phase=true and then enable the failover via
ALTER SUBSCRIPTION after making sure that slot's restart_lsn passed
two_phase_at. Probably if we use this case in 003_logical_slots.pl, we
can avoid creating regress_sub2?

A test has been added in 040_standby_failover_slots_sync.pl to enable
failover via ALTER SUBSCRIPTION.
Regarding regress_sub2 in 003_logical_slots.pl :
If we use ALTER SUBSCRIPTION to set failover=true on regress_sub, it
leads to pg_upgrade failure, as it attempts to create a slot on the
new_node(upgraded version) with both two_phase and failover enabled,
which is an unsupported combination.

---
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+        'postgres', qq(
+                BEGIN;
+                SELECT txid_current();
+                SELECT pg_log_standby_snapshot();
+                COMMIT;
+                BEGIN;
+                SELECT txid_current();
+                SELECT pg_log_standby_snapshot();
+                COMMIT;
+));
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+        'postgres',
+        "ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+         ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");

I think we need to wait for the subscription to consume these changes
before disabling the subscription. If the publisher doesn't consume
these WAL records for some reason, the subsequent "ALTER SUBSCRIPTION
... SET (failover = true)" will fail.

Done.

---
@@ -25,6 +25,8 @@ $publisher->init(
$publisher->append_conf('postgresql.conf', 'autovacuum = off');
$publisher->start;
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
$publisher->safe_psql('postgres',
"CREATE PUBLICATION regress_mypub FOR ALL TABLES;");

@@ -42,6 +44,8 @@ my $slot_creation_time_on_primary = $publisher->safe_psql(
SELECT current_timestamp;
]);

+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
# Create a subscription that enables failover.
$subscriber1->safe_psql('postgres',
"CREATE SUBSCRIPTION regress_mysub1 CONNECTION
'$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name =
lsub1_slot, copy_data = false, failover = true, enabled = false);"
@@ -98,6 +102,114 @@ my ($result, $stdout, $stderr) =
$subscriber1->psql('postgres',
ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
"altering failover is not allowed for enabled subscription");

I think it's better to create the tables before the new test starts
with a comment explaining why we need to create the table here.

Done.
~~~

Please find the updated patch v15, addressing above comments.

--
Thanks,
Nisha

Attachments:

v15-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/octet-stream; name=v15-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From 976f23ba99d8d0cb5d7137b4df65c8227436a3a6 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Fri, 9 May 2025 11:09:23 +0530
Subject: [PATCH v15] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  13 ++
 doc/src/sgml/protocol.sgml                    |  17 +++
 doc/src/sgml/ref/alter_subscription.sgml      |  12 ++
 doc/src/sgml/ref/create_subscription.sgml     |  17 +++
 src/backend/commands/subscriptioncmds.c       |  25 ++++
 src/backend/replication/logical/logical.c     |  11 ++
 src/backend/replication/logical/slotsync.c    |  10 ++
 src/backend/replication/slot.c                |  27 ++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  50 ++++++--
 .../t/040_standby_failover_slots_sync.pl      | 116 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 14 files changed, 298 insertions(+), 11 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 697c1a02891..45504cf14d5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29076,6 +29076,19 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         failover. A call to this function has the same effect as
         the replication protocol command
         <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+
+        <note>
+         <para>
+          The parameters <parameter>twophase</parameter> and
+          <parameter>failover</parameter> cannot be enabled together when
+          creating a replication slot. However, a slot created with
+          <parameter>twophase</parameter> enabled can later have
+          <parameter>failover</parameter> set to true either using
+          <xref linkend="protocol-replication-alter-replication-slot"/> or via
+          <xref linkend="sql-altersubscription"/> if a subscription is using
+          this slot.
+         </para>
+        </note>
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..2c2d66fad35 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2062,6 +2062,15 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           <literal>PREPARE TRANSACTION</literal> time.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>failover</literal>
+          when creating a logical replication slot. However, once the slot is
+          created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -2103,6 +2112,14 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           so that logical replication can be resumed after failover.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>two_phase</literal>
+          when creating the slot. However, once the slot is created with
+          <literal>two_phase = true</literal>, <literal>failover</literal> can
+          be set to true after the slot has consumed all the changes up to the
+          point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..afa29d4049b 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,18 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When a subscription's
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is in the pending state, setting <literal>failover</literal> to true is not
+      permitted. Once
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is enabled,
+      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+      can be set to true only after the slot has consumed all the changes up to the
+      point where two-phase decoding was enabled.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..55b20893e80 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -354,6 +354,14 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           to know the actual two-phase state.
          </para>
 
+         <para>
+          This option cannot be enabled together with
+          <literal>failover</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -426,6 +434,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          This option cannot be enabled together with
+          <literal>two_phase</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..54afb3f2e42 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,18 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together. See comments atop slotsync.c for details.
+	 */
+	if (opts.twophase && opts.failover)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable \"%s\" after two_phase state is ready",
+						"failover"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1257,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover until
+					 * two_phase state reaches READY state. See comments atop
+					 * slotsync.c for details.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"),
+								errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+										"failover", "two_phase"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 97b6aa899ee..1beb7c2eb56 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,17 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop slotsync.c for details.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..f5f2bcc708b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,20 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 * Please see the comments atop slotsync.c for details.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots())
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +862,19 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are potentially un-decoded transactions that are prepared before
+	 * two_phase_at. See comments atop slotsync.c for details.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot due to unconsumed changes"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..36c3b73677e 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -106,6 +106,7 @@ $oldpub->start;
 $oldpub->safe_psql(
 	'postgres', qq[
 		CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+		CREATE TABLE tbl2 AS SELECT generate_series(1, 10) AS a;
 		SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
 		SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');
 ]);
@@ -162,7 +163,8 @@ $oldpub->safe_psql(
 	'postgres', qq[
 	SELECT * FROM pg_drop_replication_slot('test_slot1');
 	SELECT * FROM pg_drop_replication_slot('test_slot2');
-	CREATE PUBLICATION regress_pub FOR ALL TABLES;
+	CREATE PUBLICATION regress_pub FOR TABLE tbl;
+	CREATE PUBLICATION regress_pub2 FOR TABLE tbl2;
 ]);
 
 # Initialize subscriber cluster
@@ -173,18 +175,23 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE TABLE tbl2 (a int);
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true');
+	CREATE SUBSCRIPTION regress_sub2 CONNECTION '$old_connstr' PUBLICATION regress_pub2 WITH (failover = 'true');
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
+$sub->wait_for_subscription_sync($oldpub, 'regress_sub2');
 
 # Also wait for two-phase to be enabled
-my $twophase_query =
-  "SELECT count(1) = 0 FROM pg_subscription WHERE subtwophasestate NOT IN ('e');";
-$sub->poll_query_until('postgres', $twophase_query)
-  or die "Timed out while waiting for subscriber to enable twophase";
+ok( $sub->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_sub';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
 
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+$sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub2 DISABLE");
 $oldpub->stop;
 
 # pg_upgrade should be successful
@@ -193,10 +200,17 @@ command_ok([@pg_upgrade_cmd], 'run of pg_upgrade of old cluster');
 # Check that the slot 'regress_sub' has migrated to the new cluster
 $newpub->start;
 my $result = $newpub->safe_psql('postgres',
-	"SELECT slot_name, two_phase, failover FROM pg_replication_slots");
-is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster');
+	"SELECT slot_name, two_phase FROM pg_replication_slots where slot_name = 'regress_sub'"
+);
+is($result, qq(regress_sub|t), 'check the slot exists on new cluster');
+
+# Check that the slot 'regress_sub2' has migrated to the new cluster
+$result = $newpub->safe_psql('postgres',
+	"SELECT slot_name, failover FROM pg_replication_slots where slot_name = 'regress_sub2'"
+);
+is($result, qq(regress_sub2|t), 'check the slot exists on new cluster');
 
-# Update the connection
+# Update the connection for regress_sub
 my $new_connstr = $newpub->connstr . ' dbname=postgres';
 $sub->safe_psql(
 	'postgres', qq[
@@ -204,13 +218,27 @@ $sub->safe_psql(
 	ALTER SUBSCRIPTION regress_sub ENABLE;
 ]);
 
+# Update the connection for regress_sub2
+$sub->safe_psql(
+	'postgres', qq[
+	ALTER SUBSCRIPTION regress_sub2 CONNECTION '$new_connstr';
+	ALTER SUBSCRIPTION regress_sub2 ENABLE;
+]);
+
 # Check whether changes on the new publisher get replicated to the subscriber
-$newpub->safe_psql('postgres',
-	"INSERT INTO tbl VALUES (generate_series(11, 20))");
+$newpub->safe_psql(
+	'postgres',
+	"INSERT INTO tbl VALUES (generate_series(11, 20));
+	INSERT INTO tbl2 VALUES (generate_series(11, 20));");
+
 $newpub->wait_for_catchup('regress_sub');
 $result = $sub->safe_psql('postgres', "SELECT count(*) FROM tbl");
 is($result, qq(20), 'check changes are replicated to the sub');
 
+$newpub->wait_for_catchup('regress_sub2');
+$result = $sub->safe_psql('postgres', "SELECT count(*) FROM tbl2");
+is($result, qq(20), 'check changes are replicated to the sub2');
+
 # Clean up
 $sub->stop();
 $newpub->stop();
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..2a915c8d610 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,122 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true).
+##################################################
+
+# Create a table on both publisher and subscriber to complete subscription
+# initialization, allowing two_phase to transition from 'pending' to 'enabled'.
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
+# Test that the failover option cannot be set for a slot with
+# restart_lsn < two_phase_at
+
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub2', 'pgoutput', false, true, false);"
+);
+
+# Try to enable failover of the slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT regress_mysub2 (failover);],
+	replication => 'database');
+ok( $stderr =~
+	  qr/ERROR:  cannot enable failover for a two-phase enabled replication slot due to unconsumed changes/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+# Create a subscription with two_phase enabled, using above slot
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, enabled = false, two_phase = true);"
+);
+
+# Try to enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state
+.*HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Enable the subscription
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+ok( $subscriber1->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_mysub2';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'regress_mysub2';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub3"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..8d4d55c802d 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  Use ALTER SUBSCRIPTION ... SET (failover = true) to enable "failover" after two_phase state is ready
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#77Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Nisha Moond (#76)
Re: Fix slot synchronization with two_phase decoding enabled

On Sun, May 25, 2025 at 11:34 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

to
On Fri, May 23, 2025 at 10:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Here are review comments for v14 patch:

Thank you for the review.

I think we need to include a basic test case where we simply create a
subscription with two_phase=true and then enable the failover via
ALTER SUBSCRIPTION after making sure that slot's restart_lsn passed
two_phase_at. Probably if we use this case in 003_logical_slots.pl, we
can avoid creating regress_sub2?

A test has been added in 040_standby_failover_slots_sync.pl to enable
failover via ALTER SUBSCRIPTION.

Yes but the slot is originally created via SQL API, which seems
uncommon usage in practice. I thought it would be good to have the
basic steps in the tests to enable both fields.

Regarding regress_sub2 in 003_logical_slots.pl :
If we use ALTER SUBSCRIPTION to set failover=true on regress_sub, it
leads to pg_upgrade failure, as it attempts to create a slot on the
new_node(upgraded version) with both two_phase and failover enabled,
which is an unsupported combination.

I think that the pg_upgrade test should cover the case where it
restores logical slots with both fields enabled in the new cluster.
When I actually tried this case, I realized that pg_upgrade doesn't
handle this case; it tried to create the logical slot via SQL API but
it failed as we don't allow it to create a logical slot with enabling
both two_phase and failover. How can we handle this case?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#78Nisha Moond
nisha.moond412@gmail.com
In reply to: Masahiko Sawada (#77)
1 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, May 28, 2025 at 6:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, May 25, 2025 at 11:34 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

to
On Fri, May 23, 2025 at 10:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Here are review comments for v14 patch:

Thank you for the review.

I think we need to include a basic test case where we simply create a
subscription with two_phase=true and then enable the failover via
ALTER SUBSCRIPTION after making sure that slot's restart_lsn passed
two_phase_at. Probably if we use this case in 003_logical_slots.pl, we
can avoid creating regress_sub2?

A test has been added in 040_standby_failover_slots_sync.pl to enable
failover via ALTER SUBSCRIPTION.

Yes but the slot is originally created via SQL API, which seems
uncommon usage in practice. I thought it would be good to have the
basic steps in the tests to enable both fields.

Apologies for the earlier misunderstanding. I've updated
040_standby_failover_slots_sync.pl to modify the slot creation via the
CREATE SUBSCRIPTION command as suggested.

Regarding regress_sub2 in 003_logical_slots.pl :
If we use ALTER SUBSCRIPTION to set failover=true on regress_sub, it
leads to pg_upgrade failure, as it attempts to create a slot on the
new_node(upgraded version) with both two_phase and failover enabled,
which is an unsupported combination.

I think that the pg_upgrade test should cover the case where it
restores logical slots with both fields enabled in the new cluster.
When I actually tried this case, I realized that pg_upgrade doesn't
handle this case; it tried to create the logical slot via SQL API but
it failed as we don't allow it to create a logical slot with enabling
both two_phase and failover. How can we handle this case?

On further analysis, it seems feasible and safe to allow creation of
slot and subscription with both two_phase and failover enabled during
upgrade (pg_upgrade).
As before starting the upgrade, pg_upgrade ensures that -
a) All slot changes have been fully consumed.
b) No prepared transactions exist.
Additionally, it is also documented[1]https://www.postgresql.org/docs/current/pgupgrade.html - "Obviously, no one should be
accessing the clusters during the upgrade."

Though, if allowed, the slot may be created with a restart_lsn <
two_phase_at. IIUC, the guarantees above make it impossible for a
prepared transaction to exist before two_phase_at, thus preventing the
bug case.
~~~~

Attached is the v16 patch, which includes the following changes:
- Enabled creation of slots and subscriptions with both two_phase and
failover options when in binary upgrade mode.
- Updated 003_logical_slots.pl to configure regress_sub's failover
option using ALTER SUBSCRIPTION, removing the need for the additional
regress_sub2 subscription.

[1]: https://www.postgresql.org/docs/current/pgupgrade.html

Thanks & Regards,
Nisha Moond

Attachments:

v16-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/x-patch; name=v16-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From 6284430ae19c53180472239ef095798c28ca9475 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Fri, 9 May 2025 11:09:23 +0530
Subject: [PATCH v16] PG17 Approach 3 Fix slot synchronization for two_phase
 enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  13 ++
 doc/src/sgml/protocol.sgml                    |  17 +++
 doc/src/sgml/ref/alter_subscription.sgml      |  12 ++
 doc/src/sgml/ref/create_subscription.sgml     |  17 +++
 src/backend/commands/subscriptioncmds.c       |  29 ++++
 src/backend/replication/logical/logical.c     |  11 ++
 src/backend/replication/logical/slotsync.c    |  10 ++
 src/backend/replication/slot.c                |  29 ++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  32 ++++-
 .../t/040_standby_failover_slots_sync.pl      | 124 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 14 files changed, 304 insertions(+), 1 deletion(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 697c1a02891..45504cf14d5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29076,6 +29076,19 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         failover. A call to this function has the same effect as
         the replication protocol command
         <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+
+        <note>
+         <para>
+          The parameters <parameter>twophase</parameter> and
+          <parameter>failover</parameter> cannot be enabled together when
+          creating a replication slot. However, a slot created with
+          <parameter>twophase</parameter> enabled can later have
+          <parameter>failover</parameter> set to true either using
+          <xref linkend="protocol-replication-alter-replication-slot"/> or via
+          <xref linkend="sql-altersubscription"/> if a subscription is using
+          this slot.
+         </para>
+        </note>
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..2c2d66fad35 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2062,6 +2062,15 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           <literal>PREPARE TRANSACTION</literal> time.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>failover</literal>
+          when creating a logical replication slot. However, once the slot is
+          created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -2103,6 +2112,14 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           so that logical replication can be resumed after failover.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>two_phase</literal>
+          when creating the slot. However, once the slot is created with
+          <literal>two_phase = true</literal>, <literal>failover</literal> can
+          be set to true after the slot has consumed all the changes up to the
+          point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..afa29d4049b 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,18 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When a subscription's
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is in the pending state, setting <literal>failover</literal> to true is not
+      permitted. Once
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is enabled,
+      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+      can be set to true only after the slot has consumed all the changes up to the
+      point where two-phase decoding was enabled.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..55b20893e80 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -354,6 +354,14 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           to know the actual two-phase state.
          </para>
 
+         <para>
+          This option cannot be enabled together with
+          <literal>failover</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -426,6 +434,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          This option cannot be enabled together with
+          <literal>two_phase</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..c66ae6219e9 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,22 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together. See comments atop slotsync.c for details.
+	 *
+	 * However, we allow this combination in binary upgrade mode, where
+	 * pg_upgrade guarantees that all slot changes are consumed and no
+	 * prepared transactions exist.
+	 */
+	if (opts.twophase && opts.failover && !IsBinaryUpgrade)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable \"%s\" after two_phase state is ready",
+						"failover"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1261,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover until
+					 * two_phase state reaches READY state. See comments atop
+					 * slotsync.c for details.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"),
+								errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+										"failover", "two_phase"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 97b6aa899ee..1beb7c2eb56 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,17 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop slotsync.c for details.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..3fea22cbc74 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,22 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 * Please see the comments atop slotsync.c for details.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot. This is also allowed in binary upgrade
+		 * mode, where pg_upgrade guarantees that all slot changes are
+		 * consumed and no prepared transactions exist.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots() && !IsBinaryUpgrade)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +864,19 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are potentially un-decoded transactions that are prepared before
+	 * two_phase_at. See comments atop slotsync.c for details.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot due to unconsumed changes"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..76b521b51bd 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -183,8 +183,38 @@ my $twophase_query =
 $sub->poll_query_until('postgres', $twophase_query)
   or die "Timed out while waiting for subscriber to enable twophase";
 
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$oldpub->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Wait for the subscription to be in sync
+$sub->wait_for_subscription_sync($oldpub, 'regress_sub');
+
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+
+# Alter subscription to enable failover
+$sub->psql('postgres',
+	"ALTER SUBSCRIPTION regress_sub SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $oldpub->safe_psql(
+		'postgres', q{
+		SELECT failover from pg_replication_slots WHERE slot_name = 'regress_sub';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
 $oldpub->stop;
 
 # pg_upgrade should be successful
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..0c80a40a9a8 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,130 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true).
+##################################################
+
+# Create a table on both publisher and subscriber to complete subscription
+# initialization, allowing two_phase to transition from 'pending' to 'enabled'.
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
+# Test that the failover option cannot be set for a slot with
+# restart_lsn < two_phase_at
+
+# Create an logical replication slot with two_phase enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('logical_slot', 'pgoutput', false, true, false);"
+);
+
+# Try to enable failover of the slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT logical_slot (failover);],
+	replication => 'database');
+ok( $stderr =~
+	  qr/ERROR:  cannot enable failover for a two-phase enabled replication slot due to unconsumed changes/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+# Drop the logical_slot
+$publisher->psql('postgres',
+	"SELECT pg_drop_replication_slot('logical_slot');");
+
+# Test that the failover option can be set for a two_phase enabled
+# subscription using ALTER SUBSCRIPTION.
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (enabled = false, two_phase = true);"
+);
+
+# Try to enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state
+.*HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Enable the subscription
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+ok( $subscriber1->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_mysub2';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'regress_mysub2';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub3"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..8d4d55c802d 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  Use ALTER SUBSCRIPTION ... SET (failover = true) to enable "failover" after two_phase state is ready
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

#79Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Nisha Moond (#78)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, May 29, 2025 at 2:00 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Wed, May 28, 2025 at 6:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, May 25, 2025 at 11:34 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

to
On Fri, May 23, 2025 at 10:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Here are review comments for v14 patch:

Thank you for the review.

I think we need to include a basic test case where we simply create a
subscription with two_phase=true and then enable the failover via
ALTER SUBSCRIPTION after making sure that slot's restart_lsn passed
two_phase_at. Probably if we use this case in 003_logical_slots.pl, we
can avoid creating regress_sub2?

A test has been added in 040_standby_failover_slots_sync.pl to enable
failover via ALTER SUBSCRIPTION.

Yes but the slot is originally created via SQL API, which seems
uncommon usage in practice. I thought it would be good to have the
basic steps in the tests to enable both fields.

Apologies for the earlier misunderstanding. I've updated
040_standby_failover_slots_sync.pl to modify the slot creation via the
CREATE SUBSCRIPTION command as suggested.

Regarding regress_sub2 in 003_logical_slots.pl :
If we use ALTER SUBSCRIPTION to set failover=true on regress_sub, it
leads to pg_upgrade failure, as it attempts to create a slot on the
new_node(upgraded version) with both two_phase and failover enabled,
which is an unsupported combination.

I think that the pg_upgrade test should cover the case where it
restores logical slots with both fields enabled in the new cluster.
When I actually tried this case, I realized that pg_upgrade doesn't
handle this case; it tried to create the logical slot via SQL API but
it failed as we don't allow it to create a logical slot with enabling
both two_phase and failover. How can we handle this case?

On further analysis, it seems feasible and safe to allow creation of
slot and subscription with both two_phase and failover enabled during
upgrade (pg_upgrade).
As before starting the upgrade, pg_upgrade ensures that -
a) All slot changes have been fully consumed.
b) No prepared transactions exist.
Additionally, it is also documented[1] - "Obviously, no one should be
accessing the clusters during the upgrade."

Though, if allowed, the slot may be created with a restart_lsn <
two_phase_at. IIUC, the guarantees above make it impossible for a
prepared transaction to exist before two_phase_at, thus preventing the
bug case.

True, but I think we need to cover the simple pg_dump and pg_restore
case too, without pg_upgrade. Otherwise the dump file won't work if
the database has at least one subscription with two_phase and failover
enabled.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#80Nisha Moond
nisha.moond412@gmail.com
In reply to: Masahiko Sawada (#79)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, May 30, 2025 at 12:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 29, 2025 at 2:00 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Wed, May 28, 2025 at 6:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, May 25, 2025 at 11:34 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

to
On Fri, May 23, 2025 at 10:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Here are review comments for v14 patch:

Thank you for the review.

I think we need to include a basic test case where we simply create a
subscription with two_phase=true and then enable the failover via
ALTER SUBSCRIPTION after making sure that slot's restart_lsn passed
two_phase_at. Probably if we use this case in 003_logical_slots.pl, we
can avoid creating regress_sub2?

A test has been added in 040_standby_failover_slots_sync.pl to enable
failover via ALTER SUBSCRIPTION.

Yes but the slot is originally created via SQL API, which seems
uncommon usage in practice. I thought it would be good to have the
basic steps in the tests to enable both fields.

Apologies for the earlier misunderstanding. I've updated
040_standby_failover_slots_sync.pl to modify the slot creation via the
CREATE SUBSCRIPTION command as suggested.

Regarding regress_sub2 in 003_logical_slots.pl :
If we use ALTER SUBSCRIPTION to set failover=true on regress_sub, it
leads to pg_upgrade failure, as it attempts to create a slot on the
new_node(upgraded version) with both two_phase and failover enabled,
which is an unsupported combination.

I think that the pg_upgrade test should cover the case where it
restores logical slots with both fields enabled in the new cluster.
When I actually tried this case, I realized that pg_upgrade doesn't
handle this case; it tried to create the logical slot via SQL API but
it failed as we don't allow it to create a logical slot with enabling
both two_phase and failover. How can we handle this case?

On further analysis, it seems feasible and safe to allow creation of
slot and subscription with both two_phase and failover enabled during
upgrade (pg_upgrade).
As before starting the upgrade, pg_upgrade ensures that -
a) All slot changes have been fully consumed.
b) No prepared transactions exist.
Additionally, it is also documented[1] - "Obviously, no one should be
accessing the clusters during the upgrade."

Though, if allowed, the slot may be created with a restart_lsn <
two_phase_at. IIUC, the guarantees above make it impossible for a
prepared transaction to exist before two_phase_at, thus preventing the
bug case.

True, but I think we need to cover the simple pg_dump and pg_restore
case too, without pg_upgrade. Otherwise the dump file won't work if
the database has at least one subscription with two_phase and failover
enabled.

Agree that we need to cover the simple pg_dump and pg_restore with the patch.

When pg_dump and pg_restore are used outside of pg_upgrade, there's no
guarantee that the target system does not have any prepared
transactions. In such cases, restoring a subscription with both
two_phase and failover enabled could lead to the bug, so we should
avoid allowing both options via pg_restore.

Here are a few possible solutions:

1) Split the CREATE SUBSCRIPTION command of such subscriptions in pg_restore :
first create the sub with two_phase:
- CREATE SUBSCRIPTION ... with (two_phase = on)
then enable failover
- ALTER SUBSCRIPTION ... with (failover=true)

However, this approach adds complexity in pg_restore and may still
fail if the slot's restart_lsn is earlier than two_phase_at.

2) Prevent dump of subscription in non-upgrade mode:
Raise an error/warning during pg_dump when a subscription with both
two_phase and failover is encountered (unless in binary upgrade mode).
That is, either fail the pg_dump, or skip dumping subscriptions in
such a case with a warning.
We can include a helpful hint:
"Disable failover for subscription <sub-name> before dumping and
re-enable it after restore."

3) Dump such subscriptions with (two_phase=on, failover=on,
create_slot=false) together. As pg_dump always sets "connect=false"
for subscription, it is up to users to reactivate the subscription
suitably. In this case, we can add some document to warn that it's the
user's responsibility to ensure the remote slot is in a safe state.

What do you think?

Please let me know if you have any other suggestions for handling this
in pg_dump/pg_restore.

--

Thanks & Regards,
Nisha Moond

#81Amit Kapila
amit.kapila16@gmail.com
In reply to: Nisha Moond (#80)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, May 30, 2025 at 3:00 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Agree that we need to cover the simple pg_dump and pg_restore with the patch.

When pg_dump and pg_restore are used outside of pg_upgrade, there's no
guarantee that the target system does not have any prepared
transactions. In such cases, restoring a subscription with both
two_phase and failover enabled could lead to the bug, so we should
avoid allowing both options via pg_restore.

Here are a few possible solutions:

1) Split the CREATE SUBSCRIPTION command of such subscriptions in pg_restore :
first create the sub with two_phase:
- CREATE SUBSCRIPTION ... with (two_phase = on)
then enable failover
- ALTER SUBSCRIPTION ... with (failover=true)

This won't work because we always dump a subscription with the
'connect' option as 'false'. As changing the subscription's failover
option, needs the connection above split won't work.

However, this approach adds complexity in pg_restore and may still
fail if the slot's restart_lsn is earlier than two_phase_at.

2) Prevent dump of subscription in non-upgrade mode:
Raise an error/warning during pg_dump when a subscription with both
two_phase and failover is encountered (unless in binary upgrade mode).
That is, either fail the pg_dump, or skip dumping subscriptions in
such a case with a warning.
We can include a helpful hint:
"Disable failover for subscription <sub-name> before dumping and
re-enable it after restore."

3) Dump such subscriptions with (two_phase=on, failover=on,
create_slot=false) together. As pg_dump always sets "connect=false"
for subscription, it is up to users to reactivate the subscription
suitably. In this case, we can add some document to warn that it's the
user's responsibility to ensure the remote slot is in a safe state.

What do you think?

Please let me know if you have any other suggestions for handling this
in pg_dump/pg_restore.

Yet another idea is to dump the subscription with two_phase = on and
failover = false. We should do this when both options are 'true'
during the dump. As we are documenting that we always dump
createsubscription with connect as false and let users take care (see
[1]: https://www.postgresql.org/docs/devel/app-pgdump.html
reasoning could be given for the failover flag.

The one more combination to consider is when someone takes a dump of
an older version and loads it into a newer version. For example, where
users dump from 17.5 and then restore in a newer version, say 17.6
(which has our fix), the restore will fail due to newer restrictions
added by this patch. Do we need to do anything about it?

[1]: https://www.postgresql.org/docs/devel/app-pgdump.html

--
With Regards,
Amit Kapila.

#82Nisha Moond
nisha.moond412@gmail.com
In reply to: Amit Kapila (#81)
2 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Mon, Jun 2, 2025 at 10:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, May 30, 2025 at 3:00 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Agree that we need to cover the simple pg_dump and pg_restore with the patch.

When pg_dump and pg_restore are used outside of pg_upgrade, there's no
guarantee that the target system does not have any prepared
transactions. In such cases, restoring a subscription with both
two_phase and failover enabled could lead to the bug, so we should
avoid allowing both options via pg_restore.

Here are a few possible solutions:

1) Split the CREATE SUBSCRIPTION command of such subscriptions in pg_restore :
first create the sub with two_phase:
- CREATE SUBSCRIPTION ... with (two_phase = on)
then enable failover
- ALTER SUBSCRIPTION ... with (failover=true)

This won't work because we always dump a subscription with the
'connect' option as 'false'. As changing the subscription's failover
option, needs the connection above split won't work.

However, this approach adds complexity in pg_restore and may still
fail if the slot's restart_lsn is earlier than two_phase_at.

2) Prevent dump of subscription in non-upgrade mode:
Raise an error/warning during pg_dump when a subscription with both
two_phase and failover is encountered (unless in binary upgrade mode).
That is, either fail the pg_dump, or skip dumping subscriptions in
such a case with a warning.
We can include a helpful hint:
"Disable failover for subscription <sub-name> before dumping and
re-enable it after restore."

3) Dump such subscriptions with (two_phase=on, failover=on,
create_slot=false) together. As pg_dump always sets "connect=false"
for subscription, it is up to users to reactivate the subscription
suitably. In this case, we can add some document to warn that it's the
user's responsibility to ensure the remote slot is in a safe state.

What do you think?

Please let me know if you have any other suggestions for handling this
in pg_dump/pg_restore.

Yet another idea is to dump the subscription with two_phase = on and
failover = false. We should do this when both options are 'true'
during the dump. As we are documenting that we always dump
createsubscription with connect as false and let users take care (see
[1] (When dumping logical replication subscriptions ...)), a similar
reasoning could be given for the failover flag.

+1

The one more combination to consider is when someone takes a dump of
an older version and loads it into a newer version. For example, where
users dump from 17.5 and then restore in a newer version, say 17.6
(which has our fix), the restore will fail due to newer restrictions
added by this patch. Do we need to do anything about it?

[1] : https://www.postgresql.org/docs/devel/app-pgdump.html

~~~

Attached v17 patches. Added a top-up patch 0002 implementing the idea
suggested by Amit above.

Thanks,
Nisha

Attachments:

v17-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/octet-stream; name=v17-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From 7b4dcc9eea86bd8c9c837da1e10b8941324d16fd Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Tue, 3 Jun 2025 10:42:53 +0530
Subject: [PATCH v17 1/2] PG17 Approach 3 Fix slot synchronization for
 two_phase enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  13 ++
 doc/src/sgml/protocol.sgml                    |  17 +++
 doc/src/sgml/ref/alter_subscription.sgml      |  12 ++
 doc/src/sgml/ref/create_subscription.sgml     |  17 +++
 src/backend/commands/subscriptioncmds.c       |  29 ++++
 src/backend/replication/logical/logical.c     |  11 ++
 src/backend/replication/logical/slotsync.c    |  10 ++
 src/backend/replication/slot.c                |  29 ++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  32 ++++-
 .../t/040_standby_failover_slots_sync.pl      | 124 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 14 files changed, 304 insertions(+), 1 deletion(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 697c1a02891..45504cf14d5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29076,6 +29076,19 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         failover. A call to this function has the same effect as
         the replication protocol command
         <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+
+        <note>
+         <para>
+          The parameters <parameter>twophase</parameter> and
+          <parameter>failover</parameter> cannot be enabled together when
+          creating a replication slot. However, a slot created with
+          <parameter>twophase</parameter> enabled can later have
+          <parameter>failover</parameter> set to true either using
+          <xref linkend="protocol-replication-alter-replication-slot"/> or via
+          <xref linkend="sql-altersubscription"/> if a subscription is using
+          this slot.
+         </para>
+        </note>
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..2c2d66fad35 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2062,6 +2062,15 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           <literal>PREPARE TRANSACTION</literal> time.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>failover</literal>
+          when creating a logical replication slot. However, once the slot is
+          created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -2103,6 +2112,14 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           so that logical replication can be resumed after failover.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>two_phase</literal>
+          when creating the slot. However, once the slot is created with
+          <literal>two_phase = true</literal>, <literal>failover</literal> can
+          be set to true after the slot has consumed all the changes up to the
+          point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..afa29d4049b 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,18 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When a subscription's
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is in the pending state, setting <literal>failover</literal> to true is not
+      permitted. Once
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is enabled,
+      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+      can be set to true only after the slot has consumed all the changes up to the
+      point where two-phase decoding was enabled.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..55b20893e80 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -354,6 +354,14 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           to know the actual two-phase state.
          </para>
 
+         <para>
+          This option cannot be enabled together with
+          <literal>failover</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -426,6 +434,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          This option cannot be enabled together with
+          <literal>two_phase</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..c66ae6219e9 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,22 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together. See comments atop slotsync.c for details.
+	 *
+	 * However, we allow this combination in binary upgrade mode, where
+	 * pg_upgrade guarantees that all slot changes are consumed and no
+	 * prepared transactions exist.
+	 */
+	if (opts.twophase && opts.failover && !IsBinaryUpgrade)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at CREATE SUBSCRIPTION",
+					   "failover", "two_phase"),
+				errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable \"%s\" after two_phase state is ready",
+						"failover"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1261,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover until
+					 * two_phase state reaches READY state. See comments atop
+					 * slotsync.c for details.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two_phase state"),
+								errhint("The \"%s\" option can be enabled after \"%s\" state is ready.",
+										"failover", "two_phase"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 97b6aa899ee..1beb7c2eb56 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,17 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop slotsync.c for details.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..3fea22cbc74 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,22 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 * Please see the comments atop slotsync.c for details.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot. This is also allowed in binary upgrade
+		 * mode, where pg_upgrade guarantees that all slot changes are
+		 * consumed and no prepared transactions exist.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots() && !IsBinaryUpgrade)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +864,19 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are potentially un-decoded transactions that are prepared before
+	 * two_phase_at. See comments atop slotsync.c for details.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("cannot enable failover for a two-phase enabled replication slot due to unconsumed changes"),
+				errdetail("The slot need to consume change upto %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..76b521b51bd 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -183,8 +183,38 @@ my $twophase_query =
 $sub->poll_query_until('postgres', $twophase_query)
   or die "Timed out while waiting for subscriber to enable twophase";
 
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$oldpub->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Wait for the subscription to be in sync
+$sub->wait_for_subscription_sync($oldpub, 'regress_sub');
+
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+
+# Alter subscription to enable failover
+$sub->psql('postgres',
+	"ALTER SUBSCRIPTION regress_sub SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $oldpub->safe_psql(
+		'postgres', q{
+		SELECT failover from pg_replication_slots WHERE slot_name = 'regress_sub';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
 $oldpub->stop;
 
 # pg_upgrade should be successful
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..0c80a40a9a8 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,130 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true).
+##################################################
+
+# Create a table on both publisher and subscriber to complete subscription
+# initialization, allowing two_phase to transition from 'pending' to 'enabled'.
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
+# Test that the failover option cannot be set for a slot with
+# restart_lsn < two_phase_at
+
+# Create an logical replication slot with two_phase enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('logical_slot', 'pgoutput', false, true, false);"
+);
+
+# Try to enable failover of the slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT logical_slot (failover);],
+	replication => 'database');
+ok( $stderr =~
+	  qr/ERROR:  cannot enable failover for a two-phase enabled replication slot due to unconsumed changes/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+# Drop the logical_slot
+$publisher->psql('postgres',
+	"SELECT pg_drop_replication_slot('logical_slot');");
+
+# Test that the failover option can be set for a two_phase enabled
+# subscription using ALTER SUBSCRIPTION.
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (enabled = false, two_phase = true);"
+);
+
+# Try to enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two_phase state
+.*HINT:  The "failover" option can be enabled after "two_phase" state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Enable the subscription
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+ok( $subscriber1->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_mysub2';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'regress_mysub2';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub3"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..8d4d55c802d 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at CREATE SUBSCRIPTION
+HINT:  Use ALTER SUBSCRIPTION ... SET (failover = true) to enable "failover" after two_phase state is ready
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

v17-0002-Fix-for-pg_dump.patchapplication/octet-stream; name=v17-0002-Fix-for-pg_dump.patchDownload
From 1e8c88c9fb98e15af3a9ffd1bb02559f8216a5c4 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Tue, 3 Jun 2025 10:48:00 +0530
Subject: [PATCH v17 2/2] Fix for pg_dump

When there is subscrption where both two_phase and failover are enabled,
generate the CREATE SUBSCRIPTION command only with two_phase = on and
ignore failover i.e. failover will be false(default).
---
 doc/src/sgml/ref/pg_dump.sgml | 13 +++++++++++++
 src/bin/pg_dump/pg_dump.c     | 12 +++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index cfc74ca6d69..ed077ac4159 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1633,6 +1633,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
    had been originally created with <literal>two_phase = true</literal> option.
   </para>
 
+  <para>
+   When dumping logical replication subscriptions where both
+   <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+   and <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+   are enabled, <application>pg_dump</application> will generate <command>CREATE
+   SUBSCRIPTION</command> commands with only <literal>two_phase = on</literal>
+   option, ignoring the <literal>failover</literal>. This ensures that restoring
+   the subscription does not fail due to the incompatibility of enabling both
+   options at creation time (see <xref linkend="sql-createsubscription"/>).
+   After the restore, users can enable <literal>failover</literal> using
+   <xref linkend="sql-altersubscription"/>.
+  </para>
+
   <para>
    It is generally recommended to use the <option>-X</option>
    (<option>--no-psqlrc</option>) option when restoring a database from a
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 9ed1a856fa3..5c31f30e914 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5129,6 +5129,7 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	int			npubnames = 0;
 	int			i;
 	char		two_phase_disabled[] = {LOGICALREP_TWOPHASE_STATE_DISABLED, '\0'};
+	bool		set_failover = true;
 
 	/* Do nothing in data-only dump */
 	if (dopt->dataOnly)
@@ -5174,8 +5175,17 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 		appendPQExpBufferStr(query, ", streaming = parallel");
 
 	if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
+	{
 		appendPQExpBufferStr(query, ", two_phase = on");
 
+		/*
+		 * The failover option cannot be set togather with two_phase during
+		 * CREATE SUBSCRIPTION. It can be enabled later using ALTER
+		 * SUBSCRIPTION after the subscription is restored.
+		 */
+		set_failover = false;
+	}
+
 	if (strcmp(subinfo->subdisableonerr, "t") == 0)
 		appendPQExpBufferStr(query, ", disable_on_error = true");
 
@@ -5185,7 +5195,7 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (strcmp(subinfo->subrunasowner, "t") == 0)
 		appendPQExpBufferStr(query, ", run_as_owner = true");
 
-	if (strcmp(subinfo->subfailover, "t") == 0)
+	if (set_failover && strcmp(subinfo->subfailover, "t") == 0)
 		appendPQExpBufferStr(query, ", failover = true");
 
 	if (strcmp(subinfo->subsynccommit, "off") != 0)
-- 
2.34.1

#83Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#81)
Re: Fix slot synchronization with two_phase decoding enabled

On Sun, Jun 1, 2025 at 10:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yet another idea is to dump the subscription with two_phase = on and
failover = false. We should do this when both options are 'true'
during the dump. As we are documenting that we always dump
createsubscription with connect as false and let users take care (see
[1] (When dumping logical replication subscriptions ...)), a similar
reasoning could be given for the failover flag.

It probably would work for subscriptions but what about logical slots?
If we restore two_phase=on and failover=false, the user would have to
enable the failover via ALTER_REPLICATION_SLOT. I think we should
avoid asking users to execute the replication commands directly.

The one more combination to consider is when someone takes a dump of
an older version and loads it into a newer version. For example, where
users dump from 17.5 and then restore in a newer version, say 17.6
(which has our fix), the restore will fail due to newer restrictions
added by this patch. Do we need to do anything about it?

A valid concern. Implementing this change could potentially render
dumps created prior to version 17.5 incompatible with version 17.6 or
later, which seems a significant backwards incompatibility to me. Do
we have any precedence of such incompatibility?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#84Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#83)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Jun 4, 2025 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Jun 1, 2025 at 10:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yet another idea is to dump the subscription with two_phase = on and
failover = false. We should do this when both options are 'true'
during the dump. As we are documenting that we always dump
createsubscription with connect as false and let users take care (see
[1] (When dumping logical replication subscriptions ...)), a similar
reasoning could be given for the failover flag.

It probably would work for subscriptions but what about logical slots?
If we restore two_phase=on and failover=false, the user would have to
enable the failover via ALTER_REPLICATION_SLOT. I think we should
avoid asking users to execute the replication commands directly.

Sorry, I mixed manually creating a logical slot and restoring a dump,
and I think we've already discussed that. Please ignore this comment.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#85Dilip Kumar
dilipbalaut@gmail.com
In reply to: Nisha Moond (#82)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Jun 3, 2025 at 11:05 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v17 patches. Added a top-up patch 0002 implementing the idea
suggested by Amit above.

I have started reviewing this, although I haven't done a complete
review yet, but I have a question on the fix we are trying to do, IIUC
we are disallowing to use 'two phase' and 'failover' options together
at the create slot time and now users has to create slot with one of
the option and later enable other option right (if user want to use
both options)? But don't you think it will affect usability? because
if a user wants to use both the options together then after creating
the slot they need to track when is the right time to enable the other
option? Not sure if anyone else has this concern or it's just me?

--
Regards,
Dilip Kumar
Google

#86Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#85)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Jun 5, 2025 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 3, 2025 at 11:05 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v17 patches. Added a top-up patch 0002 implementing the idea
suggested by Amit above.

I have started reviewing this, although I haven't done a complete
review yet, but I have a question on the fix we are trying to do, IIUC
we are disallowing to use 'two phase' and 'failover' options together
at the create slot time and now users has to create slot with one of
the option and later enable other option right (if user want to use
both options)? But don't you think it will affect usability? because
if a user wants to use both the options together then after creating
the slot they need to track when is the right time to enable the other
option? Not sure if anyone else has this concern or it's just me?

Some additional comments while quickly glancing at the patch

1.
+ if (two_phase && !IsSyncingReplicationSlots() && !IsBinaryUpgrade)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable both \"%s\" and \"%s\" options during
replication slot creation",
+    "failover", "two_phase"));

I think we should also give hints to retry later when a certain
constraint is met?

Also this is hardcoded options "failover" and "two_phase" so why do we
need to use %s for contruncting this error message?

2.
+    "failover", "two_phase"),
+ errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable
\"%s\" after two_phase state is ready",
+ "failover"));

Here we are using a mix of hardcoded string and formatted string, like
for (failover = true) we hardcoded the "failover" whereas to enable
\"%s\", we have
used %s. Better to just directly use failover as we are not depending
on any variable. Please look at other places as well, I see a few
more places whereas
we have used like this.

3. + if (slot->data.failover)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+ NameStr(slot->data.name))));

So for a failover slot we can never enable two_phase, whereas for
two_phase enabled slot we can enable failover? This seems confusing,
no?

--
Regards,
Dilip Kumar
Google

#87Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#85)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Jun 5, 2025 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 3, 2025 at 11:05 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v17 patches. Added a top-up patch 0002 implementing the idea
suggested by Amit above.

I have started reviewing this, although I haven't done a complete
review yet, but I have a question on the fix we are trying to do, IIUC
we are disallowing to use 'two phase' and 'failover' options together
at the create slot time and now users has to create slot with one of
the option and later enable other option right (if user want to use
both options)? But don't you think it will affect usability? because
if a user wants to use both the options together then after creating
the slot they need to track when is the right time to enable the other
option?

Note that this is the restriction for 17 only, as we can't think of a
better way to fix it. For HEAD, we have exposed two_phase_at via the
existing view to fix this bug, see 4868c96bc8. We can't do the same in
backbranches as discussed in email [1]/messages/by-id/CAA4eK1Jx7Ed_58+ywXozmHvRRhAVO2Yfcyoi0e5PRexZn5A6Gw@mail.gmail.com -- With Regards, Amit Kapila.. If you have any better ideas
to fix this in backbranch, then kindly let us know.

[1]: /messages/by-id/CAA4eK1Jx7Ed_58+ywXozmHvRRhAVO2Yfcyoi0e5PRexZn5A6Gw@mail.gmail.com -- With Regards, Amit Kapila.
--
With Regards,
Amit Kapila.

#88Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#87)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Jun 5, 2025 at 3:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 5, 2025 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 3, 2025 at 11:05 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v17 patches. Added a top-up patch 0002 implementing the idea
suggested by Amit above.

I have started reviewing this, although I haven't done a complete
review yet, but I have a question on the fix we are trying to do, IIUC
we are disallowing to use 'two phase' and 'failover' options together
at the create slot time and now users has to create slot with one of
the option and later enable other option right (if user want to use
both options)? But don't you think it will affect usability? because
if a user wants to use both the options together then after creating
the slot they need to track when is the right time to enable the other
option?

Note that this is the restriction for 17 only, as we can't think of a
better way to fix it. For HEAD, we have exposed two_phase_at via the
existing view to fix this bug, see 4868c96bc8. We can't do the same in
backbranches as discussed in email [1]. If you have any better ideas
to fix this in backbranch, then kindly let us know.

Sure, I will look at the commit and get back, Thanks.

--
Regards,
Dilip Kumar
Google

#89Nisha Moond
nisha.moond412@gmail.com
In reply to: Masahiko Sawada (#83)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Jun 4, 2025 at 11:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Jun 1, 2025 at 10:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

The one more combination to consider is when someone takes a dump of
an older version and loads it into a newer version. For example, where
users dump from 17.5 and then restore in a newer version, say 17.6
(which has our fix), the restore will fail due to newer restrictions
added by this patch. Do we need to do anything about it?

A valid concern. Implementing this change could potentially render
dumps created prior to version 17.5 incompatible with version 17.6 or
later, which seems a significant backwards incompatibility to me. Do
we have any precedence of such incompatibility?

As per the upgrade documentation[1]https://www.postgresql.org/docs/current/upgrading.html:
- For major version upgrades, it is recommended to use the pg_dump
(and pg_dumpall) from the newer version of Postgres.
- For minor version upgrades, it is clearly mentioned to use binary
replacement. i.e. - simply shut down the server, replace the binaries,
and restart - as minor versions are binary-compatible and do not
change the storage format.

As per the pg_dump documentation[2]https://www.postgresql.org/docs/14/app-pgdump.html:
"Because pg_dump is used to transfer data to newer versions of
PostgreSQL, the output of pg_dump can be expected to load into
PostgreSQL server versions newer than pg_dump's version."

By looking at both the docs[1]https://www.postgresql.org/docs/current/upgrading.html[2]https://www.postgresql.org/docs/14/app-pgdump.html, we can draw the conclusion that the
above statement from the pg_dump docs[2]https://www.postgresql.org/docs/14/app-pgdump.html is true for the major version
and thus our fix should be safe in that context.

[1]: https://www.postgresql.org/docs/current/upgrading.html
[2]: https://www.postgresql.org/docs/14/app-pgdump.html

--
Thanks,
Nisha

#90Nisha Moond
nisha.moond412@gmail.com
In reply to: Dilip Kumar (#86)
2 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Jun 5, 2025 at 3:26 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 5, 2025 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 3, 2025 at 11:05 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v17 patches. Added a top-up patch 0002 implementing the idea
suggested by Amit above.

I have started reviewing this, although I haven't done a complete
review yet, but I have a question on the fix we are trying to do, IIUC
we are disallowing to use 'two phase' and 'failover' options together
at the create slot time and now users has to create slot with one of
the option and later enable other option right (if user want to use
both options)? But don't you think it will affect usability? because
if a user wants to use both the options together then after creating
the slot they need to track when is the right time to enable the other
option? Not sure if anyone else has this concern or it's just me?

Some additional comments while quickly glancing at the patch

Thank you for the review.

1.
+ if (two_phase && !IsSyncingReplicationSlots() && !IsBinaryUpgrade)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable both \"%s\" and \"%s\" options during
replication slot creation",
+    "failover", "two_phase"));

I think we should also give hints to retry later when a certain
constraint is met?

Since it’s not possible to create a slot with both options enabled,
this command will always fail.

As there are no specific constraints to fulfill that would allow
creating a slot with both options enabled, the possible hints are:
- Suggesting to use ALTER_REPLICATION_SLOT command to alter failover
later. But it may not be appropriate here from the user's perspective.
- Recommending to use ALTER SUBSCRIPTION to enable failover later.
But not all slots are created for subscriptions, so this may not apply
in every case.

Please let me know if you have any suggestions for a suitable hint we
could provide here.

Also this is hardcoded options "failover" and "two_phase" so why do we
need to use %s for contruncting this error message?

I’ve followed the error message guidelines to keep the messages
translator-friendly. The use of %s helps isolate keywords like GUCs or
sub-options so they remain untouched during translation.

That said, it’s not a strict rule, usage often depends on developer or
committer judgment. After revisiting the docs and re-evaluating this
patch related files, I’ve adjusted the messages with the following in
mind:
- do not use "%s" when terms "failover" and "two-phase" can be used
as concepts and not subscription options.
- use "%s" when referring explicitly to options (or nearby code is so).
- wrap SQL commands in double quotes for clarity.
- since slot.c doesn’t use "%s" in similar contexts, I’ve updated the
message style there for consistency.

Please let me know if you feel any of the messages could still be
improved further.

2.
+    "failover", "two_phase"),
+ errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable
\"%s\" after two_phase state is ready",
+ "failover"));

Here we are using a mix of hardcoded string and formatted string, like
for (failover = true) we hardcoded the "failover" whereas to enable
\"%s\", we have
used %s. Better to just directly use failover as we are not depending
on any variable. Please look at other places as well, I see a few
more places whereas
we have used like this.

Modified this message following the reasoning outlined above.

3. + if (slot->data.failover)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+ NameStr(slot->data.name))));

So for a failover slot we can never enable two_phase, whereas for
two_phase enabled slot we can enable failover? This seems confusing,
no?

This restriction is not introduced by this patch.
As Amit pointed out [1]/messages/by-id/CAA4eK1+B067G8mUJzKUEjc5KSkYq6z0utTaHey-qeRt+nZTNJg@mail.gmail.com, this patch is specific to PG17, where
altering the two_phase option for a subscription is not permitted.
This capability has been added in PG18 onwards.
~~~

Attached v18 patch.
- patch-001: modified error messages as suggested above.
- patch-002: improved pg_dump docs as per Shveta's off-list suggestions.

[1]: /messages/by-id/CAA4eK1+B067G8mUJzKUEjc5KSkYq6z0utTaHey-qeRt+nZTNJg@mail.gmail.com

--
Thanks,
Nisha

Attachments:

v18-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/octet-stream; name=v18-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From 408c79a3a259002461870fd8fd7176041e73082a Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Tue, 3 Jun 2025 10:42:53 +0530
Subject: [PATCH v18 1/2] PG17 Approach 3 Fix slot synchronization for
 two_phase enables slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering slot's failover to true when two_phase state is 'pending'.
   User can try altering failover again when two_phase state is moved beyond 'pending'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  13 ++
 doc/src/sgml/protocol.sgml                    |  17 +++
 doc/src/sgml/ref/alter_subscription.sgml      |  12 ++
 doc/src/sgml/ref/create_subscription.sgml     |  17 +++
 src/backend/commands/subscriptioncmds.c       |  28 ++++
 src/backend/replication/logical/logical.c     |  11 ++
 src/backend/replication/logical/slotsync.c    |  10 ++
 src/backend/replication/slot.c                |  28 ++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  32 ++++-
 .../t/040_standby_failover_slots_sync.pl      | 124 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 14 files changed, 302 insertions(+), 1 deletion(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..eb124248981 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both failover and two_phase options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 697c1a02891..45504cf14d5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29076,6 +29076,19 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         failover. A call to this function has the same effect as
         the replication protocol command
         <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+
+        <note>
+         <para>
+          The parameters <parameter>twophase</parameter> and
+          <parameter>failover</parameter> cannot be enabled together when
+          creating a replication slot. However, a slot created with
+          <parameter>twophase</parameter> enabled can later have
+          <parameter>failover</parameter> set to true either using
+          <xref linkend="protocol-replication-alter-replication-slot"/> or via
+          <xref linkend="sql-altersubscription"/> if a subscription is using
+          this slot.
+         </para>
+        </note>
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..2c2d66fad35 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2062,6 +2062,15 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           <literal>PREPARE TRANSACTION</literal> time.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>failover</literal>
+          when creating a logical replication slot. However, once the slot is
+          created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -2103,6 +2112,14 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           so that logical replication can be resumed after failover.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>two_phase</literal>
+          when creating the slot. However, once the slot is created with
+          <literal>two_phase = true</literal>, <literal>failover</literal> can
+          be set to true after the slot has consumed all the changes up to the
+          point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..afa29d4049b 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,18 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When a subscription's
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is in the pending state, setting <literal>failover</literal> to true is not
+      permitted. Once
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is enabled,
+      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+      can be set to true only after the slot has consumed all the changes up to the
+      point where two-phase decoding was enabled.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..55b20893e80 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -354,6 +354,14 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           to know the actual two-phase state.
          </para>
 
+         <para>
+          This option cannot be enabled together with
+          <literal>failover</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -426,6 +434,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          This option cannot be enabled together with
+          <literal>two_phase</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..3c05b042498 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,21 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together. See comments atop slotsync.c for details.
+	 *
+	 * However, we allow this combination in binary upgrade mode, where
+	 * pg_upgrade guarantees that all slot changes are consumed and no
+	 * prepared transactions exist.
+	 */
+	if (opts.twophase && opts.failover && !IsBinaryUpgrade)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at \"CREATE SUBSCRIPTION\"",
+					   "failover", "two_phase"),
+				errhint("Use \"ALTER SUBSCRIPTION ... SET (failover = true)\" to enable failover after two-phase state is ready"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1260,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover until
+					 * two_phase state reaches READY state. See comments atop
+					 * slotsync.c for details.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two-phase state"),
+								errhint("The \"%s\" option can be enabled after two-phase state is ready.",
+										"failover"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 97b6aa899ee..1beb7c2eb56 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,17 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop slotsync.c for details.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..089006e4717 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,21 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 * Please see the comments atop slotsync.c for details.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot. This is also allowed in binary upgrade
+		 * mode, where pg_upgrade guarantees that all slot changes are
+		 * consumed and no prepared transactions exist.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots() && !IsBinaryUpgrade)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both failover and two_phase options during replication slot creation"));
 	}
 
 	/*
@@ -848,6 +863,19 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are potentially un-decoded transactions that are prepared before
+	 * two_phase_at. See comments atop slotsync.c for details.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("could not enable failover for a two-phase enabled replication slot due to unconsumed changes"),
+				errdetail("The slot needs to consume change up to %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..76b521b51bd 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -183,8 +183,38 @@ my $twophase_query =
 $sub->poll_query_until('postgres', $twophase_query)
   or die "Timed out while waiting for subscriber to enable twophase";
 
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$oldpub->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Wait for the subscription to be in sync
+$sub->wait_for_subscription_sync($oldpub, 'regress_sub');
+
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+
+# Alter subscription to enable failover
+$sub->psql('postgres',
+	"ALTER SUBSCRIPTION regress_sub SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $oldpub->safe_psql(
+		'postgres', q{
+		SELECT failover from pg_replication_slots WHERE slot_name = 'regress_sub';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
 $oldpub->stop;
 
 # pg_upgrade should be successful
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..0db7e6866d1 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,130 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true).
+##################################################
+
+# Create a table on both publisher and subscriber to complete subscription
+# initialization, allowing two_phase to transition from 'pending' to 'enabled'.
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
+# Test that the failover option cannot be set for a slot with
+# restart_lsn < two_phase_at
+
+# Create an logical replication slot with two_phase enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('logical_slot', 'pgoutput', false, true, false);"
+);
+
+# Try to enable failover of the slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT logical_slot (failover);],
+	replication => 'database');
+ok( $stderr =~
+	  qr/ERROR:  could not enable failover for a two-phase enabled replication slot due to unconsumed changes/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+# Drop the logical_slot
+$publisher->psql('postgres',
+	"SELECT pg_drop_replication_slot('logical_slot');");
+
+# Test that the failover option can be set for a two_phase enabled
+# subscription using ALTER SUBSCRIPTION.
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (enabled = false, two_phase = true);"
+);
+
+# Try to enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two-phase state
+.*HINT:  The "failover" option can be enabled after two-phase state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Enable the subscription
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+ok( $subscriber1->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_mysub2';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'regress_mysub2';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub3"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..9f4daf47901 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at "CREATE SUBSCRIPTION"
+HINT:  Use "ALTER SUBSCRIPTION ... SET (failover = true)" to enable failover after two-phase state is ready
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

v18-0002-Fix-for-pg_dump.patchapplication/octet-stream; name=v18-0002-Fix-for-pg_dump.patchDownload
From f1e51a497946b2bc5227430bad26af2141584f00 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Tue, 3 Jun 2025 10:48:00 +0530
Subject: [PATCH v18 2/2] Fix for pg_dump

When there is subscrption where both two_phase and failover are enabled,
generate the CREATE SUBSCRIPTION command only with two_phase = on and
ignore failover i.e. failover will be false(default).
---
 doc/src/sgml/ref/pg_dump.sgml | 13 +++++++++++++
 src/bin/pg_dump/pg_dump.c     | 12 +++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index cfc74ca6d69..6c133c5e12a 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1633,6 +1633,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
    had been originally created with <literal>two_phase = true</literal> option.
   </para>
 
+  <para>
+   When dumping logical replication subscriptions where both
+   <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+   and <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+   are enabled, <application>pg_dump</application> will generate <command>CREATE
+   SUBSCRIPTION</command> commands with only <literal>two_phase = true</literal>
+   option, omitting the <literal>failover=true</literal> option. That way,
+   the dump can be restored without encountering the error caused by enabling
+   both failover and two_phase simultaneously during <xref linkend="sql-createsubscription"/>.
+   After the restore, users can enable <literal>failover</literal> using
+   <xref linkend="sql-altersubscription"/>.
+  </para>
+
   <para>
    It is generally recommended to use the <option>-X</option>
    (<option>--no-psqlrc</option>) option when restoring a database from a
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 9ed1a856fa3..5c31f30e914 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5129,6 +5129,7 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	int			npubnames = 0;
 	int			i;
 	char		two_phase_disabled[] = {LOGICALREP_TWOPHASE_STATE_DISABLED, '\0'};
+	bool		set_failover = true;
 
 	/* Do nothing in data-only dump */
 	if (dopt->dataOnly)
@@ -5174,8 +5175,17 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 		appendPQExpBufferStr(query, ", streaming = parallel");
 
 	if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
+	{
 		appendPQExpBufferStr(query, ", two_phase = on");
 
+		/*
+		 * The failover option cannot be set togather with two_phase during
+		 * CREATE SUBSCRIPTION. It can be enabled later using ALTER
+		 * SUBSCRIPTION after the subscription is restored.
+		 */
+		set_failover = false;
+	}
+
 	if (strcmp(subinfo->subdisableonerr, "t") == 0)
 		appendPQExpBufferStr(query, ", disable_on_error = true");
 
@@ -5185,7 +5195,7 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (strcmp(subinfo->subrunasowner, "t") == 0)
 		appendPQExpBufferStr(query, ", run_as_owner = true");
 
-	if (strcmp(subinfo->subfailover, "t") == 0)
+	if (set_failover && strcmp(subinfo->subfailover, "t") == 0)
 		appendPQExpBufferStr(query, ", failover = true");
 
 	if (strcmp(subinfo->subsynccommit, "off") != 0)
-- 
2.34.1

#91Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#87)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Jun 5, 2025 at 3:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 5, 2025 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 3, 2025 at 11:05 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v17 patches. Added a top-up patch 0002 implementing the idea
suggested by Amit above.

I have started reviewing this, although I haven't done a complete
review yet, but I have a question on the fix we are trying to do, IIUC
we are disallowing to use 'two phase' and 'failover' options together
at the create slot time and now users has to create slot with one of
the option and later enable other option right (if user want to use
both options)? But don't you think it will affect usability? because
if a user wants to use both the options together then after creating
the slot they need to track when is the right time to enable the other
option?

Note that this is the restriction for 17 only, as we can't think of a
better way to fix it. For HEAD, we have exposed two_phase_at via the
existing view to fix this bug, see 4868c96bc8. We can't do the same in
backbranches as discussed in email [1]. If you have any better ideas
to fix this in backbranch, then kindly let us know.

Understood. On HEAD , we've exposed two_phase_at to fix this problem.
It's unfortunate we can't back-port this solution. This means we have
to impose new restrictions on what was previously allowed in older
versions of PostgreSQL to avoid the issue. While it's not ideal, we
don't have a better approach for those older branches right now.
Thanks for clarifying.

--
Regards,
Dilip Kumar
Google

#92shveta malik
shveta.malik@gmail.com
In reply to: Nisha Moond (#90)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Jun 6, 2025 at 12:37 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v18 patch.
- patch-001: modified error messages as suggested above.
- patch-002: improved pg_dump docs as per Shveta's off-list suggestions.

Thanks for the patches. Please find few comments:

1)
+ * However, we allow this combination in binary upgrade mode, where
+ * pg_upgrade guarantees that all slot changes are consumed and no
+ * prepared transactions exist.
+ */

Can we please mention on which system pg_upgrade ensures that no
prepared txn exists. Is it at the source system or target system or
both? The similar comment is there at other place too, please update
that as well.

2)

+ * The failover option cannot be set togather with two_phase during
+ * CREATE SUBSCRIPTION. It can be enabled later using ALTER
+ * SUBSCRIPTION after the subscription is restored.
+ */

togather->together

3)
We have these 2 messages in slot.c and subscriptioncmds.c, we can have
both in similar way to maintain consistency across the patch. Since we
are using 'options' in the msg, thus one with %s looks better

+ errmsg("cannot enable both failover and two_phase options during
replication slot creation"));

+ errmsg("cannot enable both \"%s\" and \"%s\" options at \"CREATE
SUBSCRIPTION\"",
+    "failover", "two_phase"),

4)
Can we please update patch-message for patch002 similar to doc to make
it more clear.

thanks
Shveta

#93Ajin Cherian
itsajin@gmail.com
In reply to: Nisha Moond (#90)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Jun 6, 2025 at 5:07 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v18 patch.
- patch-001: modified error messages as suggested above.
- patch-002: improved pg_dump docs as per Shveta's off-list suggestions.

[1] /messages/by-id/CAA4eK1+B067G8mUJzKUEjc5KSkYq6z0utTaHey-qeRt+nZTNJg@mail.gmail.com

--
Thanks,
Nisha

Hi Nisha,

Some comments.

Patch 0001
1. Commit message

from "two_phase enables slots." to "two_phase enabled slots"

2.
+
+     <para>
+      When a subscription's
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is in the pending state, setting <literal>failover</literal> to
true is not
+      permitted. Once

The term "subscription's two_phase" sounds a bit vague here.
Referring https://www.postgresql.org/docs/17/sql-createsubscription.html
and https://www.postgresql.org/docs/current/catalog-pg-subscription.html
two_phase is referred to both as the "internal state" as well as the
"mode" of the subscription.

Probably rephrase it to - "When a subscription's internal two_phase
mode is in the pending state, setting failover to true is not
permitted. See column subtwophasestate of pg_subscription to know the
actual internal state of the two-phase mode" (with appropriate tags
and reference links)

Patch 0002

1. Commit message

Typo: change "subscrption" to "subscription"

regards,
Ajin Cherian
Fujitsu Australia

#94Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Nisha Moond (#90)
Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Jun 6, 2025 at 12:07 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

On Thu, Jun 5, 2025 at 3:26 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 5, 2025 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 3, 2025 at 11:05 AM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v17 patches. Added a top-up patch 0002 implementing the idea
suggested by Amit above.

I have started reviewing this, although I haven't done a complete
review yet, but I have a question on the fix we are trying to do, IIUC
we are disallowing to use 'two phase' and 'failover' options together
at the create slot time and now users has to create slot with one of
the option and later enable other option right (if user want to use
both options)? But don't you think it will affect usability? because
if a user wants to use both the options together then after creating
the slot they need to track when is the right time to enable the other
option? Not sure if anyone else has this concern or it's just me?

Some additional comments while quickly glancing at the patch

Thank you for the review.

1.
+ if (two_phase && !IsSyncingReplicationSlots() && !IsBinaryUpgrade)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable both \"%s\" and \"%s\" options during
replication slot creation",
+    "failover", "two_phase"));

I think we should also give hints to retry later when a certain
constraint is met?

Since it’s not possible to create a slot with both options enabled,
this command will always fail.

As there are no specific constraints to fulfill that would allow
creating a slot with both options enabled, the possible hints are:
- Suggesting to use ALTER_REPLICATION_SLOT command to alter failover
later. But it may not be appropriate here from the user's perspective.
- Recommending to use ALTER SUBSCRIPTION to enable failover later.
But not all slots are created for subscriptions, so this may not apply
in every case.

Please let me know if you have any suggestions for a suitable hint we
could provide here.

Also this is hardcoded options "failover" and "two_phase" so why do we
need to use %s for contruncting this error message?

I’ve followed the error message guidelines to keep the messages
translator-friendly. The use of %s helps isolate keywords like GUCs or
sub-options so they remain untouched during translation.

That said, it’s not a strict rule, usage often depends on developer or
committer judgment. After revisiting the docs and re-evaluating this
patch related files, I’ve adjusted the messages with the following in
mind:
- do not use "%s" when terms "failover" and "two-phase" can be used
as concepts and not subscription options.
- use "%s" when referring explicitly to options (or nearby code is so).
- wrap SQL commands in double quotes for clarity.
- since slot.c doesn’t use "%s" in similar contexts, I’ve updated the
message style there for consistency.

Please let me know if you feel any of the messages could still be
improved further.

2.
+    "failover", "two_phase"),
+ errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable
\"%s\" after two_phase state is ready",
+ "failover"));

Here we are using a mix of hardcoded string and formatted string, like
for (failover = true) we hardcoded the "failover" whereas to enable
\"%s\", we have
used %s. Better to just directly use failover as we are not depending
on any variable. Please look at other places as well, I see a few
more places whereas
we have used like this.

Modified this message following the reasoning outlined above.

3. + if (slot->data.failover)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+ NameStr(slot->data.name))));

So for a failover slot we can never enable two_phase, whereas for
two_phase enabled slot we can enable failover? This seems confusing,
no?

This restriction is not introduced by this patch.
As Amit pointed out [1], this patch is specific to PG17, where
altering the two_phase option for a subscription is not permitted.
This capability has been added in PG18 onwards.
~~~

Attached v18 patch.
- patch-001: modified error messages as suggested above.
- patch-002: improved pg_dump docs as per Shveta's off-list suggestions.

Thank you for updating the patch.

BTW have we addressed the point Amit mentioned before[1]/messages/by-id/CAA4eK1KiGU5Zr_S2g8w-8dUm-PRiZc72BgMnLdS83u7sgnctcg@mail.gmail.com?

The one more combination to consider is when someone takes a dump of
an older version and loads it into a newer version. For example, where
users dump from 17.5 and then restore in a newer version, say 17.6
(which has our fix), the restore will fail due to newer restrictions
added by this patch. Do we need to do anything about it?

I think it could be a significant side-effect and we need to do
something about that.

Regards,

[1]: /messages/by-id/CAA4eK1KiGU5Zr_S2g8w-8dUm-PRiZc72BgMnLdS83u7sgnctcg@mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#95Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#93)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Jun 11, 2025 at 8:16 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Jun 6, 2025 at 5:07 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v18 patch.
- patch-001: modified error messages as suggested above.
- patch-002: improved pg_dump docs as per Shveta's off-list suggestions.

[1] /messages/by-id/CAA4eK1+B067G8mUJzKUEjc5KSkYq6z0utTaHey-qeRt+nZTNJg@mail.gmail.com

--
Thanks,
Nisha

Hi Nisha,

Some comments.

Patch 0001
1. Commit message

from "two_phase enables slots." to "two_phase enabled slots"

2.
+
+     <para>
+      When a subscription's
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is in the pending state, setting <literal>failover</literal> to
true is not
+      permitted. Once

The term "subscription's two_phase" sounds a bit vague here.
Referring https://www.postgresql.org/docs/17/sql-createsubscription.html
and https://www.postgresql.org/docs/current/catalog-pg-subscription.html
two_phase is referred to both as the "internal state" as well as the
"mode" of the subscription.

Probably rephrase it to - "When a subscription's internal two_phase
mode is in the pending state, setting failover to true is not
permitted. See column subtwophasestate of pg_subscription to know the
actual internal state of the two-phase mode" (with appropriate tags
and reference links)

There seems a potential muddling of terms two_phase VS two-phase.

AFAIK "two_phase" (with the underscore) is the name of the
subscription option; it might also be the name of an internal
structure member or catalog column name. Furthermore, when you are
referring to the option name I think it ought to be double-quoted for
clarity.

But, unless you are referring specifically to the option/field/column
then IMO it should say two-phase (with the dash).
e.g. The "two_phase" option enables two-phase mode.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#96shveta malik
shveta.malik@gmail.com
In reply to: Masahiko Sawada (#94)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Jun 11, 2025 at 11:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

BTW have we addressed the point Amit mentioned before[1]?

The one more combination to consider is when someone takes a dump of
an older version and loads it into a newer version. For example, where
users dump from 17.5 and then restore in a newer version, say 17.6
(which has our fix), the restore will fail due to newer restrictions
added by this patch. Do we need to do anything about it?

I think it could be a significant side-effect and we need to do
something about that.

After giving it more thought, we have an opinion that this
side-effect/issue is unlikely to occur if users follow our
documentation properly.

The recommended approach for upgrading between minor versions is to
shut down the server and replace the binaries. See 'To update between
compatible versions' in [1]https://www.postgresql.org/docs/current/upgrading.html.

Also it is recommended in docs that we use pg_dump from the newer
version of PostgreSQL. See 'It is recommended that you use the
pg_dump' in [2]https://www.postgresql.org/docs/current/upgrading.html#UPGRADING-VIA-PGDUMPALL. This particular recommendation is in the Upgrade doc.
If needed, we can make a similar recommendation in any of our failover
specific docs as well, mentioning this particular case.

In brief, our overall understanding is that a) pg_dump is mainly used
for major versions upgrade b) pg_dump of higher version is used.
Please let us know if your understanding is different here.

Beyond these steps, we could not find any better solution for the
pointed case. But we are open to exploring and implementing any
alternative solutions you may have. Feedback is most welcome here.

[1]: https://www.postgresql.org/docs/current/upgrading.html
[2]: https://www.postgresql.org/docs/current/upgrading.html#UPGRADING-VIA-PGDUMPALL

thanks
Shveta

#97Nisha Moond
nisha.moond412@gmail.com
In reply to: shveta malik (#92)
2 attachment(s)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Jun 10, 2025 at 2:29 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Jun 6, 2025 at 12:37 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v18 patch.
- patch-001: modified error messages as suggested above.
- patch-002: improved pg_dump docs as per Shveta's off-list suggestions.

Thanks for the patches. Please find few comments:

Thanks for the review.

1)
+ * However, we allow this combination in binary upgrade mode, where
+ * pg_upgrade guarantees that all slot changes are consumed and no
+ * prepared transactions exist.
+ */

Can we please mention on which system pg_upgrade ensures that no
prepared txn exists. Is it at the source system or target system or
both? The similar comment is there at other place too, please update
that as well.

pg_upgrade checks for prepared transactions on both source and target
systems. I've updated the respective comments.
~~~

Attached v19 patches addressing all the comments in [1]/messages/by-id/CAJpy0uCGE0fi4oMb9A_L27C-fJq+VpSO9ZbcVxW5s5F3+sX0Lw@mail.gmail.com and [2]/messages/by-id/CAFPTHDYXjq_=r2ss6YMQBr=wbSfkH=uGJFrJzVpaW0eqPhy0RQ@mail.gmail.com -- Thanks, Nisha.

[1]: /messages/by-id/CAJpy0uCGE0fi4oMb9A_L27C-fJq+VpSO9ZbcVxW5s5F3+sX0Lw@mail.gmail.com
[2]: /messages/by-id/CAFPTHDYXjq_=r2ss6YMQBr=wbSfkH=uGJFrJzVpaW0eqPhy0RQ@mail.gmail.com -- Thanks, Nisha
--
Thanks,
Nisha

Attachments:

v19-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchapplication/x-patch; name=v19-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patchDownload
From b6c1032a504f699256327160e5b9dfd99ae4f962 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Tue, 3 Jun 2025 10:42:53 +0530
Subject: [PATCH v19 1/2] PG17 Approach 3 Fix slot synchronization for
 two-phase enabled slots.

The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To prevent the risk of losing prepared transactions, we disallow enabling
both failover and twophase during slot creation, but permits altering
failover to true once ensured that slot's restart_lsn > two_phase_at.

The fix enforces the following conditions:
1) Always disallow creating slots with two_phase=true and failover=true.
2) Always disallow creating subscriptions with (two_phase=true, failover=true).
3) Prevent altering the slot's failover to true if two_phase=true and restart_lsn
   is less than two_phase_at. Otherwise, allow changing failover to true.
4) Disallow altering subscription's failover to true when two-phase state is 'pending'.
   User can try altering failover again when two-phase state is moved to 'enabled'.
---
 contrib/test_decoding/expected/slot.out       |   2 +
 contrib/test_decoding/sql/slot.sql            |   1 +
 doc/src/sgml/func.sgml                        |  13 ++
 doc/src/sgml/protocol.sgml                    |  17 +++
 doc/src/sgml/ref/alter_subscription.sgml      |  10 ++
 doc/src/sgml/ref/create_subscription.sgml     |  17 +++
 src/backend/commands/subscriptioncmds.c       |  28 ++++
 src/backend/replication/logical/logical.c     |  11 ++
 src/backend/replication/logical/slotsync.c    |  10 ++
 src/backend/replication/slot.c                |  30 +++++
 src/bin/pg_upgrade/t/003_logical_slots.pl     |  32 ++++-
 .../t/040_standby_failover_slots_sync.pl      | 124 ++++++++++++++++++
 src/test/regress/expected/subscription.out    |   4 +
 src/test/regress/sql/subscription.sql         |   4 +
 14 files changed, 302 insertions(+), 1 deletion(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..87b28ad8d55 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
+ERROR:  cannot enable both "failover" and "two_phase" options during replication slot creation
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 ----------
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
+SELECT 'init' FROM pg_create_logical_replication_slot('failover_twophase_true_slot', 'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 697c1a02891..45504cf14d5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29076,6 +29076,19 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         failover. A call to this function has the same effect as
         the replication protocol command
         <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+
+        <note>
+         <para>
+          The parameters <parameter>twophase</parameter> and
+          <parameter>failover</parameter> cannot be enabled together when
+          creating a replication slot. However, a slot created with
+          <parameter>twophase</parameter> enabled can later have
+          <parameter>failover</parameter> set to true either using
+          <xref linkend="protocol-replication-alter-replication-slot"/> or via
+          <xref linkend="sql-altersubscription"/> if a subscription is using
+          this slot.
+         </para>
+        </note>
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0396d3a9e98..2c2d66fad35 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2062,6 +2062,15 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           <literal>PREPARE TRANSACTION</literal> time.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>failover</literal>
+          when creating a logical replication slot. However, once the slot is
+          created with <literal>two_phase = true</literal>,
+          <literal>failover</literal> can be set to true after the slot has
+          consumed all the changes up to the point where two-phase decoding
+          was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -2103,6 +2112,14 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
           so that logical replication can be resumed after failover.
           The default is false.
          </para>
+
+         <para>
+          This option cannot be set together with <literal>two_phase</literal>
+          when creating the slot. However, once the slot is created with
+          <literal>two_phase = true</literal>, <literal>failover</literal> can
+          be set to true after the slot has consumed all the changes up to the
+          point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 2ccbf5e4897..ec3fc44e4e5 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -257,6 +257,16 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
       option is enabled.
      </para>
+
+     <para>
+      When subscription's two-phase is in the pending state, setting
+      <literal>failover</literal> to true is not permitted. Once the two-phase
+      state is enabled, <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+      can be set to true only after the slot has consumed all the changes up to the
+      point where two-phase decoding was enabled. See column <structfield>subtwophasestate</structfield>
+      of <link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>
+      to know the actual two-phase state.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index c9c8dd440dc..55b20893e80 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -354,6 +354,14 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           to know the actual two-phase state.
          </para>
 
+         <para>
+          This option cannot be enabled together with
+          <literal>failover</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
 
@@ -426,6 +434,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
           replication can be resumed from the new primary after failover.
           The default is <literal>false</literal>.
          </para>
+
+         <para>
+          This option cannot be enabled together with
+          <literal>two_phase</literal> when creating a subscription. However, a
+          <literal>two_phase</literal> enabled subscription can later have
+          <literal>failover</literal> set to true using
+          <xref linkend="sql-altersubscription"/>, once the slot has consumed
+          all the changes up to the point where two-phase decoding was enabled.
+         </para>
         </listitem>
        </varlistentry>
       </variablelist></para>
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..a5c33f879d3 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -648,6 +648,21 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 				 errmsg("password_required=false is superuser-only"),
 				 errhint("Subscriptions with the password_required option set to false may only be created or modified by the superuser.")));
 
+	/*
+	 * Do not allow users to enable the failover and two_phase options
+	 * together. See comments atop slotsync.c for details.
+	 *
+	 * However, we allow this combination in binary upgrade mode, where
+	 * pg_upgrade guarantees that all slot changes are consumed and no
+	 * prepared transactions exist on either the old or new cluster.
+	 */
+	if (opts.twophase && opts.failover && !IsBinaryUpgrade)
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("cannot enable both \"%s\" and \"%s\" options at \"CREATE SUBSCRIPTION\"",
+					   "failover", "two_phase"),
+				errhint("Use \"ALTER SUBSCRIPTION ... SET (failover = true)\" to enable failover after two-phase state is ready"));
+
 	/*
 	 * If built with appropriate switch, whine when regression-testing
 	 * conventions for subscription names are violated.
@@ -1245,6 +1260,19 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								 errmsg("cannot set %s for enabled subscription",
 										"failover")));
 
+					/*
+					 * Do not allow users to enable the failover until
+					 * two_phase state reaches READY state. See comments atop
+					 * slotsync.c for details.
+					 */
+					if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+						opts.failover)
+						ereport(ERROR,
+								errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+								errmsg("cannot enable failover for a subscription with a pending two-phase state"),
+								errhint("The \"%s\" option can be enabled after two-phase state is ready.",
+										"failover"));
+
 					/*
 					 * The changed failover option of the slot can't be rolled
 					 * back.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 97b6aa899ee..1beb7c2eb56 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -613,6 +613,17 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	/* Mark slot to allow two_phase decoding if not already marked */
 	if (ctx->twophase && !slot->data.two_phase)
 	{
+		/*
+		 * Do not allow two-phase decoding for failover enabled slots.
+		 *
+		 * See comments atop slotsync.c for details.
+		 */
+		if (slot->data.failover)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+							NameStr(slot->data.name))));
+
 		SpinLockAcquire(&slot->mutex);
 		slot->data.two_phase = true;
 		slot->data.two_phase_at = start_lsn;
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 73fe3f56af2..011c244ac35 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,6 +42,16 @@
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
+ *
+ * Regarding two-phase enabled slots, we need to note that the two_phase_at
+ * field of a slot is not synchronized. In the absence of two_phase_at,
+ * logical decoding could incorrectly identify prepared transactions as
+ * having been replicated to the subscriber after promotion, resulting in
+ * their omission. Consequently, both two_phase and failover features
+ * cannot be simultaneously enabled during slot creation. Failover can only
+ * be enabled once we can ensure that no transactions were prepared prior
+ * to two_phase_at on the primary. These restrictions are enforced through
+ * checks in ReplicationSlotCreate() and ReplicationSlotAlter().
  *---------------------------------------------------------------------------
  */
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a1d4768623f..8e0748d2528 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -343,6 +343,23 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 			ereport(ERROR,
 					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					errmsg("cannot enable failover for a temporary replication slot"));
+
+		/*
+		 * Do not allow users to enable both failover and two_phase for slots.
+		 * Please see the comments atop slotsync.c for details.
+		 *
+		 * However, both failover and two_phase enabled slots can be created
+		 * during slot synchronization because we need to retain the same
+		 * values as the remote slot. This is also allowed in binary upgrade
+		 * mode, where pg_upgrade guarantees that all slot changes are
+		 * consumed and no prepared transactions exist on either the old or
+		 * new cluster.
+		 */
+		if (two_phase && !IsSyncingReplicationSlots() && !IsBinaryUpgrade)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot enable both \"%s\" and \"%s\" options during replication slot creation",
+						   "failover", "two_phase"));
 	}
 
 	/*
@@ -848,6 +865,19 @@ ReplicationSlotAlter(const char *name, bool failover)
 				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				errmsg("cannot enable failover for a temporary replication slot"));
 
+	/*
+	 * Do not allow users to enable failover for a two_phase enabled slot if
+	 * there are potentially un-decoded transactions that are prepared before
+	 * two_phase_at. See comments atop slotsync.c for details.
+	 */
+	if (failover && MyReplicationSlot->data.two_phase &&
+		MyReplicationSlot->data.restart_lsn < MyReplicationSlot->data.two_phase_at)
+		ereport(ERROR,
+				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				errmsg("could not enable failover for a two-phase enabled replication slot due to unconsumed changes"),
+				errdetail("The slot needs to consume change up to %X/%X to enable failover.",
+						  LSN_FORMAT_ARGS(MyReplicationSlot->data.two_phase_at)));
+
 	if (MyReplicationSlot->data.failover != failover)
 	{
 		SpinLockAcquire(&MyReplicationSlot->mutex);
diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0a2483d3dfc..76b521b51bd 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -173,7 +173,7 @@ $sub->start;
 $sub->safe_psql(
 	'postgres', qq[
 	CREATE TABLE tbl (a int);
-	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true', failover = 'true')
+	CREATE SUBSCRIPTION regress_sub CONNECTION '$old_connstr' PUBLICATION regress_pub WITH (two_phase = 'true')
 ]);
 $sub->wait_for_subscription_sync($oldpub, 'regress_sub');
 
@@ -183,8 +183,38 @@ my $twophase_query =
 $sub->poll_query_until('postgres', $twophase_query)
   or die "Timed out while waiting for subscriber to enable twophase";
 
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$oldpub->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Wait for the subscription to be in sync
+$sub->wait_for_subscription_sync($oldpub, 'regress_sub');
+
 # 2. Temporarily disable the subscription
 $sub->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub DISABLE");
+
+# Alter subscription to enable failover
+$sub->psql('postgres',
+	"ALTER SUBSCRIPTION regress_sub SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $oldpub->safe_psql(
+		'postgres', q{
+		SELECT failover from pg_replication_slots WHERE slot_name = 'regress_sub';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
 $oldpub->stop;
 
 # pg_upgrade should be successful
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 823857bb329..0db7e6866d1 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -98,6 +98,130 @@ my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
 ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
 	"altering failover is not allowed for enabled subscription");
 
+##################################################
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription only through Alter Subscription (failover=true).
+##################################################
+
+# Create a table on both publisher and subscriber to complete subscription
+# initialization, allowing two_phase to transition from 'pending' to 'enabled'.
+$publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+$subscriber1->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
+
+# Test that the failover option cannot be set for a slot with
+# restart_lsn < two_phase_at
+
+# Create an logical replication slot with two_phase enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('logical_slot', 'pgoutput', false, true, false);"
+);
+
+# Try to enable failover of the slot
+($result, $stdout, $stderr) = $publisher->psql(
+	'postgres',
+	qq[ALTER_REPLICATION_SLOT logical_slot (failover);],
+	replication => 'database');
+ok( $stderr =~
+	  qr/ERROR:  could not enable failover for a two-phase enabled replication slot due to unconsumed changes/,
+	"failover can't be enabled if restart_lsn < two_phase_at on a two_phase subscription."
+);
+
+# Drop the logical_slot
+$publisher->psql('postgres',
+	"SELECT pg_drop_replication_slot('logical_slot');");
+
+# Test that the failover option can be set for a two_phase enabled
+# subscription using ALTER SUBSCRIPTION.
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub2 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (enabled = false, two_phase = true);"
+);
+
+# Try to enable failover for the subscription with two_phase in pending state
+($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 SET (failover = true)");
+ok( $stderr =~
+	  /ERROR:  cannot enable failover for a subscription with a pending two-phase state
+.*HINT:  The "failover" option can be enabled after two-phase state is ready./,
+	"enabling failover is not allowed for a two_phase pending subscription");
+
+# Enable the subscription
+$subscriber1->safe_psql('postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 ENABLE;");
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Also wait for two-phase to be enabled
+ok( $subscriber1->poll_query_until(
+		'postgres',
+		qq[SELECT subtwophasestate FROM pg_subscription WHERE subname = 'regress_mysub2';],
+		'e',),
+	"Timed out while waiting for subscriber to enable twophase");
+
+# Advance the slot's restart_lsn to allow enabling the failover option
+# on a two_phase-enabled subscription using ALTER SUBSCRIPTION.
+$publisher->safe_psql(
+	'postgres', qq(
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+		BEGIN;
+		SELECT txid_current();
+		SELECT pg_log_standby_snapshot();
+		COMMIT;
+));
+
+# Wait for the subscription to be in sync
+$subscriber1->wait_for_subscription_sync($publisher, 'regress_mysub2');
+
+# Alter subscription to enable failover
+$subscriber1->psql(
+	'postgres',
+	"ALTER SUBSCRIPTION regress_mysub2 DISABLE;
+	 ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);");
+
+# Confirm that the failover flag on the slot has been turned on
+is( $publisher->safe_psql(
+		'postgres',
+		q{SELECT failover from pg_replication_slots WHERE slot_name = 'regress_mysub2';}
+	),
+	"t",
+	'logical slot has failover true on the publisher');
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub2;");
+
+# Test that subscription creation with two_phase=true results in error when
+# using a slot with failover set to true.
+
+# Create a slot with failover enabled
+$publisher->psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regress_mysub3', 'pgoutput', false, false, true);"
+);
+
+my $logoffset = -s $subscriber1->logfile;
+
+# Create a subscription with two_phase enabled
+$subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION regress_mysub3 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (create_slot = false, two_phase = true);"
+);
+
+ok( $subscriber1->wait_for_log(
+		qr/cannot enable two-phase decoding for failover enabled slot "regress_mysub3"/,
+		$logoffset),
+	"creating subscription with two_phase=true is not allowed when the slot has failover set to true"
+);
+
+# Drop the subscription
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub3;");
+
+# Clean up the tables created
+$publisher->safe_psql('postgres', "DROP TABLE tab_full;");
+$subscriber1->safe_psql('postgres', "DROP TABLE tab_full;");
+
 ##################################################
 # Test that pg_sync_replication_slots() cannot be executed on a non-standby server.
 ##################################################
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 0f2a25cdc19..9f4daf47901 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -479,6 +479,10 @@ COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 RESET SESSION AUTHORIZATION;
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+ERROR:  cannot enable both "failover" and "two_phase" options at "CREATE SUBSCRIPTION"
+HINT:  Use "ALTER SUBSCRIPTION ... SET (failover = true)" to enable failover after two-phase state is ready
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c6..47fc1e5329b 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -342,6 +342,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 DROP SUBSCRIPTION regress_testsub;
 
 RESET SESSION AUTHORIZATION;
+
+-- fail - cannot enable two_phase and failover together
+CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, two_phase = true, failover = true);
+
 DROP ROLE regress_subscription_user;
 DROP ROLE regress_subscription_user2;
 DROP ROLE regress_subscription_user3;
-- 
2.34.1

v19-0002-Don-t-dump-failover-true-with-two_phase-in-CREAT.patchapplication/x-patch; name=v19-0002-Don-t-dump-failover-true-with-two_phase-in-CREAT.patchDownload
From b0e26e90d99ebfb2602a7417c2c221df84767d57 Mon Sep 17 00:00:00 2001
From: Nisha Moond <nisha.moond412@gmail.com>
Date: Tue, 3 Jun 2025 10:48:00 +0530
Subject: [PATCH v19 2/2] Don't dump failover=true with two_phase in CREATE
 SUBSCRIPTION

When dumping logical replication subscriptions where both two_phase and
failover are enabled, generate the CREATE SUBSCRIPTION commands with
only two_phase = true option, omitting the failover=true option.

That way, the dump can be restored without encountering the error caused
by enabling both failover and two_phase simultaneously during CREATE
SUBSCRIPTION. After the restore, users can enable failover using ALTER
SUBSCRIPTION.
---
 doc/src/sgml/ref/pg_dump.sgml | 13 +++++++++++++
 src/bin/pg_dump/pg_dump.c     | 12 +++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index cfc74ca6d69..6c133c5e12a 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1633,6 +1633,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
    had been originally created with <literal>two_phase = true</literal> option.
   </para>
 
+  <para>
+   When dumping logical replication subscriptions where both
+   <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+   and <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+   are enabled, <application>pg_dump</application> will generate <command>CREATE
+   SUBSCRIPTION</command> commands with only <literal>two_phase = true</literal>
+   option, omitting the <literal>failover=true</literal> option. That way,
+   the dump can be restored without encountering the error caused by enabling
+   both failover and two_phase simultaneously during <xref linkend="sql-createsubscription"/>.
+   After the restore, users can enable <literal>failover</literal> using
+   <xref linkend="sql-altersubscription"/>.
+  </para>
+
   <para>
    It is generally recommended to use the <option>-X</option>
    (<option>--no-psqlrc</option>) option when restoring a database from a
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 9ed1a856fa3..cb5f4b0812b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5129,6 +5129,7 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	int			npubnames = 0;
 	int			i;
 	char		two_phase_disabled[] = {LOGICALREP_TWOPHASE_STATE_DISABLED, '\0'};
+	bool		set_failover = true;
 
 	/* Do nothing in data-only dump */
 	if (dopt->dataOnly)
@@ -5174,8 +5175,17 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 		appendPQExpBufferStr(query, ", streaming = parallel");
 
 	if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
+	{
 		appendPQExpBufferStr(query, ", two_phase = on");
 
+		/*
+		 * The failover option cannot be set together with two_phase during
+		 * CREATE SUBSCRIPTION. It can be enabled later using ALTER
+		 * SUBSCRIPTION after the subscription is restored.
+		 */
+		set_failover = false;
+	}
+
 	if (strcmp(subinfo->subdisableonerr, "t") == 0)
 		appendPQExpBufferStr(query, ", disable_on_error = true");
 
@@ -5185,7 +5195,7 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (strcmp(subinfo->subrunasowner, "t") == 0)
 		appendPQExpBufferStr(query, ", run_as_owner = true");
 
-	if (strcmp(subinfo->subfailover, "t") == 0)
+	if (set_failover && strcmp(subinfo->subfailover, "t") == 0)
 		appendPQExpBufferStr(query, ", failover = true");
 
 	if (strcmp(subinfo->subsynccommit, "off") != 0)
-- 
2.34.1

#98Nisha Moond
nisha.moond412@gmail.com
In reply to: Peter Smith (#95)
Re: Fix slot synchronization with two_phase decoding enabled

On Thu, Jun 12, 2025 at 4:00 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, Jun 11, 2025 at 8:16 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Jun 6, 2025 at 5:07 PM Nisha Moond <nisha.moond412@gmail.com> wrote:

Attached v18 patch.
- patch-001: modified error messages as suggested above.
- patch-002: improved pg_dump docs as per Shveta's off-list suggestions.

[1] /messages/by-id/CAA4eK1+B067G8mUJzKUEjc5KSkYq6z0utTaHey-qeRt+nZTNJg@mail.gmail.com

--
Thanks,
Nisha

Hi Nisha,

Some comments.

Patch 0001
1. Commit message

from "two_phase enables slots." to "two_phase enabled slots"

2.
+
+     <para>
+      When a subscription's
+      <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
+      is in the pending state, setting <literal>failover</literal> to
true is not
+      permitted. Once

The term "subscription's two_phase" sounds a bit vague here.
Referring https://www.postgresql.org/docs/17/sql-createsubscription.html
and https://www.postgresql.org/docs/current/catalog-pg-subscription.html
two_phase is referred to both as the "internal state" as well as the
"mode" of the subscription.

Probably rephrase it to - "When a subscription's internal two_phase
mode is in the pending state, setting failover to true is not
permitted. See column subtwophasestate of pg_subscription to know the
actual internal state of the two-phase mode" (with appropriate tags
and reference links)

There seems a potential muddling of terms two_phase VS two-phase.

AFAIK "two_phase" (with the underscore) is the name of the
subscription option; it might also be the name of an internal
structure member or catalog column name. Furthermore, when you are
referring to the option name I think it ought to be double-quoted for
clarity.

But, unless you are referring specifically to the option/field/column
then IMO it should say two-phase (with the dash).
e.g. The "two_phase" option enables two-phase mode.

I agree with your suggestion to use two-phase when referring to the
internal state of the subscription.
I've updated the commit message and the respective doc (with reference
link) accordingly in v19 patches.

--
Thanks,
Nisha

#99Masahiko Sawada
sawada.mshk@gmail.com
In reply to: shveta malik (#96)
Re: Fix slot synchronization with two_phase decoding enabled

On Wed, Jun 11, 2025 at 9:50 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Jun 11, 2025 at 11:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

BTW have we addressed the point Amit mentioned before[1]?

The one more combination to consider is when someone takes a dump of
an older version and loads it into a newer version. For example, where
users dump from 17.5 and then restore in a newer version, say 17.6
(which has our fix), the restore will fail due to newer restrictions
added by this patch. Do we need to do anything about it?

I think it could be a significant side-effect and we need to do
something about that.

After giving it more thought, we have an opinion that this
side-effect/issue is unlikely to occur if users follow our
documentation properly.

The recommended approach for upgrading between minor versions is to
shut down the server and replace the binaries. See 'To update between
compatible versions' in [1].

Also it is recommended in docs that we use pg_dump from the newer
version of PostgreSQL. See 'It is recommended that you use the
pg_dump' in [2]. This particular recommendation is in the Upgrade doc.
If needed, we can make a similar recommendation in any of our failover
specific docs as well, mentioning this particular case.

In brief, our overall understanding is that a) pg_dump is mainly used
for major versions upgrade b) pg_dump of higher version is used.
Please let us know if your understanding is different here.

I agree that the main use case of pg_dump is major version upgrading
but it's not limited to that use case. There might be some users who
have taken backups using pg_dump for truly backup purposes. I'm not
sure we've had such compatibility breakage in a minor version release.

Beyond these steps, we could not find any better solution for the
pointed case. But we are open to exploring and implementing any
alternative solutions you may have. Feedback is most welcome here.

I'll share alternative ideas if I come up.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#100shveta malik
shveta.malik@gmail.com
In reply to: Masahiko Sawada (#99)
Re: Fix slot synchronization with two_phase decoding enabled

On Tue, Jun 17, 2025 at 5:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jun 11, 2025 at 9:50 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Jun 11, 2025 at 11:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

BTW have we addressed the point Amit mentioned before[1]?

The one more combination to consider is when someone takes a dump of
an older version and loads it into a newer version. For example, where
users dump from 17.5 and then restore in a newer version, say 17.6
(which has our fix), the restore will fail due to newer restrictions
added by this patch. Do we need to do anything about it?

I think it could be a significant side-effect and we need to do
something about that.

After giving it more thought, we have an opinion that this
side-effect/issue is unlikely to occur if users follow our
documentation properly.

The recommended approach for upgrading between minor versions is to
shut down the server and replace the binaries. See 'To update between
compatible versions' in [1].

Also it is recommended in docs that we use pg_dump from the newer
version of PostgreSQL. See 'It is recommended that you use the
pg_dump' in [2]. This particular recommendation is in the Upgrade doc.
If needed, we can make a similar recommendation in any of our failover
specific docs as well, mentioning this particular case.

In brief, our overall understanding is that a) pg_dump is mainly used
for major versions upgrade b) pg_dump of higher version is used.
Please let us know if your understanding is different here.

I agree that the main use case of pg_dump is major version upgrading
but it's not limited to that use case. There might be some users who
have taken backups using pg_dump for truly backup purposes. I'm not
sure we've had such compatibility breakage in a minor version release.

Beyond these steps, we could not find any better solution for the
pointed case. But we are open to exploring and implementing any
alternative solutions you may have. Feedback is most welcome here.

I'll share alternative ideas if I come up.

There was another idea proposed in [1]/messages/by-id/OS0PR01MB57161D9BB5409F229564957994AD2@OS0PR01MB5716.jpnprd01.prod.outlook.com in the beginning, quoting it here:

-------------
Another idea considered is to prevent the slot that enables two-phase
decoding from being synced to standby. IOW, this means displaying the
failover field as false in the view, if there is any possibility that
transactions prepared before the two_phase_at position exist (e.g., if
restart_lsn is less than two_phase_at).
-------------

I believe this approach has lesser limitations and maintains
compatibility with the existing pg_dump workflow.

The proposal is to show failover as false in pg_replication_slots when
restart_lsn is earlier than two_phase_at. This scenario can only occur
during the activation of two_phase. In PG17, since enabling two_phase
via ALTER SUBSCRIPTION is not permitted, this situation effectively
happens only at the creation of the subscription. At that point, if
slot synchronization is attempted, it will not synchronize the slot.
Synchronization (and creation of the new slot) will begin only once
restart_lsn surpasses two_phase_at.

The only limitation I can identify here is if a failover occurs during
that brief window, the standby might not yet have the slot created.
However, I believe the likelihood of this happening is quite low.
Overall this looks a safer approach.

Please let me know your thoughts on this.

[1]: /messages/by-id/OS0PR01MB57161D9BB5409F229564957994AD2@OS0PR01MB5716.jpnprd01.prod.outlook.com

thanks
Shveta