pg_copy_logical_replication_slot doesn't copy the failover property

Started by PG Doc comments form11 months ago9 messages
#1PG Doc comments form
noreply@postgresql.org

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/functions-admin.html
Description:

The documentation for pg_copy_logical_replication_slot doesn't mention that
the failover property for the logical slot is not copied.

I assumed there was a good reason for this, and I found a comment in the
source code that explains it (although I don't really understand).
It says
* To avoid potential issues with the slot synchronization where the
* restart_lsn of a replication slot can go backward, we set the
* failover option to false here. This situation occurs when a slot
* on the primary server is dropped and immediately replaced with a
* new slot of the same name, created by copying from another existing
* slot. However, the slot synchronization will only observe the
* restart_lsn of the same slot going backward.

I assumed that by default, all properties from the original slot would be
copied, so this function left me wondering why my logical replication slots
were not being synced to the replica.

#2Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: PG Doc comments form (#1)
1 attachment(s)
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Tue, 18 Feb 2025 at 19:49, PG Doc comments form
<noreply@postgresql.org> wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/functions-admin.html
Description:

The documentation for pg_copy_logical_replication_slot doesn't mention that
the failover property for the logical slot is not copied.

I assumed there was a good reason for this, and I found a comment in the
source code that explains it (although I don't really understand).
It says
* To avoid potential issues with the slot synchronization where the
* restart_lsn of a replication slot can go backward, we set the
* failover option to false here. This situation occurs when a slot
* on the primary server is dropped and immediately replaced with a
* new slot of the same name, created by copying from another existing
* slot. However, the slot synchronization will only observe the
* restart_lsn of the same slot going backward.

I assumed that by default, all properties from the original slot would be
copied, so this function left me wondering why my logical replication slots
were not being synced to the replica.

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1]/messages/by-id/CAD21AoCoX+jhy_i3v+T2s78NG_0HH1oXOUiTOWhDdxVPBtDHKA@mail.gmail.com.

I have created a patch to update the documentation for the same.

[1]: /messages/by-id/CAD21AoCoX+jhy_i3v+T2s78NG_0HH1oXOUiTOWhDdxVPBtDHKA@mail.gmail.com

Thanks and Regards,
Shlok Kyal

Attachments:

v1-0001-Improve-documentation-for-pg_copy_logical_replica.patchapplication/octet-stream; name=v1-0001-Improve-documentation-for-pg_copy_logical_replica.patchDownload
From 81aa5022c09b654c4fc24e24ad59c4012c1ddc19 Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 19 Feb 2025 12:29:04 +0530
Subject: [PATCH v1] Improve documentation for pg_copy_logical_replication_slot

By default the failover option is not copied when
pg_copy_logical_replication_slot function is used and it is set to false
by default. This patch document the same.
---
 doc/src/sgml/func.sgml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2548a57df31..b6fa211af8e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29369,6 +29369,9 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         from the same <acronym>LSN</acronym> as the source logical slot.  Both
         <parameter>temporary</parameter> and <parameter>plugin</parameter> are
         optional; if they are omitted, the values of the source slot are used.
+        The <parameter>failover</parameter> option of source logical slot is
+        not copied and set as <literal>false</literal> by default due to
+        potential issues with the slot synchronization.
        </para></entry>
       </row>
 
-- 
2.34.1

#3Nisha Moond
nisha.moond412@gmail.com
In reply to: Shlok Kyal (#2)
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Wed, Feb 19, 2025 at 12:40 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1].

I have created a patch to update the documentation for the same.

Thanks for the patch.
A couple of comments regarding the sentence:

+        not copied and set as <literal>false</literal> by default due to
+        potential issues with the slot synchronization.

1) / and set as/ and is set to
"set to" is more natural and clearer when assigning a specific value
compared to "set as."

2) "due to potential issues..." wording implies that it is addressing
existing issues. I feel it would be better to say:
"... <literal>false</literal> by default to avoid potential issues
with the slot synchronization."

--
Thanks,
Nisha

#4Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Nisha Moond (#3)
1 attachment(s)
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Thu, 20 Feb 2025 at 09:39, Nisha Moond <nisha.moond412@gmail.com> wrote:

On Wed, Feb 19, 2025 at 12:40 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1].

I have created a patch to update the documentation for the same.

Thanks for the patch.
A couple of comments regarding the sentence:

+        not copied and set as <literal>false</literal> by default due to
+        potential issues with the slot synchronization.

1) / and set as/ and is set to
"set to" is more natural and clearer when assigning a specific value
compared to "set as."

2) "due to potential issues..." wording implies that it is addressing
existing issues. I feel it would be better to say:
"... <literal>false</literal> by default to avoid potential issues
with the slot synchronization."

Hi Nisha,

I have addressed the comments and attached the updated v2 patch.

Thanks and Regards,
Shlok Kyal

Attachments:

v2-0001-Improve-documentation-for-pg_copy_logical_replica.patchapplication/octet-stream; name=v2-0001-Improve-documentation-for-pg_copy_logical_replica.patchDownload
From c9aeeea90d550cacfd04bfdfb7220de90933798c Mon Sep 17 00:00:00 2001
From: Shlok Kyal <shlok.kyal.oss@gmail.com>
Date: Wed, 19 Feb 2025 12:29:04 +0530
Subject: [PATCH v2] Improve documentation for pg_copy_logical_replication_slot

By default the failover option is not copied when
pg_copy_logical_replication_slot function is used and it is set to false
by default. This patch document the same.
---
 doc/src/sgml/func.sgml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df32ee0bf5b..f812e7c69be 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29369,6 +29369,9 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         from the same <acronym>LSN</acronym> as the source logical slot.  Both
         <parameter>temporary</parameter> and <parameter>plugin</parameter> are
         optional; if they are omitted, the values of the source slot are used.
+        The <parameter>failover</parameter> option of source logical slot is
+        not copied and is set to <literal>false</literal> by default to avoid
+        potential issues with the slot synchronization.
        </para></entry>
       </row>
 
-- 
2.34.1

#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Shlok Kyal (#4)
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Wed, Feb 19, 2025 at 8:25 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

On Thu, 20 Feb 2025 at 09:39, Nisha Moond <nisha.moond412@gmail.com> wrote:

On Wed, Feb 19, 2025 at 12:40 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1].

I have created a patch to update the documentation for the same.

Thanks for the patch.
A couple of comments regarding the sentence:

+        not copied and set as <literal>false</literal> by default due to
+        potential issues with the slot synchronization.

1) / and set as/ and is set to
"set to" is more natural and clearer when assigning a specific value
compared to "set as."

2) "due to potential issues..." wording implies that it is addressing
existing issues. I feel it would be better to say:
"... <literal>false</literal> by default to avoid potential issues
with the slot synchronization."

Hi Nisha,

I have addressed the comments and attached the updated v2 patch.

IIUC the two_phase field is also not copied from the source slot. I
think we can clarify in the doc that these two fields are not copied.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#6Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Masahiko Sawada (#5)
Re: pg_copy_logical_replication_slot doesn't copy the failover property

Adding Amit to the thread.

Show quoted text

On Thu, 20 Feb 2025 at 23:19, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Feb 19, 2025 at 8:25 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

On Thu, 20 Feb 2025 at 09:39, Nisha Moond <nisha.moond412@gmail.com> wrote:

On Wed, Feb 19, 2025 at 12:40 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1].

I have created a patch to update the documentation for the same.

Thanks for the patch.
A couple of comments regarding the sentence:

+        not copied and set as <literal>false</literal> by default due to
+        potential issues with the slot synchronization.

1) / and set as/ and is set to
"set to" is more natural and clearer when assigning a specific value
compared to "set as."

2) "due to potential issues..." wording implies that it is addressing
existing issues. I feel it would be better to say:
"... <literal>false</literal> by default to avoid potential issues
with the slot synchronization."

Hi Nisha,

I have addressed the comments and attached the updated v2 patch.

IIUC the two_phase field is also not copied from the source slot. I
think we can clarify in the doc that these two fields are not copied.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Shlok Kyal (#6)
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Sat, Feb 22, 2025 at 11:26 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Adding Amit to the thread.

Thanks.

Looking at the original complaint:

It says
* To avoid potential issues with the slot synchronization where the
* restart_lsn of a replication slot can go backward, we set the
* failover option to false here. This situation occurs when a slot
* on the primary server is dropped and immediately replaced with a
* new slot of the same name, created by copying from another existing
* slot. However, the slot synchronization will only observe the
* restart_lsn of the same slot going backward.

It would be better to update the comments as well to make the
potential issues with slot synchronization clear or mention the
reference of other place where we have comments related to this race
condition. Also, I think it is better to write about two_phase in the
comments as well as in docs unless already mentioned. If you agree
with updating the comments as well, shall we redirect this to hackers?

IIUC the two_phase field is also not copied from the source slot. I
think we can clarify in the doc that these two fields are not copied.

+1.

--
With Regards,
Amit Kapila.

#8Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#7)
2 attachment(s)
RE: pg_copy_logical_replication_slot doesn't copy the failover property

On Monday, February 24, 2025 10:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

(The previous email was blocked, here is another attempt)

On Sat, Feb 22, 2025 at 11:26 AM Shlok Kyal <shlok.kyal.oss@gmail.com>
wrote:

Adding Amit to the thread.

Thanks.

Looking at the original complaint:

It says
* To avoid potential issues with the slot synchronization

where the

* restart_lsn of a replication slot can go backward, we set

the

* failover option to false here. This situation occurs when

a slot

* on the primary server is dropped and immediately

replaced with a

* new slot of the same name, created by copying from

another existing

* slot. However, the slot synchronization will only observe

the

* restart_lsn of the same slot going backward.

It would be better to update the comments as well to make the potential issues
with slot synchronization clear or mention the reference of other place where
we have comments related to this race condition. Also, I think it is better to
write about two_phase in the comments as well as in docs unless already
mentioned. If you agree with updating the comments as well, shall we redirect
this to hackers?

(Redirect to -hackers)

IIUC the two_phase field is also not copied from the source slot. I
think we can clarify in the doc that these two fields are not copied.

+1.

Here is the new V3 patch set which updated the comments to make the
issue clearer.

After thinking more on the two_phase option, I didn't find an issue that prevent
us from copying its option value. So, it’s more intuitive to me to just
copy its value instead of adding doc for it. The 0002 includes the same and I will
keep testing to ensure there are no other issues missed.

Best Regards,
Hou zj

Attachments:

v3-0001-Improve-the-documentation-for-pg_copy_logical_rep.patchapplication/octet-stream; name=v3-0001-Improve-the-documentation-for-pg_copy_logical_rep.patchDownload
From 94d10dabc00b13624909b0b658ef9201118a3b52 Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Mon, 24 Feb 2025 15:36:11 +0800
Subject: [PATCH v3 1/2] Improve the documentation for
 pg_copy_logical_replication_slot

This commit documents that the failover option is not copied when using the
pg_copy_logical_replication_slot function, and it defaults to false.
Additionally, the comments within the function is improved to provide a clear
explanation of the reason behind this behavior.
---
 doc/src/sgml/func.sgml              |  3 +++
 src/backend/replication/slotfuncs.c | 19 ++++++++++++-------
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9f60a476eb..a29c9500ba 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29374,6 +29374,9 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         from the same <acronym>LSN</acronym> as the source logical slot.  Both
         <parameter>temporary</parameter> and <parameter>plugin</parameter> are
         optional; if they are omitted, the values of the source slot are used.
+        The <parameter>failover</parameter> option of source logical slot is
+        not copied and is set to <literal>false</literal> by default to avoid
+        potential issues with the slot synchronization.
        </para></entry>
       </row>
 
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index f652ec8a73..8222e5a109 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -695,13 +695,18 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		 * hence pass find_startpoint false.  confirmed_flush will be set
 		 * below, by copying from the source slot.
 		 *
-		 * To avoid potential issues with the slot synchronization where the
-		 * restart_lsn of a replication slot can go backward, we set the
-		 * failover option to false here.  This situation occurs when a slot
-		 * on the primary server is dropped and immediately replaced with a
-		 * new slot of the same name, created by copying from another existing
-		 * slot.  However, the slot synchronization will only observe the
-		 * restart_lsn of the same slot going backward.
+		 * Copying the failover option is not allowed to prevent potential
+		 * issues with slot synchronization. For instance, if a slot was
+		 * synchronized to the standby, then dropped on the primary, and
+		 * immediately recreated by copying from another existing slot with
+		 * much earlier restart_lsn and confirmed_flush_lsn, the slot
+		 * synchronization would only observe the LSN of the same slot moving
+		 * backward. As slot synchronization does not copy the restart_lsn and
+		 * confirmed_flush_lsn backward (see update_local_synced_slot() for
+		 * details), if a failover happens before the primary's slot catches
+		 * up, logical replication cannot continue using the synchronized slot
+		 * on the promoted standby because the slot retains the restart_lsn and
+		 * confirmed_flush_lsn that are much later than expected.
 		 */
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
-- 
2.30.0.windows.2

v3-0002-Copy-the-two_phase-option-in-pg_copy_logical_repl.patchapplication/octet-stream; name=v3-0002-Copy-the-two_phase-option-in-pg_copy_logical_repl.patchDownload
From 4dac078a5e6069b29e70302c8d2a31ba899f14cc Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Mon, 24 Feb 2025 17:24:41 +0800
Subject: [PATCH v3 2/2] Copy the two_phase option in
 pg_copy_logical_replication_slot

Previously, the two_phase option was not copied when calling
pg_copy_logical_replication_slot(). However, since it is feasible and more
intuitive to copy this option rather than documenting its exclusion, this patch
implements the copying of the two_phase option.
---
 contrib/test_decoding/expected/slot.out | 26 ++++++++++++-------------
 contrib/test_decoding/sql/slot.sql      |  6 +++---
 src/backend/replication/slotfuncs.c     | 11 +++++++++++
 3 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6..c590901181 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -176,7 +176,7 @@ SELECT pg_drop_replication_slot('regression_slot3');
 -- Test copy functions for logical replication slots
 --
 -- Create and copy logical slots
-SELECT 'init' FROM pg_create_logical_replication_slot('orig_slot1', 'test_decoding', false);
+SELECT 'init' FROM pg_create_logical_replication_slot('orig_slot1', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -202,18 +202,18 @@ SELECT 'copy' FROM pg_copy_logical_replication_slot('orig_slot1', 'copied_slot1_
 
 -- Check all copied slots status
 SELECT
-    o.slot_name, o.plugin, o.temporary, c.slot_name, c.plugin, c.temporary
+    o.slot_name, o.plugin, o.temporary, c.slot_name, c.plugin, c.temporary, c.two_phase
 FROM
     (SELECT * FROM pg_replication_slots WHERE slot_name LIKE 'orig%') as o
     LEFT JOIN pg_replication_slots as c ON o.restart_lsn = c.restart_lsn  AND o.confirmed_flush_lsn = c.confirmed_flush_lsn
 WHERE
     o.slot_name != c.slot_name
 ORDER BY o.slot_name, c.slot_name;
- slot_name  |    plugin     | temporary |            slot_name            |    plugin     | temporary 
-------------+---------------+-----------+---------------------------------+---------------+-----------
- orig_slot1 | test_decoding | f         | copied_slot1_change_plugin      | pgoutput      | f
- orig_slot1 | test_decoding | f         | copied_slot1_change_plugin_temp | pgoutput      | t
- orig_slot1 | test_decoding | f         | copied_slot1_no_change          | test_decoding | f
+ slot_name  |    plugin     | temporary |            slot_name            |    plugin     | temporary | two_phase 
+------------+---------------+-----------+---------------------------------+---------------+-----------+-----------
+ orig_slot1 | test_decoding | f         | copied_slot1_change_plugin      | pgoutput      | f         | t
+ orig_slot1 | test_decoding | f         | copied_slot1_change_plugin_temp | pgoutput      | t         | t
+ orig_slot1 | test_decoding | f         | copied_slot1_no_change          | test_decoding | f         | t
 (3 rows)
 
 -- Now we have maximum 4 replication slots. Check slots are properly
@@ -267,18 +267,18 @@ SELECT 'copy' FROM pg_copy_logical_replication_slot('orig_slot2', 'copied_slot2_
 
 -- Check all copied slots status
 SELECT
-    o.slot_name, o.plugin, o.temporary, c.slot_name, c.plugin, c.temporary
+    o.slot_name, o.plugin, o.temporary, c.slot_name, c.plugin, c.temporary, c.two_phase
 FROM
     (SELECT * FROM pg_replication_slots WHERE slot_name LIKE 'orig%') as o
     LEFT JOIN pg_replication_slots as c ON o.restart_lsn = c.restart_lsn  AND o.confirmed_flush_lsn = c.confirmed_flush_lsn
 WHERE
     o.slot_name != c.slot_name
 ORDER BY o.slot_name, c.slot_name;
- slot_name  |    plugin     | temporary |            slot_name            |    plugin     | temporary 
-------------+---------------+-----------+---------------------------------+---------------+-----------
- orig_slot2 | test_decoding | t         | copied_slot2_change_plugin      | pgoutput      | t
- orig_slot2 | test_decoding | t         | copied_slot2_change_plugin_temp | pgoutput      | f
- orig_slot2 | test_decoding | t         | copied_slot2_no_change          | test_decoding | t
+ slot_name  |    plugin     | temporary |            slot_name            |    plugin     | temporary | two_phase 
+------------+---------------+-----------+---------------------------------+---------------+-----------+-----------
+ orig_slot2 | test_decoding | t         | copied_slot2_change_plugin      | pgoutput      | t         | f
+ orig_slot2 | test_decoding | t         | copied_slot2_change_plugin_temp | pgoutput      | f         | f
+ orig_slot2 | test_decoding | t         | copied_slot2_no_change          | test_decoding | t         | f
 (3 rows)
 
 -- Cannot copy a logical slot to a physical slot
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3be..70356e7a56 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -88,14 +88,14 @@ SELECT pg_drop_replication_slot('regression_slot3');
 --
 
 -- Create and copy logical slots
-SELECT 'init' FROM pg_create_logical_replication_slot('orig_slot1', 'test_decoding', false);
+SELECT 'init' FROM pg_create_logical_replication_slot('orig_slot1', 'test_decoding', false, true);
 SELECT 'copy' FROM pg_copy_logical_replication_slot('orig_slot1', 'copied_slot1_no_change');
 SELECT 'copy' FROM pg_copy_logical_replication_slot('orig_slot1', 'copied_slot1_change_plugin', false, 'pgoutput');
 SELECT 'copy' FROM pg_copy_logical_replication_slot('orig_slot1', 'copied_slot1_change_plugin_temp', true, 'pgoutput');
 
 -- Check all copied slots status
 SELECT
-    o.slot_name, o.plugin, o.temporary, c.slot_name, c.plugin, c.temporary
+    o.slot_name, o.plugin, o.temporary, c.slot_name, c.plugin, c.temporary, c.two_phase
 FROM
     (SELECT * FROM pg_replication_slots WHERE slot_name LIKE 'orig%') as o
     LEFT JOIN pg_replication_slots as c ON o.restart_lsn = c.restart_lsn  AND o.confirmed_flush_lsn = c.confirmed_flush_lsn
@@ -120,7 +120,7 @@ SELECT 'copy' FROM pg_copy_logical_replication_slot('orig_slot2', 'copied_slot2_
 
 -- Check all copied slots status
 SELECT
-    o.slot_name, o.plugin, o.temporary, c.slot_name, c.plugin, c.temporary
+    o.slot_name, o.plugin, o.temporary, c.slot_name, c.plugin, c.temporary, c.two_phase
 FROM
     (SELECT * FROM pg_replication_slots WHERE slot_name LIKE 'orig%') as o
     LEFT JOIN pg_replication_slots as c ON o.restart_lsn = c.restart_lsn  AND o.confirmed_flush_lsn = c.confirmed_flush_lsn
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 8222e5a109..acca37d333 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -707,6 +707,11 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		 * up, logical replication cannot continue using the synchronized slot
 		 * on the promoted standby because the slot retains the restart_lsn and
 		 * confirmed_flush_lsn that are much later than expected.
+		 *
+		 * The two_phase option is not copied initially because the source
+		 * option value could change concurrently anyway. Instead, it will be
+		 * copied after the slot creation along with other properties like
+		 * confirmed_flush.
 		 */
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
@@ -733,6 +738,8 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		TransactionId copy_catalog_xmin;
 		XLogRecPtr	copy_restart_lsn;
 		XLogRecPtr	copy_confirmed_flush;
+		XLogRecPtr	copy_two_phase_at;
+		bool		copy_two_phase;
 		bool		copy_islogical;
 		char	   *copy_name;
 
@@ -748,6 +755,8 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		copy_catalog_xmin = second_slot_contents.data.catalog_xmin;
 		copy_restart_lsn = second_slot_contents.data.restart_lsn;
 		copy_confirmed_flush = second_slot_contents.data.confirmed_flush;
+		copy_two_phase_at = second_slot_contents.data.two_phase_at;
+		copy_two_phase = second_slot_contents.data.two_phase;
 
 		/* for existence check */
 		copy_name = NameStr(second_slot_contents.data.name);
@@ -788,6 +797,8 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		MyReplicationSlot->data.catalog_xmin = copy_catalog_xmin;
 		MyReplicationSlot->data.restart_lsn = copy_restart_lsn;
 		MyReplicationSlot->data.confirmed_flush = copy_confirmed_flush;
+		MyReplicationSlot->data.two_phase_at = copy_two_phase_at;
+		MyReplicationSlot->data.two_phase = copy_two_phase;
 		SpinLockRelease(&MyReplicationSlot->mutex);
 
 		ReplicationSlotMarkDirty();
-- 
2.30.0.windows.2

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#8)
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Mon, Feb 24, 2025 at 4:29 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Here is the new V3 patch set which updated the comments to make the
issue clearer.

I pushed your first patch after minor modifications.

After thinking more on the two_phase option, I didn't find an issue that prevent
us from copying its option value.

Please see the comments atop ReplicationSlotCreate() and study the
commit 19890a064ebf53dedcefed0d8339ed3d449b06e6 to understand the
reasons as to why we initially enabled only at create time. It is
possible that the risk mentioned in the commit doesn't hold true for
copy_slot functionality but can you please analyze the same and let us
know your thoughts on the same?

So, it’s more intuitive to me to just
copy its value instead of adding doc for it. The 0002 includes the same and I will
keep testing to ensure there are no other issues missed.

It is better to start a separate thread to discuss copying two_phase
property. Even, if it ends with just some comments and doc change, it
deserves an independent discussion.

--
With Regards,
Amit Kapila.