pg_copy_logical_replication_slot doesn't copy the failover property

Started by PG Bug reporting formabout 1 year ago9 messageshackersdocs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org
hackersdocs

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/functions-admin.html
Description:

The documentation for pg_copy_logical_replication_slot doesn't mention that
the failover property for the logical slot is not copied.

I assumed there was a good reason for this, and I found a comment in the
source code that explains it (although I don't really understand).
It says
* To avoid potential issues with the slot synchronization where the
* restart_lsn of a replication slot can go backward, we set the
* failover option to false here. This situation occurs when a slot
* on the primary server is dropped and immediately replaced with a
* new slot of the same name, created by copying from another existing
* slot. However, the slot synchronization will only observe the
* restart_lsn of the same slot going backward.

I assumed that by default, all properties from the original slot would be
copied, so this function left me wondering why my logical replication slots
were not being synced to the replica.

#2Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: PG Bug reporting form (#1)
hackersdocs
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Tue, 18 Feb 2025 at 19:49, PG Doc comments form
<noreply@postgresql.org> wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/functions-admin.html
Description:

The documentation for pg_copy_logical_replication_slot doesn't mention that
the failover property for the logical slot is not copied.

I assumed there was a good reason for this, and I found a comment in the
source code that explains it (although I don't really understand).
It says
* To avoid potential issues with the slot synchronization where the
* restart_lsn of a replication slot can go backward, we set the
* failover option to false here. This situation occurs when a slot
* on the primary server is dropped and immediately replaced with a
* new slot of the same name, created by copying from another existing
* slot. However, the slot synchronization will only observe the
* restart_lsn of the same slot going backward.

I assumed that by default, all properties from the original slot would be
copied, so this function left me wondering why my logical replication slots
were not being synced to the replica.

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1]/messages/by-id/CAD21AoCoX+jhy_i3v+T2s78NG_0HH1oXOUiTOWhDdxVPBtDHKA@mail.gmail.com.

I have created a patch to update the documentation for the same.

[1]: /messages/by-id/CAD21AoCoX+jhy_i3v+T2s78NG_0HH1oXOUiTOWhDdxVPBtDHKA@mail.gmail.com

Thanks and Regards,
Shlok Kyal

Attachments:

v1-0001-Improve-documentation-for-pg_copy_logical_replica.patchapplication/octet-stream; name=v1-0001-Improve-documentation-for-pg_copy_logical_replica.patchDownload+3-1
#3Nisha Moond
nisha.moond412@gmail.com
In reply to: Shlok Kyal (#2)
hackersdocs
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Wed, Feb 19, 2025 at 12:40 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1].

I have created a patch to update the documentation for the same.

Thanks for the patch.
A couple of comments regarding the sentence:

+        not copied and set as <literal>false</literal> by default due to
+        potential issues with the slot synchronization.

1) / and set as/ and is set to
"set to" is more natural and clearer when assigning a specific value
compared to "set as."

2) "due to potential issues..." wording implies that it is addressing
existing issues. I feel it would be better to say:
"... <literal>false</literal> by default to avoid potential issues
with the slot synchronization."

--
Thanks,
Nisha

#4Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Nisha Moond (#3)
hackersdocs
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Thu, 20 Feb 2025 at 09:39, Nisha Moond <nisha.moond412@gmail.com> wrote:

On Wed, Feb 19, 2025 at 12:40 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1].

I have created a patch to update the documentation for the same.

Thanks for the patch.
A couple of comments regarding the sentence:

+        not copied and set as <literal>false</literal> by default due to
+        potential issues with the slot synchronization.

1) / and set as/ and is set to
"set to" is more natural and clearer when assigning a specific value
compared to "set as."

2) "due to potential issues..." wording implies that it is addressing
existing issues. I feel it would be better to say:
"... <literal>false</literal> by default to avoid potential issues
with the slot synchronization."

Hi Nisha,

I have addressed the comments and attached the updated v2 patch.

Thanks and Regards,
Shlok Kyal

Attachments:

v2-0001-Improve-documentation-for-pg_copy_logical_replica.patchapplication/octet-stream; name=v2-0001-Improve-documentation-for-pg_copy_logical_replica.patchDownload+3-1
#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Shlok Kyal (#4)
hackersdocs
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Wed, Feb 19, 2025 at 8:25 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

On Thu, 20 Feb 2025 at 09:39, Nisha Moond <nisha.moond412@gmail.com> wrote:

On Wed, Feb 19, 2025 at 12:40 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1].

I have created a patch to update the documentation for the same.

Thanks for the patch.
A couple of comments regarding the sentence:

+        not copied and set as <literal>false</literal> by default due to
+        potential issues with the slot synchronization.

1) / and set as/ and is set to
"set to" is more natural and clearer when assigning a specific value
compared to "set as."

2) "due to potential issues..." wording implies that it is addressing
existing issues. I feel it would be better to say:
"... <literal>false</literal> by default to avoid potential issues
with the slot synchronization."

Hi Nisha,

I have addressed the comments and attached the updated v2 patch.

IIUC the two_phase field is also not copied from the source slot. I
think we can clarify in the doc that these two fields are not copied.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#6Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Masahiko Sawada (#5)
hackersdocs
Re: pg_copy_logical_replication_slot doesn't copy the failover property

Adding Amit to the thread.

Show quoted text

On Thu, 20 Feb 2025 at 23:19, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Feb 19, 2025 at 8:25 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

On Thu, 20 Feb 2025 at 09:39, Nisha Moond <nisha.moond412@gmail.com> wrote:

On Wed, Feb 19, 2025 at 12:40 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi,

The failover option is set to false by default while copying of the
slots as it may cause some issue during slot synchronization as per
discussion [1].

I have created a patch to update the documentation for the same.

Thanks for the patch.
A couple of comments regarding the sentence:

+        not copied and set as <literal>false</literal> by default due to
+        potential issues with the slot synchronization.

1) / and set as/ and is set to
"set to" is more natural and clearer when assigning a specific value
compared to "set as."

2) "due to potential issues..." wording implies that it is addressing
existing issues. I feel it would be better to say:
"... <literal>false</literal> by default to avoid potential issues
with the slot synchronization."

Hi Nisha,

I have addressed the comments and attached the updated v2 patch.

IIUC the two_phase field is also not copied from the source slot. I
think we can clarify in the doc that these two fields are not copied.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Shlok Kyal (#6)
hackersdocs
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Sat, Feb 22, 2025 at 11:26 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Adding Amit to the thread.

Thanks.

Looking at the original complaint:

It says
* To avoid potential issues with the slot synchronization where the
* restart_lsn of a replication slot can go backward, we set the
* failover option to false here. This situation occurs when a slot
* on the primary server is dropped and immediately replaced with a
* new slot of the same name, created by copying from another existing
* slot. However, the slot synchronization will only observe the
* restart_lsn of the same slot going backward.

It would be better to update the comments as well to make the
potential issues with slot synchronization clear or mention the
reference of other place where we have comments related to this race
condition. Also, I think it is better to write about two_phase in the
comments as well as in docs unless already mentioned. If you agree
with updating the comments as well, shall we redirect this to hackers?

IIUC the two_phase field is also not copied from the source slot. I
think we can clarify in the doc that these two fields are not copied.

+1.

--
With Regards,
Amit Kapila.

#8Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#7)
hackersdocs
RE: pg_copy_logical_replication_slot doesn't copy the failover property

On Monday, February 24, 2025 10:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

(The previous email was blocked, here is another attempt)

On Sat, Feb 22, 2025 at 11:26 AM Shlok Kyal <shlok.kyal.oss@gmail.com>
wrote:

Adding Amit to the thread.

Thanks.

Looking at the original complaint:

It says
* To avoid potential issues with the slot synchronization

where the

* restart_lsn of a replication slot can go backward, we set

the

* failover option to false here. This situation occurs when

a slot

* on the primary server is dropped and immediately

replaced with a

* new slot of the same name, created by copying from

another existing

* slot. However, the slot synchronization will only observe

the

* restart_lsn of the same slot going backward.

It would be better to update the comments as well to make the potential issues
with slot synchronization clear or mention the reference of other place where
we have comments related to this race condition. Also, I think it is better to
write about two_phase in the comments as well as in docs unless already
mentioned. If you agree with updating the comments as well, shall we redirect
this to hackers?

(Redirect to -hackers)

IIUC the two_phase field is also not copied from the source slot. I
think we can clarify in the doc that these two fields are not copied.

+1.

Here is the new V3 patch set which updated the comments to make the
issue clearer.

After thinking more on the two_phase option, I didn't find an issue that prevent
us from copying its option value. So, it’s more intuitive to me to just
copy its value instead of adding doc for it. The 0002 includes the same and I will
keep testing to ensure there are no other issues missed.

Best Regards,
Hou zj

Attachments:

v3-0001-Improve-the-documentation-for-pg_copy_logical_rep.patchapplication/octet-stream; name=v3-0001-Improve-the-documentation-for-pg_copy_logical_rep.patchDownload+15-8
v3-0002-Copy-the-two_phase-option-in-pg_copy_logical_repl.patchapplication/octet-stream; name=v3-0002-Copy-the-two_phase-option-in-pg_copy_logical_repl.patchDownload+27-15
#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#8)
hackersdocs
Re: pg_copy_logical_replication_slot doesn't copy the failover property

On Mon, Feb 24, 2025 at 4:29 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Here is the new V3 patch set which updated the comments to make the
issue clearer.

I pushed your first patch after minor modifications.

After thinking more on the two_phase option, I didn't find an issue that prevent
us from copying its option value.

Please see the comments atop ReplicationSlotCreate() and study the
commit 19890a064ebf53dedcefed0d8339ed3d449b06e6 to understand the
reasons as to why we initially enabled only at create time. It is
possible that the risk mentioned in the commit doesn't hold true for
copy_slot functionality but can you please analyze the same and let us
know your thoughts on the same?

So, it’s more intuitive to me to just
copy its value instead of adding doc for it. The 0002 includes the same and I will
keep testing to ensure there are no other issues missed.

It is better to start a separate thread to discuss copying two_phase
property. Even, if it ends with just some comments and doc change, it
deserves an independent discussion.

--
With Regards,
Amit Kapila.