Issue with logical replication slot during switchover

[1]: /messages/by-id/CAA5-nLD0vKn6T1-OHROBNfN2Pxa17zVo4UoVBdfHn2y=7nKixA@mail.gmail.com
/messages/by-id/CAA5-nLD0vKn6T1-OHROBNfN2Pxa17zVo4UoVBdfHn2y=7nKixA@mail.gmail.com

shveta.malik@gmail.com

7 months ago

In reply to: Fabrice Chapuis (#1)

Re: Issue with logical replication slot during switchover

On Thu, Aug 7, 2025 at 6:50 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:

Hi,

An issue occurred during the initial switchover using PostgreSQL version 17.5. The setup consists of a cluster with two nodes, managed by Patroni version 4.0.5.
Logical replication is configured on the same instance, and the new feature enabling logical replication slots to be failover-safe in a highly available environment is used. Logical slot management is currently disabled in Patroni.

Following are some screen captured during the swichover

1. Run the switchover with Patroni

patronictl switchover

Current cluster topology

+ Cluster: ClusterX (7529893278186104053) ----+----+-----------+

| Member | Host | Role | State | TL | Lag in MB |

+----------+--------------+---------+-----------+----+-----------+

| node_1 | xxxxxxxxxxxx | Leader | running | 4 | |

| node_2 | xxxxxxxxxxxx | Replica | streaming | 4 | 0 |

+----------+--------------+---------+-----------+----+-----------+

2. Check the slot on the new Primary

select * from pg_replication_slots where slot_type = 'logical';
+-[ RECORD 1 ]--------+----------------+
| slot_name | logical_slot |
| plugin | pgoutput |
| slot_type | logical |
| datoid | 25605 |
| database | db_test |
| temporary | f |
| active | t |
| active_pid | 3841546 |
| xmin | |
| catalog_xmin | 10399 |
| restart_lsn | 0/37002410 |
| confirmed_flush_lsn | 0/37002448 |
| wal_status | reserved |
| safe_wal_size | |
| two_phase | f |
| inactive_since | |
| conflicting | f |
| invalidation_reason | |
| failover | t |
| synced | t |
+---------------------+----------------+
Logical replication is active again after the promote
3. Check the slot on the new standby
select * from pg_replication_slots where slot_type = 'logical';
+-[ RECORD 1 ]--------+-------------------------------+
| slot_name           | logical_slot                  |
| plugin              | pgoutput                      |
| slot_type           | logical                       |
| datoid              | 25605                         |
| database            | db_test                       |
| temporary           | f                             |
| active              | f                             |
| active_pid          |                               |
| xmin                |                               |
| catalog_xmin        | 10397                         |
| restart_lsn         | 0/3638F5F0                    |
| confirmed_flush_lsn | 0/3638F6A0                    |
| wal_status          | reserved                      |
| safe_wal_size       |                               |
| two_phase           | f                             |
| inactive_since      | 2025-08-05 10:21:03.342587+02 |
| conflicting         | f                             |
| invalidation_reason |                               |
| failover            | t                             |
| synced              | f                             |
+---------------------+---------------------------
The synced flag keep value false.
Following error in in the log
2025-06-10 16:40:58.996 CEST [739829]: [1-1] user=,db=,client=,application= LOG: slot sync worker started
2025-06-10 16:40:59.011 CEST [739829]: [2-1] user=,db=,client=,application= ERROR: exiting from slot synchronization because same name slot "logical_slot" already exists on the standby
I would like to make a proposal to address the issue:
Since the logical slot is in a failover state on both the primary and the standby, an attempt could be made to resynchronize them.
I modify the slotsync.c module
+++ b/src/backend/replication/logical/slotsync.c
@@ -649,24 +649,46 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
return false;
}
-
-       /* Search for the named slot */
+       // Both local and remote slot have the same name
if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
{
bool            synced;
+               bool            failover_status = remote_slot->failover;
SpinLockAcquire(&slot->mutex);
synced = slot->data.synced;
SpinLockRelease(&slot->mutex);
+
+               if (!synced){
+
+                       Assert(!MyReplicationSlot);
+
+                       if (failover_status){
+
+                               ReplicationSlotAcquire(remote_slot->name, true, true);
+
+                               // Put the synced flag to true to attempt resynchronizing failover slot on the standby
+                               MyReplicationSlot->data.synced = true;
+
+                               ReplicationSlotMarkDirty();
-               /* User-created slot with the same name exists, raise ERROR. */
-               if (!synced)
-                       ereport(ERROR,
+                               ReplicationSlotRelease();
+
+                               /* Get rid of a replication slot that is no longer wanted */
+                               ereport(WARNING,
+                                       errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                       errmsg("slot \"%s\" local slot has the same name as remote slot and they are in failover mode, try to synchronize them",
+                                               remote_slot->name));
+                               return false; /* Going back to the main loop after droping the failover slot */
+                       }
+                       else
+                               /* User-created slot with the same name exists, raise ERROR. */
+                               ereport(ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exiting from slot synchronization because same"
-                                                  " name slot \"%s\" already exists on the standby",
-                                                  remote_slot->name));
-
+                                                       " name slot \"%s\" already exists on the standby",
+                                                       remote_slot->name));
+               }
/*
* The slot has been synchronized before.
*
This message follows the discussions started in this thread:
https://www.postgresql.org/message-id/CAA5-nLDvnqGtBsKu4T_s-cS%2BdGbpSLEzRwgep1XfYzGhQ4o65A%40mail.gmail.com
Help would be appreciated to move this point forward

Thank You for starting a new thread and working on it.

I think you have given reference to the wrong thread, the correct one
is [1]/messages/by-id/CAA5-nLD0vKn6T1-OHROBNfN2Pxa17zVo4UoVBdfHn2y=7nKixA@mail.gmail.com.

IIUC, the proposed fix is checking if remote_slot is failover-enabled
and the slot with the same name exists locally but has 'synced'=false,
enable the 'synced' flag and proceed with synchronization from next
cycle onward, else error out. But remote-slot's failover will always
be true if we have reached this stage. Did you actually mean to check
if the local slot is failover-enabled but has 'synced' set to false
(indicating it’s a new standby after a switchover)? Even with that
check, it might not be the correct thing to overwrite such a slot
internally. I think in [1]/messages/by-id/CAA5-nLD0vKn6T1-OHROBNfN2Pxa17zVo4UoVBdfHn2y=7nKixA@mail.gmail.com, we discussed a couple of ideas related to
a GUC, alter-api, drop_all_slots API, but I don't see any of those
proposed here. Do we want to try any of that?

thanks
Shveta

fabrice636861@gmail.com

7 months ago

In reply to: shveta malik (#2)

Re: Issue with logical replication slot during switchover

Thanks Shveta for coming on this point again and fixing the link.
The idea is to check if the slot has same name to try to resynchronize it
with the primary.
ok the check on the failover status for the remote slot is perhaps
redundant.
I'm not sure what impact setting the synced flag to true might have. But if
you run an additional switchover, it works fine because the synced flag on
the new primary is set to true now.
If we come back to the idea of the GUC or the API, adding an
allow_overwrite parameter to the pg_create_logical_replication_slot
function and removing the logical slot when set to true could be a suitable
approach.

What is your opinion?

Regards,
Fabrice

On Fri, Aug 8, 2025 at 7:39 AM shveta malik <shveta.malik@gmail.com> wrote:

Show quoted text

On Thu, Aug 7, 2025 at 6:50 PM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

Hi,

An issue occurred during the initial switchover using PostgreSQL version

17.5. The setup consists of a cluster with two nodes, managed by Patroni
version 4.0.5.

Logical replication is configured on the same instance, and the new

feature enabling logical replication slots to be failover-safe in a highly
available environment is used. Logical slot management is currently
disabled in Patroni.
Following are some screen captured during the swichover

1. Run the switchover with Patroni

patronictl switchover

Current cluster topology

+ Cluster: ClusterX (7529893278186104053) ----+----+-----------+

| Member | Host | Role | State | TL | Lag in MB |

+----------+--------------+---------+-----------+----+-----------+

| node_1 | xxxxxxxxxxxx | Leader | running | 4 | |

| node_2 | xxxxxxxxxxxx | Replica | streaming | 4 | 0 |

+----------+--------------+---------+-----------+----+-----------+

2. Check the slot on the new Primary

select * from pg_replication_slots where slot_type = 'logical';
+-[ RECORD 1 ]--------+----------------+
| slot_name | logical_slot |
| plugin | pgoutput |
| slot_type | logical |
| datoid | 25605 |
| database | db_test |
| temporary | f |
| active | t |
| active_pid | 3841546 |
| xmin | |
| catalog_xmin | 10399 |
| restart_lsn | 0/37002410 |
| confirmed_flush_lsn | 0/37002448 |
| wal_status | reserved |
| safe_wal_size | |
| two_phase | f |
| inactive_since | |
| conflicting | f |
| invalidation_reason | |
| failover | t |
| synced | t |
+---------------------+----------------+
Logical replication is active again after the promote
3. Check the slot on the new standby
select * from pg_replication_slots where slot_type = 'logical';
+-[ RECORD 1 ]--------+-------------------------------+
| slot_name           | logical_slot                  |
| plugin              | pgoutput                      |
| slot_type           | logical                       |
| datoid              | 25605                         |
| database            | db_test                       |
| temporary           | f                             |
| active              | f                             |
| active_pid          |                               |
| xmin                |                               |
| catalog_xmin        | 10397                         |
| restart_lsn         | 0/3638F5F0                    |
| confirmed_flush_lsn | 0/3638F6A0                    |
| wal_status          | reserved                      |
| safe_wal_size       |                               |
| two_phase           | f                             |
| inactive_since      | 2025-08-05 10:21:03.342587+02 |
| conflicting         | f                             |
| invalidation_reason |                               |
| failover            | t                             |
| synced              | f                             |
+---------------------+---------------------------
The synced flag keep value false.
Following error in in the log
2025-06-10 16:40:58.996 CEST [739829]: [1-1]
user=,db=,client=,application= LOG: slot sync worker started

2025-06-10 16:40:59.011 CEST [739829]: [2-1]

user=,db=,client=,application= ERROR: exiting from slot synchronization
because same name slot "logical_slot" already exists on the standby

I would like to make a proposal to address the issue:
Since the logical slot is in a failover state on both the primary and

the standby, an attempt could be made to resynchronize them.
I modify the slotsync.c module
+++ b/src/backend/replication/logical/slotsync.c
@@ -649,24 +649,46 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
return false;
}
-
-       /* Search for the named slot */
+       // Both local and remote slot have the same name
if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
{
bool            synced;
+               bool            failover_status = remote_slot->failover;
SpinLockAcquire(&slot->mutex);
synced = slot->data.synced;
SpinLockRelease(&slot->mutex);
+
+               if (!synced){
+
+                       Assert(!MyReplicationSlot);
+
+                       if (failover_status){
+
+
ReplicationSlotAcquire(remote_slot->name, true, true);
+
+                               // Put the synced flag to true to
attempt resynchronizing failover slot on the standby
+                               MyReplicationSlot->data.synced = true;
+
+                               ReplicationSlotMarkDirty();
- /* User-created slot with the same name exists, raise
ERROR. */
-               if (!synced)
-                       ereport(ERROR,
+                               ReplicationSlotRelease();
+
+                               /* Get rid of a replication slot that is
no longer wanted */
+                               ereport(WARNING,
+
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

+ errmsg("slot \"%s\" local slot

has the same name as remote slot and they are in failover mode, try to
synchronize them",
+                                               remote_slot->name));
+                               return false; /* Going back to the main
loop after droping the failover slot */
+                       }
+                       else
+                               /* User-created slot with the same name
exists, raise ERROR. */

+ ereport(ERROR,

errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

errmsg("exiting from slot

synchronization because same"

- " name slot \"%s\"

already exists on the standby",
-                                                  remote_slot->name));
-
+                                                       " name slot
\"%s\" already exists on the standby",

+

remote_slot->name));

+ }
/*
* The slot has been synchronized before.
*
This message follows the discussions started in this thread:

/messages/by-id/CAA5-nLDvnqGtBsKu4T_s-cS+dGbpSLEzRwgep1XfYzGhQ4o65A@mail.gmail.com

Help would be appreciated to move this point forward

Thank You for starting a new thread and working on it.

I think you have given reference to the wrong thread, the correct one
is [1].

IIUC, the proposed fix is checking if remote_slot is failover-enabled
and the slot with the same name exists locally but has 'synced'=false,
enable the 'synced' flag and proceed with synchronization from next
cycle onward, else error out. But remote-slot's failover will always
be true if we have reached this stage. Did you actually mean to check
if the local slot is failover-enabled but has 'synced' set to false
(indicating it’s a new standby after a switchover)? Even with that
check, it might not be the correct thing to overwrite such a slot
internally. I think in [1], we discussed a couple of ideas related to
a GUC, alter-api, drop_all_slots API, but I don't see any of those
proposed here. Do we want to try any of that?

[1]:

/messages/by-id/CAA5-nLD0vKn6T1-OHROBNfN2Pxa17zVo4UoVBdfHn2y=7nKixA@mail.gmail.com

thanks
Shveta

shveta.malik@gmail.com

7 months ago

In reply to: Fabrice Chapuis (#3)

Re: Issue with logical replication slot during switchover

On Fri, Aug 8, 2025 at 7:01 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:

Thanks Shveta for coming on this point again and fixing the link.
The idea is to check if the slot has same name to try to resynchronize it with the primary.
ok the check on the failover status for the remote slot is perhaps redundant.
I'm not sure what impact setting the synced flag to true might have. But if you run an additional switchover, it works fine because the synced flag on the new primary is set to true now.
If we come back to the idea of the GUC or the API, adding an allow_overwrite parameter to the pg_create_logical_replication_slot function and removing the logical slot when set to true could be a suitable approach.

What is your opinion?

If implemented as a GUC, it would address only a specific corner case,
making it less suitable to be added as a GUC.

OTOH, adding it as a slot's property makes more sense. You can start
with introducing a new slot property, allow_overwrite. By default,
this property will be set to false.

a) The function pg_create_logical_replication_slot() can be extended
to accept this parameter.
b) A new API pg_alter_logical_replication_slot() can be introduced, to
modify this property after slot creation if needed.
c) The commands CREATE SUBSCRIPTION and ALTER SUBSCRIPTION are not
needed to include an allow_overwrite parameter. When CREATE
SUBSCRIPTION creates a slot, it will always set allow_overwrite to
false by default. If users need to change this later, they can use the
new API pg_alter_logical_replication_slot() to update the property.
d) Additionally, pg_alter_logical_replication_slot() can serve as a
generic API to modify other slot properties as well.

This appears to be a reasonable idea with potential use cases beyond
just allowing synchronization post switchover. Thoughts?

~~~

Another problem as you pointed out is inconsistent behaviour across
switchovers. On the first switchover, we get the error on new standby:
"Exiting from slot synchronization because a slot with the same name
already exists on the standby."

But in the case of a double switchover, this error does not occur.
This is due to the 'synced' flag not set on new standby on first
switchover while set in double switchover. I think the behaviour
should be the same. In both cases, it should emit the same error. We
are thinking of a potential solution here and will start a new thread
if needed.

thanks
Shveta

fabrice636861@gmail.com

7 months ago

In reply to: shveta malik (#4)

Re: Issue with logical replication slot during switchover

Thanks for sharing this. In fact, I agree, introducing an allow_overwrite
slot property makes seems cleaner than a GUC for this specific use case.
a) At first, an extension of pg_create_logical_replication_slot() could be
proposed
b) pg_alter_logical_replication_slot(): that could be is a neat approach, a
generic API for modifying other slot properties seems like a
forward-looking design choice. However I don't know if slot properties
could be modified easily because when slot is active, a process is holding
them in memory.
c) CREATE SUBSCRIPTION and ALTER SUBSCRIPTION stay untouched

Regarding the switchover inconsistency, I agree that behavior should be
uniform. The current discrepancy between the first and double switchover is
confusing, and having the same error in both cases would improve
predictability. It's up to you if you want to open a thread for this.

Regards,

Fabrice

On Wed, Aug 13, 2025 at 8:04 AM shveta malik <shveta.malik@gmail.com> wrote:

Show quoted text

On Fri, Aug 8, 2025 at 7:01 PM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

Thanks Shveta for coming on this point again and fixing the link.
The idea is to check if the slot has same name to try to resynchronize

it with the primary.

ok the check on the failover status for the remote slot is perhaps

redundant.

I'm not sure what impact setting the synced flag to true might have. But

if you run an additional switchover, it works fine because the synced flag
on the new primary is set to true now.

If we come back to the idea of the GUC or the API, adding an

allow_overwrite parameter to the pg_create_logical_replication_slot
function and removing the logical slot when set to true could be a suitable
approach.

What is your opinion?

If implemented as a GUC, it would address only a specific corner case,
making it less suitable to be added as a GUC.

OTOH, adding it as a slot's property makes more sense. You can start
with introducing a new slot property, allow_overwrite. By default,
this property will be set to false.

a) The function pg_create_logical_replication_slot() can be extended
to accept this parameter.
b) A new API pg_alter_logical_replication_slot() can be introduced, to
modify this property after slot creation if needed.
c) The commands CREATE SUBSCRIPTION and ALTER SUBSCRIPTION are not
needed to include an allow_overwrite parameter. When CREATE
SUBSCRIPTION creates a slot, it will always set allow_overwrite to
false by default. If users need to change this later, they can use the
new API pg_alter_logical_replication_slot() to update the property.
d) Additionally, pg_alter_logical_replication_slot() can serve as a
generic API to modify other slot properties as well.

This appears to be a reasonable idea with potential use cases beyond
just allowing synchronization post switchover. Thoughts?

~~~

Another problem as you pointed out is inconsistent behaviour across
switchovers. On the first switchover, we get the error on new standby:
"Exiting from slot synchronization because a slot with the same name
already exists on the standby."

But in the case of a double switchover, this error does not occur.
This is due to the 'synced' flag not set on new standby on first
switchover while set in double switchover. I think the behaviour
should be the same. In both cases, it should emit the same error. We
are thinking of a potential solution here and will start a new thread
if needed.

thanks
Shveta

shveta.malik@gmail.com

7 months ago

In reply to: Fabrice Chapuis (#5)

Re: Issue with logical replication slot during switchover

On Wed, Aug 13, 2025 at 9:29 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:

Thanks for sharing this. In fact, I agree, introducing an allow_overwrite slot property makes seems cleaner than a GUC for this specific use case.

a) At first, an extension of pg_create_logical_replication_slot() could be proposed
b) pg_alter_logical_replication_slot(): that could be is a neat approach, a generic API for modifying other slot properties seems like a forward-looking design choice. However I don't know if slot properties could be modified easily because when slot is active, a process is holding them in memory.

Right, it can not be.This API can be used when the slot is not
acquired by any other process. We may have use-cases for that as well
similar to one we have for this switchover case.

thanks
Shveta

fabrice636861@gmail.com

7 months ago

In reply to: shveta malik (#6)

Re: Issue with logical replication slot during switchover

I'll work on pg_create_logical_replication_slot() API. pg_replication_slots
view must also be adjusted to display the new flag allow_overwrite.

Regards

Fabrice

On Thu, Aug 14, 2025 at 8:07 AM shveta malik <shveta.malik@gmail.com> wrote:

Show quoted text

On Wed, Aug 13, 2025 at 9:29 PM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

Thanks for sharing this. In fact, I agree, introducing an

allow_overwrite slot property makes seems cleaner than a GUC for this
specific use case.

a) At first, an extension of pg_create_logical_replication_slot() could

be proposed

b) pg_alter_logical_replication_slot(): that could be is a neat

approach, a generic API for modifying other slot properties seems like a
forward-looking design choice. However I don't know if slot properties
could be modified easily because when slot is active, a process is holding
them in memory.

Right, it can not be.This API can be used when the slot is not
acquired by any other process. We may have use-cases for that as well
similar to one we have for this switchover case.

thanks
Shveta

fabrice636861@gmail.com

7 months ago

In reply to: shveta malik (#4)

Re: Issue with logical replication slot during switchover

For the first step (a), the pg_create_logical_replication_slot interface is
extended.
The slot on the new attached standby will be dropped and recreated if the
flag allow_overwrite is set to true.
I tested the modified source, could you please give me a feedback on code
changes.

Regards,

Fabrice

diff --git a/src/backend/catalog/system_functions.sql
b/src/backend/catalog/system_functions.sql
index 566f308..6cd3175 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -480,6 +480,7 @@ CREATE OR REPLACE FUNCTION
pg_create_logical_replication_slot(
     IN temporary boolean DEFAULT false,
     IN twophase boolean DEFAULT false,
     IN failover boolean DEFAULT false,
+    IN allow_overwrite boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)
 RETURNS RECORD
 LANGUAGE INTERNAL
diff --git a/src/backend/replication/logical/slotsync.c
b/src/backend/replication/logical/slotsync.c
index 656e66e..d6332cd 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -627,6 +627,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
        ReplicationSlot *slot;
        XLogRecPtr      latestFlushPtr;
        bool            slot_updated = false;
+       bool            allow_overwrite = false;

/*
* Make sure that concerned WAL is received and flushed before
syncing
@@ -649,24 +650,46 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)

                return false;
        }
-
-       /* Search for the named slot */
+       // Both local and remote slot have the same name
        if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
        {
-               bool            synced;
+               bool            synced;

                SpinLockAcquire(&slot->mutex);
                synced = slot->data.synced;
+               allow_overwrite = slot->data.allow_overwrite;
                SpinLockRelease(&slot->mutex);
-
-               /* User-created slot with the same name exists, raise
ERROR. */
-               if (!synced)
-                       ereport(ERROR,
+
+               if (!synced){
+                       /*
+                        * Check if we need to overwrite an existing
+                        * logical slot
+                       */
+                       if (allow_overwrite){
+                               /*
+                                * Get rid of a replication slot that is no
+                                *longer wanted
+                               */
+                               ReplicationSlotDrop(remote_slot->name,true);
+
+                               /* Get rid of a replication slot that is no
longer wanted */
+                               ereport(WARNING,
+
 errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                       errmsg("slot \"%s\" already exists"
+                            " on the standby but it will be dropped
because flag allow_overwrite is set to true",
+                             remote_slot->name));
+
+                               /* Going back to the main loop after
droping the failover slot */
+                               return false;
+                       }
+                       else
+                               /* User-created slot with the same name
exists, raise ERROR. */
+                               ereport(ERROR,

errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                                        errmsg("exiting from slot
synchronization because same"
-                                                  " name slot \"%s\"
already exists on the standby",
-                                                  remote_slot->name));
-
+                                                       " name slot \"%s\"
already exists on the standby",
+                                                       remote_slot->name));
+               }
                /*
                 * The slot has been synchronized before.
                 *
@@ -761,6 +784,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
                ReplicationSlotCreate(remote_slot->name, true, RS_TEMPORARY,

remote_slot->two_phase,

remote_slot->failover,
+ allow_overwrite,
true);

                /* For shorter lines. */
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 600b87f..d6bc5c6 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -323,7 +323,7 @@ ReplicationSlotValidateName(const char *name, int
elevel)
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
                                          ReplicationSlotPersistency
persistency,
-                                         bool two_phase, bool failover,
bool synced)
+                                         bool two_phase, bool failover,
bool allow_overwrite, bool synced)
 {
        ReplicationSlot *slot = NULL;
        int                     i;
@@ -413,6 +413,11 @@ ReplicationSlotCreate(const char *name, bool
db_specific,
        slot->data.two_phase_at = InvalidXLogRecPtr;
        slot->data.failover = failover;
        slot->data.synced = synced;
+       slot->data.allow_overwrite = allow_overwrite;
+
+       elog(LOG, "Logical replication slot %s created with option
allow_overwrite to %s",
+     NameStr(slot->data.name),
+     slot->data.allow_overwrite ? "true" : "false");

        /* and then data only present in shared memory */
        slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c
b/src/backend/replication/slotfuncs.c
index 36cc2ed..6bd430f 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -40,7 +40,7 @@ create_physical_replication_slot(char *name, bool
immediately_reserve,

        /* acquire replication slot, this will check for conflicting names
*/
        ReplicationSlotCreate(name, false,
-                                                 temporary ? RS_TEMPORARY
: RS_PERSISTENT, false,
+                                                 temporary ? RS_TEMPORARY
: RS_PERSISTENT, false, false,
                                                  false, false);

        if (immediately_reserve)
@@ -116,7 +116,7 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
 static void
 create_logical_replication_slot(char *name, char *plugin,
                                                                bool
temporary, bool two_phase,
-                                                               bool
failover,
+                                                               bool
failover, bool allow_overwrite,
                                                                XLogRecPtr
restart_lsn,
                                                                bool
find_startpoint)
 {
@@ -134,7 +134,7 @@ create_logical_replication_slot(char *name, char
*plugin,
         */
        ReplicationSlotCreate(name, true,
                                                  temporary ? RS_TEMPORARY
: RS_EPHEMERAL, two_phase,
-                                                 failover, false);
+                                                 failover,
allow_overwrite, false);

        /*
         * Create logical decoding context to find start point or, if we
don't
@@ -173,6 +173,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
        bool            temporary = PG_GETARG_BOOL(2);
        bool            two_phase = PG_GETARG_BOOL(3);
        bool            failover = PG_GETARG_BOOL(4);
+       bool            allow_overwrite = PG_GETARG_BOOL(5);
        Datum           result;
        TupleDesc       tupdesc;
        HeapTuple       tuple;
@@ -191,6 +192,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)

temporary,

two_phase,

failover,
+
allow_overwrite,

InvalidXLogRecPtr,

true);

@@ -210,6 +212,47 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}

+/*
+ * This function is intended to modify a logical replication slot with
+ * given arguments.
+ */
+static void
+alter_logical_replication_slot(char *name, bool two_phase,
+                                                               bool
failover,
+                                                               bool
allow_overwrite)
+{
+       Assert(!MyReplicationSlot);
+
+       ReplicationSlotAcquire(name, true, true);
+       MyReplicationSlot->data.allow_overwrite = allow_overwrite;
+       ReplicationSlotMarkDirty();
+
+       ReplicationSlotRelease();
+}
+
+/*
+ * SQL function for altering logical replication slot properties.
+ */
+Datum
+pg_alter_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+       Name            name = PG_GETARG_NAME(0);
+       bool            two_phase = PG_GETARG_BOOL(1);
+       bool            failover = PG_GETARG_BOOL(2);
+       bool            allow_overwrite = PG_GETARG_BOOL(3);
+
+       CheckSlotPermissions();
+
+       CheckLogicalDecodingRequirements();
+
+       alter_logical_replication_slot(NameStr(*name),
+
 two_phase,
+
 failover,
+
 allow_overwrite);
+
+       PG_RETURN_NAME(name);
+}
+

/*
* SQL function for dropping a replication slot.
@@ -726,6 +769,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool
logical_slot)

temporary,

false,

false,
+
false,

src_restart_lsn,

    false);
        }
diff --git a/src/backend/replication/walsender.c
b/src/backend/replication/walsender.c
index 9fa8beb..ef22695 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1198,7 +1198,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
        {
                ReplicationSlotCreate(cmd->slotname, false,
                                                          cmd->temporary ?
RS_TEMPORARY : RS_PERSISTENT,
-                                                         false, false,
false);
+                                                         false, false,
false, false);

                if (reserve_wal)
                {
@@ -1229,7 +1229,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
                 */
                ReplicationSlotCreate(cmd->slotname, true,
                                                          cmd->temporary ?
RS_TEMPORARY : RS_EPHEMERAL,
-                                                         two_phase,
failover, false);
+                                                         two_phase,
failover, false, false);

                /*
                 * Do options check early so that we can bail before
calling the
diff --git a/src/include/catalog/pg_proc.dat
b/src/include/catalog/pg_proc.dat
index 62beb71..074805d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11480,10 +11480,10 @@
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
   proparallel => 'u', prorettype => 'record',
-  proargtypes => 'name name bool bool bool',
-  proallargtypes => '{name,name,bool,bool,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,i,i,o,o}',
-  proargnames =>
'{slot_name,plugin,temporary,twophase,failover,slot_name,lsn}',
+  proargtypes => 'name name bool bool bool bool',
+  proallargtypes => '{name,name,bool,bool,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,i,i,o,o}',
+  proargnames =>
'{slot_name,plugin,temporary,twophase,failover,allow_overwrite,slot_name,lsn}',
   prosrc => 'pg_create_logical_replication_slot' },
 { oid => '4222',
   descr => 'copy a logical replication slot, changing temporality and
plugin',
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index eb0b93b..1fd6445 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -134,6 +134,11 @@ typedef struct ReplicationSlotPersistentData
         * for logical slots on the primary server.
         */
        bool            failover;
+       /*
+        * Allow Postgres to drop logical replication slot on standby
server to ensure
+        * creation of new failover slot when sync_replication_slots is
true.
+        */
+       bool            allow_overwrite;
 } ReplicationSlotPersistentData;

/*
@@ -267,7 +272,7 @@ extern void ReplicationSlotsShmemInit(void);
/* management of individual slots */
extern void ReplicationSlotCreate(const char *name, bool db_specific,

ReplicationSlotPersistency persistency,
-                                                                 bool
two_phase, bool failover,
+                                                                 bool
two_phase, bool failover, bool allow_overwrite,
                                                                  bool
synced);
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);

On Wed, Aug 13, 2025 at 8:04 AM shveta malik <shveta.malik@gmail.com> wrote:

Show quoted text

On Fri, Aug 8, 2025 at 7:01 PM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

Thanks Shveta for coming on this point again and fixing the link.
The idea is to check if the slot has same name to try to resynchronize

it with the primary.

ok the check on the failover status for the remote slot is perhaps

redundant.

I'm not sure what impact setting the synced flag to true might have. But

if you run an additional switchover, it works fine because the synced flag
on the new primary is set to true now.

If we come back to the idea of the GUC or the API, adding an

allow_overwrite parameter to the pg_create_logical_replication_slot
function and removing the logical slot when set to true could be a suitable
approach.

What is your opinion?

If implemented as a GUC, it would address only a specific corner case,
making it less suitable to be added as a GUC.

OTOH, adding it as a slot's property makes more sense. You can start
with introducing a new slot property, allow_overwrite. By default,
this property will be set to false.

a) The function pg_create_logical_replication_slot() can be extended
to accept this parameter.
b) A new API pg_alter_logical_replication_slot() can be introduced, to
modify this property after slot creation if needed.
c) The commands CREATE SUBSCRIPTION and ALTER SUBSCRIPTION are not
needed to include an allow_overwrite parameter. When CREATE
SUBSCRIPTION creates a slot, it will always set allow_overwrite to
false by default. If users need to change this later, they can use the
new API pg_alter_logical_replication_slot() to update the property.
d) Additionally, pg_alter_logical_replication_slot() can serve as a
generic API to modify other slot properties as well.

This appears to be a reasonable idea with potential use cases beyond
just allowing synchronization post switchover. Thoughts?

~~~

Another problem as you pointed out is inconsistent behaviour across
switchovers. On the first switchover, we get the error on new standby:
"Exiting from slot synchronization because a slot with the same name
already exists on the standby."

But in the case of a double switchover, this error does not occur.
This is due to the 'synced' flag not set on new standby on first
switchover while set in double switchover. I think the behaviour
should be the same. In both cases, it should emit the same error. We
are thinking of a potential solution here and will start a new thread
if needed.

thanks
Shveta

shveta.malik@gmail.com

6 months ago

In reply to: Fabrice Chapuis (#8)

Re: Issue with logical replication slot during switchover

On Wed, Aug 27, 2025 at 7:55 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:

For the first step (a), the pg_create_logical_replication_slot interface is extended.
The slot on the new attached standby will be dropped and recreated if the flag allow_overwrite is set to true.
I tested the modified source, could you please give me a feedback on code changes.

Thanks for working on this. Can you please provide a patch instead of
diff? It will be easier for reviewers to apply and test it then.

thanks
Shveta

#10

fabrice636861@gmail.com

6 months ago

In reply to: shveta malik (#9)

Re: Issue with logical replication slot during switchover

What is the procedure to create this patch. Thank you for your help.

On Thu, Aug 28, 2025 at 12:04 PM shveta malik <shveta.malik@gmail.com>
wrote:

Show quoted text

On Wed, Aug 27, 2025 at 7:55 PM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

For the first step (a), the pg_create_logical_replication_slot interface

is extended.

The slot on the new attached standby will be dropped and recreated if

the flag allow_overwrite is set to true.

I tested the modified source, could you please give me a feedback on

code changes.

Thanks for working on this. Can you please provide a patch instead of
diff? It will be easier for reviewers to apply and test it then.

thanks
Shveta

#11

[1]: https://stackoverflow.com/questions/6658313/how-can-i-generate-a-git-patch-for-a-specific-commit
[2]: https://www.geeksforgeeks.org/git/how-to-generate-a-git-patch-for-a-specific-commit/
[3]: https://gist.github.com/ganapathichidambaram/3504e31fc8ba809762b3a0d9d9ef8ec2

shveta.malik@gmail.com

6 months ago

In reply to: Fabrice Chapuis (#10)

Re: Issue with logical replication slot during switchover

On Thu, Aug 28, 2025 at 9:11 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:

What is the procedure to create this patch. Thank you for your help.

We use 'git format-patch' to generate formatted patches. I have given
a few links ([1]https://stackoverflow.com/questions/6658313/how-can-i-generate-a-git-patch-for-a-specific-commit,[2]https://www.geeksforgeeks.org/git/how-to-generate-a-git-patch-for-a-specific-commit/,[3)] on know-how.

I usually use these steps:

git add <file1.c> <adds file to staging area>
git add <file2.c>
<once all the required files are added, create a commit>
git commit <add the required message to describe the patch>
git format-patch -1 -v1 <create the patch>
-1-> to create patch for last 1 commit
-v1 --> patch-prefix. I have used v1 which indicates version-1, when
we upload the patch a second time, we can use -v2 and so on.

thanks
Shveta

#12

fabrice636861@gmail.com

6 months ago

In reply to: shveta malik (#11)

Re: Issue with logical replication slot during switchover

Thanks Shveta for your procedure to follow, here is the version 1 of the
patch

Regards,

Fabrice

On Fri, Aug 29, 2025 at 5:34 AM shveta malik <shveta.malik@gmail.com> wrote:

Show quoted text

On Thu, Aug 28, 2025 at 9:11 PM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

What is the procedure to create this patch. Thank you for your help.

We use 'git format-patch' to generate formatted patches. I have given
a few links ([1],[2],[3)] on know-how.

I usually use these steps:

git add <file1.c> <adds file to staging area>
git add <file2.c>
<once all the required files are added, create a commit>
git commit <add the required message to describe the patch>
git format-patch -1 -v1 <create the patch>
-1-> to create patch for last 1 commit
-v1 --> patch-prefix. I have used v1 which indicates version-1, when
we upload the patch a second time, we can use -v2 and so on.

[1]:
https://stackoverflow.com/questions/6658313/how-can-i-generate-a-git-patch-for-a-specific-commit
[2]:
https://www.geeksforgeeks.org/git/how-to-generate-a-git-patch-for-a-specific-commit/
[3]:
https://gist.github.com/ganapathichidambaram/3504e31fc8ba809762b3a0d9d9ef8ec2

thanks
Shveta

#13

Shlok Kyal

shlok.kyal.oss@gmail.com

6 months ago

In reply to: Fabrice Chapuis (#12)

Re: Issue with logical replication slot during switchover

On Fri, 29 Aug 2025 at 19:28, Fabrice Chapuis <fabrice636861@gmail.com> wrote:

Thanks Shveta for your procedure to follow, here is the version 1 of the patch

Regards,

Fabrice

On Fri, Aug 29, 2025 at 5:34 AM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, Aug 28, 2025 at 9:11 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:

What is the procedure to create this patch. Thank you for your help.

We use 'git format-patch' to generate formatted patches. I have given
a few links ([1],[2],[3)] on know-how.

I usually use these steps:

git add <file1.c> <adds file to staging area>
git add <file2.c>
<once all the required files are added, create a commit>
git commit <add the required message to describe the patch>
git format-patch -1 -v1 <create the patch>
-1-> to create patch for last 1 commit
-v1 --> patch-prefix. I have used v1 which indicates version-1, when
we upload the patch a second time, we can use -v2 and so on.

[1]: https://stackoverflow.com/questions/6658313/how-can-i-generate-a-git-patch-for-a-specific-commit
[2]: https://www.geeksforgeeks.org/git/how-to-generate-a-git-patch-for-a-specific-commit/
[3]: https://gist.github.com/ganapathichidambaram/3504e31fc8ba809762b3a0d9d9ef8ec2

thanks
Shveta

Hi Fabrice,

Thanks for providing the patch. I reviewed your patch and have
following comment:

1. I think we should add a commit message in the patch. It would help
to give an understanding of the patch.

2. I tried applying patch on HEAD, it generates following warnings:
Applying: fix failover slot issue when doing a switchover
.git/rebase-apply/patch:31: trailing whitespace.
bool allow_overwrite = false;
.git/rebase-apply/patch:45: trailing whitespace.
bool synced;
.git/rebase-apply/patch:55: trailing whitespace.

.git/rebase-apply/patch:56: trailing whitespace.
if (!synced){
.git/rebase-apply/patch:57: trailing whitespace.
/*
warning: squelched 16 whitespace errors
warning: 21 lines add whitespace errors.

3. I also tried to build the patch on HEAD and it is giving the following error:
ubuntu@ubuntu-Virtual-Machine:~/Project/pg/postgres$
./../../install-clean.sh pg_30_8 > warn.log
launcher.c: In function ‘CreateConflictDetectionSlot’:
launcher.c:1457:9: error: too few arguments to function ‘ReplicationSlotCreate’
1457 | ReplicationSlotCreate(CONFLICT_DETECTION_SLOT, false,
RS_PERSISTENT, false,
| ^~~~~~~~~~~~~~~~~~~~~
In file included from launcher.c:35:
../../../../src/include/replication/slot.h:307:13: note: declared here
307 | extern void ReplicationSlotCreate(const char *name, bool db_specific,
| ^~~~~~~~~~~~~~~~~~~~~
make[4]: *** [<builtin>: launcher.o] Error 1
make[3]: *** [../../../src/backend/common.mk:37: logical-recursive] Error 2
make[2]https://wiki.postgresql.org/wiki/Creating_Clean_Patches: *** [common.mk:37: replication-recursive] Error 2
make[1]https://www.postgresql.org/docs/devel/source-format.html: *** [Makefile:42: install-backend-recurse] Error 2
make: *** [GNUmakefile:11: install-src-recurse] Error 2

4. I have some general comments regarding formatting of the patch.
+ // Both local and remote slot have the same name

We use following format for single line comments:
/* comment text */
and for multi line comments we use following format:
/*
* comment text begins here
* and continues here
*/

5. We should use proper indentation here:
+   elog(LOG, "Logical replication slot %s created with option
allow_overwrite to %s",
+     NameStr(slot->data.name),
+     slot->data.allow_overwrite ? "true" : "false");

6.
+       if (!synced){
+           /*
+            * Check if we need to overwrite an existing
+            * logical slot
+           */
We should start the parentheses from the next line.
Also, indentation for comment is not correct here.

7.
+           if (allow_overwrite){
+               /*
+                * Get rid of a replication slot that is no
+                *longer wanted
+               */
Similar comment as above.

Please refer [1]https://www.postgresql.org/docs/devel/source-format.html [2]https://wiki.postgresql.org/wiki/Creating_Clean_Patches for proper formatting of the patch.
[1]: https://www.postgresql.org/docs/devel/source-format.html
[2]: https://wiki.postgresql.org/wiki/Creating_Clean_Patches

Thanks,
Shlok Kyal

#14

shveta.malik@gmail.com

6 months ago

In reply to: Shlok Kyal (#13)

Re: Issue with logical replication slot during switchover

On Sat, Aug 30, 2025 at 11:43 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

Hi Fabrice,

Thanks for providing the patch. I reviewed your patch and have
following comment:

1. I think we should add a commit message in the patch. It would help
to give an understanding of the patch.

2. I tried applying patch on HEAD, it generates following warnings:
Applying: fix failover slot issue when doing a switchover
.git/rebase-apply/patch:31: trailing whitespace.
bool allow_overwrite = false;
.git/rebase-apply/patch:45: trailing whitespace.
bool synced;
.git/rebase-apply/patch:55: trailing whitespace.

.git/rebase-apply/patch:56: trailing whitespace.
if (!synced){
.git/rebase-apply/patch:57: trailing whitespace.
/*
warning: squelched 16 whitespace errors
warning: 21 lines add whitespace errors.

'git diff --check' can be used before 'git commit' to get info of
above issues before creating a patch.

3. I also tried to build the patch on HEAD and it is giving the following error:
ubuntu@ubuntu-Virtual-Machine:~/Project/pg/postgres$
./../../install-clean.sh pg_30_8 > warn.log
launcher.c: In function ‘CreateConflictDetectionSlot’:
launcher.c:1457:9: error: too few arguments to function ‘ReplicationSlotCreate’
1457 | ReplicationSlotCreate(CONFLICT_DETECTION_SLOT, false,
RS_PERSISTENT, false,
| ^~~~~~~~~~~~~~~~~~~~~
In file included from launcher.c:35:
../../../../src/include/replication/slot.h:307:13: note: declared here
307 | extern void ReplicationSlotCreate(const char *name, bool db_specific,
| ^~~~~~~~~~~~~~~~~~~~~
make[4]: *** [<builtin>: launcher.o] Error 1
make[3]: *** [../../../src/backend/common.mk:37: logical-recursive] Error 2
make[2]: *** [common.mk:37: replication-recursive] Error 2
make[1]: *** [Makefile:42: install-backend-recurse] Error 2
make: *** [GNUmakefile:11: install-src-recurse] Error 2

4. I have some general comments regarding formatting of the patch.
+ // Both local and remote slot have the same name

We use following format for single line comments:
/* comment text */
and for multi line comments we use following format:
/*
* comment text begins here
* and continues here
*/
5. We should use proper indentation here:
+   elog(LOG, "Logical replication slot %s created with option
allow_overwrite to %s",
+     NameStr(slot->data.name),
+     slot->data.allow_overwrite ? "true" : "false");

src/tools/pgindent <file name>
can be used to do indentation before creating a patch.

Show quoted text

6.
+       if (!synced){
+           /*
+            * Check if we need to overwrite an existing
+            * logical slot
+           */
We should start the parentheses from the next line.
Also, indentation for comment is not correct here.
7.
+           if (allow_overwrite){
+               /*
+                * Get rid of a replication slot that is no
+                *longer wanted
+               */
Similar comment as above.
Please refer [1] [2] for proper formatting of the patch.
[1]: https://www.postgresql.org/docs/devel/source-format.html
[2]: https://wiki.postgresql.org/wiki/Creating_Clean_Patches

Thanks,
Shlok Kyal

#15

fabrice636861@gmail.com

6 months ago

In reply to: shveta malik (#14)

Re: Issue with logical replication slot during switchover

Hi,
I will make de modifications based on the remarks you mentioned.

Regards,

Fabrice

On Mon, Sep 1, 2025 at 5:45 AM shveta malik <shveta.malik@gmail.com> wrote:

Show quoted text

On Sat, Aug 30, 2025 at 11:43 AM Shlok Kyal <shlok.kyal.oss@gmail.com>
wrote:

Hi Fabrice,

Thanks for providing the patch. I reviewed your patch and have
following comment:

1. I think we should add a commit message in the patch. It would help
to give an understanding of the patch.

2. I tried applying patch on HEAD, it generates following warnings:
Applying: fix failover slot issue when doing a switchover
.git/rebase-apply/patch:31: trailing whitespace.
bool allow_overwrite = false;
.git/rebase-apply/patch:45: trailing whitespace.
bool synced;
.git/rebase-apply/patch:55: trailing whitespace.

.git/rebase-apply/patch:56: trailing whitespace.
if (!synced){
.git/rebase-apply/patch:57: trailing whitespace.
/*
warning: squelched 16 whitespace errors
warning: 21 lines add whitespace errors.

'git diff --check' can be used before 'git commit' to get info of
above issues before creating a patch.

3. I also tried to build the patch on HEAD and it is giving the

following error:

ubuntu@ubuntu-Virtual-Machine:~/Project/pg/postgres$
./../../install-clean.sh pg_30_8 > warn.log
launcher.c: In function ‘CreateConflictDetectionSlot’:
launcher.c:1457:9: error: too few arguments to function

‘ReplicationSlotCreate’

1457 | ReplicationSlotCreate(CONFLICT_DETECTION_SLOT, false,
RS_PERSISTENT, false,
| ^~~~~~~~~~~~~~~~~~~~~
In file included from launcher.c:35:
../../../../src/include/replication/slot.h:307:13: note: declared here
307 | extern void ReplicationSlotCreate(const char *name, bool

db_specific,

| ^~~~~~~~~~~~~~~~~~~~~
make[4]: *** [<builtin>: launcher.o] Error 1
make[3]: *** [../../../src/backend/common.mk:37: logical-recursive]

Error 2
make[2]: *** [common.mk:37: replication-recursive] Error 2
make[1]: *** [Makefile:42: install-backend-recurse] Error 2
make: *** [GNUmakefile:11: install-src-recurse] Error 2

4. I have some general comments regarding formatting of the patch.
+ // Both local and remote slot have the same name

We use following format for single line comments:
/* comment text */
and for multi line comments we use following format:
/*
* comment text begins here
* and continues here
*/
5. We should use proper indentation here:
+   elog(LOG, "Logical replication slot %s created with option
allow_overwrite to %s",
+     NameStr(slot->data.name),
+     slot->data.allow_overwrite ? "true" : "false");
src/tools/pgindent <file name>
can be used to do indentation before creating a patch.
6.
+       if (!synced){
+           /*
+            * Check if we need to overwrite an existing
+            * logical slot
+           */
We should start the parentheses from the next line.
Also, indentation for comment is not correct here.
7.
+           if (allow_overwrite){
+               /*
+                * Get rid of a replication slot that is no
+                *longer wanted
+               */
Similar comment as above.
Please refer [1] [2] for proper formatting of the patch.
[1]: https://www.postgresql.org/docs/devel/source-format.html
[2]: https://wiki.postgresql.org/wiki/Creating_Clean_Patches

Thanks,
Shlok Kyal

#16

fabrice636861@gmail.com

6 months ago

In reply to: Fabrice Chapuis (#15)

Re: Issue with logical replication slot during switchover

Hi,
Here the version 1 of the patch with the modifications.
I do a git diff --check until there are no more errors.
In a next version of the patch, I think I have make change to call
ReplicationSlotCreate() function with the value of the flag
allow_overwrite be consistent and modify the
interface ReplicationSlotCreate to display the attribute allow_overwrite.

Best regards,

Fabrice

On Mon, Sep 1, 2025 at 9:42 AM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

Show quoted text

Hi,
I will make de modifications based on the remarks you mentioned.

Regards,

Fabrice

On Mon, Sep 1, 2025 at 5:45 AM shveta malik <shveta.malik@gmail.com>
wrote:
On Sat, Aug 30, 2025 at 11:43 AM Shlok Kyal <shlok.kyal.oss@gmail.com>
wrote:

Hi Fabrice,

Thanks for providing the patch. I reviewed your patch and have
following comment:

1. I think we should add a commit message in the patch. It would help
to give an understanding of the patch.

2. I tried applying patch on HEAD, it generates following warnings:
Applying: fix failover slot issue when doing a switchover
.git/rebase-apply/patch:31: trailing whitespace.
bool allow_overwrite = false;
.git/rebase-apply/patch:45: trailing whitespace.
bool synced;
.git/rebase-apply/patch:55: trailing whitespace.

.git/rebase-apply/patch:56: trailing whitespace.
if (!synced){
.git/rebase-apply/patch:57: trailing whitespace.
/*
warning: squelched 16 whitespace errors
warning: 21 lines add whitespace errors.

'git diff --check' can be used before 'git commit' to get info of
above issues before creating a patch.

3. I also tried to build the patch on HEAD and it is giving the

following error:

ubuntu@ubuntu-Virtual-Machine:~/Project/pg/postgres$
./../../install-clean.sh pg_30_8 > warn.log
launcher.c: In function ‘CreateConflictDetectionSlot’:
launcher.c:1457:9: error: too few arguments to function

‘ReplicationSlotCreate’

1457 | ReplicationSlotCreate(CONFLICT_DETECTION_SLOT, false,
RS_PERSISTENT, false,
| ^~~~~~~~~~~~~~~~~~~~~
In file included from launcher.c:35:
../../../../src/include/replication/slot.h:307:13: note: declared here
307 | extern void ReplicationSlotCreate(const char *name, bool

db_specific,

| ^~~~~~~~~~~~~~~~~~~~~
make[4]: *** [<builtin>: launcher.o] Error 1
make[3]: *** [../../../src/backend/common.mk:37: logical-recursive]

Error 2
make[2]: *** [common.mk:37: replication-recursive] Error 2
make[1]: *** [Makefile:42: install-backend-recurse] Error 2
make: *** [GNUmakefile:11: install-src-recurse] Error 2

4. I have some general comments regarding formatting of the patch.
+ // Both local and remote slot have the same name

We use following format for single line comments:
/* comment text */
and for multi line comments we use following format:
/*
* comment text begins here
* and continues here
*/
5. We should use proper indentation here:
+   elog(LOG, "Logical replication slot %s created with option
allow_overwrite to %s",
+     NameStr(slot->data.name),
+     slot->data.allow_overwrite ? "true" : "false");
src/tools/pgindent <file name>
can be used to do indentation before creating a patch.
6.
+       if (!synced){
+           /*
+            * Check if we need to overwrite an existing
+            * logical slot
+           */
We should start the parentheses from the next line.
Also, indentation for comment is not correct here.
7.
+           if (allow_overwrite){
+               /*
+                * Get rid of a replication slot that is no
+                *longer wanted
+               */
Similar comment as above.
Please refer [1] [2] for proper formatting of the patch.
[1]: https://www.postgresql.org/docs/devel/source-format.html
[2]: https://wiki.postgresql.org/wiki/Creating_Clean_Patches

Thanks,
Shlok Kyal

#17

[1]: /messages/by-id/CAJpy0uBJqTQdq+UDW55oK2bhxMig=FaEYFV=+ZaKmtmgNWwsHw@mail.gmail.com

shveta.malik@gmail.com

6 months ago

In reply to: shveta malik (#4)

Re: Issue with logical replication slot during switchover

On Wed, Aug 13, 2025 at 11:34 AM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Aug 8, 2025 at 7:01 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:

Another problem as you pointed out is inconsistent behaviour across
switchovers. On the first switchover, we get the error on new standby:
"Exiting from slot synchronization because a slot with the same name
already exists on the standby."

But in the case of a double switchover, this error does not occur.
This is due to the 'synced' flag not set on new standby on first
switchover while set in double switchover. I think the behaviour
should be the same. In both cases, it should emit the same error. We
are thinking of a potential solution here and will start a new thread
if needed.

Started a thread [1]/messages/by-id/CAJpy0uBJqTQdq+UDW55oK2bhxMig=FaEYFV=+ZaKmtmgNWwsHw@mail.gmail.com for this issue.

thanks
Shveta

#18

shveta.malik@gmail.com

6 months ago

In reply to: Fabrice Chapuis (#16)

Re: Issue with logical replication slot during switchover

On Sun, Sep 7, 2025 at 1:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:

Hi,
Here the version 1 of the patch with the modifications.
I do a git diff --check until there are no more errors.
In a next version of the patch, I think I have make change to call ReplicationSlotCreate() function with the value of the flag allow_overwrite be consistent

In which flow? I see all ReplicationSlotCreate() references already
accepting the value.

and modify the interface ReplicationSlotCreate to display the attribute allow_overwrite.

Do you mean to modify pg_replication_slots?

Thank You for the patch. Please find a few comments:

1)
In both create_physical_replication_slot() and
create_physical_replication_slot():
+ false, false,false);

,false --> , false (space after comma is recommended)

2)
+ elog(DEBUG1, "logical replication slot %s created with option
allow_overwrite to %s",
+ NameStr(slot->data.name), slot->data.allow_overwrite ? "true" : "false");

IMO, we don't need this as we do not log other properties of the slot as well.

3)
We can have pg_replication_slots (pg_get_replication_slots) modified
to display the new property. Also we can modify docs to have the new
property defined.

4)
+ {
+ /* Check if we need to overwrite an existing logical slot */
+ if (allow_overwrite)
+ {
+ /* Get rid of a replication slot that is no longer wanted */
+ ReplicationSlotDrop(remote_slot->name,true);
+
+ /* Get rid of a replication slot that is no longer wanted */
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("slot \"%s\" already exists"
+ " on the standby but it will be dropped because "
+ "flag allow_overwrite is set to true",
+ remote_slot->name));
+
+ /* Going back to the main loop after droping the failover slot */
+ return false;

Currently we are returning after dropping the slot. But I think we
shall drop the slot and proceed with new slot creation of the same
name otherwise we may be left with old slot dropped and no new slot
created specially when API is run.

Scenario:
--On primary, create 2 slots: slot1 and slot2.
--On standby, create one slot slot1 with allow_overwrite=true.
--Run 'SELECT pg_sync_replication_slots();' on standby.

At the end of API, expectation is that both slots will be present with
'synced'=true, but only slot2 is present. if we run this API next
time, slot1 is created. It should have been dropped and recreated (or
say overwritten) in the first run itself.

5)
IMO, LOG is sufficient here, as the action aligns with the
user-provided 'allow_overwrite' setting.

thanks
Shveta

#19

fabrice636861@gmail.com

6 months ago

In reply to: shveta malik (#18)

Re: Issue with logical replication slot during switchover

Thanks Shveta for all your comments. I'll prepare a version 2 of the patch
including your proposals.

Regards

Fabrice

On Thu, 11 Sep 2025, 07:42 shveta malik <shveta.malik@gmail.com> wrote:

Show quoted text

On Sun, Sep 7, 2025 at 1:32 PM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

Hi,
Here the version 1 of the patch with the modifications.
I do a git diff --check until there are no more errors.
In a next version of the patch, I think I have make change to call

ReplicationSlotCreate() function with the value of the flag
allow_overwrite be consistent

In which flow? I see all ReplicationSlotCreate() references already
accepting the value.

and modify the interface ReplicationSlotCreate to display the attribute

allow_overwrite.

Do you mean to modify pg_replication_slots?

Thank You for the patch. Please find a few comments:

1)
In both create_physical_replication_slot() and
create_physical_replication_slot():
+ false, false,false);

,false --> , false (space after comma is recommended)
2)
+ elog(DEBUG1, "logical replication slot %s created with option
allow_overwrite to %s",
+ NameStr(slot->data.name), slot->data.allow_overwrite ? "true" :
"false");
IMO, we don't need this as we do not log other properties of the slot as
well.

3)
We can have pg_replication_slots (pg_get_replication_slots) modified
to display the new property. Also we can modify docs to have the new
property defined.
4)
+ {
+ /* Check if we need to overwrite an existing logical slot */
+ if (allow_overwrite)
+ {
+ /* Get rid of a replication slot that is no longer wanted */
+ ReplicationSlotDrop(remote_slot->name,true);
+
+ /* Get rid of a replication slot that is no longer wanted */
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("slot \"%s\" already exists"
+ " on the standby but it will be dropped because "
+ "flag allow_overwrite is set to true",
+ remote_slot->name));
+
+ /* Going back to the main loop after droping the failover slot */
+ return false;
Currently we are returning after dropping the slot. But I think we
shall drop the slot and proceed with new slot creation of the same
name otherwise we may be left with old slot dropped and no new slot
created specially when API is run.

Scenario:
--On primary, create 2 slots: slot1 and slot2.
--On standby, create one slot slot1 with allow_overwrite=true.
--Run 'SELECT pg_sync_replication_slots();' on standby.

At the end of API, expectation is that both slots will be present with
'synced'=true, but only slot2 is present. if we run this API next
time, slot1 is created. It should have been dropped and recreated (or
say overwritten) in the first run itself.

5)
IMO, LOG is sufficient here, as the action aligns with the
user-provided 'allow_overwrite' setting.

thanks
Shveta

#20

fabrice636861@gmail.com

5 months ago

In reply to: Fabrice Chapuis (#19)

Re: Issue with logical replication slot during switchover

Hi Shveta,

Here is the version 1 of the patch with corrections

1)
In both create_physical_replication_slot() and
create_physical_replication_slot():
+ false, false,false);

,false --> , false (space after comma is recommended)

2)
+ elog(DEBUG1, "logical replication slot %s created with option
allow_overwrite to %s",
+ NameStr(slot->data.name), slot->data.allow_overwrite ? "true" : "false");

IMO, we don't need this as we do not log other properties of the slot as
well.
Point 1-2 done
3)
We can have pg_replication_slots (pg_get_replication_slots) modified
to display the new property. Also we can modify docs to have the new
property defined.

Point 3 Not yet implemented

4)
+ {
+ /* Check if we need to overwrite an existing logical slot */
+ if (allow_overwrite)
+ {
+ /* Get rid of a replication slot that is no longer wanted */
+ ReplicationSlotDrop(remote_slot->name,true);
+
+ /* Get rid of a replication slot that is no longer wanted */
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("slot \"%s\" already exists"
+ " on the standby but it will be dropped because "
+ "flag allow_overwrite is set to true",
+ remote_slot->name));
+
+ /* Going back to the main loop after droping the failover slot */
+ return false;

Scenario:
--On primary, create 2 slots: slot1 and slot2.
--On standby, create one slot slot1 with allow_overwrite=true.
--Run 'SELECT pg_sync_replication_slots();' on standby.

Point 4:
I put the creation slot under the allow_overwrite condition.
After testing with syn_standby_slots disable on the standby, it works. I
think the code for the synchronisation of the new slot could be factorised.

5)
IMO, LOG is sufficient here, as the action aligns with the
user-provided 'allow_overwrite' setting.

Point 5
Where is it in the code?

Best regards,

Fabrice

On Fri, Sep 12, 2025 at 2:44 PM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

Show quoted text

Thanks Shveta for all your comments. I'll prepare a version 2 of the patch
including your proposals.

Regards

Fabrice

On Thu, 11 Sep 2025, 07:42 shveta malik <shveta.malik@gmail.com> wrote:
On Sun, Sep 7, 2025 at 1:32 PM Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

Hi,
Here the version 1 of the patch with the modifications.
I do a git diff --check until there are no more errors.
In a next version of the patch, I think I have make change to call

ReplicationSlotCreate() function with the value of the flag
allow_overwrite be consistent

In which flow? I see all ReplicationSlotCreate() references already
accepting the value.

and modify the interface ReplicationSlotCreate to display the attribute

allow_overwrite.

Do you mean to modify pg_replication_slots?

Thank You for the patch. Please find a few comments:

1)
In both create_physical_replication_slot() and
create_physical_replication_slot():
+ false, false,false);

,false --> , false (space after comma is recommended)
2)
+ elog(DEBUG1, "logical replication slot %s created with option
allow_overwrite to %s",
+ NameStr(slot->data.name), slot->data.allow_overwrite ? "true" :
"false");
IMO, we don't need this as we do not log other properties of the slot as
well.

3)
We can have pg_replication_slots (pg_get_replication_slots) modified
to display the new property. Also we can modify docs to have the new
property defined.
4)
+ {
+ /* Check if we need to overwrite an existing logical slot */
+ if (allow_overwrite)
+ {
+ /* Get rid of a replication slot that is no longer wanted */
+ ReplicationSlotDrop(remote_slot->name,true);
+
+ /* Get rid of a replication slot that is no longer wanted */
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("slot \"%s\" already exists"
+ " on the standby but it will be dropped because "
+ "flag allow_overwrite is set to true",
+ remote_slot->name));
+
+ /* Going back to the main loop after droping the failover slot */
+ return false;
Currently we are returning after dropping the slot. But I think we
shall drop the slot and proceed with new slot creation of the same
name otherwise we may be left with old slot dropped and no new slot
created specially when API is run.

Scenario:
--On primary, create 2 slots: slot1 and slot2.
--On standby, create one slot slot1 with allow_overwrite=true.
--Run 'SELECT pg_sync_replication_slots();' on standby.

At the end of API, expectation is that both slots will be present with
'synced'=true, but only slot2 is present. if we run this API next
time, slot1 is created. It should have been dropped and recreated (or
say overwritten) in the first run itself.

5)
IMO, LOG is sufficient here, as the action aligns with the
user-provided 'allow_overwrite' setting.

thanks
Shveta

#21

shveta.malik@gmail.com

5 months ago

In reply to: Fabrice Chapuis (#20)

#22

fabrice636861@gmail.com

5 months ago

In reply to: Fabrice Chapuis (#1)

#23

shveta.malik@gmail.com

5 months ago

In reply to: Fabrice Chapuis (#22)

#24

shveta.malik@gmail.com

5 months ago

In reply to: shveta malik (#23)

#25

fabrice636861@gmail.com

5 months ago

In reply to: shveta malik (#23)

#26

fabrice636861@gmail.com

5 months ago

In reply to: shveta malik (#24)

#27

fabrice636861@gmail.com

5 months ago

In reply to: shveta malik (#24)

#28

shveta.malik@gmail.com

5 months ago

In reply to: Fabrice Chapuis (#27)

#29

fabrice636861@gmail.com

5 months ago

In reply to: shveta malik (#28)

#30

shveta.malik@gmail.com

5 months ago

In reply to: Fabrice Chapuis (#29)

#31

fabrice636861@gmail.com

5 months ago

In reply to: shveta malik (#30)

#32

shveta.malik@gmail.com

5 months ago

In reply to: Fabrice Chapuis (#31)

#33

cyberdemn@gmail.com

4 months ago

In reply to: shveta malik (#32)

#34

shveta.malik@gmail.com

4 months ago

In reply to: Alexander Kukushkin (#33)

#35

fabrice636861@gmail.com

4 months ago

In reply to: shveta malik (#34)

#36

cyberdemn@gmail.com

4 months ago

In reply to: Fabrice Chapuis (#35)

#37

fabrice636861@gmail.com

4 months ago

In reply to: Alexander Kukushkin (#36)

#38

cyberdemn@gmail.com

4 months ago

In reply to: Fabrice Chapuis (#37)

#39

fabrice636861@gmail.com

4 months ago

In reply to: Alexander Kukushkin (#38)

#40

cyberdemn@gmail.com

4 months ago

In reply to: Fabrice Chapuis (#39)

#41

fabrice636861@gmail.com

4 months ago

In reply to: Alexander Kukushkin (#40)

#42

fabrice636861@gmail.com

4 months ago

In reply to: Fabrice Chapuis (#41)

#43

amit.kapila16@gmail.com

4 months ago

In reply to: Alexander Kukushkin (#36)

#44

fabrice636861@gmail.com

4 months ago

In reply to: Amit Kapila (#43)

#45

amit.kapila16@gmail.com

4 months ago

In reply to: Fabrice Chapuis (#44)

#46

cyberdemn@gmail.com

4 months ago

In reply to: Amit Kapila (#45)

#47

amit.kapila16@gmail.com

4 months ago

In reply to: Alexander Kukushkin (#46)

#48

cyberdemn@gmail.com

4 months ago

In reply to: Amit Kapila (#47)

#49

amit.kapila16@gmail.com

4 months ago

In reply to: Alexander Kukushkin (#48)

#50

cyberdemn@gmail.com

4 months ago

In reply to: Amit Kapila (#49)

#51

shveta.malik@gmail.com

4 months ago

In reply to: Alexander Kukushkin (#50)

#52

sawada.mshk@gmail.com

4 months ago

In reply to: shveta malik (#51)

#53

cyberdemn@gmail.com

4 months ago

In reply to: Masahiko Sawada (#52)

#54

sawada.mshk@gmail.com

4 months ago

In reply to: Alexander Kukushkin (#53)

#55

amit.kapila16@gmail.com

4 months ago

In reply to: Masahiko Sawada (#52)

#56

sawada.mshk@gmail.com

4 months ago

In reply to: Amit Kapila (#55)

#57

cyberdemn@gmail.com

4 months ago

In reply to: Masahiko Sawada (#56)

#58

amit.kapila16@gmail.com

4 months ago

In reply to: Masahiko Sawada (#56)

#59

sawada.mshk@gmail.com

4 months ago

In reply to: Amit Kapila (#58)

#60

amit.kapila16@gmail.com

4 months ago

In reply to: Masahiko Sawada (#59)

#61

fabrice636861@gmail.com

4 months ago

In reply to: Masahiko Sawada (#59)

#62

sawada.mshk@gmail.com

4 months ago

In reply to: Amit Kapila (#60)

#63

fabrice636861@gmail.com

4 months ago

In reply to: Masahiko Sawada (#62)

#64

sawada.mshk@gmail.com

4 months ago

In reply to: Fabrice Chapuis (#63)

#65

amit.kapila16@gmail.com

4 months ago

In reply to: Masahiko Sawada (#64)

#66