Improve pg_sync_replication_slots() to wait for primary to advance

Started by Ajin Cherian7 months ago145 messages

itsajin@gmail.com

7 months ago

1 attachment(s)

Hello,

Creating this thread for a POC based on discussions in thread [1]/messages/by-id/CAF1DzPWTcg+m+x+oVVB=y4q9=PYYsL_mujVp7uJr-_oUtWNGbA@mail.gmail.com.
Hou-san had created this patch, and I just cleaned up some documents,
did some testing and now sharing the patch here.

In this patch, the pg_sync_replication_slots() API now waits
indefinitely for the remote slot to catch up. We could later add a
timeout parameter to control maximum wait time if this approach seems
acceptable. If there are more ideas on improving this patch, let me
know.

regards,
Ajin Cherian
[1]: /messages/by-id/CAF1DzPWTcg+m+x+oVVB=y4q9=PYYsL_mujVp7uJr-_oUtWNGbA@mail.gmail.com

Attachments:

0001-Fix-stale-snapshot-issue.patchapplication/octet-stream; name=0001-Fix-stale-snapshot-issue.patchDownload

From 4be607984333427111af1bb289d7762ba4082537 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Fri, 20 Jun 2025 03:37:40 -0400
Subject: [PATCH] Fix stale snapshot issue

Make sure logical replication does not use snapshots that are built prior to becoming
consistent to avoid stale snapshots causing catalog lookup errors.
---
 src/backend/replication/logical/snapbuild.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index adf18c3..642c986 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -677,6 +677,19 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 									 builder->snapshot);
 	}
 
+	/*
+	 * Forget snapshots built before the SNAPBUILD_CONSISTENT state,
+	 * so that we rebuild them when we become consistent.
+	 */
+	if (builder->state == SNAPBUILD_FULL_SNAPSHOT)
+	{
+		if (builder->snapshot)
+		{
+			SnapBuildSnapDecRefcount(builder->snapshot);
+			builder->snapshot = NULL;
+		}
+	}
+
 	return true;
 }
 
-- 
1.8.3.1

Ajin Cherian

itsajin@gmail.com

7 months ago

In reply to: Ajin Cherian (#1)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Sorry, I attached the wrong file. Attaching the correct file now.

regards,
Ajin Cerian
Fujitsu Australia

Attachments:

0001-Improve-initial-slot-synchronization-in-pg_sync_repl.patchapplication/octet-stream; name=0001-Improve-initial-slot-synchronization-in-pg_sync_repl.patchDownload

From fc1f503e8cd6c68b17e71a09bd9c19aea9d85ba0 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 24 Jun 2025 06:23:22 -0400
Subject: [PATCH] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately when synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. We deliberately avoid retaining temporary
slot as with the slotsync worker, because we could not predict when (or if) the
SQL function might be executed again, and the creating session might persist
after promotion. Without automatic cleanup, this could lead to temporary slots
being retained for a longer time.
---
 doc/src/sgml/logicaldecoding.sgml               |  19 ---
 src/backend/replication/logical/slotsync.c      | 189 ++++++++++++++++++++++--
 src/backend/utils/activity/wait_event_names.txt |   1 +
 3 files changed, 180 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 5c5957e..a8c18f9 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -398,25 +398,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot may have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index f1dcbeb..c2a8e81 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -146,6 +146,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -550,6 +551,160 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
 }
 
 /*
+ * Wait for remote slot to pass locally reserved position.
+ *
+ * Return true if remote_slot could catch up with the locally reserved
+ * position. Return false in all other cases.
+ */
+static bool
+wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
+{
+#define SLOT_QUERY_COLUMN_COUNT 4
+
+	StringInfoData cmd;
+
+	Assert(!AmLogicalSlotSyncWorkerProcess());
+
+	ereport(LOG,
+			errmsg("waiting for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
+				   " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
+				   remote_slot->name,
+				   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+				   remote_slot->catalog_xmin,
+				   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+				   MyReplicationSlot->data.catalog_xmin));
+
+	initStringInfo(&cmd);
+	appendStringInfo(&cmd,
+					 "SELECT invalidation_reason IS NOT NULL, restart_lsn,"
+					 " confirmed_flush_lsn, catalog_xmin"
+					 " FROM pg_catalog.pg_replication_slots"
+					 " WHERE slot_name = %s",
+					 quote_literal_cstr(remote_slot->name));
+
+	for (;;)
+	{
+		bool		new_invalidated;
+		XLogRecPtr	new_restart_lsn;
+		XLogRecPtr	new_confirmed_lsn;
+		TransactionId new_catalog_xmin;
+		WalRcvExecResult *res;
+		TupleTableSlot *tupslot;
+		Datum		d;
+		int			rc;
+		int			col = 0;
+		bool		isnull;
+		Oid			slotRow[SLOT_QUERY_COLUMN_COUNT] = {BOOLOID, LSNOID, LSNOID, XIDOID};
+
+		/* Handle any termination request if any */
+		ProcessSlotSyncInterrupts(wrconn);
+
+		res = walrcv_exec(wrconn, cmd.data, SLOT_QUERY_COLUMN_COUNT, slotRow);
+
+		if (res->status != WALRCV_OK_TUPLES)
+			ereport(ERROR,
+					errmsg("could not fetch slot \"%s\" info from the"
+						   " primary server: %s",
+						   remote_slot->name, res->err));
+
+		tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+		if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
+		{
+			ereport(WARNING,
+					errmsg("aborting initial sync for slot \"%s\"",
+						   remote_slot->name),
+					errdetail("This slot was not found on the primary server."));
+
+			pfree(cmd.data);
+			walrcv_clear_result(res);
+
+			return false;
+		}
+
+		/*
+		 * It is possible to get null value for restart_lsn if the slot is
+		 * invalidated on the primary server, so handle accordingly.
+		 */
+		new_invalidated = DatumGetBool(slot_getattr(tupslot, ++col, &isnull));
+		Assert(!isnull);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		new_restart_lsn = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
+
+		if (new_invalidated || XLogRecPtrIsInvalid(new_restart_lsn))
+		{
+			/*
+			 * The slot won't be persisted by the caller; it will be cleaned up
+			 * at the end of synchronization.
+			 */
+			ereport(WARNING,
+					errmsg("aborting initial sync for slot \"%s\"",
+						   remote_slot->name),
+					errdetail("This slot was invalidated on the primary server."));
+
+			pfree(cmd.data);
+			ExecClearTuple(tupslot);
+			walrcv_clear_result(res);
+
+			return false;
+		}
+
+		/*
+		 * It is possible to get null values for confirmed_lsn and
+		 * catalog_xmin if on the primary server the slot is just created with
+		 * a valid restart_lsn and slot-sync worker has fetched the slot
+		 * before the primary server could set valid confirmed_lsn and
+		 * catalog_xmin.
+		 */
+		d = slot_getattr(tupslot, ++col, &isnull);
+		new_confirmed_lsn = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		new_catalog_xmin = isnull ? InvalidTransactionId : DatumGetTransactionId(d);
+
+		ExecClearTuple(tupslot);
+		walrcv_clear_result(res);
+
+		if (new_restart_lsn >= MyReplicationSlot->data.restart_lsn &&
+			!XLogRecPtrIsInvalid(new_confirmed_lsn) &&
+			TransactionIdFollowsOrEquals(new_catalog_xmin,
+										 MyReplicationSlot->data.catalog_xmin))
+		{
+			/* Update new values in remote_slot */
+			remote_slot->restart_lsn = new_restart_lsn;
+			remote_slot->confirmed_lsn = new_confirmed_lsn;
+			remote_slot->catalog_xmin = new_catalog_xmin;
+
+			ereport(LOG,
+					errmsg("wait over for remote slot \"%s\" as its LSN (%X/%X)"
+						   " and catalog xmin (%u) has now passed local slot LSN"
+						   " (%X/%X) and catalog xmin (%u)",
+						   remote_slot->name,
+						   LSN_FORMAT_ARGS(new_restart_lsn),
+						   new_catalog_xmin,
+						   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+						   MyReplicationSlot->data.catalog_xmin));
+
+			pfree(cmd.data);
+
+			return true;
+		}
+
+		/*
+		 * XXX: Is waiting for 2 seconds before retrying enough or more or
+		 * less?
+		 */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   2000L,
+					   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+		if (rc & WL_LATCH_SET)
+			ResetLatch(MyLatch);
+	}
+}
+
+/*
  * If the remote restart_lsn and catalog_xmin have caught up with the
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
@@ -558,7 +713,8 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(WalReceiverConn *wrconn,
+									 RemoteSlot *remote_slot, Oid remote_dbid)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -577,12 +733,22 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * For the slotsync worker, we do not drop the slot because the
+		 * restart_lsn can be ahead of the current location when recreating the
+		 * slot in the next cycle. It may take more time to create such a slot.
+		 * Therefore, we keep this slot and attempt the synchronization in the
+		 * next cycle.
+		 *
+		 * For SQL API synchronization, we wait for the remote slot to catch up
+		 * rather than leaving temporary slots. This is because we could not
+		 * predict when (or if) the SQL function might be executed again, and
+		 * the creating session might persist after promotion. Without
+		 * automatic cleanup, this could lead to temporary slots being retained
+		 * for a longer time.
 		 */
-		return false;
+		if (AmLogicalSlotSyncWorkerProcess() ||
+			!wait_for_primary_slot_catchup(wrconn, remote_slot))
+			return false;
 	}
 
 	/*
@@ -622,7 +788,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
+					 Oid remote_dbid)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +882,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/* Slot not ready yet, let's attempt to make it sync-ready now. */
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
-			slot_updated = update_and_persist_local_synced_slot(remote_slot,
+			slot_updated = update_and_persist_local_synced_slot(wrconn,
+																remote_slot,
 																remote_dbid);
 		}
 
@@ -785,7 +953,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(wrconn, remote_slot, remote_dbid);
 
 		slot_updated = true;
 	}
@@ -927,7 +1095,8 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(wrconn, remote_slot,
+												  remote_dbid);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4da6831..ba82cc1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -64,6 +64,7 @@ LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication paralle
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
 REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
1.8.3.1

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: Ajin Cherian (#1)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Jun 24, 2025 at 4:11 PM Ajin Cherian <itsajin@gmail.com> wrote:

Hello,

Creating this thread for a POC based on discussions in thread [1].
Hou-san had created this patch, and I just cleaned up some documents,
did some testing and now sharing the patch here.

In this patch, the pg_sync_replication_slots() API now waits
indefinitely for the remote slot to catch up. We could later add a
timeout parameter to control maximum wait time if this approach seems
acceptable. If there are more ideas on improving this patch, let me
know.

+1 on the idea.
I believe the timeout option may not be necessary here, since the API
can be manually canceled if needed. Otherwise, the recommended
approach is to let it complete. But I would like to know what others
think here.

Few comments:

1)
When the API is waiting for the primary to advance, standby fails to
handle promotion requests. Promotion fails:
./pg_ctl -D ../../standbydb/ promote -w
waiting for server to promote.................stopped waiting
pg_ctl: server did not promote in time

See the logs at [1]Log file: 2025-07-02 14:38:09.851 IST [153187] LOG: waiting for remote slot "failover_slot" LSN (0/3003F60) and catalog xmin (754) to pass local slot LSN (0/3003F60) and catalog xmin (767) 2025-07-02 14:38:09.851 IST [153187] STATEMENT: SELECT pg_sync_replication_slots(); 2025-07-02 14:41:36.200 IST [153164] LOG: received promote request

2)
Also when the API is waiting for a long time, it just dumps the
'waiting for remote_slot..' LOG only once. Do you think it makes sense
to log it at a regular interval until the wait is over? See logs at
[1]: Log file: 2025-07-02 14:38:09.851 IST [153187] LOG: waiting for remote slot "failover_slot" LSN (0/3003F60) and catalog xmin (754) to pass local slot LSN (0/3003F60) and catalog xmin (767) 2025-07-02 14:38:09.851 IST [153187] STATEMENT: SELECT pg_sync_replication_slots(); 2025-07-02 14:41:36.200 IST [153164] LOG: received promote request

3)
+ /*
+ * It is possible to get null value for restart_lsn if the slot is
+ * invalidated on the primary server, so handle accordingly.
+ */

+ if (new_invalidated || XLogRecPtrIsInvalid(new_restart_lsn))
+ {
+ /*
+ * The slot won't be persisted by the caller; it will be cleaned up
+ * at the end of synchronization.
+ */
+ ereport(WARNING,
+ errmsg("aborting initial sync for slot \"%s\"",
+    remote_slot->name),
+ errdetail("This slot was invalidated on the primary server."));

Which case are we referring to here where null restart_lsn would mean
invalidation? Can you please point me to such code where it happens or
a test-case which does that. I tried a few invalidation cases, but did
not hit it.

[1]: Log file: 2025-07-02 14:38:09.851 IST [153187] LOG: waiting for remote slot "failover_slot" LSN (0/3003F60) and catalog xmin (754) to pass local slot LSN (0/3003F60) and catalog xmin (767) 2025-07-02 14:38:09.851 IST [153187] STATEMENT: SELECT pg_sync_replication_slots(); 2025-07-02 14:41:36.200 IST [153164] LOG: received promote request
Log file:
2025-07-02 14:38:09.851 IST [153187] LOG: waiting for remote slot
"failover_slot" LSN (0/3003F60) and catalog xmin (754) to pass local
slot LSN (0/3003F60) and catalog xmin (767)
2025-07-02 14:38:09.851 IST [153187] STATEMENT: SELECT
pg_sync_replication_slots();
2025-07-02 14:41:36.200 IST [153164] LOG: received promote request

thanks
Shveta

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: shveta malik (#3)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Please find few more comments:

1)
In pg_sync_replication_slots() doc, we have this:

"Note that this function is primarily intended for testing and
debugging purposes and should be used with caution. Additionally, this
function cannot be executed if ...."

We can get rid of this info as well and change to:

"Note that this function cannot be executed if...."

2)
We got rid of NOTE in logicaldecoding.sgml, but now the page does not
mention pg_sync_replication_slots() at all. We need to bring back the
change removed by [1]/messages/by-id/CAJpy0uAD_La2vi+B+iSBbCYTMayMstvbF9ndrAJysL9t5fHtbQ@mail.gmail.com (or something on similar line) which is this:

-     <command>CREATE SUBSCRIPTION</command> during slot creation, and
then calling
-     <link linkend="pg-sync-replication-slots">
-     <function>pg_sync_replication_slots</function></link>
-     on the standby. By setting <link linkend="guc-sync-replication-slots">
+     <command>CREATE SUBSCRIPTION</command> during slot creation.
+     Additionally, enabling <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby
+     is required. By enabling <link linkend="guc-sync-replication-slots">

3)
wait_for_primary_slot_catchup():
+ /*
+ * It is possible to get null values for confirmed_lsn and
+ * catalog_xmin if on the primary server the slot is just created with
+ * a valid restart_lsn and slot-sync worker has fetched the slot
+ * before the primary server could set valid confirmed_lsn and
+ * catalog_xmin.
+ */

Do we need this special handling? We already have one such handling in
synchronize_slots(). please see:
/*
* If restart_lsn, confirmed_lsn or catalog_xmin is
invalid but the
* slot is valid, that means we have fetched the
remote_slot in its
* RS_EPHEMERAL state. In such a case, don't sync it;
we can always
* sync it in the next sync cycle when the remote_slot
is persisted
* and has valid lsn(s) and xmin values.
*/
if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
!TransactionIdIsValid(remote_slot->catalog_xmin)) &&
remote_slot->invalidated == RS_INVAL_NONE)
pfree(remote_slot);

Due to the above check in synchronize_slots(), we will not reach
wait_for_primary_slot_catchup() when any of confirmed_lsn or
catalog_xmin is not initialized.

[1]: /messages/by-id/CAJpy0uAD_La2vi+B+iSBbCYTMayMstvbF9ndrAJysL9t5fHtbQ@mail.gmail.com

thanks
Shveta

Ajin Cherian

itsajin@gmail.com

6 months ago

In reply to: shveta malik (#4)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Jul 2, 2025 at 7:56 PM shveta malik <shveta.malik@gmail.com> wrote:

Few comments:

1)
When the API is waiting for the primary to advance, standby fails to
handle promotion requests. Promotion fails:
./pg_ctl -D ../../standbydb/ promote -w
waiting for server to promote.................stopped waiting
pg_ctl: server did not promote in time

See the logs at [1]

I've modified this to handle promotion request and stop slot
synchronization if standby is promoted.

2)
Also when the API is waiting for a long time, it just dumps the
'waiting for remote_slot..' LOG only once. Do you think it makes sense
to log it at a regular interval until the wait is over? See logs at
[1]. It dumped the log once in 3minutes.

I've modified it to log once every 10 seconds.

3)
+ /*
+ * It is possible to get null value for restart_lsn if the slot is
+ * invalidated on the primary server, so handle accordingly.
+ */
+ if (new_invalidated || XLogRecPtrIsInvalid(new_restart_lsn))
+ {
+ /*
+ * The slot won't be persisted by the caller; it will be cleaned up
+ * at the end of synchronization.
+ */
+ ereport(WARNING,
+ errmsg("aborting initial sync for slot \"%s\"",
+    remote_slot->name),
+ errdetail("This slot was invalidated on the primary server."));
Which case are we referring to here where null restart_lsn would mean
invalidation? Can you please point me to such code where it happens or
a test-case which does that. I tried a few invalidation cases, but did
not hit it.

I've removed all this code as the checks for null restart_lsn and
other parameters are handled in earlier functions and we won't get
here if these had null, I've added asserts to check that it is not
null.

On Wed, Jul 9, 2025 at 2:53 PM shveta malik <shveta.malik@gmail.com> wrote:

Please find few more comments:

1)
In pg_sync_replication_slots() doc, we have this:

"Note that this function is primarily intended for testing and
debugging purposes and should be used with caution. Additionally, this
function cannot be executed if ...."

We can get rid of this info as well and change to:

"Note that this function cannot be executed if...."

Modified as requested.

2)
We got rid of NOTE in logicaldecoding.sgml, but now the page does not
mention pg_sync_replication_slots() at all. We need to bring back the
change removed by [1] (or something on similar line) which is this:

-     <command>CREATE SUBSCRIPTION</command> during slot creation, and
then calling
-     <link linkend="pg-sync-replication-slots">
-     <function>pg_sync_replication_slots</function></link>
-     on the standby. By setting <link linkend="guc-sync-replication-slots">
+     <command>CREATE SUBSCRIPTION</command> during slot creation.
+     Additionally, enabling <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby
+     is required. By enabling <link linkend="guc-sync-replication-slots">

I've added that back.

3)
wait_for_primary_slot_catchup():
+ /*
+ * It is possible to get null values for confirmed_lsn and
+ * catalog_xmin if on the primary server the slot is just created with
+ * a valid restart_lsn and slot-sync worker has fetched the slot
+ * before the primary server could set valid confirmed_lsn and
+ * catalog_xmin.
+ */
Do we need this special handling? We already have one such handling in
synchronize_slots(). please see:
/*
* If restart_lsn, confirmed_lsn or catalog_xmin is
invalid but the
* slot is valid, that means we have fetched the
remote_slot in its
* RS_EPHEMERAL state. In such a case, don't sync it;
we can always
* sync it in the next sync cycle when the remote_slot
is persisted
* and has valid lsn(s) and xmin values.
*/
if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
!TransactionIdIsValid(remote_slot->catalog_xmin)) &&
remote_slot->invalidated == RS_INVAL_NONE)
pfree(remote_slot);

Due to the above check in synchronize_slots(), we will not reach
wait_for_primary_slot_catchup() when any of confirmed_lsn or
catalog_xmin is not initialized.

Yes, you are correct. I've removed all that logic.

The modified patch (v2) is attached.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v2-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchapplication/octet-stream; name=v2-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchDownload

From c8fb374d71ef9e19edfb3515051428904ba8fd86 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 16 Jul 2025 05:19:14 -0400
Subject: [PATCH v2] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func.sgml                     |   4 +-
 doc/src/sgml/logicaldecoding.sgml          |   5 +-
 src/backend/replication/logical/slotsync.c | 113 +++++++++++++++--------------
 3 files changed, 65 insertions(+), 57 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a6d7976..5e5e6a9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29980,9 +29980,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionaly,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index a8c18f9..2e4d2fa 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,7 +370,10 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
+     <command>CREATE SUBSCRIPTION</command> during slot creation, and then
+     calling <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby.
      Additionally, enabling <link linkend="guc-sync-replication-slots">
      <varname>sync_replication_slots</varname></link> on the standby
      is required. By enabling <link linkend="guc-sync-replication-slots">
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index c2a8e81..ca71727 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -559,9 +559,10 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
 static bool
 wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
 {
-#define SLOT_QUERY_COLUMN_COUNT 4
+#define SLOT_QUERY_COLUMN_COUNT 3
 
 	StringInfoData cmd;
+	int			   wait_iterations = 0;
 
 	Assert(!AmLogicalSlotSyncWorkerProcess());
 
@@ -576,7 +577,7 @@ wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
 
 	initStringInfo(&cmd);
 	appendStringInfo(&cmd,
-					 "SELECT invalidation_reason IS NOT NULL, restart_lsn,"
+					 "SELECT restart_lsn,"
 					 " confirmed_flush_lsn, catalog_xmin"
 					 " FROM pg_catalog.pg_replication_slots"
 					 " WHERE slot_name = %s",
@@ -584,7 +585,6 @@ wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
 
 	for (;;)
 	{
-		bool		new_invalidated;
 		XLogRecPtr	new_restart_lsn;
 		XLogRecPtr	new_confirmed_lsn;
 		TransactionId new_catalog_xmin;
@@ -594,7 +594,7 @@ wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
 		int			rc;
 		int			col = 0;
 		bool		isnull;
-		Oid			slotRow[SLOT_QUERY_COLUMN_COUNT] = {BOOLOID, LSNOID, LSNOID, XIDOID};
+		Oid			slotRow[SLOT_QUERY_COLUMN_COUNT] = {LSNOID, LSNOID, XIDOID};
 
 		/* Handle any termination request if any */
 		ProcessSlotSyncInterrupts(wrconn);
@@ -621,52 +621,23 @@ wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
 			return false;
 		}
 
-		/*
-		 * It is possible to get null value for restart_lsn if the slot is
-		 * invalidated on the primary server, so handle accordingly.
-		 */
-		new_invalidated = DatumGetBool(slot_getattr(tupslot, ++col, &isnull));
-		Assert(!isnull);
-
+		/* Any slot with NULL in these fields should not have made it this far */
 		d = slot_getattr(tupslot, ++col, &isnull);
-		new_restart_lsn = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
-
-		if (new_invalidated || XLogRecPtrIsInvalid(new_restart_lsn))
-		{
-			/*
-			 * The slot won't be persisted by the caller; it will be cleaned up
-			 * at the end of synchronization.
-			 */
-			ereport(WARNING,
-					errmsg("aborting initial sync for slot \"%s\"",
-						   remote_slot->name),
-					errdetail("This slot was invalidated on the primary server."));
-
-			pfree(cmd.data);
-			ExecClearTuple(tupslot);
-			walrcv_clear_result(res);
-
-			return false;
-		}
+		Assert(!isnull);
+		new_restart_lsn = DatumGetLSN(d);
 
-		/*
-		 * It is possible to get null values for confirmed_lsn and
-		 * catalog_xmin if on the primary server the slot is just created with
-		 * a valid restart_lsn and slot-sync worker has fetched the slot
-		 * before the primary server could set valid confirmed_lsn and
-		 * catalog_xmin.
-		 */
 		d = slot_getattr(tupslot, ++col, &isnull);
-		new_confirmed_lsn = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
+		Assert(!isnull);
+		new_confirmed_lsn = DatumGetLSN(d);
 
 		d = slot_getattr(tupslot, ++col, &isnull);
-		new_catalog_xmin = isnull ? InvalidTransactionId : DatumGetTransactionId(d);
+		Assert(!isnull);
+		new_catalog_xmin = DatumGetTransactionId(d);
 
 		ExecClearTuple(tupslot);
 		walrcv_clear_result(res);
 
 		if (new_restart_lsn >= MyReplicationSlot->data.restart_lsn &&
-			!XLogRecPtrIsInvalid(new_confirmed_lsn) &&
 			TransactionIdFollowsOrEquals(new_catalog_xmin,
 										 MyReplicationSlot->data.catalog_xmin))
 		{
@@ -691,6 +662,22 @@ wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
 		}
 
 		/*
+		 * If in SQL API synchronization, and we've been  promoted, then no point
+		 * continuing.
+		 */
+		if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered())
+		{
+			ereport(WARNING,
+					errmsg("aborting sync for slot \"%s\"",
+							remote_slot->name),
+					errdetail("Promotion occurred before this slot was fully"
+							  " synchronized."));
+			pfree(cmd.data);
+
+			return false;
+		}
+
+		/*
 		 * XXX: Is waiting for 2 seconds before retrying enough or more or
 		 * less?
 		 */
@@ -701,6 +688,20 @@ wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
 
 		if (rc & WL_LATCH_SET)
 			ResetLatch(MyLatch);
+
+		/* log a message every ten seconds */
+		wait_iterations++;
+		if (wait_iterations % 5 == 0)
+		{
+			ereport(LOG,
+					errmsg("continuing to wait for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
+						   " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
+						   remote_slot->name,
+						   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+						   remote_slot->catalog_xmin,
+						   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+						   MyReplicationSlot->data.catalog_xmin));
+		}
 	}
 }
 
@@ -733,22 +734,28 @@ update_and_persist_local_synced_slot(WalReceiverConn *wrconn,
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * For the slotsync worker, we do not drop the slot because the
-		 * restart_lsn can be ahead of the current location when recreating the
-		 * slot in the next cycle. It may take more time to create such a slot.
-		 * Therefore, we keep this slot and attempt the synchronization in the
-		 * next cycle.
-		 *
+		 * If we're in the slotsync worker, we retain the slot and retry in the
+		 * next cycle. The restart_lsn might advance by then, allowing the slot
+		 * to be created successfully later.
+		 */
+		if (AmLogicalSlotSyncWorkerProcess())
+			return false;
+
+		/*
 		 * For SQL API synchronization, we wait for the remote slot to catch up
-		 * rather than leaving temporary slots. This is because we could not
-		 * predict when (or if) the SQL function might be executed again, and
-		 * the creating session might persist after promotion. Without
-		 * automatic cleanup, this could lead to temporary slots being retained
-		 * for a longer time.
+		 * here, since we can't assume the SQL API will be called again soon.
+		 * We will retry the sync once the slot catches up.
+		 *
+		 * Note: This will return false if a promotion is triggered on the
+		 * standby while waiting, in which case we stop syncing and drop the
+		 * temporary slot.
 		 */
-		if (AmLogicalSlotSyncWorkerProcess() ||
-			!wait_for_primary_slot_catchup(wrconn, remote_slot))
+		if (!wait_for_primary_slot_catchup(wrconn, remote_slot))
 			return false;
+		else
+			update_local_synced_slot(remote_slot, remote_dbid,
+									 &found_consistent_snapshot,
+									 &remote_slot_precedes);
 	}
 
 	/*
-- 
1.8.3.1

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: Ajin Cherian (#5)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Jul 16, 2025 at 3:00 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Jul 2, 2025 at 7:56 PM shveta malik <shveta.malik@gmail.com> wrote:

Few comments:

1)
When the API is waiting for the primary to advance, standby fails to
handle promotion requests. Promotion fails:
./pg_ctl -D ../../standbydb/ promote -w
waiting for server to promote.................stopped waiting
pg_ctl: server did not promote in time

See the logs at [1]

I've modified this to handle promotion request and stop slot
synchronization if standby is promoted.

2)
Also when the API is waiting for a long time, it just dumps the
'waiting for remote_slot..' LOG only once. Do you think it makes sense
to log it at a regular interval until the wait is over? See logs at
[1]. It dumped the log once in 3minutes.

I've modified it to log once every 10 seconds.
3)
+ /*
+ * It is possible to get null value for restart_lsn if the slot is
+ * invalidated on the primary server, so handle accordingly.
+ */
+ if (new_invalidated || XLogRecPtrIsInvalid(new_restart_lsn))
+ {
+ /*
+ * The slot won't be persisted by the caller; it will be cleaned up
+ * at the end of synchronization.
+ */
+ ereport(WARNING,
+ errmsg("aborting initial sync for slot \"%s\"",
+    remote_slot->name),
+ errdetail("This slot was invalidated on the primary server."));
Which case are we referring to here where null restart_lsn would mean
invalidation? Can you please point me to such code where it happens or
a test-case which does that. I tried a few invalidation cases, but did
not hit it.
I've removed all this code as the checks for null restart_lsn and
other parameters are handled in earlier functions and we won't get
here if these had null, I've added asserts to check that it is not
null.

On Wed, Jul 9, 2025 at 2:53 PM shveta malik <shveta.malik@gmail.com> wrote:

Please find few more comments:

1)
In pg_sync_replication_slots() doc, we have this:

"Note that this function is primarily intended for testing and
debugging purposes and should be used with caution. Additionally, this
function cannot be executed if ...."

We can get rid of this info as well and change to:

"Note that this function cannot be executed if...."

Modified as requested.
2)
We got rid of NOTE in logicaldecoding.sgml, but now the page does not
mention pg_sync_replication_slots() at all. We need to bring back the
change removed by [1] (or something on similar line) which is this:
-     <command>CREATE SUBSCRIPTION</command> during slot creation, and
then calling
-     <link linkend="pg-sync-replication-slots">
-     <function>pg_sync_replication_slots</function></link>
-     on the standby. By setting <link linkend="guc-sync-replication-slots">
+     <command>CREATE SUBSCRIPTION</command> during slot creation.
+     Additionally, enabling <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby
+     is required. By enabling <link linkend="guc-sync-replication-slots">
I've added that back.
3)
wait_for_primary_slot_catchup():
+ /*
+ * It is possible to get null values for confirmed_lsn and
+ * catalog_xmin if on the primary server the slot is just created with
+ * a valid restart_lsn and slot-sync worker has fetched the slot
+ * before the primary server could set valid confirmed_lsn and
+ * catalog_xmin.
+ */
Do we need this special handling? We already have one such handling in
synchronize_slots(). please see:
/*
* If restart_lsn, confirmed_lsn or catalog_xmin is
invalid but the
* slot is valid, that means we have fetched the
remote_slot in its
* RS_EPHEMERAL state. In such a case, don't sync it;
we can always
* sync it in the next sync cycle when the remote_slot
is persisted
* and has valid lsn(s) and xmin values.
*/
if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
!TransactionIdIsValid(remote_slot->catalog_xmin)) &&
remote_slot->invalidated == RS_INVAL_NONE)
pfree(remote_slot);

Due to the above check in synchronize_slots(), we will not reach
wait_for_primary_slot_catchup() when any of confirmed_lsn or
catalog_xmin is not initialized.
Yes, you are correct. I've removed all that logic.

The modified patch (v2) is attached.

I am not able to apply the patch to the latest head or even to a week
back version. Can you please check and rebase?

thanks
Shveta

Ajin Cherian

itsajin@gmail.com

6 months ago

In reply to: shveta malik (#6)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

I am not able to apply the patch to the latest head or even to a week
back version. Can you please check and rebase?

thanks
Shveta

Rebased.

Regards,
Ajin Cherian
Fujitsu Australia.

Attachments:

v2-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchapplication/octet-stream; name=v2-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchDownload

From 676ffaba23a6c35e4ec63db1dda54373687c5d50 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 16 Jul 2025 06:11:51 -0400
Subject: [PATCH v2] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func.sgml                          |   4 +-
 doc/src/sgml/logicaldecoding.sgml               |  24 +--
 src/backend/replication/logical/slotsync.c      | 196 ++++++++++++++++++++++--
 src/backend/utils/activity/wait_event_names.txt |   1 +
 4 files changed, 192 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f5a0e09..53f97a0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -30023,9 +30023,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 593f784..d299ca3 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,7 +370,10 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
+     <command>CREATE SUBSCRIPTION</command> during slot creation, and then
+     calling <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby.
      Additionally, enabling <link linkend="guc-sync-replication-slots">
      <varname>sync_replication_slots</varname></link> on the standby
      is required. By enabling <link linkend="guc-sync-replication-slots">
@@ -398,25 +401,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot may have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 2f0c08b..4604fda 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -146,6 +146,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -550,6 +551,161 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
 }
 
 /*
+ * Wait for remote slot to pass locally reserved position.
+ *
+ * Return true if remote_slot could catch up with the locally reserved
+ * position. Return false in all other cases.
+ */
+static bool
+wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
+{
+#define SLOT_QUERY_COLUMN_COUNT 3
+
+	StringInfoData cmd;
+	int			   wait_iterations = 0;
+
+	Assert(!AmLogicalSlotSyncWorkerProcess());
+
+	ereport(LOG,
+			errmsg("waiting for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
+				   " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
+				   remote_slot->name,
+				   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+				   remote_slot->catalog_xmin,
+				   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+				   MyReplicationSlot->data.catalog_xmin));
+
+	initStringInfo(&cmd);
+	appendStringInfo(&cmd,
+					 "SELECT restart_lsn,"
+					 " confirmed_flush_lsn, catalog_xmin"
+					 " FROM pg_catalog.pg_replication_slots"
+					 " WHERE slot_name = %s",
+					 quote_literal_cstr(remote_slot->name));
+
+	for (;;)
+	{
+		XLogRecPtr	new_restart_lsn;
+		XLogRecPtr	new_confirmed_lsn;
+		TransactionId new_catalog_xmin;
+		WalRcvExecResult *res;
+		TupleTableSlot *tupslot;
+		Datum		d;
+		int			rc;
+		int			col = 0;
+		bool		isnull;
+		Oid			slotRow[SLOT_QUERY_COLUMN_COUNT] = {LSNOID, LSNOID, XIDOID};
+
+		/* Handle any termination request if any */
+		ProcessSlotSyncInterrupts(wrconn);
+
+		res = walrcv_exec(wrconn, cmd.data, SLOT_QUERY_COLUMN_COUNT, slotRow);
+
+		if (res->status != WALRCV_OK_TUPLES)
+			ereport(ERROR,
+					errmsg("could not fetch slot \"%s\" info from the"
+						   " primary server: %s",
+						   remote_slot->name, res->err));
+
+		tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+		if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
+		{
+			ereport(WARNING,
+					errmsg("aborting initial sync for slot \"%s\"",
+						   remote_slot->name),
+					errdetail("This slot was not found on the primary server."));
+
+			pfree(cmd.data);
+			walrcv_clear_result(res);
+
+			return false;
+		}
+
+		/* Any slot with NULL in these fields should not have made it this far */
+		d = slot_getattr(tupslot, ++col, &isnull);
+		Assert(!isnull);
+		new_restart_lsn = DatumGetLSN(d);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		Assert(!isnull);
+		new_confirmed_lsn = DatumGetLSN(d);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		Assert(!isnull);
+		new_catalog_xmin = DatumGetTransactionId(d);
+
+		ExecClearTuple(tupslot);
+		walrcv_clear_result(res);
+
+		if (new_restart_lsn >= MyReplicationSlot->data.restart_lsn &&
+			TransactionIdFollowsOrEquals(new_catalog_xmin,
+										 MyReplicationSlot->data.catalog_xmin))
+		{
+			/* Update new values in remote_slot */
+			remote_slot->restart_lsn = new_restart_lsn;
+			remote_slot->confirmed_lsn = new_confirmed_lsn;
+			remote_slot->catalog_xmin = new_catalog_xmin;
+
+			ereport(LOG,
+					errmsg("wait over for remote slot \"%s\" as its LSN (%X/%X)"
+						   " and catalog xmin (%u) has now passed local slot LSN"
+						   " (%X/%X) and catalog xmin (%u)",
+						   remote_slot->name,
+						   LSN_FORMAT_ARGS(new_restart_lsn),
+						   new_catalog_xmin,
+						   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+						   MyReplicationSlot->data.catalog_xmin));
+
+			pfree(cmd.data);
+
+			return true;
+		}
+
+		/*
+		 * If in SQL API synchronization, and we've been  promoted, then no point
+		 * continuing.
+		 */
+		if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered())
+		{
+			ereport(WARNING,
+					errmsg("aborting sync for slot \"%s\"",
+							remote_slot->name),
+					errdetail("Promotion occurred before this slot was fully"
+							  " synchronized."));
+			pfree(cmd.data);
+
+			return false;
+		}
+
+		/*
+		 * XXX: Is waiting for 2 seconds before retrying enough or more or
+		 * less?
+		 */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   2000L,
+					   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+		if (rc & WL_LATCH_SET)
+			ResetLatch(MyLatch);
+
+		/* log a message every ten seconds */
+		wait_iterations++;
+		if (wait_iterations % 5 == 0)
+		{
+			ereport(LOG,
+					errmsg("continuing to wait for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
+						   " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
+						   remote_slot->name,
+						   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+						   remote_slot->catalog_xmin,
+						   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+						   MyReplicationSlot->data.catalog_xmin));
+		}
+	}
+}
+
+/*
  * If the remote restart_lsn and catalog_xmin have caught up with the
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
@@ -558,7 +714,8 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(WalReceiverConn *wrconn,
+									 RemoteSlot *remote_slot, Oid remote_dbid)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -577,12 +734,28 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * If we're in the slotsync worker, we retain the slot and retry in the
+		 * next cycle. The restart_lsn might advance by then, allowing the slot
+		 * to be created successfully later.
 		 */
-		return false;
+		if (AmLogicalSlotSyncWorkerProcess())
+			return false;
+
+		/*
+		 * For SQL API synchronization, we wait for the remote slot to catch up
+		 * here, since we can't assume the SQL API will be called again soon.
+		 * We will retry the sync once the slot catches up.
+		 *
+		 * Note: This will return false if a promotion is triggered on the
+		 * standby while waiting, in which case we stop syncing and drop the
+		 * temporary slot.
+		 */
+		if (!wait_for_primary_slot_catchup(wrconn, remote_slot))
+			return false;
+		else
+			update_local_synced_slot(remote_slot, remote_dbid,
+									 &found_consistent_snapshot,
+									 &remote_slot_precedes);
 	}
 
 	/*
@@ -622,7 +795,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
+					 Oid remote_dbid)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +889,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/* Slot not ready yet, let's attempt to make it sync-ready now. */
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
-			slot_updated = update_and_persist_local_synced_slot(remote_slot,
+			slot_updated = update_and_persist_local_synced_slot(wrconn,
+																remote_slot,
 																remote_dbid);
 		}
 
@@ -785,7 +960,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(wrconn, remote_slot, remote_dbid);
 
 		slot_updated = true;
 	}
@@ -927,7 +1102,8 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(wrconn, remote_slot,
+												  remote_dbid);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4da6831..ba82cc1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -64,6 +64,7 @@ LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication paralle
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
 REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
1.8.3.1

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: Ajin Cherian (#7)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Jul 16, 2025 at 3:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I am not able to apply the patch to the latest head or even to a week
back version. Can you please check and rebase?

thanks
Shveta

Rebased.

Thanks. Please find a few comments:

1)
/* Any slot with NULL in these fields should not have made it this far */

It is good to get rid of the case where we had checks for NULL
confirmed_lsn and catalog_xmin (i.e. when slot was in RS_EPHEMERAL
state), as that has already been checked by synchronize_slots() and
such a slot will not even reach wait_for_primary_slot_catchup(). But a
slot can still be invalidated on primary anytime, and thus during this
wait, we should check for primary's invalidation as we were doing in
v1.

2)
+ * If in SQL API synchronization, and we've been promoted, then no point

extra space before promoted.

+ if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered())

We don't need 'AmLogicalSlotSyncWorkerProcess' as that is already
checked at the beginning of this function.

4)
+ ereport(WARNING,
+ errmsg("aborting sync for slot \"%s\"",
+ remote_slot->name),
+ errdetail("Promotion occurred before this slot was fully"
+   " synchronized."));
+ pfree(cmd.data);
+
+ return false;

a) Please add an error-code.

b) Shall we change msg to

errmsg("aborting sync for slot \"%s\"",
remote_slot->name),
errhint("%s cannot be executed once promotion is
triggered.",

"pg_sync_replication_slots()")));

5)
Instead of using PromoteIsTriggered, shall we rely on
'SlotSyncCtx->stopSignaled' as we do when we start this API.

6)
In logicaldecoding.sgml, we can get rid of "Additionally, enabling
sync_replication_slots on the standby is required" to make it same as
what we had prior to the patch I pointed earlier.

Or better we can refine it to below. Thoughts?

The logical replication slots on the primary can be enabled for
synchronization to the hot standby by using the failover parameter of
pg_create_logical_replication_slot, or by using the failover option of
CREATE SUBSCRIPTION during slot creation. After that, synchronization
can be performed either manually by calling pg_sync_replication_slots
on the standby, or automatically by enabling sync_replication_slots on
the standby. When sync_replication_slots is enabled, the failover
slots are periodically synchronized by the slot sync worker. For the
synchronization to work, .....

thanks
Shveta

Dilip Kumar

dilipbalaut@gmail.com

6 months ago

In reply to: shveta malik (#8)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Jul 17, 2025 at 9:34 AM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Jul 16, 2025 at 3:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I am not able to apply the patch to the latest head or even to a week
back version. Can you please check and rebase?

thanks
Shveta

Rebased.

Thanks. Please find a few comments:

1)
/* Any slot with NULL in these fields should not have made it this far */

It is good to get rid of the case where we had checks for NULL
confirmed_lsn and catalog_xmin (i.e. when slot was in RS_EPHEMERAL
state), as that has already been checked by synchronize_slots() and
such a slot will not even reach wait_for_primary_slot_catchup(). But a
slot can still be invalidated on primary anytime, and thus during this
wait, we should check for primary's invalidation as we were doing in
v1.

2)
+ * If in SQL API synchronization, and we've been promoted, then no point

extra space before promoted.

3)

+ if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered())

We don't need 'AmLogicalSlotSyncWorkerProcess' as that is already
checked at the beginning of this function.
4)
+ ereport(WARNING,
+ errmsg("aborting sync for slot \"%s\"",
+ remote_slot->name),
+ errdetail("Promotion occurred before this slot was fully"
+   " synchronized."));
+ pfree(cmd.data);
+
+ return false;
a) Please add an error-code.

b) Shall we change msg to

errmsg("aborting sync for slot \"%s\"",
remote_slot->name),
errhint("%s cannot be executed once promotion is
triggered.",

"pg_sync_replication_slots()")));

5)
Instead of using PromoteIsTriggered, shall we rely on
'SlotSyncCtx->stopSignaled' as we do when we start this API.

6)
In logicaldecoding.sgml, we can get rid of "Additionally, enabling
sync_replication_slots on the standby is required" to make it same as
what we had prior to the patch I pointed earlier.

Or better we can refine it to below. Thoughts?

The logical replication slots on the primary can be enabled for
synchronization to the hot standby by using the failover parameter of
pg_create_logical_replication_slot, or by using the failover option of
CREATE SUBSCRIPTION during slot creation. After that, synchronization
can be performed either manually by calling pg_sync_replication_slots
on the standby, or automatically by enabling sync_replication_slots on
the standby. When sync_replication_slots is enabled, the failover
slots are periodically synchronized by the slot sync worker. For the
synchronization to work, .....

I am wondering if we should provide an optional parameter to
pg_sync_replication_slots(), to control whether to wait for the slot
to be synced or just return with ERROR as it is doing now, default can
be wait.

--
Regards,
Dilip Kumar
Google

#10

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: Dilip Kumar (#9)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Jul 18, 2025 at 10:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jul 17, 2025 at 9:34 AM shveta malik <shveta.malik@gmail.com> wrote:
On Wed, Jul 16, 2025 at 3:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I am not able to apply the patch to the latest head or even to a week
back version. Can you please check and rebase?

thanks
Shveta

Rebased.

Thanks. Please find a few comments:

1)
/* Any slot with NULL in these fields should not have made it this far */

It is good to get rid of the case where we had checks for NULL
confirmed_lsn and catalog_xmin (i.e. when slot was in RS_EPHEMERAL
state), as that has already been checked by synchronize_slots() and
such a slot will not even reach wait_for_primary_slot_catchup(). But a
slot can still be invalidated on primary anytime, and thus during this
wait, we should check for primary's invalidation as we were doing in
v1.

2)
+ * If in SQL API synchronization, and we've been promoted, then no point

extra space before promoted.

3)

+ if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered())

We don't need 'AmLogicalSlotSyncWorkerProcess' as that is already
checked at the beginning of this function.
4)
+ ereport(WARNING,
+ errmsg("aborting sync for slot \"%s\"",
+ remote_slot->name),
+ errdetail("Promotion occurred before this slot was fully"
+   " synchronized."));
+ pfree(cmd.data);
+
+ return false;
a) Please add an error-code.

b) Shall we change msg to

errmsg("aborting sync for slot \"%s\"",
remote_slot->name),
errhint("%s cannot be executed once promotion is
triggered.",

"pg_sync_replication_slots()")));

5)
Instead of using PromoteIsTriggered, shall we rely on
'SlotSyncCtx->stopSignaled' as we do when we start this API.

6)
In logicaldecoding.sgml, we can get rid of "Additionally, enabling
sync_replication_slots on the standby is required" to make it same as
what we had prior to the patch I pointed earlier.

Or better we can refine it to below. Thoughts?

The logical replication slots on the primary can be enabled for
synchronization to the hot standby by using the failover parameter of
pg_create_logical_replication_slot, or by using the failover option of
CREATE SUBSCRIPTION during slot creation. After that, synchronization
can be performed either manually by calling pg_sync_replication_slots
on the standby, or automatically by enabling sync_replication_slots on
the standby. When sync_replication_slots is enabled, the failover
slots are periodically synchronized by the slot sync worker. For the
synchronization to work, .....
I am wondering if we should provide an optional parameter to
pg_sync_replication_slots(), to control whether to wait for the slot
to be synced or just return with ERROR as it is doing now, default can
be wait.

Do you mean specifically in case of promotion or in general, as in do
not wait for primary to catch-up (anytime) and exit and drop the
temporary slot while exiting?

thanks
Shveta

#11

Dilip Kumar

dilipbalaut@gmail.com

6 months ago

In reply to: shveta malik (#10)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Jul 18, 2025 at 10:45 AM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Jul 18, 2025 at 10:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Jul 17, 2025 at 9:34 AM shveta malik <shveta.malik@gmail.com> wrote:
On Wed, Jul 16, 2025 at 3:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I am not able to apply the patch to the latest head or even to a week
back version. Can you please check and rebase?

thanks
Shveta

Rebased.

Thanks. Please find a few comments:

1)
/* Any slot with NULL in these fields should not have made it this far */

It is good to get rid of the case where we had checks for NULL
confirmed_lsn and catalog_xmin (i.e. when slot was in RS_EPHEMERAL
state), as that has already been checked by synchronize_slots() and
such a slot will not even reach wait_for_primary_slot_catchup(). But a
slot can still be invalidated on primary anytime, and thus during this
wait, we should check for primary's invalidation as we were doing in
v1.

2)
+ * If in SQL API synchronization, and we've been promoted, then no point

extra space before promoted.

3)

+ if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered())

We don't need 'AmLogicalSlotSyncWorkerProcess' as that is already
checked at the beginning of this function.
4)
+ ereport(WARNING,
+ errmsg("aborting sync for slot \"%s\"",
+ remote_slot->name),
+ errdetail("Promotion occurred before this slot was fully"
+   " synchronized."));
+ pfree(cmd.data);
+
+ return false;
a) Please add an error-code.

b) Shall we change msg to

errmsg("aborting sync for slot \"%s\"",
remote_slot->name),
errhint("%s cannot be executed once promotion is
triggered.",

"pg_sync_replication_slots()")));

5)
Instead of using PromoteIsTriggered, shall we rely on
'SlotSyncCtx->stopSignaled' as we do when we start this API.

6)
In logicaldecoding.sgml, we can get rid of "Additionally, enabling
sync_replication_slots on the standby is required" to make it same as
what we had prior to the patch I pointed earlier.

Or better we can refine it to below. Thoughts?

The logical replication slots on the primary can be enabled for
synchronization to the hot standby by using the failover parameter of
pg_create_logical_replication_slot, or by using the failover option of
CREATE SUBSCRIPTION during slot creation. After that, synchronization
can be performed either manually by calling pg_sync_replication_slots
on the standby, or automatically by enabling sync_replication_slots on
the standby. When sync_replication_slots is enabled, the failover
slots are periodically synchronized by the slot sync worker. For the
synchronization to work, .....
I am wondering if we should provide an optional parameter to
pg_sync_replication_slots(), to control whether to wait for the slot
to be synced or just return with ERROR as it is doing now, default can
be wait.
Do you mean specifically in case of promotion or in general, as in do
not wait for primary to catch-up (anytime) and exit and drop the
temporary slot while exiting?

I am specifically pointing to the exposed function
pg_sync_replication_slots() which was earlier non-blocking and was
giving an error if the primary slot for not catch-up, so we have
improved the functionality in this thread by making it wait for
primary to catch-up instead of just throwing an error. But my
question was since this is a user facing function so shall we keep the
old behavior intact with some optional parameters so that if the user
chooses not to wait they have options to do that?

--
Regards,
Dilip Kumar
Google

#12

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: Dilip Kumar (#11)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Jul 18, 2025 at 10:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Jul 18, 2025 at 10:45 AM shveta malik <shveta.malik@gmail.com> wrote:
On Fri, Jul 18, 2025 at 10:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Jul 17, 2025 at 9:34 AM shveta malik <shveta.malik@gmail.com> wrote:
On Wed, Jul 16, 2025 at 3:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I am not able to apply the patch to the latest head or even to a week
back version. Can you please check and rebase?

thanks
Shveta

Rebased.

Thanks. Please find a few comments:

1)
/* Any slot with NULL in these fields should not have made it this far */

It is good to get rid of the case where we had checks for NULL
confirmed_lsn and catalog_xmin (i.e. when slot was in RS_EPHEMERAL
state), as that has already been checked by synchronize_slots() and
such a slot will not even reach wait_for_primary_slot_catchup(). But a
slot can still be invalidated on primary anytime, and thus during this
wait, we should check for primary's invalidation as we were doing in
v1.

2)
+ * If in SQL API synchronization, and we've been promoted, then no point

extra space before promoted.

3)

+ if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered())

We don't need 'AmLogicalSlotSyncWorkerProcess' as that is already
checked at the beginning of this function.
4)
+ ereport(WARNING,
+ errmsg("aborting sync for slot \"%s\"",
+ remote_slot->name),
+ errdetail("Promotion occurred before this slot was fully"
+   " synchronized."));
+ pfree(cmd.data);
+
+ return false;
a) Please add an error-code.

b) Shall we change msg to

errmsg("aborting sync for slot \"%s\"",
remote_slot->name),
errhint("%s cannot be executed once promotion is
triggered.",

"pg_sync_replication_slots()")));

5)
Instead of using PromoteIsTriggered, shall we rely on
'SlotSyncCtx->stopSignaled' as we do when we start this API.

6)
In logicaldecoding.sgml, we can get rid of "Additionally, enabling
sync_replication_slots on the standby is required" to make it same as
what we had prior to the patch I pointed earlier.

Or better we can refine it to below. Thoughts?

The logical replication slots on the primary can be enabled for
synchronization to the hot standby by using the failover parameter of
pg_create_logical_replication_slot, or by using the failover option of
CREATE SUBSCRIPTION during slot creation. After that, synchronization
can be performed either manually by calling pg_sync_replication_slots
on the standby, or automatically by enabling sync_replication_slots on
the standby. When sync_replication_slots is enabled, the failover
slots are periodically synchronized by the slot sync worker. For the
synchronization to work, .....
I am wondering if we should provide an optional parameter to
pg_sync_replication_slots(), to control whether to wait for the slot
to be synced or just return with ERROR as it is doing now, default can
be wait.
Do you mean specifically in case of promotion or in general, as in do
not wait for primary to catch-up (anytime) and exit and drop the
temporary slot while exiting?
I am specifically pointing to the exposed function
pg_sync_replication_slots() which was earlier non-blocking and was
giving an error if the primary slot for not catch-up, so we have
improved the functionality in this thread by making it wait for
primary to catch-up instead of just throwing an error. But my
question was since this is a user facing function so shall we keep the
old behavior intact with some optional parameters so that if the user
chooses not to wait they have options to do that?

Okay. I see your point. Yes, it was non-blocking earlier but it was
not giving ERROR, it was just dumping in logilfe that primary is
behind and thus slot-sync could not be done.

If we continue using the non-blocking mode, there’s a risk that the
API may never successfully sync the slots. This is because it
eventually drops the temporary slot on exit, and when it tries to
create a new one later on subsequent call, it’s likely that the new
slot will again be ahead of the primary. This may happen if we have
continuous ongoing writes on the primary and the logical slot is not
being consumed at the same pace.

My preference would be to avoid including such an option as it is
confusing. With such an option in place, users may think that
slot-sync is completed while that may not be the case. But if it's
necessary for backward compatibility, it should be okay to provide it
as a non-default option as you suggested. Would like to know what
others think of this.

thanks
Shveta

#13

Dilip Kumar

dilipbalaut@gmail.com

6 months ago

In reply to: shveta malik (#12)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Jul 18, 2025 at 11:25 AM shveta malik <shveta.malik@gmail.com> wrote:

Okay. I see your point. Yes, it was non-blocking earlier but it was
not giving ERROR, it was just dumping in logilfe that primary is
behind and thus slot-sync could not be done.

If we continue using the non-blocking mode, there’s a risk that the
API may never successfully sync the slots. This is because it
eventually drops the temporary slot on exit, and when it tries to
create a new one later on subsequent call, it’s likely that the new
slot will again be ahead of the primary. This may happen if we have
continuous ongoing writes on the primary and the logical slot is not
being consumed at the same pace.

My preference would be to avoid including such an option as it is
confusing. With such an option in place, users may think that
slot-sync is completed while that may not be the case.

Fair enough

But if it's

necessary for backward compatibility, it should be okay to provide it
as a non-default option as you suggested. Would like to know what
others think of this.

I think we don't need to maintain backward compatibility here as it
was not behaving sanely before, so I am fine with providing the
behaviour we are doing with the patch.

--
Regards,
Dilip Kumar
Google

#14

Amit Kapila

amit.kapila16@gmail.com

6 months ago

In reply to: Dilip Kumar (#13)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Jul 18, 2025 at 11:31 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Jul 18, 2025 at 11:25 AM shveta malik <shveta.malik@gmail.com> wrote:

Okay. I see your point. Yes, it was non-blocking earlier but it was
not giving ERROR, it was just dumping in logilfe that primary is
behind and thus slot-sync could not be done.

If we continue using the non-blocking mode, there’s a risk that the
API may never successfully sync the slots. This is because it
eventually drops the temporary slot on exit, and when it tries to
create a new one later on subsequent call, it’s likely that the new
slot will again be ahead of the primary. This may happen if we have
continuous ongoing writes on the primary and the logical slot is not
being consumed at the same pace.

My preference would be to avoid including such an option as it is
confusing. With such an option in place, users may think that
slot-sync is completed while that may not be the case.

Fair enough

I think if we want we may return bool and return false when sync is
not complete say due to promotion or other reason like timeout.
However, at this stage it is not very clear whether it will be useful
to provide additional timeout parameter. But we can consider retruning
true/false depending on whether we are successful in syncing the slots
or not.

--
With Regards,
Amit Kapila.

#15

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: Amit Kapila (#14)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Sat, Jul 19, 2025 at 5:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 18, 2025 at 11:31 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Jul 18, 2025 at 11:25 AM shveta malik <shveta.malik@gmail.com> wrote:

Okay. I see your point. Yes, it was non-blocking earlier but it was
not giving ERROR, it was just dumping in logilfe that primary is
behind and thus slot-sync could not be done.

If we continue using the non-blocking mode, there’s a risk that the
API may never successfully sync the slots. This is because it
eventually drops the temporary slot on exit, and when it tries to
create a new one later on subsequent call, it’s likely that the new
slot will again be ahead of the primary. This may happen if we have
continuous ongoing writes on the primary and the logical slot is not
being consumed at the same pace.

My preference would be to avoid including such an option as it is
confusing. With such an option in place, users may think that
slot-sync is completed while that may not be the case.

Fair enough

I think if we want we may return bool and return false when sync is
not complete say due to promotion or other reason like timeout.
However, at this stage it is not very clear whether it will be useful
to provide additional timeout parameter. But we can consider retruning
true/false depending on whether we are successful in syncing the slots
or not.

I am not very sure if in the current scenario, such a return-value
will have any value addition. Since this function will be waiting
indefinitely until all the slots are synced, it is supposed to return
true in such normal scenarios. If it is interrupted by promotion or
user cancels it manually, then it is supposed to return false. But in
those cases, a more helpful approach would be to log a clear WARNING
or ERROR message like "sync interrupted by promotion" (or similar
reasons), rather than relying on a return value. In future, if we plan
to add a timeout-parameter, then this return value makes more sense as
in normal scenarios as well, as it can easily return false if the
timeout value is short or the number of slots are huge or are stuck
waiting on primary.

Additionally, if we do return a value, there may be an expectation
that the API should also provide details on the list of slots that
couldn't be synced. That could introduce unnecessary complexity at
this stage. We can avoid it for now and consider adding such
enhancements later if we receive relevant customer feedback. Please
note that our recommended approach for syncing slots still remains the
'slot sync worker' method.

thanks
Shveta

#16

Amit Kapila

amit.kapila16@gmail.com

6 months ago

In reply to: shveta malik (#15)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Jul 21, 2025 at 10:08 AM shveta malik <shveta.malik@gmail.com> wrote:

On Sat, Jul 19, 2025 at 5:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 18, 2025 at 11:31 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Jul 18, 2025 at 11:25 AM shveta malik <shveta.malik@gmail.com> wrote:

Okay. I see your point. Yes, it was non-blocking earlier but it was
not giving ERROR, it was just dumping in logilfe that primary is
behind and thus slot-sync could not be done.

If we continue using the non-blocking mode, there’s a risk that the
API may never successfully sync the slots. This is because it
eventually drops the temporary slot on exit, and when it tries to
create a new one later on subsequent call, it’s likely that the new
slot will again be ahead of the primary. This may happen if we have
continuous ongoing writes on the primary and the logical slot is not
being consumed at the same pace.

My preference would be to avoid including such an option as it is
confusing. With such an option in place, users may think that
slot-sync is completed while that may not be the case.

Fair enough

I think if we want we may return bool and return false when sync is
not complete say due to promotion or other reason like timeout.
However, at this stage it is not very clear whether it will be useful
to provide additional timeout parameter. But we can consider retruning
true/false depending on whether we are successful in syncing the slots
or not.

I am not very sure if in the current scenario, such a return-value
will have any value addition. Since this function will be waiting
indefinitely until all the slots are synced, it is supposed to return
true in such normal scenarios. If it is interrupted by promotion or
user cancels it manually, then it is supposed to return false. But in
those cases, a more helpful approach would be to log a clear WARNING
or ERROR message like "sync interrupted by promotion" (or similar
reasons), rather than relying on a return value. In future, if we plan
to add a timeout-parameter, then this return value makes more sense as
in normal scenarios as well, as it can easily return false if the
timeout value is short or the number of slots are huge or are stuck
waiting on primary.

Additionally, if we do return a value, there may be an expectation
that the API should also provide details on the list of slots that
couldn't be synced. That could introduce unnecessary complexity at
this stage. We can avoid it for now and consider adding such
enhancements later if we receive relevant customer feedback.

makes sense.

Please
note that our recommended approach for syncing slots still remains the
'slot sync worker' method.

Right.

--
With Regards,
Amit Kapila.

#17

Dilip Kumar

dilipbalaut@gmail.com

6 months ago

In reply to: shveta malik (#15)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Jul 21, 2025 at 10:08 AM shveta malik <shveta.malik@gmail.com> wrote:

On Sat, Jul 19, 2025 at 5:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 18, 2025 at 11:31 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Jul 18, 2025 at 11:25 AM shveta malik <shveta.malik@gmail.com> wrote:

Okay. I see your point. Yes, it was non-blocking earlier but it was
not giving ERROR, it was just dumping in logilfe that primary is
behind and thus slot-sync could not be done.

If we continue using the non-blocking mode, there’s a risk that the
API may never successfully sync the slots. This is because it
eventually drops the temporary slot on exit, and when it tries to
create a new one later on subsequent call, it’s likely that the new
slot will again be ahead of the primary. This may happen if we have
continuous ongoing writes on the primary and the logical slot is not
being consumed at the same pace.

My preference would be to avoid including such an option as it is
confusing. With such an option in place, users may think that
slot-sync is completed while that may not be the case.

Fair enough

I think if we want we may return bool and return false when sync is
not complete say due to promotion or other reason like timeout.
However, at this stage it is not very clear whether it will be useful
to provide additional timeout parameter. But we can consider retruning
true/false depending on whether we are successful in syncing the slots
or not.

I am not very sure if in the current scenario, such a return-value
will have any value addition. Since this function will be waiting
indefinitely until all the slots are synced, it is supposed to return
true in such normal scenarios. If it is interrupted by promotion or
user cancels it manually, then it is supposed to return false. But in
those cases, a more helpful approach would be to log a clear WARNING
or ERROR message like "sync interrupted by promotion" (or similar
reasons), rather than relying on a return value. In future, if we plan
to add a timeout-parameter, then this return value makes more sense as
in normal scenarios as well, as it can easily return false if the
timeout value is short or the number of slots are huge or are stuck
waiting on primary.

Additionally, if we do return a value, there may be an expectation
that the API should also provide details on the list of slots that
couldn't be synced. That could introduce unnecessary complexity at
this stage. We can avoid it for now and consider adding such
enhancements later if we receive relevant customer feedback. Please
note that our recommended approach for syncing slots still remains the
'slot sync worker' method.

--
Regards,
Dilip Kumar
Google

#18

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: shveta malik (#8)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Jul 17, 2025 at 2:04 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Jul 16, 2025 at 3:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I am not able to apply the patch to the latest head or even to a week
back version. Can you please check and rebase?

thanks
Shveta

Rebased.

Thanks. Please find a few comments:

1)
/* Any slot with NULL in these fields should not have made it this far */

It is good to get rid of the case where we had checks for NULL
confirmed_lsn and catalog_xmin (i.e. when slot was in RS_EPHEMERAL
state), as that has already been checked by synchronize_slots() and
such a slot will not even reach wait_for_primary_slot_catchup(). But a
slot can still be invalidated on primary anytime, and thus during this
wait, we should check for primary's invalidation as we were doing in
v1.

I've added back the check for invalidated slots.

2)
+ * If in SQL API synchronization, and we've been promoted, then no point

extra space before promoted.

Fixed.

3)

+ if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered())

We don't need 'AmLogicalSlotSyncWorkerProcess' as that is already
checked at the beginning of this function.

Fixed.

4)
+ ereport(WARNING,
+ errmsg("aborting sync for slot \"%s\"",
+ remote_slot->name),
+ errdetail("Promotion occurred before this slot was fully"
+   " synchronized."));
+ pfree(cmd.data);
+
+ return false;
a) Please add an error-code.

b) Shall we change msg to

errmsg("aborting sync for slot \"%s\"",
remote_slot->name),
errhint("%s cannot be executed once promotion is
triggered.",

"pg_sync_replication_slots()")));

Since there is already an error return in the start if promotion is
triggered, I've kept the same error code and message here as well for
consistency.

5)
Instead of using PromoteIsTriggered, shall we rely on
'SlotSyncCtx->stopSignaled' as we do when we start this API.

Fixed.

6)
In logicaldecoding.sgml, we can get rid of "Additionally, enabling
sync_replication_slots on the standby is required" to make it same as
what we had prior to the patch I pointed earlier.

Or better we can refine it to below. Thoughts?

The logical replication slots on the primary can be enabled for
synchronization to the hot standby by using the failover parameter of
pg_create_logical_replication_slot, or by using the failover option of
CREATE SUBSCRIPTION during slot creation. After that, synchronization
can be performed either manually by calling pg_sync_replication_slots
on the standby, or automatically by enabling sync_replication_slots on
the standby. When sync_replication_slots is enabled, the failover
slots are periodically synchronized by the slot sync worker. For the
synchronization to work, .....

Updated as above.

Patch v3 attached.

Regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v3-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchapplication/octet-stream; name=v3-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchDownload

From 661fcb4ced929bed2dd4d90e126c0a1cb39114c0 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Thu, 31 Jul 2025 05:33:42 -0400
Subject: [PATCH v3] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func.sgml                          |   4 +-
 doc/src/sgml/logicaldecoding.sgml               |  40 ++---
 src/backend/replication/logical/slotsync.c      | 220 ++++++++++++++++++++++--
 src/backend/utils/activity/wait_event_names.txt |   1 +
 4 files changed, 225 insertions(+), 40 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af..4092677 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -30034,9 +30034,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 593f784..edad0e9 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -364,18 +364,23 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
    <sect2 id="logicaldecoding-replication-slots-synchronization">
     <title>Replication Slot Synchronization</title>
     <para>
-     The logical replication slots on the primary can be synchronized to
-     the hot standby by using the <literal>failover</literal> parameter of
+     The logical replication slots on the primary can be enabled for
+     synchronization to the hot standby by using the
+     <literal>failover</literal> parameter of
      <link linkend="pg-create-logical-replication-slot">
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +403,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot may have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 2f0c08b..3308772 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -146,6 +146,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -550,6 +551,185 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
 }
 
 /*
+ * Wait for remote slot to pass locally reserved position.
+ *
+ * Return true if remote_slot could catch up with the locally reserved
+ * position. Return false in all other cases.
+ */
+static bool
+wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
+{
+#define SLOT_QUERY_COLUMN_COUNT 4
+
+	StringInfoData cmd;
+	int			   wait_iterations = 0;
+
+	Assert(!AmLogicalSlotSyncWorkerProcess());
+
+	ereport(LOG,
+			errmsg("waiting for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
+				   " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
+				   remote_slot->name,
+				   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+				   remote_slot->catalog_xmin,
+				   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+				   MyReplicationSlot->data.catalog_xmin));
+
+	initStringInfo(&cmd);
+	appendStringInfo(&cmd,
+					 "SELECT invalidation_reason IS NOT NULL, restart_lsn,"
+					 " confirmed_flush_lsn, catalog_xmin"
+					 " FROM pg_catalog.pg_replication_slots"
+					 " WHERE slot_name = %s",
+					 quote_literal_cstr(remote_slot->name));
+
+	for (;;)
+	{
+		bool		new_invalidated;
+		XLogRecPtr	new_restart_lsn;
+		XLogRecPtr	new_confirmed_lsn;
+		TransactionId new_catalog_xmin;
+		WalRcvExecResult *res;
+		TupleTableSlot *tupslot;
+		Datum		d;
+		int			rc;
+		int			col = 0;
+		bool		isnull;
+		Oid			slotRow[SLOT_QUERY_COLUMN_COUNT] = {BOOLOID, LSNOID, LSNOID, XIDOID};
+
+		/* Handle any termination request if any */
+		ProcessSlotSyncInterrupts(wrconn);
+
+		res = walrcv_exec(wrconn, cmd.data, SLOT_QUERY_COLUMN_COUNT, slotRow);
+
+		if (res->status != WALRCV_OK_TUPLES)
+			ereport(ERROR,
+					errmsg("could not fetch slot \"%s\" info from the"
+						   " primary server: %s",
+						   remote_slot->name, res->err));
+
+		tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+		if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
+		{
+			ereport(WARNING,
+					errmsg("aborting initial sync for slot \"%s\"",
+						   remote_slot->name),
+					errdetail("This slot was not found on the primary server."));
+
+			pfree(cmd.data);
+			walrcv_clear_result(res);
+
+			return false;
+		}
+
+		/*
+		 * It is possible that the slot was invalidated on the primary, if so
+		 * handle accordingly.
+		 */
+		new_invalidated = DatumGetBool(slot_getattr(tupslot, ++col, &isnull));
+		Assert(!isnull);
+
+		if (new_invalidated)
+		{
+			/*
+			 * The slot won't be persisted by the caller; it will be cleaned
+			 * up at the end of synchronization.
+			 */
+			ereport(WARNING,
+					errmsg("aborting initial sync for slot \"%s\"",
+						   remote_slot->name),
+					errdetail("This slot was invalidated on the primary server."));
+
+			pfree(cmd.data);
+			ExecClearTuple(tupslot);
+			walrcv_clear_result(res);
+
+			return false;
+		}
+
+		/* Any slot with NULL in these fields should not have made it this far */
+		d = slot_getattr(tupslot, ++col, &isnull);
+		Assert(!isnull);
+		new_restart_lsn = DatumGetLSN(d);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		Assert(!isnull);
+		new_confirmed_lsn = DatumGetLSN(d);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		Assert(!isnull);
+		new_catalog_xmin = DatumGetTransactionId(d);
+
+		ExecClearTuple(tupslot);
+		walrcv_clear_result(res);
+
+		if (new_restart_lsn >= MyReplicationSlot->data.restart_lsn &&
+			TransactionIdFollowsOrEquals(new_catalog_xmin,
+										 MyReplicationSlot->data.catalog_xmin))
+		{
+			/* Update new values in remote_slot */
+			remote_slot->restart_lsn = new_restart_lsn;
+			remote_slot->confirmed_lsn = new_confirmed_lsn;
+			remote_slot->catalog_xmin = new_catalog_xmin;
+
+			ereport(LOG,
+					errmsg("wait over for remote slot \"%s\" as its LSN (%X/%X)"
+						   " and catalog xmin (%u) has now passed local slot LSN"
+						   " (%X/%X) and catalog xmin (%u)",
+						   remote_slot->name,
+						   LSN_FORMAT_ARGS(new_restart_lsn),
+						   new_catalog_xmin,
+						   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+						   MyReplicationSlot->data.catalog_xmin));
+
+			pfree(cmd.data);
+
+			return true;
+		}
+
+		/*
+		 * If we've been promoted, then no point continuing.
+		 */
+		if (SlotSyncCtx->stopSignaled)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("cannot synchronize replication slots when"
+							" standby promotion is ongoing")));
+			pfree(cmd.data);
+
+			return false;
+		}
+
+		/*
+		 * XXX: Is waiting for 2 seconds before retrying enough or more or
+		 * less?
+		 */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   2000L,
+					   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+		if (rc & WL_LATCH_SET)
+			ResetLatch(MyLatch);
+
+		/* log a message every ten seconds */
+		wait_iterations++;
+		if (wait_iterations % 5 == 0)
+		{
+			ereport(LOG,
+					errmsg("continuing to wait for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
+						   " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
+						   remote_slot->name,
+						   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+						   remote_slot->catalog_xmin,
+						   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+						   MyReplicationSlot->data.catalog_xmin));
+		}
+	}
+}
+
+/*
  * If the remote restart_lsn and catalog_xmin have caught up with the
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
@@ -558,7 +738,8 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(WalReceiverConn *wrconn,
+									 RemoteSlot *remote_slot, Oid remote_dbid)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -577,12 +758,28 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * If we're in the slotsync worker, we retain the slot and retry in the
+		 * next cycle. The restart_lsn might advance by then, allowing the slot
+		 * to be created successfully later.
 		 */
-		return false;
+		if (AmLogicalSlotSyncWorkerProcess())
+			return false;
+
+		/*
+		 * For SQL API synchronization, we wait for the remote slot to catch up
+		 * here, since we can't assume the SQL API will be called again soon.
+		 * We will retry the sync once the slot catches up.
+		 *
+		 * Note: This will return false if a promotion is triggered on the
+		 * standby while waiting, in which case we stop syncing and drop the
+		 * temporary slot.
+		 */
+		if (!wait_for_primary_slot_catchup(wrconn, remote_slot))
+			return false;
+		else
+			update_local_synced_slot(remote_slot, remote_dbid,
+									 &found_consistent_snapshot,
+									 &remote_slot_precedes);
 	}
 
 	/*
@@ -622,7 +819,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
+					 Oid remote_dbid)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +913,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/* Slot not ready yet, let's attempt to make it sync-ready now. */
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
-			slot_updated = update_and_persist_local_synced_slot(remote_slot,
+			slot_updated = update_and_persist_local_synced_slot(wrconn,
+																remote_slot,
 																remote_dbid);
 		}
 
@@ -785,7 +984,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(wrconn, remote_slot, remote_dbid);
 
 		slot_updated = true;
 	}
@@ -927,7 +1126,8 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(wrconn, remote_slot,
+												  remote_dbid);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d..9fa36ab 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -64,6 +64,7 @@ LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication paralle
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
 REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
1.8.3.1

#19

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Ajin Cherian (#18)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Jul 31, 2025 at 3:11 PM Ajin Cherian <itsajin@gmail.com> wrote:

Patch v3 attached.

Thanks for the patch. I tested it, please find a few comments:

1)
it hits an assert
(slotsync_reread_config()-->Assert(sync_replication_slots)) when API
is trying to sync and is in wait loop while in another session, I
enable sync_replication_slots using:

ALTER SYSTEM SET sync_replication_slots = 'on';
SELECT pg_reload_conf();

Assert:
025-08-01 10:55:43.637 IST [118576] STATEMENT: SELECT
pg_sync_replication_slots();
2025-08-01 10:55:51.730 IST [118563] LOG: received SIGHUP, reloading
configuration files
2025-08-01 10:55:51.731 IST [118563] LOG: parameter
"sync_replication_slots" changed to "on"
TRAP: failed Assert("sync_replication_slots"), File: "slotsync.c",
Line: 1334, PID: 118576
postgres: shveta postgres [local]
SELECT(ExceptionalCondition+0xbb)[0x61df0160e090]
postgres: shveta postgres [local] SELECT(+0x6520dc)[0x61df0133a0dc]
2025-08-01 10:55:51.739 IST [118666] ERROR: cannot synchronize
replication slots concurrently
postgres: shveta postgres [local] SELECT(+0x6522b2)[0x61df0133a2b2]
postgres: shveta postgres [local] SELECT(+0x650664)[0x61df01338664]
postgres: shveta postgres [local] SELECT(+0x650cf8)[0x61df01338cf8]
postgres: shveta postgres [local] SELECT(+0x6513ea)[0x61df013393ea]
postgres: shveta postgres [local] SELECT(+0x6519df)[0x61df013399df]
postgres: shveta postgres [local]
SELECT(SyncReplicationSlots+0xbb)[0x61df0133af60]
postgres: shveta postgres [local]
SELECT(pg_sync_replication_slots+0x1b1)[0x61df01357e52]

2)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot synchronize replication slots when"
+ " standby promotion is ongoing")));

I think better error message will be:
"exiting from slot synchronization as promotion is triggered"

This will be better suited in log file as well after below wait statements:
LOG: continuing to wait for remote slot "failover_slot" LSN
(0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060)
and catalog xmin (757)
STATEMENT: SELECT pg_sync_replication_slots();

3)
API dumps this when it is waiting for primary:

----
LOG: could not synchronize replication slot "failover_slot2"
DETAIL: Synchronization could lead to data loss, because the remote
slot needs WAL at LSN 0/03066E70 and catalog xmin 755, but the standby
has LSN 0/03066E70 and catalog xmin 770.
STATEMENT: SELECT pg_sync_replication_slots();
LOG: waiting for remote slot "failover_slot2" LSN (0/3066E70) and
catalog xmin (755) to pass local slot LSN (0/3066E70) and catalog xmin
(770)
STATEMENT: SELECT pg_sync_replication_slots();
LOG: continuing to wait for remote slot "failover_slot2" LSN
(0/3066E70) and catalog xmin (755) to pass local slot LSN (0/3066E70)
and catalog xmin (770)
STATEMENT: SELECT pg_sync_replication_slots();
----

Unsure if we shall still dump 'could not synchronize..' when it is
going to retry until it succeeds? The concerned log gives a feeling
that we are done trying and could not synchronize it. What do you
think?

thanks
Shveta

#20

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: shveta malik (#19)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 1, 2025 at 12:02 PM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, Jul 31, 2025 at 3:11 PM Ajin Cherian <itsajin@gmail.com> wrote:

Patch v3 attached.

Thanks for the patch. I tested it, please find a few comments:

1)
it hits an assert
(slotsync_reread_config()-->Assert(sync_replication_slots)) when API
is trying to sync and is in wait loop while in another session, I
enable sync_replication_slots using:

ALTER SYSTEM SET sync_replication_slots = 'on';
SELECT pg_reload_conf();

Assert:
025-08-01 10:55:43.637 IST [118576] STATEMENT: SELECT
pg_sync_replication_slots();
2025-08-01 10:55:51.730 IST [118563] LOG: received SIGHUP, reloading
configuration files
2025-08-01 10:55:51.731 IST [118563] LOG: parameter
"sync_replication_slots" changed to "on"
TRAP: failed Assert("sync_replication_slots"), File: "slotsync.c",
Line: 1334, PID: 118576
postgres: shveta postgres [local]
SELECT(ExceptionalCondition+0xbb)[0x61df0160e090]
postgres: shveta postgres [local] SELECT(+0x6520dc)[0x61df0133a0dc]
2025-08-01 10:55:51.739 IST [118666] ERROR: cannot synchronize
replication slots concurrently
postgres: shveta postgres [local] SELECT(+0x6522b2)[0x61df0133a2b2]
postgres: shveta postgres [local] SELECT(+0x650664)[0x61df01338664]
postgres: shveta postgres [local] SELECT(+0x650cf8)[0x61df01338cf8]
postgres: shveta postgres [local] SELECT(+0x6513ea)[0x61df013393ea]
postgres: shveta postgres [local] SELECT(+0x6519df)[0x61df013399df]
postgres: shveta postgres [local]
SELECT(SyncReplicationSlots+0xbb)[0x61df0133af60]
postgres: shveta postgres [local]
SELECT(pg_sync_replication_slots+0x1b1)[0x61df01357e52]
2)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot synchronize replication slots when"
+ " standby promotion is ongoing")));
I think better error message will be:
"exiting from slot synchronization as promotion is triggered"

This will be better suited in log file as well after below wait statements:
LOG: continuing to wait for remote slot "failover_slot" LSN
(0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060)
and catalog xmin (757)
STATEMENT: SELECT pg_sync_replication_slots();

3)
API dumps this when it is waiting for primary:

----
LOG: could not synchronize replication slot "failover_slot2"
DETAIL: Synchronization could lead to data loss, because the remote
slot needs WAL at LSN 0/03066E70 and catalog xmin 755, but the standby
has LSN 0/03066E70 and catalog xmin 770.
STATEMENT: SELECT pg_sync_replication_slots();
LOG: waiting for remote slot "failover_slot2" LSN (0/3066E70) and
catalog xmin (755) to pass local slot LSN (0/3066E70) and catalog xmin
(770)
STATEMENT: SELECT pg_sync_replication_slots();
LOG: continuing to wait for remote slot "failover_slot2" LSN
(0/3066E70) and catalog xmin (755) to pass local slot LSN (0/3066E70)
and catalog xmin (770)
STATEMENT: SELECT pg_sync_replication_slots();
----

Unsure if we shall still dump 'could not synchronize..' when it is
going to retry until it succeeds? The concerned log gives a feeling
that we are done trying and could not synchronize it. What do you
think?

A few more comments:

+ if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
+ {
+ ereport(WARNING,
+ errmsg("aborting initial sync for slot \"%s\"",
+    remote_slot->name),
+ errdetail("This slot was not found on the primary server."));
+
+ pfree(cmd.data);
+ walrcv_clear_result(res);
+
+ return false;
+ }

We may have 'aborting sync for slot', can remove 'initial'.

5)
I tried a test where there were 4 slots on the publisher, where one
was getting used while the others were not. Initiated
pg_sync_replication_slots on standby. Forced unused slots to be
invalidated by setting idle_replication_slot_timeout=60 on primary,
due to which API finished but gave a warning:

postgres=# SELECT pg_sync_replication_slots();
WARNING: aborting initial sync for slot "failover_slot"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot2"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot3"
DETAIL: This slot was invalidated on the primary server.
pg_sync_replication_slots
---------------------------

(1 row)

Do we need these warnings here? I think we can have it as a LOG rather
than having it on console. Thoughts?

If we inclined towards WARNING here, will ti be better to have it as
a single line:

WARNING: aborting sync for slot "failover_slot" as the slot was
invalidated on primary
WARNING: aborting sync for slot "failover_slot1" as the slot was
invalidated on primary
WARNING: aborting sync for slot "failover_slot2" as the slot was
invalidated on primary

6)
- * We do not drop the slot because the restart_lsn can be ahead of the
- * current location when recreating the slot in the next cycle. It may
- * take more time to create such a slot. Therefore, we keep this slot
- * and attempt the synchronization in the next cycle.
+ * If we're in the slotsync worker, we retain the slot and retry in the
+ * next cycle. The restart_lsn might advance by then, allowing the slot
+ * to be created successfully later.
  */

I like the previous command better as it was conveying the side-effect
of dropping the slot herer. Can we try to incorporate the previous
comment here and make it specific to slotsync workers?

thanks
Shveta

#21

Amit Kapila

amit.kapila16@gmail.com

5 months ago

In reply to: shveta malik (#20)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 1, 2025 at 2:50 PM shveta malik <shveta.malik@gmail.com> wrote:

5)
I tried a test where there were 4 slots on the publisher, where one
was getting used while the others were not. Initiated
pg_sync_replication_slots on standby. Forced unused slots to be
invalidated by setting idle_replication_slot_timeout=60 on primary,
due to which API finished but gave a warning:

postgres=# SELECT pg_sync_replication_slots();
WARNING: aborting initial sync for slot "failover_slot"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot2"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot3"
DETAIL: This slot was invalidated on the primary server.
pg_sync_replication_slots
---------------------------

(1 row)

Do we need these warnings here? I think we can have it as a LOG rather
than having it on console. Thoughts?

What is the behaviour of a slotsync worker in the same case? I don't
see any such WARNING messages in the code of slotsync worker, so why
do we want a different behaviour here?

Few other comments:
======================
1.
update_and_persist_local_synced_slot()
{
...
+ /*
+ * For SQL API synchronization, we wait for the remote slot to catch up
+ * here, since we can't assume the SQL API will be called again soon.
+ * We will retry the sync once the slot catches up.
+ *
+ * Note: This will return false if a promotion is triggered on the
+ * standby while waiting, in which case we stop syncing and drop the
+ * temporary slot.
+ */
+ if (!wait_for_primary_slot_catchup(wrconn, remote_slot))
+ return false;

Why is the wait added at this level? Shouldn't it be at API level aka
in SyncReplicationSlots() or pg_sync_replication_slots() similar to
what we do in ReplSlotSyncWorkerMain() for slotsync workers?

2.
REPLICATION_SLOTSYNC_MAIN "Waiting in main loop of slot sync worker."
...
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP "Waiting for the primary to catch-up."

Can't we reuse existing waitevent REPLICATION_SLOTSYNC_MAIN? We may
want to change the description. Is there a specific reason to add a
new wait_event for this API?

--
With Regards,
Amit Kapila.

#22

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Amit Kapila (#21)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Aug 4, 2025 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Aug 1, 2025 at 2:50 PM shveta malik <shveta.malik@gmail.com> wrote:

5)
I tried a test where there were 4 slots on the publisher, where one
was getting used while the others were not. Initiated
pg_sync_replication_slots on standby. Forced unused slots to be
invalidated by setting idle_replication_slot_timeout=60 on primary,
due to which API finished but gave a warning:

postgres=# SELECT pg_sync_replication_slots();
WARNING: aborting initial sync for slot "failover_slot"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot2"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot3"
DETAIL: This slot was invalidated on the primary server.
pg_sync_replication_slots
---------------------------

(1 row)

Do we need these warnings here? I think we can have it as a LOG rather
than having it on console. Thoughts?

What is the behaviour of a slotsync worker in the same case? I don't
see any such WARNING messages in the code of slotsync worker, so why
do we want a different behaviour here?

We don’t have continuous waiting in the slot-sync worker if the remote
slot is behind the local slot. But if during the first sync cycle the
remote slot is behind, we keep the local slot as a temporary slot. In
the next sync cycle, if we find the remote slot is invalidated, we
mark the local slot as invalidated too, keeping it in this temporary
state. There are no LOG or WARNING messages in this case. When the
slot-sync worker stops or shuts down (like during promotion), it
cleans up this temporary slot.

Now, for the API behavior: if the remote slot is behind the local
slot, the API enters a wait loop and logs:

LOG: waiting for remote slot "failover_slot" LSN (0/3000060) and
catalog xmin (755) to pass local slot LSN (0/3000060) and catalog xmin
(770)

If it keeps waiting, every 10 seconds it logs:
LOG: continuing to wait for remote slot "failover_slot" LSN
(0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060)
and catalog xmin (770)

If the remote slot becomes invalidated during this wait, currently it
logs a WARNING and moves to syncing the next slot:
WARNING: aborting initial sync for slot "failover_slot" as the slot
was invalidated on primary

I think this WARNING is too strong. We could change it to a LOG
message instead, mark the local slot as invalidated, exit the wait
loop, and move on to syncing the next slot.

Even though this LOG is not there in slotsync worker case, I think it
makes more sense in API case due to continuous LOGs which suggested
that API was waiting to sync this slot in wait-loop and thus we shall
conclude it either by saying wait-over (like we do in successful sync
case) or we can say 'LOG: aborting wait as the remote slot was
invalidated' instead of above WARNING message. What do you suggest?

thanks
Shveta

#23

Amit Kapila

amit.kapila16@gmail.com

5 months ago

In reply to: shveta malik (#22)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Aug 4, 2025 at 12:19 PM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Aug 4, 2025 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Aug 1, 2025 at 2:50 PM shveta malik <shveta.malik@gmail.com> wrote:

5)
I tried a test where there were 4 slots on the publisher, where one
was getting used while the others were not. Initiated
pg_sync_replication_slots on standby. Forced unused slots to be
invalidated by setting idle_replication_slot_timeout=60 on primary,
due to which API finished but gave a warning:

postgres=# SELECT pg_sync_replication_slots();
WARNING: aborting initial sync for slot "failover_slot"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot2"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot3"
DETAIL: This slot was invalidated on the primary server.
pg_sync_replication_slots
---------------------------

(1 row)

Do we need these warnings here? I think we can have it as a LOG rather
than having it on console. Thoughts?

What is the behaviour of a slotsync worker in the same case? I don't
see any such WARNING messages in the code of slotsync worker, so why
do we want a different behaviour here?

We don’t have continuous waiting in the slot-sync worker if the remote
slot is behind the local slot. But if during the first sync cycle the
remote slot is behind, we keep the local slot as a temporary slot. In
the next sync cycle, if we find the remote slot is invalidated, we
mark the local slot as invalidated too, keeping it in this temporary
state. There are no LOG or WARNING messages in this case. When the
slot-sync worker stops or shuts down (like during promotion), it
cleans up this temporary slot.

Now, for the API behavior: if the remote slot is behind the local
slot, the API enters a wait loop and logs:

LOG: waiting for remote slot "failover_slot" LSN (0/3000060) and
catalog xmin (755) to pass local slot LSN (0/3000060) and catalog xmin
(770)

If it keeps waiting, every 10 seconds it logs:
LOG: continuing to wait for remote slot "failover_slot" LSN
(0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060)
and catalog xmin (770)

If the remote slot becomes invalidated during this wait, currently it
logs a WARNING and moves to syncing the next slot:
WARNING: aborting initial sync for slot "failover_slot" as the slot
was invalidated on primary

I think this WARNING is too strong. We could change it to a LOG
message instead, mark the local slot as invalidated, exit the wait
loop, and move on to syncing the next slot.

Even though this LOG is not there in slotsync worker case, I think it
makes more sense in API case due to continuous LOGs which suggested
that API was waiting to sync this slot in wait-loop and thus we shall
conclude it either by saying wait-over (like we do in successful sync
case) or we can say 'LOG: aborting wait as the remote slot was
invalidated' instead of above WARNING message. What do you suggest?

I also think LOG is a better choice for this because there is nothing
we can expect users to do even after seeing this. I feel this is more
of an info for users.

--
With Regards,
Amit Kapila.

#24

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Amit Kapila (#23)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Aug 4, 2025 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 4, 2025 at 12:19 PM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Aug 4, 2025 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Aug 1, 2025 at 2:50 PM shveta malik <shveta.malik@gmail.com> wrote:

5)
I tried a test where there were 4 slots on the publisher, where one
was getting used while the others were not. Initiated
pg_sync_replication_slots on standby. Forced unused slots to be
invalidated by setting idle_replication_slot_timeout=60 on primary,
due to which API finished but gave a warning:

postgres=# SELECT pg_sync_replication_slots();
WARNING: aborting initial sync for slot "failover_slot"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot2"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot3"
DETAIL: This slot was invalidated on the primary server.
pg_sync_replication_slots
---------------------------

(1 row)

Do we need these warnings here? I think we can have it as a LOG rather
than having it on console. Thoughts?

What is the behaviour of a slotsync worker in the same case? I don't
see any such WARNING messages in the code of slotsync worker, so why
do we want a different behaviour here?

We don’t have continuous waiting in the slot-sync worker if the remote
slot is behind the local slot. But if during the first sync cycle the
remote slot is behind, we keep the local slot as a temporary slot. In
the next sync cycle, if we find the remote slot is invalidated, we
mark the local slot as invalidated too, keeping it in this temporary
state. There are no LOG or WARNING messages in this case. When the
slot-sync worker stops or shuts down (like during promotion), it
cleans up this temporary slot.

Now, for the API behavior: if the remote slot is behind the local
slot, the API enters a wait loop and logs:

LOG: waiting for remote slot "failover_slot" LSN (0/3000060) and
catalog xmin (755) to pass local slot LSN (0/3000060) and catalog xmin
(770)

If it keeps waiting, every 10 seconds it logs:
LOG: continuing to wait for remote slot "failover_slot" LSN
(0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060)
and catalog xmin (770)

If the remote slot becomes invalidated during this wait, currently it
logs a WARNING and moves to syncing the next slot:
WARNING: aborting initial sync for slot "failover_slot" as the slot
was invalidated on primary

I think this WARNING is too strong. We could change it to a LOG
message instead, mark the local slot as invalidated, exit the wait
loop, and move on to syncing the next slot.

Even though this LOG is not there in slotsync worker case, I think it
makes more sense in API case due to continuous LOGs which suggested
that API was waiting to sync this slot in wait-loop and thus we shall
conclude it either by saying wait-over (like we do in successful sync
case) or we can say 'LOG: aborting wait as the remote slot was
invalidated' instead of above WARNING message. What do you suggest?

I also think LOG is a better choice for this because there is nothing
we can expect users to do even after seeing this. I feel this is more
of an info for users.

Yes, it is more of an info for users.

1.
update_and_persist_local_synced_slot()
{
...
+ /*
+ * For SQL API synchronization, we wait for the remote slot to catch up
+ * here, since we can't assume the SQL API will be called again soon.
+ * We will retry the sync once the slot catches up.
+ *
+ * Note: This will return false if a promotion is triggered on the
+ * standby while waiting, in which case we stop syncing and drop the
+ * temporary slot.
+ */
+ if (!wait_for_primary_slot_catchup(wrconn, remote_slot))
+ return false;

Why is the wait added at this level? Shouldn't it be at API level aka
in SyncReplicationSlots() or pg_sync_replication_slots() similar to
what we do in ReplSlotSyncWorkerMain() for slotsync workers?

The initial goal was to perform a single sync cycle for all slots. The
logic was simple, if any slot couldn’t be synced because its remote
slot was lagging, we would wait for the remote slot to catch up, and
only then move on to the next slot.

But if we consider moving wait logic to SyncReplicationSlots(), we
will necessarily have to go with the logic that it will attempt to
sync all slots in the first sync cycle, skipping those where remote
slots are lagging. It will then continue with multiple sync cycles
until all slots are successfully synced. But if new remote slots are
added in the meantime, they will be picked up in the next cycle, and
the API then has to wait on those as well and this cycle may go on for
longer.

If we want to avoid continuously syncing newly added slots in later
cycles and instead focus only on the ones that failed to sync during
the first attempt, one approach is to maintain a list of failed slots
from the initial cycle and only retry those in subsequent attempts.
But this will add complexity to the implementation.

IMO, attempting multiple sync cycles essentially makes the API behave
more like slotsyncworker, which might not be desirable. I feel that
performing just one sync cycle in the API is more in line with the
expected behavior. And for that, the current implementation of
wait-logic seems simpler. But let me know if you think otherwise or I
have not understood your point clearly. I am open to more
approaches/ideas here.

thanks
Shveta

#25

Amit Kapila

amit.kapila16@gmail.com

5 months ago

In reply to: shveta malik (#24)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Aug 5, 2025 at 9:28 AM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Aug 4, 2025 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 4, 2025 at 12:19 PM shveta malik <shveta.malik@gmail.com> wrote:

If we want to avoid continuously syncing newly added slots in later
cycles and instead focus only on the ones that failed to sync during
the first attempt, one approach is to maintain a list of failed slots
from the initial cycle and only retry those in subsequent attempts.
But this will add complexity to the implementation.

There will be some additional code for this but overall it improves
the code in the lower level functions. We may want to use the existing
remote_slot list for this purpose.

The current proposed change in low-level functions appears to be
difficult to maintain, especially the change proposed in
update_and_persist_local_synced_slot(). If we can find a better way to
achieve the same then we can consider the current approach as well.

--
With Regards,
Amit Kapila.

#26

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: shveta malik (#20)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 1, 2025 at 4:32 PM shveta malik <shveta.malik@gmail.com> wrote:

Thanks for the patch. I tested it, please find a few comments:

1)
it hits an assert
(slotsync_reread_config()-->Assert(sync_replication_slots)) when API
is trying to sync and is in wait loop while in another session, I
enable sync_replication_slots using:

ALTER SYSTEM SET sync_replication_slots = 'on';
SELECT pg_reload_conf();

Assert:
025-08-01 10:55:43.637 IST [118576] STATEMENT: SELECT
pg_sync_replication_slots();
2025-08-01 10:55:51.730 IST [118563] LOG: received SIGHUP, reloading
configuration files
2025-08-01 10:55:51.731 IST [118563] LOG: parameter
"sync_replication_slots" changed to "on"
TRAP: failed Assert("sync_replication_slots"), File: "slotsync.c",
Line: 1334, PID: 118576
postgres: shveta postgres [local]
SELECT(ExceptionalCondition+0xbb)[0x61df0160e090]
postgres: shveta postgres [local] SELECT(+0x6520dc)[0x61df0133a0dc]
2025-08-01 10:55:51.739 IST [118666] ERROR: cannot synchronize
replication slots concurrently
postgres: shveta postgres [local] SELECT(+0x6522b2)[0x61df0133a2b2]
postgres: shveta postgres [local] SELECT(+0x650664)[0x61df01338664]
postgres: shveta postgres [local] SELECT(+0x650cf8)[0x61df01338cf8]
postgres: shveta postgres [local] SELECT(+0x6513ea)[0x61df013393ea]
postgres: shveta postgres [local] SELECT(+0x6519df)[0x61df013399df]
postgres: shveta postgres [local]
SELECT(SyncReplicationSlots+0xbb)[0x61df0133af60]
postgres: shveta postgres [local]
SELECT(pg_sync_replication_slots+0x1b1)[0x61df01357e52]

Fixed.

2)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot synchronize replication slots when"
+ " standby promotion is ongoing")));
I think better error message will be:
"exiting from slot synchronization as promotion is triggered"

This will be better suited in log file as well after below wait statements:
LOG: continuing to wait for remote slot "failover_slot" LSN
(0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060)
and catalog xmin (757)
STATEMENT: SELECT pg_sync_replication_slots();

Fixed.

3)
API dumps this when it is waiting for primary:

----
LOG: could not synchronize replication slot "failover_slot2"
DETAIL: Synchronization could lead to data loss, because the remote
slot needs WAL at LSN 0/03066E70 and catalog xmin 755, but the standby
has LSN 0/03066E70 and catalog xmin 770.
STATEMENT: SELECT pg_sync_replication_slots();
LOG: waiting for remote slot "failover_slot2" LSN (0/3066E70) and
catalog xmin (755) to pass local slot LSN (0/3066E70) and catalog xmin
(770)
STATEMENT: SELECT pg_sync_replication_slots();
LOG: continuing to wait for remote slot "failover_slot2" LSN
(0/3066E70) and catalog xmin (755) to pass local slot LSN (0/3066E70)
and catalog xmin (770)
STATEMENT: SELECT pg_sync_replication_slots();
----

Unsure if we shall still dump 'could not synchronize..' when it is
going to retry until it succeeds? The concerned log gives a feeling
that we are done trying and could not synchronize it. What do you
think?

I've modified the log to now say, "initial sync of replication slot
\"%s\" failed; will keep retrying"

On Fri, Aug 1, 2025 at 7:20 PM shveta malik <shveta.malik@gmail.com> wrote:

A few more comments:

+ if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
+ {
+ ereport(WARNING,
+ errmsg("aborting initial sync for slot \"%s\"",
+    remote_slot->name),
+ errdetail("This slot was not found on the primary server."));
+
+ pfree(cmd.data);
+ walrcv_clear_result(res);
+
+ return false;
+ }

We may have 'aborting sync for slot', can remove 'initial'.

Fixed.

5)
I tried a test where there were 4 slots on the publisher, where one
was getting used while the others were not. Initiated
pg_sync_replication_slots on standby. Forced unused slots to be
invalidated by setting idle_replication_slot_timeout=60 on primary,
due to which API finished but gave a warning:

postgres=# SELECT pg_sync_replication_slots();
WARNING: aborting initial sync for slot "failover_slot"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot2"
DETAIL: This slot was invalidated on the primary server.
WARNING: aborting initial sync for slot "failover_slot3"
DETAIL: This slot was invalidated on the primary server.
pg_sync_replication_slots
---------------------------

(1 row)

Do we need these warnings here? I think we can have it as a LOG rather
than having it on console. Thoughts?

If we inclined towards WARNING here, will ti be better to have it as
a single line:

WARNING: aborting sync for slot "failover_slot" as the slot was
invalidated on primary
WARNING: aborting sync for slot "failover_slot1" as the slot was
invalidated on primary
WARNING: aborting sync for slot "failover_slot2" as the slot was
invalidated on primary

I've changed it to LOG now.

6)
- * We do not drop the slot because the restart_lsn can be ahead of the
- * current location when recreating the slot in the next cycle. It may
- * take more time to create such a slot. Therefore, we keep this slot
- * and attempt the synchronization in the next cycle.
+ * If we're in the slotsync worker, we retain the slot and retry in the
+ * next cycle. The restart_lsn might advance by then, allowing the slot
+ * to be created successfully later.
*/

I like the previous command better as it was conveying the side-effect
of dropping the slot herer. Can we try to incorporate the previous
comment here and make it specific to slotsync workers?

Reverted to the previous version.

Attaching patch v4 which addresses these comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v4-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchapplication/octet-stream; name=v4-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchDownload

From 8df6afc21824e590c056ef9419d72002ba4da114 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 5 Aug 2025 21:49:03 -0400
Subject: [PATCH v4] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func.sgml                          |   4 +-
 doc/src/sgml/logicaldecoding.sgml               |  40 ++---
 src/backend/replication/logical/slotsync.c      | 226 ++++++++++++++++++++++--
 src/backend/utils/activity/wait_event_names.txt |   1 +
 4 files changed, 229 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af..4092677 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -30034,9 +30034,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 593f784..edad0e9 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -364,18 +364,23 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
    <sect2 id="logicaldecoding-replication-slots-synchronization">
     <title>Replication Slot Synchronization</title>
     <para>
-     The logical replication slots on the primary can be synchronized to
-     the hot standby by using the <literal>failover</literal> parameter of
+     The logical replication slots on the primary can be enabled for
+     synchronization to the hot standby by using the
+     <literal>failover</literal> parameter of
      <link linkend="pg-create-logical-replication-slot">
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +403,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot may have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 2f0c08b..01df8ea 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -146,6 +146,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -211,7 +212,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		 * impact the users, so we used DEBUG1 level to log the message.
 		 */
 		ereport(slot->data.persistency == RS_TEMPORARY ? LOG : DEBUG1,
-				errmsg("could not synchronize replication slot \"%s\"",
+				errmsg("initial sync of replication slot \"%s\" failed; will keep retrying",
 					   remote_slot->name),
 				errdetail("Synchronization could lead to data loss, because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the standby has LSN %X/%08X and catalog xmin %u.",
 						  LSN_FORMAT_ARGS(remote_slot->restart_lsn),
@@ -550,6 +551,185 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
 }
 
 /*
+ * Wait for remote slot to pass locally reserved position.
+ *
+ * Return true if remote_slot could catch up with the locally reserved
+ * position. Return false in all other cases.
+ */
+static bool
+wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
+{
+#define SLOT_QUERY_COLUMN_COUNT 4
+
+	StringInfoData cmd;
+	int			   wait_iterations = 0;
+
+	Assert(!AmLogicalSlotSyncWorkerProcess());
+
+	ereport(LOG,
+			errmsg("waiting for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
+				   " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
+				   remote_slot->name,
+				   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+				   remote_slot->catalog_xmin,
+				   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+				   MyReplicationSlot->data.catalog_xmin));
+
+	initStringInfo(&cmd);
+	appendStringInfo(&cmd,
+					 "SELECT invalidation_reason IS NOT NULL, restart_lsn,"
+					 " confirmed_flush_lsn, catalog_xmin"
+					 " FROM pg_catalog.pg_replication_slots"
+					 " WHERE slot_name = %s",
+					 quote_literal_cstr(remote_slot->name));
+
+	for (;;)
+	{
+		bool		new_invalidated;
+		XLogRecPtr	new_restart_lsn;
+		XLogRecPtr	new_confirmed_lsn;
+		TransactionId new_catalog_xmin;
+		WalRcvExecResult *res;
+		TupleTableSlot *tupslot;
+		Datum		d;
+		int			rc;
+		int			col = 0;
+		bool		isnull;
+		Oid			slotRow[SLOT_QUERY_COLUMN_COUNT] = {BOOLOID, LSNOID, LSNOID, XIDOID};
+
+		/* Handle any termination request if any */
+		ProcessSlotSyncInterrupts(wrconn);
+
+		res = walrcv_exec(wrconn, cmd.data, SLOT_QUERY_COLUMN_COUNT, slotRow);
+
+		if (res->status != WALRCV_OK_TUPLES)
+			ereport(ERROR,
+					errmsg("could not fetch slot \"%s\" info from the"
+						   " primary server: %s",
+						   remote_slot->name, res->err));
+
+		tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+		if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
+		{
+			ereport(LOG,
+					errmsg("aborting sync for slot \"%s\"",
+						   remote_slot->name),
+					errdetail("This slot was not found on the primary server."));
+
+			pfree(cmd.data);
+			walrcv_clear_result(res);
+
+			return false;
+		}
+
+		/*
+		 * It is possible that the slot was invalidated on the primary, if so
+		 * handle accordingly.
+		 */
+		new_invalidated = DatumGetBool(slot_getattr(tupslot, ++col, &isnull));
+		Assert(!isnull);
+
+		if (new_invalidated)
+		{
+			/*
+			 * The slot won't be persisted by the caller; it will be cleaned
+			 * up at the end of synchronization.
+			 */
+			ereport(WARNING,
+					errmsg("aborting initial sync for slot \"%s\"",
+						   remote_slot->name),
+					errdetail("This slot was invalidated on the primary server."));
+
+			pfree(cmd.data);
+			ExecClearTuple(tupslot);
+			walrcv_clear_result(res);
+
+			return false;
+		}
+
+		/* Any slot with NULL in these fields should not have made it this far */
+		d = slot_getattr(tupslot, ++col, &isnull);
+		Assert(!isnull);
+		new_restart_lsn = DatumGetLSN(d);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		Assert(!isnull);
+		new_confirmed_lsn = DatumGetLSN(d);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		Assert(!isnull);
+		new_catalog_xmin = DatumGetTransactionId(d);
+
+		ExecClearTuple(tupslot);
+		walrcv_clear_result(res);
+
+		if (new_restart_lsn >= MyReplicationSlot->data.restart_lsn &&
+			TransactionIdFollowsOrEquals(new_catalog_xmin,
+										 MyReplicationSlot->data.catalog_xmin))
+		{
+			/* Update new values in remote_slot */
+			remote_slot->restart_lsn = new_restart_lsn;
+			remote_slot->confirmed_lsn = new_confirmed_lsn;
+			remote_slot->catalog_xmin = new_catalog_xmin;
+
+			ereport(LOG,
+					errmsg("wait over for remote slot \"%s\" as its LSN (%X/%X)"
+						   " and catalog xmin (%u) has now passed local slot LSN"
+						   " (%X/%X) and catalog xmin (%u)",
+						   remote_slot->name,
+						   LSN_FORMAT_ARGS(new_restart_lsn),
+						   new_catalog_xmin,
+						   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+						   MyReplicationSlot->data.catalog_xmin));
+
+			pfree(cmd.data);
+
+			return true;
+		}
+
+		/*
+		 * If we've been promoted, then no point continuing.
+		 */
+		if (SlotSyncCtx->stopSignaled)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("exiting from slot synchronization as"
+							" promotion is triggered")));
+			pfree(cmd.data);
+
+			return false;
+		}
+
+		/*
+		 * XXX: Is waiting for 2 seconds before retrying enough or more or
+		 * less?
+		 */
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					   2000L,
+					   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+		if (rc & WL_LATCH_SET)
+			ResetLatch(MyLatch);
+
+		/* log a message every ten seconds */
+		wait_iterations++;
+		if (wait_iterations % 5 == 0)
+		{
+			ereport(LOG,
+					errmsg("continuing to wait for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
+						   " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
+						   remote_slot->name,
+						   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+						   remote_slot->catalog_xmin,
+						   LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+						   MyReplicationSlot->data.catalog_xmin));
+		}
+	}
+}
+
+/*
  * If the remote restart_lsn and catalog_xmin have caught up with the
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
@@ -558,7 +738,8 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(WalReceiverConn *wrconn,
+									 RemoteSlot *remote_slot, Oid remote_dbid)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -577,12 +758,30 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * If we're in the slotsync worker, we do not drop the slot because the
+		 * restart_lsn can be ahead of the current location when recreating
+		 * the slot in the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the synchronization
+		 * in the next cycle.
 		 */
-		return false;
+		if (AmLogicalSlotSyncWorkerProcess())
+			return false;
+
+		/*
+		 * For SQL API synchronization, we wait for the remote slot to catch up
+		 * here, since we can't assume the SQL API will be called again soon.
+		 * We will retry the sync once the slot catches up.
+		 *
+		 * Note: This will return false if a promotion is triggered on the
+		 * standby while waiting, in which case we stop syncing and drop the
+		 * temporary slot.
+		 */
+		if (!wait_for_primary_slot_catchup(wrconn, remote_slot))
+			return false;
+		else
+			update_local_synced_slot(remote_slot, remote_dbid,
+									 &found_consistent_snapshot,
+									 &remote_slot_precedes);
 	}
 
 	/*
@@ -622,7 +821,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
+					 Oid remote_dbid)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +915,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/* Slot not ready yet, let's attempt to make it sync-ready now. */
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
-			slot_updated = update_and_persist_local_synced_slot(remote_slot,
+			slot_updated = update_and_persist_local_synced_slot(wrconn,
+																remote_slot,
 																remote_dbid);
 		}
 
@@ -785,7 +986,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(wrconn, remote_slot, remote_dbid);
 
 		slot_updated = true;
 	}
@@ -927,7 +1128,8 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(wrconn, remote_slot,
+												  remote_dbid);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
@@ -1131,7 +1333,7 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d..9fa36ab 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -64,6 +64,7 @@ LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication paralle
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
 REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
1.8.3.1

#27

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: Amit Kapila (#25)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Aug 5, 2025 at 4:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 5, 2025 at 9:28 AM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Aug 4, 2025 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 4, 2025 at 12:19 PM shveta malik <shveta.malik@gmail.com> wrote:

If we want to avoid continuously syncing newly added slots in later
cycles and instead focus only on the ones that failed to sync during
the first attempt, one approach is to maintain a list of failed slots
from the initial cycle and only retry those in subsequent attempts.
But this will add complexity to the implementation.

There will be some additional code for this but overall it improves
the code in the lower level functions. We may want to use the existing
remote_slot list for this purpose.

The current proposed change in low-level functions appears to be
difficult to maintain, especially the change proposed in
update_and_persist_local_synced_slot(). If we can find a better way to
achieve the same then we can consider the current approach as well.

Next patch, I'll work on addressing this comment. I'll need to
restructure the code to make this happen.

regards,
Ajin Cherian
Fujitsu Australia

#28

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Ajin Cherian (#27)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Aug 6, 2025 at 7:35 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Tue, Aug 5, 2025 at 4:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 5, 2025 at 9:28 AM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Aug 4, 2025 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 4, 2025 at 12:19 PM shveta malik <shveta.malik@gmail.com> wrote:

If we want to avoid continuously syncing newly added slots in later
cycles and instead focus only on the ones that failed to sync during
the first attempt, one approach is to maintain a list of failed slots
from the initial cycle and only retry those in subsequent attempts.
But this will add complexity to the implementation.

There will be some additional code for this but overall it improves
the code in the lower level functions. We may want to use the existing
remote_slot list for this purpose.

The current proposed change in low-level functions appears to be
difficult to maintain, especially the change proposed in
update_and_persist_local_synced_slot(). If we can find a better way to
achieve the same then we can consider the current approach as well.

Next patch, I'll work on addressing this comment. I'll need to
restructure the code to make this happen.

Okay, thanks Ajin. I will resume review after this comment is
addressed as I am assuming that the new logic will get rid of most of
the current wait logic and thus it makes sense to review it after it
is addressed.

thanks
Shveta

#29

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

5 months ago

In reply to: shveta malik (#28)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Aug 6, 2025 at 8:48 AM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Aug 6, 2025 at 7:35 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Tue, Aug 5, 2025 at 4:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 5, 2025 at 9:28 AM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Aug 4, 2025 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 4, 2025 at 12:19 PM shveta malik <shveta.malik@gmail.com> wrote:

If we want to avoid continuously syncing newly added slots in later
cycles and instead focus only on the ones that failed to sync during
the first attempt, one approach is to maintain a list of failed slots
from the initial cycle and only retry those in subsequent attempts.
But this will add complexity to the implementation.

There will be some additional code for this but overall it improves
the code in the lower level functions. We may want to use the existing
remote_slot list for this purpose.

The current proposed change in low-level functions appears to be
difficult to maintain, especially the change proposed in
update_and_persist_local_synced_slot(). If we can find a better way to
achieve the same then we can consider the current approach as well.

Next patch, I'll work on addressing this comment. I'll need to
restructure the code to make this happen.

Okay, thanks Ajin. I will resume review after this comment is
addressed as I am assuming that the new logic will get rid of most of
the current wait logic and thus it makes sense to review it after it
is addressed.

There's also a minor merge conflict because func.sgml is not split
into multiple files.

--
Best Wishes,
Ashutosh Bapat

#30

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: Amit Kapila (#25)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Aug 5, 2025 at 4:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 5, 2025 at 9:28 AM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Aug 4, 2025 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 4, 2025 at 12:19 PM shveta malik <shveta.malik@gmail.com> wrote:

If we want to avoid continuously syncing newly added slots in later
cycles and instead focus only on the ones that failed to sync during
the first attempt, one approach is to maintain a list of failed slots
from the initial cycle and only retry those in subsequent attempts.
But this will add complexity to the implementation.

There will be some additional code for this but overall it improves
the code in the lower level functions. We may want to use the existing
remote_slot list for this purpose.

The current proposed change in low-level functions appears to be
difficult to maintain, especially the change proposed in
update_and_persist_local_synced_slot(). If we can find a better way to
achieve the same then we can consider the current approach as well.

Right. I've reworked the design to have the wait at a much lower
level. I've also used a single WAIT EVENT -
REPLICATION_SLOTSYNC_PRIMARY_CATCHUP for both the slotsync worker and
the sync API.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v5-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchapplication/octet-stream; name=v5-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchDownload

From b11c33159de217d21c188cfa18af0399e1277e0d Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 11 Aug 2025 03:44:55 -0400
Subject: [PATCH v5] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml               |   4 +-
 doc/src/sgml/logicaldecoding.sgml               |  40 +--
 src/backend/replication/logical/slotsync.c      | 437 ++++++++++++++++++++----
 src/backend/utils/activity/wait_event_names.txt |   2 +-
 4 files changed, 383 insertions(+), 100 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 446fdfe..3608610 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 77c720c..6e4251a 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -364,18 +364,23 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
    <sect2 id="logicaldecoding-replication-slots-synchronization">
     <title>Replication Slot Synchronization</title>
     <para>
-     The logical replication slots on the primary can be synchronized to
-     the hot standby by using the <literal>failover</literal> parameter of
+     The logical replication slots on the primary can be enabled for
+     synchronization to the hot standby by using the
+     <literal>failover</literal> parameter of
      <link linkend="pg-create-logical-replication-slot">
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +403,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 3773844..f9eec0b 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -113,6 +113,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -146,6 +147,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -166,7 +168,8 @@ static void update_synced_slots_inactive_since(void);
 static bool
 update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 						 bool *found_consistent_snapshot,
-						 bool *remote_slot_precedes)
+						 bool *remote_slot_precedes,
+						 int   sync_iterations)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		updated_xmin_or_lsn = false;
@@ -209,15 +212,21 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		 * to understand why the slot is not sync-ready. In the case of a
 		 * persistent slot, it would be a more common case and won't directly
 		 * impact the users, so we used DEBUG1 level to log the message.
+		 *
+		 * If called from pg_sync_replication_slots(), log message only for
+		 * the first iteration.
 		 */
-		ereport(slot->data.persistency == RS_TEMPORARY ? LOG : DEBUG1,
-				errmsg("could not synchronize replication slot \"%s\"",
+		if (AmLogicalSlotSyncWorkerProcess() || sync_iterations == 1)
+		{
+			ereport(slot->data.persistency == RS_TEMPORARY ? LOG : DEBUG1,
+				errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
 					   remote_slot->name),
-				errdetail("Synchronization could lead to data loss, because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the standby has LSN %X/%08X and catalog xmin %u.",
-						  LSN_FORMAT_ARGS(remote_slot->restart_lsn),
-						  remote_slot->catalog_xmin,
-						  LSN_FORMAT_ARGS(slot->data.restart_lsn),
-						  slot->data.catalog_xmin));
+				errdetail("Attempting Synchronization could lead to data loss, because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the standby has LSN %X/%08X and catalog xmin %u.",
+				LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+				remote_slot->catalog_xmin,
+				LSN_FORMAT_ARGS(slot->data.restart_lsn),
+				slot->data.catalog_xmin));
+		}
 
 		if (remote_slot_precedes)
 			*remote_slot_precedes = true;
@@ -558,7 +567,9 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(WalReceiverConn * wrconn,
+	RemoteSlot * remote_slot, Oid remote_dbid, bool *sync_start_pending,
+	int sync_iterations)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -566,7 +577,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 
 	(void) update_local_synced_slot(remote_slot, remote_dbid,
 									&found_consistent_snapshot,
-									&remote_slot_precedes);
+									&remote_slot_precedes,
+									sync_iterations);
 
 	/*
 	 * Check if the primary server has caught up. Refer to the comment atop
@@ -575,13 +587,40 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 	if (remote_slot_precedes)
 	{
 		/*
-		 * The remote slot didn't catch up to locally reserved position.
+		 * The remote slot didn't catch up to locally reserved
+		 * position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.
+		 *
+		 * If called from pg_sync_replication_slots(), set flag
+		 * indicating that the slot is not yet sync ready, so that it
+		 * can be retried. Log a message once every 5 iterations,
+		 * which should be around 10 seconds.
 		 */
+		if (!AmLogicalSlotSyncWorkerProcess())
+		{
+			if (sync_start_pending)
+				*sync_start_pending = true;
+
+			if (sync_iterations % 5 == 0)
+			{
+				/* Log a message every ten seconds */
+				ereport(LOG,
+						errmsg("waiting for remote slot \"%s\" LSN (%X/%X)"
+						" and catalog xmin %u) to pass local slot LSN"
+						" (%X/%X) and catalog xmin (%u)",
+						remote_slot->name,
+						LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+						remote_slot->catalog_xmin,
+						LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+						MyReplicationSlot->data.catalog_xmin));
+			}
+		}
+
 		return false;
 	}
 
@@ -622,7 +661,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn * wrconn, RemoteSlot * remote_slot,
+			Oid remote_dbid, bool *sync_start_pending, int sync_iterations)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,8 +755,11 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/* Slot not ready yet, let's attempt to make it sync-ready now. */
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
-			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+			slot_updated = update_and_persist_local_synced_slot(wrconn,
+																remote_slot,
+																remote_dbid,
+																sync_start_pending,
+																sync_iterations);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -738,7 +781,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 										   LSN_FORMAT_ARGS(remote_slot->confirmed_lsn)));
 
 			slot_updated = update_local_synced_slot(remote_slot, remote_dbid,
-													NULL, NULL);
+													NULL, NULL, sync_iterations);
 		}
 	}
 	/* Otherwise create the slot first. */
@@ -785,7 +828,9 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(wrconn, remote_slot, remote_dbid,
+											 sync_start_pending,
+											 sync_iterations);
 
 		slot_updated = true;
 	}
@@ -796,15 +841,17 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Gets the failover logical slots info from the primary server and creates
+ * a list of remote slots that need to be synchronized locally.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ * NOTE: Caller must ensure a transaction is active before calling this function.
+ *
+ * Returns a list of RemoteSlot structures, or NIL if no slots need syncing.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -813,21 +860,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
 	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
 		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
 		" database, invalidation_reason"
 		" FROM pg_catalog.pg_replication_slots"
 		" WHERE failover and NOT temporary";
 
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
-	{
-		StartTransactionCommand();
-		started_tx = true;
-	}
-
 	/* Execute the query */
 	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
@@ -835,7 +873,7 @@ synchronize_slots(WalReceiverConn *wrconn)
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
+	/* Construct the remote_slot tuple and build list of slots to sync */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -912,6 +950,180 @@ synchronize_slots(WalReceiverConn *wrconn)
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Update remote slots list with current values.
+ *
+ * Takes a list of RemoteSlot structures and queries the primary server to
+ * get updated values for those specific slots. This is useful for refreshing
+ * slot information without fetching all failover slots again.
+ *
+ * NOTE: Caller must ensure a transaction is active before calling this
+ * function.
+ *
+ * Parameters: wrconn - Connection to the primary server remote_slot_list -
+ * List of RemoteSlot structures to update
+ *
+ * Returns the updated list, or the original list if query fails. Slots that
+ * no longer exist on the primary will be removed from the list.
+ */
+static List *
+refresh_remote_slots(WalReceiverConn * wrconn, List * remote_slot_list)
+{
+#define UPDATE_SLOTSYNC_COLUMN_COUNT 10
+	Oid		slotRow[UPDATE_SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
+	LSNOID, XIDOID, BOOLOID, LSNOID, BOOLOID, TEXTOID, TEXTOID};
+	WalRcvExecResult   *res;
+	TupleTableSlot	   *tupslot;
+	List			   *updated_slot_list = NIL;
+	StringInfoData		query;
+	ListCell		   *lc;
+	bool				first_slot = true;
+
+	/* If the input list is empty, return it as-is */
+	if (remote_slot_list == NIL)
+		return remote_slot_list;
+
+	/* Build query with slot names from the input list */
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary AND slot_name IN (");
+
+	/* Add slot names to the IN clause */
+	foreach(lc, remote_slot_list)
+	{
+		RemoteSlot     *remote_slot = (RemoteSlot *) lfirst(lc);
+
+		if (!first_slot)
+			appendStringInfoString(&query, ", ");
+
+		appendStringInfo(&query, "'%s'", remote_slot->name);
+		first_slot = false;
+	}
+	appendStringInfoString(&query, ")");
+
+	/* Execute the query */
+	res = walrcv_exec(wrconn, query.data, UPDATE_SLOTSYNC_COLUMN_COUNT, slotRow);
+	if (res->status != WALRCV_OK_TUPLES)
+	{
+		ereport(WARNING,
+		errmsg("could not fetch updated failover logical slots info"
+			   " from the primary server: %s",
+			   res->err));
+		pfree(query.data);
+		return remote_slot_list;
+	}
+
+	/* Process the updated slot information */
+	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
+	{
+		bool		isnull;
+		RemoteSlot     *remote_slot = palloc0(sizeof(RemoteSlot));
+		Datum		d;
+		int		col = 0;
+
+		remote_slot->name = TextDatumGetCString(slot_getattr(tupslot, ++col,
+								  &isnull));
+		Assert(!isnull);
+
+		remote_slot->plugin = TextDatumGetCString(slot_getattr(tupslot, ++col,
+								  &isnull));
+		Assert(!isnull);
+
+		/*
+		 * Handle possible null values for LSN and Xmin if slot is
+		 * invalidated on the primary server.
+		 */
+		d = slot_getattr(tupslot, ++col, &isnull);
+		remote_slot->confirmed_lsn = isnull ? InvalidXLogRecPtr :
+			DatumGetLSN(d);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		remote_slot->restart_lsn = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		remote_slot->catalog_xmin = isnull ? InvalidTransactionId :
+			DatumGetTransactionId(d);
+
+		remote_slot->two_phase = DatumGetBool(slot_getattr(tupslot, ++col,
+								   &isnull));
+		Assert(!isnull);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		remote_slot->two_phase_at = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
+
+		remote_slot->failover = DatumGetBool(slot_getattr(tupslot, ++col,
+								  &isnull));
+		Assert(!isnull);
+
+		remote_slot->database = TextDatumGetCString(slot_getattr(tupslot,
+							   ++col, &isnull));
+		Assert(!isnull);
+
+		d = slot_getattr(tupslot, ++col, &isnull);
+		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
+			GetSlotInvalidationCause(TextDatumGetCString(d));
+
+		/* Sanity check */
+		Assert(col == UPDATE_SLOTSYNC_COLUMN_COUNT);
+
+		/*
+		 * Apply the same ephemeral slot filtering as in
+		 * fetch_remote_slots. Skip slots that are in RS_EPHEMERAL
+		 * state (invalid LSNs/xmin but not explicitly invalidated).
+		 */
+		if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
+			 XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
+			 !TransactionIdIsValid(remote_slot->catalog_xmin)) &&
+			 remote_slot->invalidated == RS_INVAL_NONE)
+			pfree(remote_slot);
+		else
+			/* Add to updated list */
+			updated_slot_list = lappend(updated_slot_list, remote_slot);
+
+		ExecClearTuple(tupslot);
+	}
+
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	/*
+	 * Free the original list structures (but not the slot names, as
+	 * they're reused)
+	 */
+	foreach(lc, remote_slot_list)
+	{
+		RemoteSlot     *old_slot = (RemoteSlot *) lfirst(lc);
+		pfree(old_slot);
+	}
+	list_free(remote_slot_list);
+
+	return updated_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  List **pending_sync_start_slots, int sync_iterations)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -919,6 +1131,7 @@ synchronize_slots(WalReceiverConn *wrconn)
 	foreach_ptr(RemoteSlot, remote_slot, remote_slot_list)
 	{
 		Oid			remote_dbid = get_database_oid(remote_slot->database, false);
+		bool		sync_start_pending = false;
 
 		/*
 		 * Use shared lock to prevent a conflict with
@@ -927,19 +1140,16 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(wrconn, remote_slot,
+					  remote_dbid, &sync_start_pending, sync_iterations);
+
+		/* Only append to list if caller wants it and sync is pending */
+		if (pending_sync_start_slots != NULL && sync_start_pending)
+			*pending_sync_start_slots = lappend(*pending_sync_start_slots, remote_slot);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1131,7 +1341,7 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1252,31 +1462,38 @@ slotsync_worker_onexit(int code, Datum arg)
  * sync-cycles is reset to the minimum (200ms).
  */
 static void
-wait_for_slot_activity(bool some_slot_updated)
+wait_for_slot_activity(bool some_slot_updated, bool called_from_api)
 {
-	int			rc;
+	int		rc;
+	int		wait_time;
 
-	if (!some_slot_updated)
-	{
-		/*
-		 * No slots were updated, so double the sleep time, but not beyond the
-		 * maximum allowable value.
-		 */
-		sleep_ms = Min(sleep_ms * 2, MAX_SLOTSYNC_WORKER_NAPTIME_MS);
-	}
-	else
-	{
+	if (called_from_api) {
 		/*
-		 * Some slots were updated since the last sleep, so reset the sleep
-		 * time.
+		 * When called from pg_sync_replication_slots, use a fixed 2
+		 * second wait time.
 		 */
-		sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
+		wait_time = SLOTSYNC_API_NAPTIME_MS;
+	} else {
+		if (!some_slot_updated) {
+			/*
+			 * No slots were updated, so double the sleep time,
+			 * but not beyond the maximum allowable value.
+			 */
+			sleep_ms = Min(sleep_ms * 2, MAX_SLOTSYNC_WORKER_NAPTIME_MS);
+		} else {
+			/*
+			 * Some slots were updated since the last sleep, so
+			 * reset the sleep time.
+			 */
+			sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
+		}
+		wait_time = sleep_ms;
 	}
 
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   wait_time,
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,12 +1722,28 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		List	       *remote_slots;
+		bool		started_tx = false;
 
 		ProcessSlotSyncInterrupts(wrconn);
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState()) {
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL, 0);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
-		wait_for_slot_activity(some_slot_updated);
+		wait_for_slot_activity(some_slot_updated, false);
 	}
 
 	/*
@@ -1736,19 +1969,85 @@ slotsync_failure_callback(int code, Datum arg)
 }
 
 /*
- * Synchronize the failover enabled replication slots using the specified
- * primary server connection.
+ * Synchronize failover enabled replication slots using the specified primary
+ * server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early is promotion is triggered.
  */
 void
-SyncReplicationSlots(WalReceiverConn *wrconn)
+SyncReplicationSlots(WalReceiverConn * wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List	       *remote_slots;
+		bool		started_tx = false;
+		int			sync_iterations = 0;
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState()) {
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn);
+
+		/* Retry until all slots are sync ready atleast */
+		for (;;)
+		{
+			bool		some_slot_updated = false;
+			List	       *pending_sync_start_slots = NIL;
+
+			sync_iterations++;
+
+			/* Refresh remote slot data */
+			remote_slots = refresh_remote_slots(wrconn, remote_slots);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+						 &pending_sync_start_slots, sync_iterations);
+
+			/* Done if all slots are atleast sync ready */
+			if (pending_sync_start_slots == NIL)
+				break;
+			else
+			{
+				list_free(pending_sync_start_slots);
+				pending_sync_start_slots = NIL;
+
+				/* wait for 2 seconds before retrying */
+				wait_for_slot_activity(some_slot_updated, true);
+
+				/*
+				 * If we've been promoted, then no point
+				 * continuing.
+				 */
+				if (SlotSyncCtx->stopSignaled)
+				{
+					ereport(ERROR,
+						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						 errmsg("exiting from slot synchronization as"
+								" promotion is triggered")));
+					break;
+				}
+
+				/* Handle any termination request if any */
+				ProcessSlotSyncInterrupts(wrconn);
+			}
+		}
+
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d..3497f0f 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
1.8.3.1

#31

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: Ashutosh Bapat (#29)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 8, 2025 at 11:22 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

There's also a minor merge conflict because func.sgml is not split
into multiple files.

Yes, I fixed this.

regards,
Ajin Cherian
Fujitsu Australia

#32

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Ajin Cherian (#31)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Aug 11, 2025 at 1:37 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Aug 8, 2025 at 11:22 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

There's also a minor merge conflict because func.sgml is not split
into multiple files.

Yes, I fixed this.

Thanks for the patch. Please find a few comments:

1)
We can merge refresh_remote_slots and fetch_remote_slots by passing an
argument of remote_list. If no remote_list passed, fetch all failover
slots, else extend the query and fetch only the listed ones.

2)
We can get rid of 'sync_iterations' and the logic within, as I think
there is no need to distinguish between slotsync and API in terms of
logs.

3)
sync_start_pending is not needed to be passed to
update_and_persist_local_synced_slot(), as the output of this function
is good enough to tell whether slot is persisted or not.

4)
Also how about having sync-pending in SlotSyncCtxStruct. It can be set
unconditionally by both slotsync and API, but will be used by API. I
think it can simplify the code.

5)
We can get rid of 'pending_sync_start_slots', as it is not being used anywhere.

6)
Also we can mention in comments as to why we are using the old
remote_slots list in refresh_remote_slots() during subsequent cycles
of API rather than using only the pending-slot list.

thanks
Shveta

#33

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: shveta malik (#32)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Aug 13, 2025 at 2:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Aug 11, 2025 at 1:37 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Aug 8, 2025 at 11:22 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

There's also a minor merge conflict because func.sgml is not split
into multiple files.

Yes, I fixed this.

Thanks for the patch. Please find a few comments:

1)
We can merge refresh_remote_slots and fetch_remote_slots by passing an
argument of remote_list. If no remote_list passed, fetch all failover
slots, else extend the query and fetch only the listed ones.

Done.

2)
We can get rid of 'sync_iterations' and the logic within, as I think
there is no need to distinguish between slotsync and API in terms of
logs.

Done.

3)
sync_start_pending is not needed to be passed to
update_and_persist_local_synced_slot(), as the output of this function
is good enough to tell whether slot is persisted or not.

4)
Also how about having sync-pending in SlotSyncCtxStruct. It can be set
unconditionally by both slotsync and API, but will be used by API. I
think it can simplify the code.

Done.

5)
We can get rid of 'pending_sync_start_slots', as it is not being used anywhere.

Fixed.

6)
Also we can mention in comments as to why we are using the old
remote_slots list in refresh_remote_slots() during subsequent cycles
of API rather than using only the pending-slot list.

Done.

Patch v6 attached.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v6-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchapplication/octet-stream; name=v6-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchDownload

From 2e9b2d343f69c9d80d3a37249283b8ba9632b1d5 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 12 Aug 2025 23:02:37 -0400
Subject: [PATCH v6] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml               |   4 +-
 doc/src/sgml/logicaldecoding.sgml               |  40 +--
 src/backend/replication/logical/slotsync.c      | 334 ++++++++++++++++++------
 src/backend/utils/activity/wait_event_names.txt |   2 +-
 4 files changed, 262 insertions(+), 118 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 446fdfe..3608610 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 77c720c..6e4251a 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -364,18 +364,23 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
    <sect2 id="logicaldecoding-replication-slots-synchronization">
     <title>Replication Slot Synchronization</title>
     <para>
-     The logical replication slots on the primary can be synchronized to
-     the hot standby by using the <literal>failover</literal> parameter of
+     The logical replication slots on the primary can be enabled for
+     synchronization to the hot standby by using the
+     <literal>failover</literal> parameter of
      <link linkend="pg-create-logical-replication-slot">
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +403,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 3773844..09833aa 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -99,6 +99,8 @@ typedef struct SlotSyncCtxStruct
 	bool		syncing;
 	time_t		last_start_time;
 	slock_t		mutex;
+	/* used by pg_sync_replication_slots() API only */
+	bool		slot_not_persisted;
 } SlotSyncCtxStruct;
 
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
@@ -113,6 +115,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -146,6 +149,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -211,13 +215,13 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		 * impact the users, so we used DEBUG1 level to log the message.
 		 */
 		ereport(slot->data.persistency == RS_TEMPORARY ? LOG : DEBUG1,
-				errmsg("could not synchronize replication slot \"%s\"",
-					   remote_slot->name),
-				errdetail("Synchronization could lead to data loss, because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the standby has LSN %X/%08X and catalog xmin %u.",
-						  LSN_FORMAT_ARGS(remote_slot->restart_lsn),
-						  remote_slot->catalog_xmin,
-						  LSN_FORMAT_ARGS(slot->data.restart_lsn),
-						  slot->data.catalog_xmin));
+			errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
+				   remote_slot->name),
+			errdetail("Attempting Synchronization could lead to data loss, because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the standby has LSN %X/%08X and catalog xmin %u.",
+			LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+			remote_slot->catalog_xmin,
+			LSN_FORMAT_ARGS(slot->data.restart_lsn),
+			slot->data.catalog_xmin));
 
 		if (remote_slot_precedes)
 			*remote_slot_precedes = true;
@@ -558,7 +562,8 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(WalReceiverConn * wrconn,
+	RemoteSlot * remote_slot, Oid remote_dbid)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -575,13 +580,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 	if (remote_slot_precedes)
 	{
 		/*
-		 * The remote slot didn't catch up to locally reserved position.
+		 * The remote slot didn't catch up to locally reserved
+		 * position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle. Update flag, so that
+		 * API logic can retry.
 		 */
+		SlotSyncCtx->slot_not_persisted = true;
+
 		return false;
 	}
 
@@ -596,11 +606,17 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* update flag, so that we retry */
+		SlotSyncCtx->slot_not_persisted = true;
+
 		return false;
 	}
 
 	ReplicationSlotPersist();
 
+	/* slot has been persisted, no need to retry */
+	SlotSyncCtx->slot_not_persisted = false;
+
 	ereport(LOG,
 			errmsg("newly created replication slot \"%s\" is sync-ready now",
 				   remote_slot->name));
@@ -622,7 +638,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn * wrconn, RemoteSlot * remote_slot,
+					Oid remote_dbid)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +732,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/* Slot not ready yet, let's attempt to make it sync-ready now. */
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
-			slot_updated = update_and_persist_local_synced_slot(remote_slot,
+			slot_updated = update_and_persist_local_synced_slot(wrconn,
+																remote_slot,
 																remote_dbid);
 		}
 
@@ -785,7 +803,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(wrconn, remote_slot, remote_dbid);
 
 		slot_updated = true;
 	}
@@ -796,46 +814,87 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch or refresh remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If remote_slot_list is NIL, fetches all failover logical slots from the
+ * primary server. If remote_slot_list is provided, refreshes only those
+ * specific slots with current values from the primary server.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ * NOTE: Caller must ensure a transaction is active before calling this
+ * function.
+ *
+ * Parameters:
+ *   wrconn - Connection to the primary server
+ *   remote_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ *                      fetch all failover slots
+ *
+ * Returns a list of RemoteSlot structures. If refreshing and the query fails,
+ * returns the original list. Slots that no longer exist on the primary will
+ * be removed from the list.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_or_refresh_remote_slots(WalReceiverConn *wrconn, List *remote_slot_list)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
 	LSNOID, XIDOID, BOOLOID, LSNOID, BOOLOID, TEXTOID, TEXTOID};
-
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
-	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	List	   *updated_slot_list = NIL;
+	StringInfoData query;
+	ListCell   *lc;
+	bool		is_refresh = (remote_slot_list != NIL);
+	bool		first_slot = true;
+
+	/* Build the query based on whether we're fetching all or refreshing specific slots */
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+						   "SELECT slot_name, plugin, confirmed_flush_lsn,"
+						   " restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
+						   " database, invalidation_reason"
+						   " FROM pg_catalog.pg_replication_slots"
+						   " WHERE failover and NOT temporary");
+
+	if (is_refresh)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		/* Add IN clause for specific slot names */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, remote_slot_list)
+		{
+			RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", remote_slot->name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
-		ereport(ERROR,
-				errmsg("could not fetch failover logical slots info from the primary server: %s",
-					   res->err));
+	{
+		if (is_refresh)
+		{
+			ereport(WARNING,
+					errmsg("could not fetch updated failover logical slots info"
+						   " from the primary server: %s",
+						   res->err));
+			pfree(query.data);
+			return remote_slot_list; /* Return original list on refresh failure */
+		}
+		else
+		{
+			ereport(ERROR,
+					errmsg("could not fetch failover logical slots info from the primary server: %s",
+						   res->err));
+		}
+	}
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
+	/* Process the slot information */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -853,8 +912,8 @@ synchronize_slots(WalReceiverConn *wrconn)
 		Assert(!isnull);
 
 		/*
-		 * It is possible to get null values for LSN and Xmin if slot is
-		 * invalidated on the primary server, so handle accordingly.
+		 * Handle possible null values for LSN and Xmin if slot is
+		 * invalidated on the primary server.
 		 */
 		d = slot_getattr(tupslot, ++col, &isnull);
 		remote_slot->confirmed_lsn = isnull ? InvalidXLogRecPtr :
@@ -890,15 +949,8 @@ synchronize_slots(WalReceiverConn *wrconn)
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
-		 * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the
-		 * slot is valid, that means we have fetched the remote_slot in its
-		 * RS_EPHEMERAL state. In such a case, don't sync it; we can always
-		 * sync it in the next sync cycle when the remote_slot is persisted
-		 * and has valid lsn(s) and xmin values.
-		 *
-		 * XXX: In future, if we plan to expose 'slot->data.persistency' in
-		 * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL
-		 * slots in the first place.
+		 * Apply ephemeral slot filtering. Skip slots that are in RS_EPHEMERAL
+		 * state (invalid LSNs/xmin but not explicitly invalidated).
 		 */
 		if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
 			 XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
@@ -906,12 +958,42 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
-			remote_slot_list = lappend(remote_slot_list, remote_slot);
+			/* Add to updated list */
+			updated_slot_list = lappend(updated_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	/* If refreshing, free the original list structures */
+	if (is_refresh)
+	{
+		foreach(lc, remote_slot_list)
+		{
+			RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
+			pfree(old_slot);
+		}
+		list_free(remote_slot_list);
+	}
+
+	return updated_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -927,19 +1009,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(wrconn, remote_slot,
+								remote_dbid);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1131,7 +1206,7 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1252,31 +1327,38 @@ slotsync_worker_onexit(int code, Datum arg)
  * sync-cycles is reset to the minimum (200ms).
  */
 static void
-wait_for_slot_activity(bool some_slot_updated)
+wait_for_slot_activity(bool some_slot_updated, bool called_from_api)
 {
-	int			rc;
+	int		rc;
+	int		wait_time;
 
-	if (!some_slot_updated)
-	{
+	if (called_from_api) {
 		/*
-		 * No slots were updated, so double the sleep time, but not beyond the
-		 * maximum allowable value.
+		 * When called from pg_sync_replication_slots, use a fixed 2
+		 * second wait time.
 		 */
-		sleep_ms = Min(sleep_ms * 2, MAX_SLOTSYNC_WORKER_NAPTIME_MS);
-	}
-	else
-	{
-		/*
-		 * Some slots were updated since the last sleep, so reset the sleep
-		 * time.
-		 */
-		sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
+		wait_time = SLOTSYNC_API_NAPTIME_MS;
+	} else {
+		if (!some_slot_updated) {
+			/*
+			 * No slots were updated, so double the sleep time,
+			 * but not beyond the maximum allowable value.
+			 */
+			sleep_ms = Min(sleep_ms * 2, MAX_SLOTSYNC_WORKER_NAPTIME_MS);
+		} else {
+			/*
+			 * Some slots were updated since the last sleep, so
+			 * reset the sleep time.
+			 */
+			sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
+		}
+		wait_time = sleep_ms;
 	}
 
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   wait_time,
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,12 +1587,28 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		List	       *remote_slots;
+		bool		started_tx = false;
 
 		ProcessSlotSyncInterrupts(wrconn);
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState()) {
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_or_refresh_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots);
+		list_free_deep(remote_slots);
 
-		wait_for_slot_activity(some_slot_updated);
+		if (started_tx)
+			CommitTransactionCommand();
+
+		wait_for_slot_activity(some_slot_updated, false);
 	}
 
 	/*
@@ -1736,19 +1834,81 @@ slotsync_failure_callback(int code, Datum arg)
 }
 
 /*
- * Synchronize the failover enabled replication slots using the specified
- * primary server connection.
+ * Synchronize failover enabled replication slots using the specified primary
+ * server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early is promotion is triggered.
  */
 void
-SyncReplicationSlots(WalReceiverConn *wrconn)
+SyncReplicationSlots(WalReceiverConn * wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List	       *remote_slots;
+		bool		started_tx = false;
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState()) {
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_or_refresh_remote_slots(wrconn, NIL);
+
+		/* Retry until all slots are sync ready atleast */
+		for (;;)
+		{
+			bool		some_slot_updated = false;
+
+			/*
+			 * Refresh the remote slot data. We keep using the original slot
+			 * list, even if some slots are already sync ready, so that all
+			 * slots are updated with the latest status from the primary.
+			 */
+			remote_slots = fetch_or_refresh_remote_slots(wrconn, remote_slots);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots);
+
+			/* Done if all slots are atleast sync ready */
+			if (!SlotSyncCtx->slot_not_persisted)
+				break;
+			else
+			{
+				/* wait for 2 seconds before retrying */
+				wait_for_slot_activity(some_slot_updated, true);
+
+				/*
+				 * If we've been promoted, then no point
+				 * continuing.
+				 */
+				if (SlotSyncCtx->stopSignaled)
+				{
+					ereport(ERROR,
+						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+						 errmsg("exiting from slot synchronization as"
+								" promotion is triggered")));
+					break;
+				}
+
+				/* Handle any termination request if any */
+				ProcessSlotSyncInterrupts(wrconn);
+			}
+		}
+
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d..3497f0f 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
1.8.3.1

#34

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Ajin Cherian (#33)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Aug 14, 2025 at 7:28 AM Ajin Cherian <itsajin@gmail.com> wrote:

Patch v6 attached.

Thanks Ajin. Please find my comments:

1)
SyncReplicationSlots:
+ remote_slots = fetch_or_refresh_remote_slots(wrconn, NIL);
+
+ /* Retry until all slots are sync ready atleast */
+ for (;;)
+ {
+ bool some_slot_updated = false;
+
+ /*
+ * Refresh the remote slot data. We keep using the original slot
+ * list, even if some slots are already sync ready, so that all
+ * slots are updated with the latest status from the primary.
+ */
+ remote_slots = fetch_or_refresh_remote_slots(wrconn, remote_slots);

When the API begins, it seems we are fetching remote_list twice
before we even sync it once. We can get rid of
'fetch_or_refresh_remote_slots' from outside the loop and retain the
inside one. At first call, remote_slots will be NIL and thus it will
fetch all slots and in subsequent calls, it will be populated one.

2)
SyncReplicationSlots:
+ /*
+ * The syscache access in fetch_or_refresh_remote_slots() needs a
+ * transaction env.
+ */
+ if (!IsTransactionState()) {
+ StartTransactionCommand();
+ started_tx = true;
+ }

+ if (started_tx)
+ CommitTransactionCommand();

Shall we move these 2 inside fetch_or_refresh_remote_slots() (both
worker and APi flow) similar to how validate_remote_info() also has it
inside?

3)
SyncReplicationSlots:
+ /* Done if all slots are atleast sync ready */
+ if (!SlotSyncCtx->slot_not_persisted)
+ break;
+ else
+ {
+ /* wait for 2 seconds before retrying */
+ wait_for_slot_activity(some_slot_updated, true);

No need to have 'else' block here. The code can be put without having
'else' because 'if' when true, breaks from the loop.

4)
'fetch_or_refresh_remote_slots' can be renamed to 'fetch_remote_slots'
simply and then a comment can define an extra argument. Because
ultimately we are re-fetching some/all slots in both cases.

5)
In the case of API, wait_for_slot_activity() does not change its wait
time based on 'some_slot_updated'. I think we can pull 'WaitLatch,
ResetLatch' in API-function itself and lets not change worker's
wait_for_slot_activity().

6)
fetch_or_refresh_remote_slots:
+ {
+ if (is_refresh)
+ {
+ ereport(WARNING,
+ errmsg("could not fetch updated failover logical slots info"
+    " from the primary server: %s",
+    res->err));
+ pfree(query.data);
+ return remote_slot_list; /* Return original list on refresh failure */
+ }
+ else
+ {
+ ereport(ERROR,
+ errmsg("could not fetch failover logical slots info from the primary
server: %s",
+    res->err));
+ }
+ }

I think there is no need for different behaviour here for worker and
API. Since worker errors-out here, we can make API also error-out.

7)
+fetch_or_refresh_remote_slots(WalReceiverConn *wrconn, List *remote_slot_list)

We can name the argument as 'target_slot_list' and replace the name
'updated_slot_list' with 'remote_slot_list'.

8)
+ /* If refreshing, free the original list structures */
+ if (is_refresh)
+ {
+ foreach(lc, remote_slot_list)
+ {
+ RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
+ pfree(old_slot);
+ }
+ list_free(remote_slot_list);
+ }

We can get rid of 'is_refresh' and can simply check if
'target_slot_list != NIL', free it. We can use list_free_deep instead
of freeing each element. Having said that, it looks slightly odd to
free the list in this function, I will think more here. Meanwhile, we
can do this.

9)
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(WalReceiverConn * wrconn,
+ RemoteSlot * remote_slot, Oid remote_dbid)

We can get rid of wrconn as we are not using it. Same with wrconn
argument for synchronize_one_slot()

10)
+ /* used by pg_sync_replication_slots() API only */
+ bool slot_not_persisted;

We can move comment outside structure. We can first define it and then
say the above line.

11)
+ SlotSyncCtx->slot_not_persisted = false;

This may overwrite the 'slot_not_persisted' set for the previous slot
and ultimately make it 'false' at the end of cycle even though we had
few not-persisted slots in the beginning of cycle. Should it be:

SlotSyncCtx->slot_not_persisted |= false;

12)
Shall we rename this to : slot_persistence_pending (based on many
other modules using similar names: detach_pending, send_pending,
callback_pending)?

13)
- errmsg("could not synchronize replication slot \"%s\"",
-    remote_slot->name),
- errdetail("Synchronization could lead to data loss, because the
remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
standby has LSN %X/%08X and catalog xmin %u.",
-   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
-   remote_slot->catalog_xmin,
-   LSN_FORMAT_ARGS(slot->data.restart_lsn),
-   slot->data.catalog_xmin));
+ errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
+    remote_slot->name),
+ errdetail("Attempting Synchronization could lead to data loss,
because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
but the standby has LSN %X/%08X and catalog xmin %u.",
+ LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+ remote_slot->catalog_xmin,
+ LSN_FORMAT_ARGS(slot->data.restart_lsn),
+ slot->data.catalog_xmin));

We can retain the same message as it was put after a lot of
discussion. We can attempt to change if others comment. The idea is
since a worker dumps it in each subsequent cycle (if such a situation
arises), on the same basis now the API can also do so because it is
also performing multiple cycles now. Earlier I had suggested changing
it for API based on messages 'continuing to wait..' which are no
longer there now.

thanks
Shveta

#35

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

5 months ago

In reply to: shveta malik (#34)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Aug 14, 2025 at 12:14 PM shveta malik <shveta.malik@gmail.com> wrote:

8)
+ /* If refreshing, free the original list structures */
+ if (is_refresh)
+ {
+ foreach(lc, remote_slot_list)
+ {
+ RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
+ pfree(old_slot);
+ }
+ list_free(remote_slot_list);
+ }
We can get rid of 'is_refresh' and can simply check if
'target_slot_list != NIL', free it. We can use list_free_deep instead
of freeing each element. Having said that, it looks slightly odd to
free the list in this function, I will think more here. Meanwhile, we
can do this.

+1. The function prologue doesn't mention that the original list is
deep freed. So a caller may try to access it after this call, which
will lead to a crash. As a safe programming practice we should let the
caller free the original list if it is not needed anymore OR modify
the input list in-place and return it for the convenience of the
caller like all list_* interfaces. At least we should document this
behavior in the function prologue. You could also use foreach_ptr
instead of foreach.

13)
- errmsg("could not synchronize replication slot \"%s\"",
-    remote_slot->name),
- errdetail("Synchronization could lead to data loss, because the
remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
standby has LSN %X/%08X and catalog xmin %u.",
-   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
-   remote_slot->catalog_xmin,
-   LSN_FORMAT_ARGS(slot->data.restart_lsn),
-   slot->data.catalog_xmin));
+ errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
+    remote_slot->name),
+ errdetail("Attempting Synchronization could lead to data loss,
because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
but the standby has LSN %X/%08X and catalog xmin %u.",
+ LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+ remote_slot->catalog_xmin,
+ LSN_FORMAT_ARGS(slot->data.restart_lsn),
+ slot->data.catalog_xmin));
We can retain the same message as it was put after a lot of
discussion. We can attempt to change if others comment. The idea is
since a worker dumps it in each subsequent cycle (if such a situation
arises), on the same basis now the API can also do so because it is
also performing multiple cycles now. Earlier I had suggested changing
it for API based on messages 'continuing to wait..' which are no
longer there now.

Also we usually don't use capital letters at the start of the error
message. Any reason this is different?

Some more

+ * When called from pg_sync_replication_slots, use a fixed 2
+ * second wait time.

Function prologue doesn't mention this. Probably the prologue should
contain only the first sentence there. Rest of the prologues just
repeat comments in the function. The function is small enough that a
reader could read the details from the function instead of the
prologue.

+ wait_time = SLOTSYNC_API_NAPTIME_MS;
+ } else {

} else and { should be on separate lines.

--
Best Wishes,
Ashutosh Bapat

#36

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: shveta malik (#34)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Aug 14, 2025 at 4:44 PM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, Aug 14, 2025 at 7:28 AM Ajin Cherian <itsajin@gmail.com> wrote:

Patch v6 attached.

Thanks Ajin. Please find my comments:
1)
SyncReplicationSlots:
+ remote_slots = fetch_or_refresh_remote_slots(wrconn, NIL);
+
+ /* Retry until all slots are sync ready atleast */
+ for (;;)
+ {
+ bool some_slot_updated = false;
+
+ /*
+ * Refresh the remote slot data. We keep using the original slot
+ * list, even if some slots are already sync ready, so that all
+ * slots are updated with the latest status from the primary.
+ */
+ remote_slots = fetch_or_refresh_remote_slots(wrconn, remote_slots);
When the API begins, it seems we are fetching remote_list twice
before we even sync it once. We can get rid of
'fetch_or_refresh_remote_slots' from outside the loop and retain the
inside one. At first call, remote_slots will be NIL and thus it will
fetch all slots and in subsequent calls, it will be populated one.

Fixed.

2)
SyncReplicationSlots:
+ /*
+ * The syscache access in fetch_or_refresh_remote_slots() needs a
+ * transaction env.
+ */
+ if (!IsTransactionState()) {
+ StartTransactionCommand();
+ started_tx = true;
+ }
+ if (started_tx)
+ CommitTransactionCommand();

Shall we move these 2 inside fetch_or_refresh_remote_slots() (both
worker and APi flow) similar to how validate_remote_info() also has it
inside?

I tried this but it doesn't work because when the transaction is
committed, the memory allocation for the remote slots are also freed.
So, this needs to be on the outside.

3)
SyncReplicationSlots:
+ /* Done if all slots are atleast sync ready */
+ if (!SlotSyncCtx->slot_not_persisted)
+ break;
+ else
+ {
+ /* wait for 2 seconds before retrying */
+ wait_for_slot_activity(some_slot_updated, true);
No need to have 'else' block here. The code can be put without having
'else' because 'if' when true, breaks from the loop.

Fixed.

4)
'fetch_or_refresh_remote_slots' can be renamed to 'fetch_remote_slots'
simply and then a comment can define an extra argument. Because
ultimately we are re-fetching some/all slots in both cases.

Done.

5)
In the case of API, wait_for_slot_activity() does not change its wait
time based on 'some_slot_updated'. I think we can pull 'WaitLatch,
ResetLatch' in API-function itself and lets not change worker's
wait_for_slot_activity().

Done.

6)
fetch_or_refresh_remote_slots:
+ {
+ if (is_refresh)
+ {
+ ereport(WARNING,
+ errmsg("could not fetch updated failover logical slots info"
+    " from the primary server: %s",
+    res->err));
+ pfree(query.data);
+ return remote_slot_list; /* Return original list on refresh failure */
+ }
+ else
+ {
+ ereport(ERROR,
+ errmsg("could not fetch failover logical slots info from the primary
server: %s",
+    res->err));
+ }
+ }

I think there is no need for different behaviour here for worker and
API. Since worker errors-out here, we can make API also error-out.

Fixed.

7)
+fetch_or_refresh_remote_slots(WalReceiverConn *wrconn, List *remote_slot_list)

We can name the argument as 'target_slot_list' and replace the name
'updated_slot_list' with 'remote_slot_list'.

Fixed.

8)
+ /* If refreshing, free the original list structures */
+ if (is_refresh)
+ {
+ foreach(lc, remote_slot_list)
+ {
+ RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
+ pfree(old_slot);
+ }
+ list_free(remote_slot_list);
+ }
We can get rid of 'is_refresh' and can simply check if
'target_slot_list != NIL', free it. We can use list_free_deep instead
of freeing each element. Having said that, it looks slightly odd to
free the list in this function, I will think more here. Meanwhile, we
can do this.

Fixed.

9)
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(WalReceiverConn * wrconn,
+ RemoteSlot * remote_slot, Oid remote_dbid)
We can get rid of wrconn as we are not using it. Same with wrconn
argument for synchronize_one_slot()

Done.

10)
+ /* used by pg_sync_replication_slots() API only */
+ bool slot_not_persisted;
We can move comment outside structure. We can first define it and then
say the above line.

Done.

11)
+ SlotSyncCtx->slot_not_persisted = false;

This may overwrite the 'slot_not_persisted' set for the previous slot
and ultimately make it 'false' at the end of cycle even though we had
few not-persisted slots in the beginning of cycle. Should it be:

SlotSyncCtx->slot_not_persisted |= false;

Fixed.

12)
Shall we rename this to : slot_persistence_pending (based on many
other modules using similar names: detach_pending, send_pending,
callback_pending)?

Done.

13)
- errmsg("could not synchronize replication slot \"%s\"",
-    remote_slot->name),
- errdetail("Synchronization could lead to data loss, because the
remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
standby has LSN %X/%08X and catalog xmin %u.",
-   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
-   remote_slot->catalog_xmin,
-   LSN_FORMAT_ARGS(slot->data.restart_lsn),
-   slot->data.catalog_xmin));
+ errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
+    remote_slot->name),
+ errdetail("Attempting Synchronization could lead to data loss,
because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
but the standby has LSN %X/%08X and catalog xmin %u.",
+ LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+ remote_slot->catalog_xmin,
+ LSN_FORMAT_ARGS(slot->data.restart_lsn),
+ slot->data.catalog_xmin));
We can retain the same message as it was put after a lot of
discussion. We can attempt to change if others comment. The idea is
since a worker dumps it in each subsequent cycle (if such a situation
arises), on the same basis now the API can also do so because it is
also performing multiple cycles now. Earlier I had suggested changing
it for API based on messages 'continuing to wait..' which are no
longer there now.

Done.

On Thu, Aug 14, 2025 at 10:44 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Aug 14, 2025 at 12:14 PM shveta malik <shveta.malik@gmail.com> wrote:
8)
+ /* If refreshing, free the original list structures */
+ if (is_refresh)
+ {
+ foreach(lc, remote_slot_list)
+ {
+ RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
+ pfree(old_slot);
+ }
+ list_free(remote_slot_list);
+ }
We can get rid of 'is_refresh' and can simply check if
'target_slot_list != NIL', free it. We can use list_free_deep instead
of freeing each element. Having said that, it looks slightly odd to
free the list in this function, I will think more here. Meanwhile, we
can do this.
+1. The function prologue doesn't mention that the original list is
deep freed. So a caller may try to access it after this call, which
will lead to a crash. As a safe programming practice we should let the
caller free the original list if it is not needed anymore OR modify
the input list in-place and return it for the convenience of the
caller like all list_* interfaces. At least we should document this
behavior in the function prologue. You could also use foreach_ptr
instead of foreach.

I've changed the logic so that it is the responsibility of the caller
to free the list.

13)
- errmsg("could not synchronize replication slot \"%s\"",
-    remote_slot->name),
- errdetail("Synchronization could lead to data loss, because the
remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
standby has LSN %X/%08X and catalog xmin %u.",
-   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
-   remote_slot->catalog_xmin,
-   LSN_FORMAT_ARGS(slot->data.restart_lsn),
-   slot->data.catalog_xmin));
+ errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
+    remote_slot->name),
+ errdetail("Attempting Synchronization could lead to data loss,
because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
but the standby has LSN %X/%08X and catalog xmin %u.",
+ LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+ remote_slot->catalog_xmin,
+ LSN_FORMAT_ARGS(slot->data.restart_lsn),
+ slot->data.catalog_xmin));
We can retain the same message as it was put after a lot of
discussion. We can attempt to change if others comment. The idea is
since a worker dumps it in each subsequent cycle (if such a situation
arises), on the same basis now the API can also do so because it is
also performing multiple cycles now. Earlier I had suggested changing
it for API based on messages 'continuing to wait..' which are no
longer there now.
Also we usually don't use capital letters at the start of the error
message. Any reason this is different?

Retained the old message.

Some more
+ * When called from pg_sync_replication_slots, use a fixed 2
+ * second wait time.
Function prologue doesn't mention this. Probably the prologue should
contain only the first sentence there. Rest of the prologues just
repeat comments in the function. The function is small enough that a
reader could read the details from the function instead of the
prologue.

+ wait_time = SLOTSYNC_API_NAPTIME_MS;
+ } else {

} else and { should be on separate lines.

I've removed the changes in this function and it is now the same as before.

Attaching patch v7 addressing all the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v7-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchapplication/octet-stream; name=v7-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchDownload

From 05459fd3b78032e08241bb0f30df500e07b8b9be Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 19 Aug 2025 15:12:59 +1000
Subject: [PATCH v7] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  40 +--
 src/backend/replication/logical/slotsync.c    | 314 +++++++++++++-----
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 252 insertions(+), 108 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 446fdfe56f4..360861004b2 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 77c720c422c..6e4251a810d 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -364,18 +364,23 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
    <sect2 id="logicaldecoding-replication-slots-synchronization">
     <title>Replication Slot Synchronization</title>
     <para>
-     The logical replication slots on the primary can be synchronized to
-     the hot standby by using the <literal>failover</literal> parameter of
+     The logical replication slots on the primary can be enabled for
+     synchronization to the hot standby by using the
+     <literal>failover</literal> parameter of
      <link linkend="pg-create-logical-replication-slot">
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +403,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 37738440113..60d4776f760 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -91,6 +91,9 @@
  * is expected (e.g., slot sync GUCs change), slot sync worker will reset
  * last_start_time before exiting, so that postmaster can start the worker
  * without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+ *
+ * The 'slot_persistence_pending' flag is used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
  */
 typedef struct SlotSyncCtxStruct
 {
@@ -99,6 +102,7 @@ typedef struct SlotSyncCtxStruct
 	bool		syncing;
 	time_t		last_start_time;
 	slock_t		mutex;
+	bool		slot_persistence_pending;
 } SlotSyncCtxStruct;
 
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
@@ -113,6 +117,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -146,6 +151,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -211,13 +217,13 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		 * impact the users, so we used DEBUG1 level to log the message.
 		 */
 		ereport(slot->data.persistency == RS_TEMPORARY ? LOG : DEBUG1,
-				errmsg("could not synchronize replication slot \"%s\"",
-					   remote_slot->name),
-				errdetail("Synchronization could lead to data loss, because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the standby has LSN %X/%08X and catalog xmin %u.",
-						  LSN_FORMAT_ARGS(remote_slot->restart_lsn),
-						  remote_slot->catalog_xmin,
-						  LSN_FORMAT_ARGS(slot->data.restart_lsn),
-						  slot->data.catalog_xmin));
+			errmsg("could not synchronize replication slot \"%s\"",
+				   remote_slot->name),
+			errdetail("Synchronization could lead to data loss, because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the standby has LSN %X/%08X and catalog xmin %u.",
+			LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+			remote_slot->catalog_xmin,
+			LSN_FORMAT_ARGS(slot->data.restart_lsn),
+			slot->data.catalog_xmin));
 
 		if (remote_slot_precedes)
 			*remote_slot_precedes = true;
@@ -565,8 +571,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 	bool		remote_slot_precedes = false;
 
 	(void) update_local_synced_slot(remote_slot, remote_dbid,
-									&found_consistent_snapshot,
-									&remote_slot_precedes);
+							&found_consistent_snapshot,
+							&remote_slot_precedes);
 
 	/*
 	 * Check if the primary server has caught up. Refer to the comment atop
@@ -575,13 +581,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 	if (remote_slot_precedes)
 	{
 		/*
-		 * The remote slot didn't catch up to locally reserved position.
+		 * The remote slot didn't catch up to locally reserved
+		 * position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle. Update the
+		 * slot_persistence_pending flag, so the API can retry.
 		 */
+		SlotSyncCtx->slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -596,11 +607,17 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* update flag, so that we retry */
+		SlotSyncCtx->slot_persistence_pending = true;
+
 		return false;
 	}
 
 	ReplicationSlotPersist();
 
+	/* slot has been persisted, no need to retry */
+	SlotSyncCtx->slot_persistence_pending |= false;
+
 	ereport(LOG,
 			errmsg("newly created replication slot \"%s\" is sync-ready now",
 				   remote_slot->name));
@@ -622,7 +639,8 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn * wrconn, RemoteSlot * remote_slot,
+					Oid remote_dbid)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,8 +733,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/* Slot not ready yet, let's attempt to make it sync-ready now. */
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
-			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+			slot_updated = update_and_persist_local_synced_slot(remote_slot, remote_dbid);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -796,30 +813,66 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch or refresh remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If remote_slot_list is NIL, fetches all failover logical slots from the
+ * primary server. If remote_slot_list is provided, refreshes only those
+ * specific slots with current values from the primary server.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ * NOTE: Caller must ensure a transaction is active before calling this
+ * function.
+ *
+ * Parameters:
+ *   wrconn - Connection to the primary server
+ *   remote_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ *                      fetch all failover slots
+ *
+ * Returns a list of RemoteSlot structures. If refreshing and the query fails,
+ * returns the original list. Slots that no longer exist on the primary will
+ * be removed from the list.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *target_slot_list)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
 	LSNOID, XIDOID, BOOLOID, LSNOID, BOOLOID, TEXTOID, TEXTOID};
-
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
+	StringInfoData query;
+	ListCell   *lc;
+	bool		is_refresh = (target_slot_list!= NIL);
+	bool		first_slot = true;
 	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
+
+	/* Build the query based on whether we're fetching all or refreshing specific slots */
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (is_refresh)
+	{
+		/* Add IN clause for specific slot names */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, target_slot_list)
+		{
+			RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", remote_slot->name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
+	}
 
 	/* The syscache access in walrcv_exec() needs a transaction env. */
 	if (!IsTransactionState())
@@ -829,13 +882,16 @@ synchronize_slots(WalReceiverConn *wrconn)
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
+	{
 		ereport(ERROR,
-				errmsg("could not fetch failover logical slots info from the primary server: %s",
-					   res->err));
+			errmsg("could not fetch failover logical slots info"
+				" from the primary server: %s",
+				res->err));
+	}
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
+	/* Process the slot information */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -844,17 +900,19 @@ synchronize_slots(WalReceiverConn *wrconn)
 		Datum		d;
 		int			col = 0;
 
-		remote_slot->name = TextDatumGetCString(slot_getattr(tupslot, ++col,
-															 &isnull));
+		remote_slot->name = TextDatumGetCString(slot_getattr(tupslot,
+									++col,
+									&isnull));
 		Assert(!isnull);
 
-		remote_slot->plugin = TextDatumGetCString(slot_getattr(tupslot, ++col,
-															   &isnull));
+		remote_slot->plugin = TextDatumGetCString(slot_getattr(tupslot,
+									++col,
+									&isnull));
 		Assert(!isnull);
 
 		/*
-		 * It is possible to get null values for LSN and Xmin if slot is
-		 * invalidated on the primary server, so handle accordingly.
+		 * Handle possible null values for LSN and Xmin if slot is
+		 * invalidated on the primary server.
 		 */
 		d = slot_getattr(tupslot, ++col, &isnull);
 		remote_slot->confirmed_lsn = isnull ? InvalidXLogRecPtr :
@@ -868,18 +926,20 @@ synchronize_slots(WalReceiverConn *wrconn)
 			DatumGetTransactionId(d);
 
 		remote_slot->two_phase = DatumGetBool(slot_getattr(tupslot, ++col,
-														   &isnull));
+								   &isnull));
 		Assert(!isnull);
 
 		d = slot_getattr(tupslot, ++col, &isnull);
 		remote_slot->two_phase_at = isnull ? InvalidXLogRecPtr : DatumGetLSN(d);
 
-		remote_slot->failover = DatumGetBool(slot_getattr(tupslot, ++col,
-														  &isnull));
+		remote_slot->failover = DatumGetBool(slot_getattr(tupslot,
+									++col,
+									&isnull));
 		Assert(!isnull);
 
 		remote_slot->database = TextDatumGetCString(slot_getattr(tupslot,
-																 ++col, &isnull));
+									 ++col,
+									 &isnull));
 		Assert(!isnull);
 
 		d = slot_getattr(tupslot, ++col, &isnull);
@@ -890,15 +950,8 @@ synchronize_slots(WalReceiverConn *wrconn)
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
-		 * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the
-		 * slot is valid, that means we have fetched the remote_slot in its
-		 * RS_EPHEMERAL state. In such a case, don't sync it; we can always
-		 * sync it in the next sync cycle when the remote_slot is persisted
-		 * and has valid lsn(s) and xmin values.
-		 *
-		 * XXX: In future, if we plan to expose 'slot->data.persistency' in
-		 * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL
-		 * slots in the first place.
+		 * Apply ephemeral slot filtering. Skip slots that are in RS_EPHEMERAL
+		 * state (invalid LSNs/xmin but not explicitly invalidated).
 		 */
 		if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
 			 XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
@@ -906,12 +959,34 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
+			/* Add to updated list */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	if (started_tx)
+		CommitTransactionCommand();
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -927,19 +1002,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(wrconn, remote_slot,
+								remote_dbid);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1131,7 +1199,7 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1254,29 +1322,29 @@ slotsync_worker_onexit(int code, Datum arg)
 static void
 wait_for_slot_activity(bool some_slot_updated)
 {
-	int			rc;
+	int		rc;
 
 	if (!some_slot_updated)
 	{
 		/*
-		 * No slots were updated, so double the sleep time, but not beyond the
-		 * maximum allowable value.
+		 * No slots were updated, so double the sleep time,
+		 * but not beyond the maximum allowable value.
 		 */
 		sleep_ms = Min(sleep_ms * 2, MAX_SLOTSYNC_WORKER_NAPTIME_MS);
 	}
 	else
 	{
 		/*
-		 * Some slots were updated since the last sleep, so reset the sleep
-		 * time.
+		 * Some slots were updated since the last sleep, so
+		 * reset the sleep time.
 		 */
 		sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 	}
 
 	rc = WaitLatch(MyLatch,
-				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+			WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+			sleep_ms,
+			WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1573,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	       *remote_slots;
 
 		ProcessSlotSyncInterrupts(wrconn);
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1736,19 +1821,94 @@ slotsync_failure_callback(int code, Datum arg)
 }
 
 /*
- * Synchronize the failover enabled replication slots using the specified
- * primary server connection.
+ * Synchronize failover enabled replication slots using the specified primary
+ * server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early is promotion is triggered.
  */
 void
-SyncReplicationSlots(WalReceiverConn *wrconn)
+SyncReplicationSlots(WalReceiverConn * wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots;
+		List		*prev_slot_list = NIL;
+		bool		started_tx = false;
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState()) {
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+
+		/* Retry until all slots are sync ready atleast */
+		for (;;)
+		{
+			int		rc;
+
+			/*
+			 * Refresh the remote slot data. We keep using the previous slot
+			 * list, even if some slots are already sync ready, so that all
+			 * slots are updated with the latest status from the primary.
+			 * Some of the slots in the previous list could have gone away,
+			 * which is why we create a new list here and free the old list
+			 * at the end of the loop.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, prev_slot_list);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots);
+
+			/* Done if all slots are atleast sync ready */
+			if (!SlotSyncCtx->slot_persistence_pending)
+				break;
+
+			/* wait for 2 seconds before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+				ResetLatch(MyLatch);
+
+			/*
+			 * If we've been promoted, then no point
+			 * continuing.
+			 */
+			if (SlotSyncCtx->stopSignaled)
+			{
+				ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("exiting from slot synchronization as"
+							" promotion is triggered")));
+				break;
+			}
+
+			/* Handle any termination request if any */
+			ProcessSlotSyncInterrupts(wrconn);
+
+			/* Free the previous slot-list if it exists */
+			if (prev_slot_list)
+				list_free_deep(prev_slot_list);
+
+			prev_slot_list = remote_slots;
+		}
+
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..3497f0fa45e 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#37

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Ajin Cherian (#36)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Aug 19, 2025 at 10:55 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v7 addressing all the above comments.

Thank You for the patches. Please find a few comments:

1)
We are not resetting 'slot_persistence_pending' to false anywhere. So
once it hits the flow which sets it to true, it will never become
false even if remote-slot catches up in subsequent cycles, resulting
in a hang of the API. We shall reset it before starting a new
iteration in SyncReplicationSlots().

2)
We need to clean 'slot_persistence_pending' in reset_syncing_flag() as
well which is called at the end of API or in failure of API. Even
though the slot sync worker is not using it, we should clean it up in
slotsync_worker_onexit() as well.

3)
+ /* slot has been persisted, no need to retry */
+ SlotSyncCtx->slot_persistence_pending |= false;
+

This will not be needed once we reset this flag before each iteration
in SyncReplicationSlots()

4)
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn * wrconn, RemoteSlot * remote_slot,
+ Oid remote_dbid)

wrconn not used anywhere.

5)
+ bool is_refresh = (target_slot_list!= NIL);

is_refresh is not needed. We can simply check if
target_slot_list!=NIL, then append it to cmd.

6)
* If remote_slot_list is NIL, fetches all failover logical slots from the
* primary server. If remote_slot_list is provided, refreshes only those
* specific slots with current values from the primary server.

The usage of the word 'refreshing' is confusing. Since we are
allocating a new remote-list everytime (instead of reusing or
refreshing previous one), we can simply say:

------
Fetches the failover logical slots info from the primary server

If target_slot_list is NIL, fetches all failover logical slots from
the primary server, otherwise fetches only the ones mentioned in
target_slot_list
------

The 'Parameters:' can also be adjusted accordingly.

7)
* Returns a list of RemoteSlot structures. If refreshing and the query fails,
* returns the original list. Slots that no longer exist on the primary will
* be removed from the list.

This can be removed.

8)
- * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the
- * slot is valid, that means we have fetched the remote_slot in its
- * RS_EPHEMERAL state. In such a case, don't sync it; we can always
- * sync it in the next sync cycle when the remote_slot is persisted
- * and has valid lsn(s) and xmin values.
- *
- * XXX: In future, if we plan to expose 'slot->data.persistency' in
- * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL
- * slots in the first place.
+ * Apply ephemeral slot filtering. Skip slots that are in RS_EPHEMERAL
+ * state (invalid LSNs/xmin but not explicitly invalidated).

We can retain the original comment.

9)
Apart from above, there are many changes (alignement, comments etc)
which are not related to this particular improvement. We can get rid
of those changes. The patch should have the changes pertaining to
current improvement alone.

thanks
Shveta

#38

Kirill Reshke

reshkekirill@gmail.com

5 months ago

In reply to: Ajin Cherian (#36)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, 19 Aug 2025 at 10:25, Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v7 addressing all the above comments.

Looks like this thread is not attached to any commitfest entry?
If so, can you please create one[0]https://commitfest.postgresql.org/55/new/? This will be beneficial for
thread, both simplifying patch review and (possibly) increasing the
number of reviewers.

[0]: https://commitfest.postgresql.org/55/new/

--
Best regards,
Kirill Reshke

#39

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: shveta malik (#37)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Aug 19, 2025 at 7:42 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Aug 19, 2025 at 10:55 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v7 addressing all the above comments.

Thank You for the patches. Please find a few comments:

1)
We are not resetting 'slot_persistence_pending' to false anywhere. So
once it hits the flow which sets it to true, it will never become
false even if remote-slot catches up in subsequent cycles, resulting
in a hang of the API. We shall reset it before starting a new
iteration in SyncReplicationSlots().

2)
We need to clean 'slot_persistence_pending' in reset_syncing_flag() as
well which is called at the end of API or in failure of API. Even
though the slot sync worker is not using it, we should clean it up in
slotsync_worker_onexit() as well.

Done.

3)
+ /* slot has been persisted, no need to retry */
+ SlotSyncCtx->slot_persistence_pending |= false;
+
This will not be needed once we reset this flag before each iteration
in SyncReplicationSlots()

Removed.

4)
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn * wrconn, RemoteSlot * remote_slot,
+ Oid remote_dbid)

wrconn not used anywhere.

Removed.

5)
+ bool is_refresh = (target_slot_list!= NIL);

is_refresh is not needed. We can simply check if
target_slot_list!=NIL, then append it to cmd.

Changed.

6)
* If remote_slot_list is NIL, fetches all failover logical slots from the
* primary server. If remote_slot_list is provided, refreshes only those
* specific slots with current values from the primary server.

The usage of the word 'refreshing' is confusing. Since we are
allocating a new remote-list everytime (instead of reusing or
refreshing previous one), we can simply say:

------
Fetches the failover logical slots info from the primary server

If target_slot_list is NIL, fetches all failover logical slots from
the primary server, otherwise fetches only the ones mentioned in
target_slot_list
------

The 'Parameters:' can also be adjusted accordingly.

Done.

7)
* Returns a list of RemoteSlot structures. If refreshing and the query fails,
* returns the original list. Slots that no longer exist on the primary will
* be removed from the list.

This can be removed.

Done.

8)
- * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the
- * slot is valid, that means we have fetched the remote_slot in its
- * RS_EPHEMERAL state. In such a case, don't sync it; we can always
- * sync it in the next sync cycle when the remote_slot is persisted
- * and has valid lsn(s) and xmin values.
- *
- * XXX: In future, if we plan to expose 'slot->data.persistency' in
- * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL
- * slots in the first place.
+ * Apply ephemeral slot filtering. Skip slots that are in RS_EPHEMERAL
+ * state (invalid LSNs/xmin but not explicitly invalidated).

We can retain the original comment.

Done.

9)
Apart from above, there are many changes (alignement, comments etc)
which are not related to this particular improvement. We can get rid
of those changes. The patch should have the changes pertaining to
current improvement alone.

I've removed them.
Attaching patch v8 addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v8-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchapplication/octet-stream; name=v8-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchDownload

From 2af57203f3c5bb6f038413f44a9f116a9622c579 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 20 Aug 2025 14:55:55 +1000
Subject: [PATCH v8] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  40 ++--
 src/backend/replication/logical/slotsync.c    | 219 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 203 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 6347fe60b0c..cefdcb3887d 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index a1f2efb2420..fd1d8771ec2 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -364,18 +364,23 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
    <sect2 id="logicaldecoding-replication-slots-synchronization">
     <title>Replication Slot Synchronization</title>
     <para>
-     The logical replication slots on the primary can be synchronized to
-     the hot standby by using the <literal>failover</literal> parameter of
+     The logical replication slots on the primary can be enabled for
+     synchronization to the hot standby by using the
+     <literal>failover</literal> parameter of
      <link linkend="pg-create-logical-replication-slot">
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +403,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 37738440113..770b5325f2b 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -91,6 +91,9 @@
  * is expected (e.g., slot sync GUCs change), slot sync worker will reset
  * last_start_time before exiting, so that postmaster can start the worker
  * without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+ *
+ * The 'slot_persistence_pending' flag is used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
  */
 typedef struct SlotSyncCtxStruct
 {
@@ -99,6 +102,7 @@ typedef struct SlotSyncCtxStruct
 	bool		syncing;
 	time_t		last_start_time;
 	slock_t		mutex;
+	bool		slot_persistence_pending;
 } SlotSyncCtxStruct;
 
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
@@ -113,6 +117,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -146,6 +151,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -577,11 +583,15 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle. Update the
+		 * slot_persistence_pending flag, so the API can retry.
 		 */
+		SlotSyncCtx->slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -596,6 +606,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* update flag, so that we retry */
+		SlotSyncCtx->slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -796,15 +809,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If remote_slot_list is NIL, fetches all failover logical slots from the
+ * primary server, otherwises fetches only the ones mentioned in
+ * target_slot_list.
+ *
+ * NOTE: Caller must ensure a transaction is active before calling this
+ * function.
+ *
+ * Parameters:
+ *   wrconn - Connection to the primary server
+ *   target_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ *                      fetch all failover slots
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *target_slot_list)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -813,13 +834,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
+	StringInfoData query;
+	ListCell   *lc;
+	bool		first_slot = true;
 	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
+
+	/* Build the query based on whether we're fetching all or refreshing specific slots */
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (target_slot_list!= NIL)
+	{
+		/* Add IN clause for specific slot names */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, target_slot_list)
+		{
+			RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", remote_slot->name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
+	}
 
 	/* The syscache access in walrcv_exec() needs a transaction env. */
 	if (!IsTransactionState())
@@ -829,13 +875,13 @@ synchronize_slots(WalReceiverConn *wrconn)
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
+	/* Process the slot information */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -906,12 +952,34 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
+			/* Add to updated list */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	if (started_tx)
+		CommitTransactionCommand();
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -932,14 +1000,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1131,7 +1191,7 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1228,6 +1288,7 @@ slotsync_worker_onexit(int code, Datum arg)
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
 	SlotSyncCtx->pid = InvalidPid;
+	SlotSyncCtx->slot_persistence_pending = false;
 
 	/*
 	 * If syncing_slots is true, it indicates that the process errored out
@@ -1276,7 +1337,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1335,6 +1396,7 @@ reset_syncing_flag()
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->slot_persistence_pending = false;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1505,10 +1567,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts(wrconn);
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1738,17 +1817,95 @@ slotsync_failure_callback(int code, Datum arg)
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early is promotion is triggered.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots;
+		List		*prev_slot_list = NIL;
+		bool		started_tx = false;
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState()) {
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+
+		/* Retry until all slots are sync ready atleast */
+		for (;;)
+		{
+			int		rc;
+
+			/* reset flag before every iteration */
+			SlotSyncCtx->slot_persistence_pending = false;
+
+			/*
+			 * Refresh the remote slot data. We keep using the previous slot
+			 * list, even if some slots are already sync ready, so that all
+			 * slots are updated with the latest status from the primary.
+			 * Some of the slots in the previous list could have gone away,
+			 * which is why we create a new list here and free the old list
+			 * at the end of the loop.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, prev_slot_list);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots);
+
+			/* Done if all slots are atleast sync ready */
+			if (!SlotSyncCtx->slot_persistence_pending)
+				break;
+
+			/* wait for 2 seconds before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+				ResetLatch(MyLatch);
+
+			/*
+			 * If we've been promoted, then no point
+			 * continuing.
+			 */
+			if (SlotSyncCtx->stopSignaled)
+			{
+				ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("exiting from slot synchronization as"
+							" promotion is triggered")));
+				break;
+			}
+
+			/* Handle any termination request if any */
+			ProcessSlotSyncInterrupts(wrconn);
+
+			/* Free the previous slot-list if it exists */
+			if (prev_slot_list)
+				list_free_deep(prev_slot_list);
+
+			prev_slot_list = remote_slots;
+		}
+
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..3497f0fa45e 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#40

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: Kirill Reshke (#38)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Aug 19, 2025 at 8:57 PM Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 19 Aug 2025 at 10:25, Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v7 addressing all the above comments.

Looks like this thread is not attached to any commitfest entry?
If so, can you please create one[0]? This will be beneficial for
thread, both simplifying patch review and (possibly) increasing the
number of reviewers.

[0] https://commitfest.postgresql.org/55/new/

Done.
https://commitfest.postgresql.org/patch/5976/

regards,
Ajin Cherian
Fujitsu Australia

#41

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Ajin Cherian (#39)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Aug 20, 2025 at 10:53 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've removed them.
Attaching patch v8 addressing the above comments.

Thanks for the patch. Please find a few comments:

1)
When the API is in progress, and meanwhile in another session we turn
off hot_standby_feedback, the API session terminates abnormally.

postgres=# SELECT pg_sync_replication_slots();
server closed the connection unexpectedly

It seems slotsync_reread_config() is not adjusted for API. It does
proc_exit assuming it is a worker process.

2)
slotsync_worker_onexit():

SlotSyncCtx->slot_persistence_pending = false;

/*
* If syncing_slots is true, it indicates that the process errored out
* without resetting the flag. So, we need to clean up shared memory and
* reset the flag here.
*/
if (syncing_slots)
{
SlotSyncCtx->syncing = false;
syncing_slots = false;
}

Shall we reset slot_persistence_pending inside 'if (syncing_slots)'?
slot_persistence_pending can not be true without syncing_slots being
true.

3)
reset_syncing_flag():

SpinLockAcquire(&SlotSyncCtx->mutex);
SlotSyncCtx->syncing = false;
+ SlotSyncCtx->slot_persistence_pending = false;
SpinLockRelease(&SlotSyncCtx->mutex);

Here we are changing slot_persistence_pending under mutex, while at
other places, it is not protected by mutex. Is it intentional here?

4)
On rethinking, we maintain anything in shared memory if it has to be
shared between a few processes. 'slot_persistence_pending' OTOH is
required to be set and accessed by only one process at a time. Shall
we move it out of SlotSyncCtxStruct and keep it static similar to
'syncing_slots'? Rest of the setting, resetting flow remains the same.
What do you think?

5)
/* Build the query based on whether we're fetching all or refreshing
specific slots */

Perhaps we can shift this comment to where we actually append
target_slot_list. Better to have it something like:
'If target_slot_list is provided, construct the query only to fetch given slots'

thanks
Shveta

#42

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: shveta malik (#41)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 22, 2025 at 3:44 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Aug 20, 2025 at 10:53 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've removed them.
Attaching patch v8 addressing the above comments.

Thanks for the patch. Please find a few comments:

1)
When the API is in progress, and meanwhile in another session we turn
off hot_standby_feedback, the API session terminates abnormally.

postgres=# SELECT pg_sync_replication_slots();
server closed the connection unexpectedly

It seems slotsync_reread_config() is not adjusted for API. It does
proc_exit assuming it is a worker process.

I've removed the API calling ProcessSlotSyncInterrupts() as I don't
think the API needs to specifically handle shutdown requests, it works
without calling this. And the other config checks, I don't think the
API needs to handle, I think we should leave it to the user.

2)
slotsync_worker_onexit():

SlotSyncCtx->slot_persistence_pending = false;

/*
* If syncing_slots is true, it indicates that the process errored out
* without resetting the flag. So, we need to clean up shared memory and
* reset the flag here.
*/
if (syncing_slots)
{
SlotSyncCtx->syncing = false;
syncing_slots = false;
}

Shall we reset slot_persistence_pending inside 'if (syncing_slots)'?
slot_persistence_pending can not be true without syncing_slots being
true.

3)
reset_syncing_flag():

SpinLockAcquire(&SlotSyncCtx->mutex);
SlotSyncCtx->syncing = false;
+ SlotSyncCtx->slot_persistence_pending = false;
SpinLockRelease(&SlotSyncCtx->mutex);

Here we are changing slot_persistence_pending under mutex, while at
other places, it is not protected by mutex. Is it intentional here?

4)
On rethinking, we maintain anything in shared memory if it has to be
shared between a few processes. 'slot_persistence_pending' OTOH is
required to be set and accessed by only one process at a time. Shall
we move it out of SlotSyncCtxStruct and keep it static similar to
'syncing_slots'? Rest of the setting, resetting flow remains the same.
What do you think?

Yes, I agree. I have modified it accordingly.

5)
/* Build the query based on whether we're fetching all or refreshing
specific slots */

Perhaps we can shift this comment to where we actually append
target_slot_list. Better to have it something like:
'If target_slot_list is provided, construct the query only to fetch given slots'

Changed.

Attaching patch v9 addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v9-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchapplication/octet-stream; name=v9-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patchDownload

From 98770fde1e4a31e570c7e86dbede7e800d22889f Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 26 Aug 2025 14:24:28 +1000
Subject: [PATCH v9] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  40 +---
 src/backend/replication/logical/slotsync.c    | 222 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 200 insertions(+), 68 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 57ff333159f..d85440abab4 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..794ae9d1044 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -364,18 +364,23 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
    <sect2 id="logicaldecoding-replication-slots-synchronization">
     <title>Replication Slot Synchronization</title>
     <para>
-     The logical replication slots on the primary can be synchronized to
-     the hot standby by using the <literal>failover</literal> parameter of
+     The logical replication slots on the primary can be enabled for
+     synchronization to the hot standby by using the
+     <literal>failover</literal> parameter of
      <link linkend="pg-create-logical-replication-slot">
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +403,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 37738440113..454ccaabbe3 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -91,6 +91,7 @@
  * is expected (e.g., slot sync GUCs change), slot sync worker will reset
  * last_start_time before exiting, so that postmaster can start the worker
  * without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+ *
  */
 typedef struct SlotSyncCtxStruct
 {
@@ -113,6 +114,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -126,6 +128,12 @@ static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
  */
 static bool syncing_slots = false;
 
+/*
+ * Flag used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
+ */
+static bool slot_persistence_pending = false;
+
 /*
  * Structure to hold information fetched from the primary server about a logical
  * replication slot.
@@ -146,6 +154,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -577,11 +586,15 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle. Update the
+		 * slot_persistence_pending flag, so the API can retry.
 		 */
+		slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -596,6 +609,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* update flag, so that we retry */
+		slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -796,15 +812,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If remote_slot_list is NIL, fetches all failover logical slots from the
+ * primary server, otherwises fetches only the ones mentioned in
+ * target_slot_list.
+ *
+ * NOTE: Caller must ensure a transaction is active before calling this
+ * function.
+ *
+ * Parameters:
+ *   wrconn - Connection to the primary server
+ *   target_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ *                      fetch all failover slots
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *target_slot_list)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -813,29 +837,48 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+	ListCell   *lc;
+	bool		first_slot = true;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (target_slot_list!= NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		/*
+		 * If target_slot_list is provided, construct the query only to
+		 * fetch given slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, target_slot_list)
+		{
+			RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", remote_slot->name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
+	/* Process the slot information */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -906,12 +949,31 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
+			/* Add to updated list */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -932,14 +994,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1131,7 +1185,7 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1238,6 +1292,7 @@ slotsync_worker_onexit(int code, Datum arg)
 	{
 		SlotSyncCtx->syncing = false;
 		syncing_slots = false;
+		slot_persistence_pending = false;
 	}
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1338,6 +1393,7 @@ reset_syncing_flag()
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
+	slot_persistence_pending = false;
 };
 
 /*
@@ -1505,10 +1561,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts(wrconn);
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1738,17 +1811,92 @@ slotsync_failure_callback(int code, Datum arg)
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early is promotion is triggered.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots;
+		List		*prev_slot_list = NIL;
+		bool		started_tx = false;
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState()) {
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+
+		/* Retry until all slots are sync ready atleast */
+		for (;;)
+		{
+			int		rc;
+
+			/* reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/*
+			 * Refresh the remote slot data. We keep using the previous slot
+			 * list, even if some slots are already sync ready, so that all
+			 * slots are updated with the latest status from the primary.
+			 * Some of the slots in the previous list could have gone away,
+			 * which is why we create a new list here and free the old list
+			 * at the end of the loop.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, prev_slot_list);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots);
+
+			/* Done if all slots are atleast sync ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait for 2 seconds before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+				ResetLatch(MyLatch);
+
+			/*
+			 * If we've been promoted, then no point
+			 * continuing.
+			 */
+			if (SlotSyncCtx->stopSignaled)
+			{
+				ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("exiting from slot synchronization as"
+							" promotion is triggered")));
+				break;
+			}
+
+			/* Free the previous slot-list if it exists */
+			if (prev_slot_list)
+				list_free_deep(prev_slot_list);
+
+			prev_slot_list = remote_slots;
+		}
+
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5427da5bc1b..accdf156187 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#43

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Ajin Cherian (#42)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Aug 26, 2025 at 9:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Aug 22, 2025 at 3:44 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Aug 20, 2025 at 10:53 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've removed them.
Attaching patch v8 addressing the above comments.

Thanks for the patch. Please find a few comments:

1)
When the API is in progress, and meanwhile in another session we turn
off hot_standby_feedback, the API session terminates abnormally.

postgres=# SELECT pg_sync_replication_slots();
server closed the connection unexpectedly

It seems slotsync_reread_config() is not adjusted for API. It does
proc_exit assuming it is a worker process.

I've removed the API calling ProcessSlotSyncInterrupts() as I don't
think the API needs to specifically handle shutdown requests, it works
without calling this. And the other config checks, I don't think the
API needs to handle, I think we should leave it to the user.

2)
slotsync_worker_onexit():

SlotSyncCtx->slot_persistence_pending = false;

/*
* If syncing_slots is true, it indicates that the process errored out
* without resetting the flag. So, we need to clean up shared memory and
* reset the flag here.
*/
if (syncing_slots)
{
SlotSyncCtx->syncing = false;
syncing_slots = false;
}

Shall we reset slot_persistence_pending inside 'if (syncing_slots)'?
slot_persistence_pending can not be true without syncing_slots being
true.

3)
reset_syncing_flag():

SpinLockAcquire(&SlotSyncCtx->mutex);
SlotSyncCtx->syncing = false;
+ SlotSyncCtx->slot_persistence_pending = false;
SpinLockRelease(&SlotSyncCtx->mutex);

Here we are changing slot_persistence_pending under mutex, while at
other places, it is not protected by mutex. Is it intentional here?

4)
On rethinking, we maintain anything in shared memory if it has to be
shared between a few processes. 'slot_persistence_pending' OTOH is
required to be set and accessed by only one process at a time. Shall
we move it out of SlotSyncCtxStruct and keep it static similar to
'syncing_slots'? Rest of the setting, resetting flow remains the same.
What do you think?

Yes, I agree. I have modified it accordingly.

5)
/* Build the query based on whether we're fetching all or refreshing
specific slots */

Perhaps we can shift this comment to where we actually append
target_slot_list. Better to have it something like:
'If target_slot_list is provided, construct the query only to fetch given slots'

Changed.

Attaching patch v9 addressing the above comments.

Thank You for the patches. Please find a few comments.

1)
Change not needed:

* without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+ *
*/

2)
Regarding the naptime in API, I was thinking why not to use
wait_for_slot_activity() directly? If there are no slots activity, it
will keep on doubling the naptime starting from 2sec till max of
30sec. Thoughts?

We can rename MIN_SLOTSYNC_WORKER_NAPTIME_MS and
MAX_SLOTSYNC_WORKER_NAPTIME_MS to MIN_SLOTSYNC_NAPTIME_MS and
MAX_SLOTSYNC_NAPTIME_MS in such a case.

3)
+ *   target_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ *                      fetch all failover slots

Can we please change it to:

List of failover logical slots to fetch from primary, or NIL to fetch
all failover logical slots

4)
In the worker, before each call to synchronize_slots(), we are
starting a new transaction. It aligns with the previous implementation
where StartTransaction was inside synchronize_slots(). But in API, we
are doing StartTransaction once outside of the loop instead of doing
before each synchronize_slots(), is it intentional? It may keep the
transaction open for a long duration for the case where slots are not
getting persisted soon.

5)
With ProcessSlotSyncInterrupts() being removed from API, can you
please check the behaviour of API on smart-shutdown and rest of the
shutdown modes? It should behave like other APIs. And what happens if
I change primary_conninfo to some non-existing server when the API is
running. Does it error out or keep retrying?

thanks
Shveta

#44

Ajin Cherian

itsajin@gmail.com

5 months ago

In reply to: shveta malik (#43)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 29, 2025 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Aug 26, 2025 at 9:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
4)
In the worker, before each call to synchronize_slots(), we are
starting a new transaction. It aligns with the previous implementation
where StartTransaction was inside synchronize_slots(). But in API, we
are doing StartTransaction once outside of the loop instead of doing
before each synchronize_slots(), is it intentional? It may keep the
transaction open for a long duration for the case where slots are not
getting persisted soon.

I’ll address your other comments separately, but I wanted to respond
to this one first. I did try the approach you suggested, but the issue
is that we use the remote_slots list across loop iterations. If we end
the transaction at the end of each iteration, the list gets freed and
is no longer available for the next pass. Each iteration relies on the
remote_slots list from the previous one to build the new list, which
is why we can’t free it inside the loop.

regards,
Ajin Cherian
Fujitsu Australia

#45

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

5 months ago

In reply to: Ajin Cherian (#44)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 29, 2025 at 11:42 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Aug 29, 2025 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Aug 26, 2025 at 9:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
4)
In the worker, before each call to synchronize_slots(), we are
starting a new transaction. It aligns with the previous implementation
where StartTransaction was inside synchronize_slots(). But in API, we
are doing StartTransaction once outside of the loop instead of doing
before each synchronize_slots(), is it intentional? It may keep the
transaction open for a long duration for the case where slots are not
getting persisted soon.

I’ll address your other comments separately, but I wanted to respond
to this one first. I did try the approach you suggested, but the issue
is that we use the remote_slots list across loop iterations. If we end
the transaction at the end of each iteration, the list gets freed and
is no longer available for the next pass. Each iteration relies on the
remote_slots list from the previous one to build the new list, which
is why we can’t free it inside the loop.

Isn't that just a matter of allocating the list in appropriate long
lived memory context?

Here are some more comments
<para>
- The logical replication slots on the primary can be synchronized to
- the hot standby by using the <literal>failover</literal> parameter of
+ The logical replication slots on the primary can be enabled for
+ synchronization to the hot standby by using the
+ <literal>failover</literal> parameter of

This change corresponds to an existing feature. Should be a separate
patch, which we may want to backport.

- on the standby, the failover slots can be synchronized periodically in
+ <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+ synchronization can be be performed either manually by calling
+ <link linkend="pg-sync-replication-slots">
... snip ...
- <note>
- <para>
- While enabling <link linkend="guc-sync-replication-slots">
- <varname>sync_replication_slots</varname></link> allows for automatic
- periodic synchronization of failover slots, they can also be manually
... snip ...

I like the current documentation which separates the discussion of two
methods. I think we should just improve the second paragraph instead
of deleting it and changing the first one.

* without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+ *

unnecessary blank line

+/*
+ * Flag used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
+ */
+static bool slot_persistence_pending = false;

I don't think we need to keep a global variable for this. The variable
is used only inside SyncReplicationSlots() and the call depth is not
more than a few calls. From synchronize_slots(), before which the
variable is reset and after which the variable is checked, to
update_and_persist_local_synced_slot() which sets the variable, all
the functions return bool. All of them can be made to return an
integer status instead indicating the result of the operation. If we
do so we could check the return value of synchronize_slots() to decide
whether to retry or not, isntead of maintaining a global variable
which has a much wider scope than required. It's difficult to keep it
updated over the time.

+ * Parameters:
+ * wrconn - Connection to the primary server
+ * target_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ * fetch all failover slots
*
- * Returns TRUE if any of the slots gets updated in this sync-cycle.

Need to describe the return value.

*/
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *target_slot_list)

I like the way this function is broken down into two. That breaking down is
useful even without this feature.

- /* Construct the remote_slot tuple and synchronize each slot locally */
+ /* Process the slot information */

Probably these comments don't make much sense or they repeat what's
already there in the function prologue.

else
- /* Create list of remote slots */
+ /* Add to updated list */

Probably these comments don't make much sense or they repeat what's
already there in the function prologue.

@@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)

The function is too cute to be useful. The code should be part of
ReplSlotSyncWorkerMain() just like other worker's main functions.

void
SyncReplicationSlots(WalReceiverConn *wrconn)
{
PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
{

Shouldn't this function call CheckForInterrupts() somewhere in the
loop since it could be potentially an infinite loop?

--
Best Wishes,
Ashutosh Bapat

#46

shveta malik

shveta.malik@gmail.com

5 months ago

In reply to: Ashutosh Bapat (#45)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 29, 2025 at 2:20 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Fri, Aug 29, 2025 at 11:42 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Aug 29, 2025 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Aug 26, 2025 at 9:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
4)
In the worker, before each call to synchronize_slots(), we are
starting a new transaction. It aligns with the previous implementation
where StartTransaction was inside synchronize_slots(). But in API, we
are doing StartTransaction once outside of the loop instead of doing
before each synchronize_slots(), is it intentional? It may keep the
transaction open for a long duration for the case where slots are not
getting persisted soon.

I’ll address your other comments separately, but I wanted to respond
to this one first. I did try the approach you suggested, but the issue
is that we use the remote_slots list across loop iterations. If we end
the transaction at the end of each iteration, the list gets freed and
is no longer available for the next pass. Each iteration relies on the
remote_slots list from the previous one to build the new list, which
is why we can’t free it inside the loop.

Isn't that just a matter of allocating the list in appropriate long
lived memory context?

+1. Since we're reallocating the list each time we fetch it from the
remote server, it's not suitable for long-lived memory storage.
Instead, should we extract the slot names during the initial fetch of
failover slots and store them in the appropriate memory context? This
extraction would only need to happen only when
slot_persistence_pending is true during the first sync cycle.

Here are some more comments
<para>
- The logical replication slots on the primary can be synchronized to
- the hot standby by using the <literal>failover</literal> parameter of
+ The logical replication slots on the primary can be enabled for
+ synchronization to the hot standby by using the
+ <literal>failover</literal> parameter of
This change corresponds to an existing feature. Should be a separate
patch, which we may want to backport.
- on the standby, the failover slots can be synchronized periodically in
+ <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+ synchronization can be be performed either manually by calling
+ <link linkend="pg-sync-replication-slots">
... snip ...
- <note>
- <para>
- While enabling <link linkend="guc-sync-replication-slots">
- <varname>sync_replication_slots</varname></link> allows for automatic
- periodic synchronization of failover slots, they can also be manually
... snip ...
I like the current documentation which separates the discussion of two
methods. I think we should just improve the second paragraph instead
of deleting it and changing the first one.

* without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+ *

unnecessary blank line
+/*
+ * Flag used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
+ */
+static bool slot_persistence_pending = false;
I don't think we need to keep a global variable for this. The variable
is used only inside SyncReplicationSlots() and the call depth is not
more than a few calls. From synchronize_slots(), before which the
variable is reset and after which the variable is checked, to
update_and_persist_local_synced_slot() which sets the variable, all
the functions return bool. All of them can be made to return an
integer status instead indicating the result of the operation. If we
do so we could check the return value of synchronize_slots() to decide
whether to retry or not, isntead of maintaining a global variable
which has a much wider scope than required. It's difficult to keep it
updated over the time.
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * target_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ * fetch all failover slots
*
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
Need to describe the return value.
*/
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *target_slot_list)
I like the way this function is broken down into two. That breaking down is
useful even without this feature.
- /* Construct the remote_slot tuple and synchronize each slot locally */
+ /* Process the slot information */
Probably these comments don't make much sense or they repeat what's
already there in the function prologue.
else
- /* Create list of remote slots */
+ /* Add to updated list */
Probably these comments don't make much sense or they repeat what's
already there in the function prologue.

@@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)

The function is too cute to be useful. The code should be part of
ReplSlotSyncWorkerMain() just like other worker's main functions.

I was thinking we can retain wait_for_slot_activity() as this can even
be invoked from API flow. See my comment# 2 in [1]/messages/by-id/CAJpy0uASzojKbzinpNu29xuYGsSRnSo=22CLhXaSt_43TVoBhQ@mail.gmail.com

[1]: /messages/by-id/CAJpy0uASzojKbzinpNu29xuYGsSRnSo=22CLhXaSt_43TVoBhQ@mail.gmail.com

thanks
Shveta

#47

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

5 months ago

In reply to: shveta malik (#46)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 29, 2025 at 2:37 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Aug 29, 2025 at 2:20 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

@@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)

The function is too cute to be useful. The code should be part of
ReplSlotSyncWorkerMain() just like other worker's main functions.

I was thinking we can retain wait_for_slot_activity() as this can even
be invoked from API flow. See my comment# 2 in [1]

We want the SQL callable function to finish as fast as possible, and
make all the slots sync ready as fast as possible. So a shorter nap
time makes sense. We don't want to increase it per iteration. But sync
worker is a long running worker and can afford to wait longer. In fact
it should wait longer so as not to load the primary and the standby.
Given that the naptimes in both cases can not be controlled by the
same logic, I think it's better not to use the same function. Each of
them should have separate code for napping. That way the logic which
decides the nap time is closer to the code that naps making it more
readable.

--
Best Wishes,
Ashutosh Bapat

#48

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: Ashutosh Bapat (#47)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 29, 2025 at 4:14 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Fri, Aug 29, 2025 at 2:37 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Aug 29, 2025 at 2:20 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

@@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)

The function is too cute to be useful. The code should be part of
ReplSlotSyncWorkerMain() just like other worker's main functions.

I was thinking we can retain wait_for_slot_activity() as this can even
be invoked from API flow. See my comment# 2 in [1]

We want the SQL callable function to finish as fast as possible, and
make all the slots sync ready as fast as possible. So a shorter nap
time makes sense. We don't want to increase it per iteration. But sync
worker is a long running worker and can afford to wait longer. In fact
it should wait longer so as not to load the primary and the standby.
Given that the naptimes in both cases can not be controlled by the
same logic, I think it's better not to use the same function.

Okay, makes sense.

thanks
Shveta

#49

Ajin Cherian

itsajin@gmail.com

4 months ago

In reply to: Ashutosh Bapat (#45)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Aug 29, 2025 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Aug 26, 2025 at 9:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

Thank You for the patches. Please find a few comments.

1)
Change not needed:

* without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+ *
*/

Removed.

2)
Regarding the naptime in API, I was thinking why not to use
wait_for_slot_activity() directly? If there are no slots activity, it
will keep on doubling the naptime starting from 2sec till max of
30sec. Thoughts?

We can rename MIN_SLOTSYNC_WORKER_NAPTIME_MS and
MAX_SLOTSYNC_WORKER_NAPTIME_MS to MIN_SLOTSYNC_NAPTIME_MS and
MAX_SLOTSYNC_NAPTIME_MS in such a case.

Not changing this since further discussion clarified this.

3)
+ *   target_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ *                      fetch all failover slots
Can we please change it to:

List of failover logical slots to fetch from primary, or NIL to fetch
all failover logical slots

Changed the variable itself.

4)
In the worker, before each call to synchronize_slots(), we are
starting a new transaction. It aligns with the previous implementation
where StartTransaction was inside synchronize_slots(). But in API, we
are doing StartTransaction once outside of the loop instead of doing
before each synchronize_slots(), is it intentional? It may keep the
transaction open for a long duration for the case where slots are not
getting persisted soon.

I've added a new memory context to handle slot_names

5)
With ProcessSlotSyncInterrupts() being removed from API, can you
please check the behaviour of API on smart-shutdown and rest of the
shutdown modes? It should behave like other APIs. And what happens if
I change primary_conninfo to some non-existing server when the API is
running. Does it error out or keep retrying?

I've tested with different types of shutdown and it seems to be
handled corerctly. However, yes, if configuration changed, the API
does not handle. I've written a new function
slotsync_api_reread_config() to specifically handle configuration
changes in API context as it is different from slotsync worker logic.

On Fri, Aug 29, 2025 at 6:50 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Fri, Aug 29, 2025 at 11:42 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Aug 29, 2025 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Aug 26, 2025 at 9:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
4)
In the worker, before each call to synchronize_slots(), we are
starting a new transaction. It aligns with the previous implementation
where StartTransaction was inside synchronize_slots(). But in API, we
are doing StartTransaction once outside of the loop instead of doing
before each synchronize_slots(), is it intentional? It may keep the
transaction open for a long duration for the case where slots are not
getting persisted soon.

I’ll address your other comments separately, but I wanted to respond
to this one first. I did try the approach you suggested, but the issue
is that we use the remote_slots list across loop iterations. If we end
the transaction at the end of each iteration, the list gets freed and
is no longer available for the next pass. Each iteration relies on the
remote_slots list from the previous one to build the new list, which
is why we can’t free it inside the loop.

Isn't that just a matter of allocating the list in appropriate long
lived memory context?

Yes, changed this.

Here are some more comments
<para>
- The logical replication slots on the primary can be synchronized to
- the hot standby by using the <literal>failover</literal> parameter of
+ The logical replication slots on the primary can be enabled for
+ synchronization to the hot standby by using the
+ <literal>failover</literal> parameter of

This change corresponds to an existing feature. Should be a separate
patch, which we may want to backport.

Removed.

- on the standby, the failover slots can be synchronized periodically in
+ <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+ synchronization can be be performed either manually by calling
+ <link linkend="pg-sync-replication-slots">
... snip ...
- <note>
- <para>
- While enabling <link linkend="guc-sync-replication-slots">
- <varname>sync_replication_slots</varname></link> allows for automatic
- periodic synchronization of failover slots, they can also be manually
... snip ...
I like the current documentation which separates the discussion of two
methods. I think we should just improve the second paragraph instead
of deleting it and changing the first one.

* without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+ *

unnecessary blank line

Changed.

+/*
+ * Flag used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
+ */
+static bool slot_persistence_pending = false;
I don't think we need to keep a global variable for this. The variable
is used only inside SyncReplicationSlots() and the call depth is not
more than a few calls. From synchronize_slots(), before which the
variable is reset and after which the variable is checked, to
update_and_persist_local_synced_slot() which sets the variable, all
the functions return bool. All of them can be made to return an
integer status instead indicating the result of the operation. If we
do so we could check the return value of synchronize_slots() to decide
whether to retry or not, isntead of maintaining a global variable
which has a much wider scope than required. It's difficult to keep it
updated over the time.

The problem is that all those calls synchronize_slots() and
update_and_persist_local_synced_slot() are shared with the slotsync
worker logic and API. Hence, changing this will affect slotsync_worker
logic as well. While the API needs to spefically retry only if the
initial sync fails, the slotsync worker will always be retrying. I
feel using a global variable is a more convenient way of doing this.

+ * Parameters:
+ * wrconn - Connection to the primary server
+ * target_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ * fetch all failover slots
*
- * Returns TRUE if any of the slots gets updated in this sync-cycle.

Need to describe the return value.

Added.

*/
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *target_slot_list)
I like the way this function is broken down into two. That breaking down is
useful even without this feature.
- /* Construct the remote_slot tuple and synchronize each slot locally */
+ /* Process the slot information */
Probably these comments don't make much sense or they repeat what's
already there in the function prologue.
else
- /* Create list of remote slots */
+ /* Add to updated list */
Probably these comments don't make much sense or they repeat what's
already there in the function prologue.

Removed these comments.

@@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)

The function is too cute to be useful. The code should be part of
ReplSlotSyncWorkerMain() just like other worker's main functions.

But this wouldn't be part of this feature.

void
SyncReplicationSlots(WalReceiverConn *wrconn)
{
PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
{

Shouldn't this function call CheckForInterrupts() somewhere in the
loop since it could be potentially an infinite loop?

I've tested this and I see that interrupts are being handled by
sending SIGQUIT and SIGINT to the backend process.

Attaching v10 with the above changes.

regards,
Ajin Cerian
Fujitsu Australia

Attachments:

v10-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v10-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 2a251bbce3b4d7a00fcea5f79c1c4bd308c32c61 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 26 Aug 2025 14:24:28 +1000
Subject: [PATCH v10] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  35 +-
 src/backend/replication/logical/slotsync.c    | 301 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 275 insertions(+), 67 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 57ff333159f..d85440abab4 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..d5a7601430f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,12 +370,16 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +402,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 37738440113..dfdd87c8bee 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -65,6 +65,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -113,6 +114,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -126,6 +128,12 @@ static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
  */
 static bool syncing_slots = false;
 
+/*
+ * Flag used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
+ */
+static bool slot_persistence_pending = false;
+
 /*
  * Structure to hold information fetched from the primary server about a logical
  * replication slot.
@@ -146,6 +154,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(WalReceiverConn *wrconn);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -577,11 +586,15 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle. Update the
+		 * slot_persistence_pending flag, so the API can retry.
 		 */
+		slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -596,6 +609,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* update flag, so that we retry */
+		slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -796,15 +812,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -813,29 +837,46 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+	ListCell   *lc;
+	bool		first_slot = true;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -886,7 +927,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -906,12 +946,30 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -932,14 +990,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1131,7 +1181,7 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1238,6 +1288,7 @@ slotsync_worker_onexit(int code, Datum arg)
 	{
 		SlotSyncCtx->syncing = false;
 		syncing_slots = false;
+		slot_persistence_pending = false;
 	}
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1276,7 +1327,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1338,6 +1389,7 @@ reset_syncing_flag()
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
+	slot_persistence_pending = false;
 };
 
 /*
@@ -1505,10 +1557,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts(wrconn);
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1735,26 +1804,182 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		/* Allocate slot name in current memory context */
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file for API context.
+ *
+ * Throws an error if conninfo, primary_slot_name or hot_standby_feedback changed.
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due to parameter changes"),
+				 errdetail("Critical replication parameters (primary_conninfo, primary_slot_name, or hot_standby_feedback) have changed since pg_sync_replication_slots() started."),
+				 errhint("Retry pg_sync_replication_slots() to use the updated configuration.")));
+	}
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+		MemoryContext oldcontext;
+		MemoryContext sync_context;
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Create a memory context that survives transaction boundaries */
+		sync_context = AllocSetContextCreate(CurrentMemoryContext,
+											 "SlotSync",
+											 ALLOCSET_DEFAULT_SIZES);
+
+		/* Retry until all slots are sync ready atleast */
+		for (;;)
+		{
+			int		rc;
+			bool	started_tx = false;
+
+			/* reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote_slots info for these slot_names, if slot_names
+			 * is NIL, fetch all the failover enabled slots.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				/* Switch to long-lived context to store slot names */
+				oldcontext = MemoryContextSwitchTo(sync_context);
+
+				/* Extract slot names from the remote slots */
+				slot_names = extract_slot_names(remote_slots);
+
+				MemoryContextSwitchTo(oldcontext);
+			}
+
+			/* Free the current remote_slots list */
+			if (remote_slots)
+				list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are atleast sync ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait for 2 seconds before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+				ResetLatch(MyLatch);
+
+			/*
+			 * If we've been promoted, then no point
+			 * continuing.
+			 */
+			if (SlotSyncCtx->stopSignaled)
+			{
+				ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("exiting from slot synchronization as"
+							" promotion is triggered")));
+				break;
+			}
+
+			/* error out if configuration parameters changed */
+			if (ConfigReloadPending)
+				slotsync_api_reread_config();
+		}
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
 
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
+
+		/* Clean up the sync memory context */
+		MemoryContextDelete(sync_context);
 	}
 	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5427da5bc1b..accdf156187 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#50

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: Ajin Cherian (#49)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching v10 with the above changes.

The patch does not apply on HEAD. Can you please rebase?

thanks
Shveta

#51

Ajin Cherian

itsajin@gmail.com

4 months ago

In reply to: shveta malik (#50)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Sep 3, 2025 at 6:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching v10 with the above changes.

The patch does not apply on HEAD. Can you please rebase?

Rebased and made a small change as well to use TopMemoryContext rather
than create a new context for slot_list.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v10-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v10-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 55350db3d2d4d478ee71a2a5ee713988f4757685 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 3 Sep 2025 19:43:06 +1000
Subject: [PATCH v10] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  35 +-
 src/backend/replication/logical/slotsync.c    | 301 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 275 insertions(+), 67 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 57ff333159f..d85440abab4 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..d5a7601430f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,12 +370,16 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +402,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 9d0072a49ed..4bd3fa5570d 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -112,6 +113,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -125,6 +127,12 @@ static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
  */
 static bool syncing_slots = false;
 
+/*
+ * Flag used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
+ */
+static bool slot_persistence_pending = false;
+
 /*
  * Structure to hold information fetched from the primary server about a logical
  * replication slot.
@@ -145,6 +153,7 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+static void ProcessSlotSyncInterrupts(void);
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -576,11 +585,15 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle. Update the
+		 * slot_persistence_pending flag, so the API can retry.
 		 */
+		slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +608,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* update flag, so that we retry */
+		slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -795,15 +811,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +836,46 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+	ListCell   *lc;
+	bool		first_slot = true;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +926,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +945,30 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -931,14 +989,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1130,7 +1180,7 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1237,6 +1287,7 @@ slotsync_worker_onexit(int code, Datum arg)
 	{
 		SlotSyncCtx->syncing = false;
 		syncing_slots = false;
+		slot_persistence_pending = false;
 	}
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1275,7 +1326,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1337,6 +1388,7 @@ reset_syncing_flag()
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
+	slot_persistence_pending = false;
 };
 
 /*
@@ -1505,10 +1557,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1735,26 +1804,182 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		/* Allocate slot name in current memory context */
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file for API context.
+ *
+ * Throws an error if conninfo, primary_slot_name or hot_standby_feedback changed.
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due to parameter changes"),
+				 errdetail("Critical replication parameters (primary_conninfo, primary_slot_name, or hot_standby_feedback) have changed since pg_sync_replication_slots() started."),
+				 errhint("Retry pg_sync_replication_slots() to use the updated configuration.")));
+	}
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+		MemoryContext oldcontext;
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all slots are sync ready atleast */
+		for (;;)
+		{
+			int		rc;
+			bool	started_tx = false;
+
+			/* reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote_slots info for these slot_names, if slot_names
+			 * is NIL, fetch all the failover enabled slots.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				/* Switch to long-lived TopMemoryContext to store slot names */
+				oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+				/* Extract slot names from the remote slots */
+				slot_names = extract_slot_names(remote_slots);
+
+				MemoryContextSwitchTo(oldcontext);
+			}
+
+			/* Free the current remote_slots list */
+			if (remote_slots)
+				list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are atleast sync ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait for 2 seconds before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+				ResetLatch(MyLatch);
+
+			/*
+			 * If we've been promoted, then no point
+			 * continuing.
+			 */
+			if (SlotSyncCtx->stopSignaled)
+			{
+				ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("exiting from slot synchronization as"
+							" promotion is triggered")));
+				break;
+			}
+
+			/* error out if configuration parameters changed */
+			if (ConfigReloadPending)
+				slotsync_api_reread_config();
+		}
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
 
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
+
+		/* Clean up slot_names if allocated in TopMemoryContext */
+		if (slot_names)
+		{
+			oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+			list_free_deep(slot_names);
+			MemoryContextSwitchTo(oldcontext);
+		}
+
 	}
 	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5427da5bc1b..accdf156187 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#52

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: Ajin Cherian (#51)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Sep 3, 2025 at 3:19 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Sep 3, 2025 at 6:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching v10 with the above changes.

The patch does not apply on HEAD. Can you please rebase?

Rebased and made a small change as well to use TopMemoryContext rather
than create a new context for slot_list.

Thanks for the patch. Please find a few comments:

1)
/* Clean up slot_names if allocated in TopMemoryContext */
if (slot_names)
{
oldcontext = MemoryContextSwitchTo(TopMemoryContext);
list_free_deep(slot_names);
MemoryContextSwitchTo(oldcontext);
}

I think we can free slot_names without switching the context. Can you
please check this?

2)
We should add a comment for:
a) why we are using the slot-names from the first cycle instead of
fetching all failover slots in each cycle.
b) why are we relocating remote_slot list everytime.

3)
@@ -1130,7 +1180,7 @@ slotsync_reread_config(void)

- Assert(sync_replication_slots);
+ Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);

Do we still need this change after slotsync_api_reread_config?

4)
+static void ProcessSlotSyncInterrupts(void);

This is not needed.

+ /* update flag, so that we retry */
+ slot_persistence_pending = true;

Can we tweak it to: 'Update the flag so that the API can retry'

6)
SyncReplicationSlots():
+ /* Free the current remote_slots list */
+ if (remote_slots)
+ list_free_deep(remote_slots);

Do we need a 'remote_slots' check, won't it manage it internally? We
don't have it in ReplSlotSyncWorkerMain().

7)
slotsync_api_reread_config

+ ereport(ERROR,
+ (errcode(ERRCODE_CONFIG_FILE_ERROR),
+ errmsg("cannot continue slot synchronization due to parameter changes"),
+ errdetail("Critical replication parameters (primary_conninfo,
primary_slot_name, or hot_standby_feedback) have changed since
pg_sync_replication_slots() started."),
+ errhint("Retry pg_sync_replication_slots() to use the updated
configuration.")));

I am unsure if we need to mention '(primary_conninfo,
primary_slot_name, or hot_standby_feedback)', but would like to know
what others think.

thanks
Shveta

#53

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

4 months ago

In reply to: Ajin Cherian (#49)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Aug 29, 2025 at 6:50 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Fri, Aug 29, 2025 at 11:42 AM Ajin Cherian <itsajin@gmail.com> wrote:
+/*
+ * Flag used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
+ */
+static bool slot_persistence_pending = false;
I don't think we need to keep a global variable for this. The variable
is used only inside SyncReplicationSlots() and the call depth is not
more than a few calls. From synchronize_slots(), before which the
variable is reset and after which the variable is checked, to
update_and_persist_local_synced_slot() which sets the variable, all
the functions return bool. All of them can be made to return an
integer status instead indicating the result of the operation. If we
do so we could check the return value of synchronize_slots() to decide
whether to retry or not, isntead of maintaining a global variable
which has a much wider scope than required. It's difficult to keep it
updated over the time.
The problem is that all those calls synchronize_slots() and
update_and_persist_local_synced_slot() are shared with the slotsync
worker logic and API. Hence, changing this will affect slotsync_worker
logic as well. While the API needs to spefically retry only if the
initial sync fails, the slotsync worker will always be retrying. I
feel using a global variable is a more convenient way of doing this.

AFAICS, it's a matter of expanding the scope of what's returned by
those functions. The worker may not want to use the whole expanded
scope but the API will use it. That shouldn't change the functionality
of the worker, but it will help avoid the global variable - which have
much wider scope and their maintenance can be prone to bugs.

@@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)

The function is too cute to be useful. The code should be part of
ReplSlotSyncWorkerMain() just like other worker's main functions.

But this wouldn't be part of this feature.

void
SyncReplicationSlots(WalReceiverConn *wrconn)
{
PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
{

Shouldn't this function call CheckForInterrupts() somewhere in the
loop since it could be potentially an infinite loop?

I've tested this and I see that interrupts are being handled by
sending SIGQUIT and SIGINT to the backend process.

Can you please point me to the code (the call to
CHECK_FOR_INTERRUPTS()) which processes these interrupts while
pg_sync_replication_slots() is executing, especially when the function
is waiting while syncing a slot.

--
Best Wishes,
Ashutosh Bapat

#54

Ashutosh Sharma

ashu.coek88@gmail.com

4 months ago

In reply to: Ajin Cherian (#51)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Hi,

On Wed, Sep 3, 2025 at 3:20 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Sep 3, 2025 at 6:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching v10 with the above changes.

The patch does not apply on HEAD. Can you please rebase?

Rebased and made a small change as well to use TopMemoryContext rather
than create a new context for slot_list.

If the remote slot is inactive since long and lagging behind the
standby, this function (pg_sync_replication_slots) could end up
waiting indefinitely. While it can certainly be cancelled manually,
that behavior might not be ideal for everyone. That’s my
understanding; please let me know if you see it differently.

--
With Regards,
Ashutosh Sharma.

#55

Ashutosh Sharma

ashu.coek88@gmail.com

4 months ago

In reply to: Ashutosh Bapat (#53)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Sep 5, 2025 at 6:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Fri, Aug 29, 2025 at 6:50 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Fri, Aug 29, 2025 at 11:42 AM Ajin Cherian <itsajin@gmail.com> wrote:
+/*
+ * Flag used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
+ */
+static bool slot_persistence_pending = false;
I don't think we need to keep a global variable for this. The variable
is used only inside SyncReplicationSlots() and the call depth is not
more than a few calls. From synchronize_slots(), before which the
variable is reset and after which the variable is checked, to
update_and_persist_local_synced_slot() which sets the variable, all
the functions return bool. All of them can be made to return an
integer status instead indicating the result of the operation. If we
do so we could check the return value of synchronize_slots() to decide
whether to retry or not, isntead of maintaining a global variable
which has a much wider scope than required. It's difficult to keep it
updated over the time.
The problem is that all those calls synchronize_slots() and
update_and_persist_local_synced_slot() are shared with the slotsync
worker logic and API. Hence, changing this will affect slotsync_worker
logic as well. While the API needs to spefically retry only if the
initial sync fails, the slotsync worker will always be retrying. I
feel using a global variable is a more convenient way of doing this.
AFAICS, it's a matter of expanding the scope of what's returned by
those functions. The worker may not want to use the whole expanded
scope but the API will use it. That shouldn't change the functionality
of the worker, but it will help avoid the global variable - which have
much wider scope and their maintenance can be prone to bugs.

@@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)

The function is too cute to be useful. The code should be part of
ReplSlotSyncWorkerMain() just like other worker's main functions.

But this wouldn't be part of this feature.

void
SyncReplicationSlots(WalReceiverConn *wrconn)
{
PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
{

Shouldn't this function call CheckForInterrupts() somewhere in the
loop since it could be potentially an infinite loop?

I've tested this and I see that interrupts are being handled by
sending SIGQUIT and SIGINT to the backend process.

Can you please point me to the code (the call to
CHECK_FOR_INTERRUPTS()) which processes these interrupts while
pg_sync_replication_slots() is executing, especially when the function
is waiting while syncing a slot.

I noticed that the function libpqrcv_processTuples, which is invoked
by fetch_remote_slots, includes a CHECK_FOR_INTERRUPTS call. This is
currently helping in processing interrupts while we are in an infinite
loop within SyncReplicationSlots(). I’m just pointing this out based
on my observation while reviewing the changes in this patch. Ajin,
please correct me if I’m mistaken. If not, can we always rely on this
particular check for interrupts.

--
With Regards,
Ashutosh Sharma.

#56

Amit Kapila

amit.kapila16@gmail.com

4 months ago

In reply to: Ashutosh Sharma (#54)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Sep 5, 2025 at 11:39 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Wed, Sep 3, 2025 at 3:20 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Sep 3, 2025 at 6:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching v10 with the above changes.

The patch does not apply on HEAD. Can you please rebase?

Rebased and made a small change as well to use TopMemoryContext rather
than create a new context for slot_list.

If the remote slot is inactive since long and lagging behind the
standby, this function (pg_sync_replication_slots) could end up
waiting indefinitely. While it can certainly be cancelled manually,
that behavior might not be ideal for everyone. That’s my
understanding; please let me know if you see it differently.

Such a case can be addressed by having additional timeout parameters.
We can do that as an additional patch if the use case is important
enough to address.

--
With Regards,
Amit Kapila.

#57

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

4 months ago

In reply to: Ashutosh Sharma (#55)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Sat, Sep 6, 2025 at 12:05 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Sep 5, 2025 at 6:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've tested this and I see that interrupts are being handled by
sending SIGQUIT and SIGINT to the backend process.

Can you please point me to the code (the call to
CHECK_FOR_INTERRUPTS()) which processes these interrupts while
pg_sync_replication_slots() is executing, especially when the function
is waiting while syncing a slot.

I noticed that the function libpqrcv_processTuples, which is invoked
by fetch_remote_slots, includes a CHECK_FOR_INTERRUPTS call. This is
currently helping in processing interrupts while we are in an infinite
loop within SyncReplicationSlots(). I’m just pointing this out based
on my observation while reviewing the changes in this patch. Ajin,
please correct me if I’m mistaken. If not, can we always rely on this
particular check for interrupts.

It doesn't seem good to rely on CHECKF_FOR_INTERRUPTS from so far
away. It's better to have one being called from SyncReplicationSlots()
which has the wait loop. That's how the other functions which have
potentially long wait loops do.

--
Best Wishes,
Ashutosh Bapat

#58

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

4 months ago

In reply to: Amit Kapila (#56)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Sat, Sep 6, 2025 at 9:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 5, 2025 at 11:39 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

If the remote slot is inactive since long and lagging behind the
standby, this function (pg_sync_replication_slots) could end up
waiting indefinitely. While it can certainly be cancelled manually,
that behavior might not be ideal for everyone. That’s my
understanding; please let me know if you see it differently.

Such a case can be addressed by having additional timeout parameters.
We can do that as an additional patch if the use case is important
enough to address.

Or we could rely on statement_timeout or the user cancelling the query
explicitly.

--
Best Wishes,
Ashutosh Bapat

#59

Ashutosh Sharma

ashu.coek88@gmail.com

4 months ago

In reply to: Ashutosh Bapat (#58)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Sep 8, 2025 at 9:51 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sat, Sep 6, 2025 at 9:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 5, 2025 at 11:39 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

If the remote slot is inactive since long and lagging behind the
standby, this function (pg_sync_replication_slots) could end up
waiting indefinitely. While it can certainly be cancelled manually,
that behavior might not be ideal for everyone. That’s my
understanding; please let me know if you see it differently.

Such a case can be addressed by having additional timeout parameters.
We can do that as an additional patch if the use case is important
enough to address.

Or we could rely on statement_timeout or the user cancelling the query
explicitly.

Sure. thanks Amit and Ashutosh.

--
With Regards,
Ashutosh Sharma.

#60

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: Ashutosh Bapat (#53)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Sep 5, 2025 at 6:51 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
On Fri, Aug 29, 2025 at 6:50 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Fri, Aug 29, 2025 at 11:42 AM Ajin Cherian <itsajin@gmail.com> wrote:
+/*
+ * Flag used by pg_sync_replication_slots()
+ * to do retries if the slot did not persist while syncing.
+ */
+static bool slot_persistence_pending = false;
I don't think we need to keep a global variable for this. The variable
is used only inside SyncReplicationSlots() and the call depth is not
more than a few calls. From synchronize_slots(), before which the
variable is reset and after which the variable is checked, to
update_and_persist_local_synced_slot() which sets the variable, all
the functions return bool. All of them can be made to return an
integer status instead indicating the result of the operation. If we
do so we could check the return value of synchronize_slots() to decide
whether to retry or not, isntead of maintaining a global variable
which has a much wider scope than required. It's difficult to keep it
updated over the time.
The problem is that all those calls synchronize_slots() and
update_and_persist_local_synced_slot() are shared with the slotsync
worker logic and API. Hence, changing this will affect slotsync_worker
logic as well. While the API needs to spefically retry only if the
initial sync fails, the slotsync worker will always be retrying. I
feel using a global variable is a more convenient way of doing this.
AFAICS, it's a matter of expanding the scope of what's returned by
those functions. The worker may not want to use the whole expanded
scope but the API will use it. That shouldn't change the functionality
of the worker, but it will help avoid the global variable - which have
much wider scope and their maintenance can be prone to bugs.

I think this can be done.

@@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)

The function is too cute to be useful. The code should be part of
ReplSlotSyncWorkerMain() just like other worker's main functions.

But this wouldn't be part of this feature.

void
SyncReplicationSlots(WalReceiverConn *wrconn)
{
PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
{

Shouldn't this function call CheckForInterrupts() somewhere in the
loop since it could be potentially an infinite loop?

I've tested this and I see that interrupts are being handled by
sending SIGQUIT and SIGINT to the backend process.

Can you please point me to the code (the call to
CHECK_FOR_INTERRUPTS()) which processes these interrupts while
pg_sync_replication_slots() is executing, especially when the function
is waiting while syncing a slot.

I noticed that the function libpqrcv_processTuples, which is invoked
by fetch_remote_slots, includes a CHECK_FOR_INTERRUPTS call. This is
currently helping in processing interrupts while we are in an infinite
loop within SyncReplicationSlots(). I’m just pointing this out based
on my observation while reviewing the changes in this patch. Ajin,
please correct me if I’m mistaken. If not, can we always rely on this
particular check for interrupts.

It doesn't seem good to rely on CHECKF_FOR_INTERRUPTS from so far
away. It's better to have one being called from SyncReplicationSlots()
which has the wait loop. That's how the other functions which have
potentially long wait loops do.

thanks
Shveta

#61

Ashutosh Sharma

ashu.coek88@gmail.com

4 months ago

In reply to: Ajin Cherian (#51)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Hi,

Sharing some of my review comments, please look if these make sense to you.

On Wed, Sep 3, 2025 at 3:20 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Sep 3, 2025 at 6:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching v10 with the above changes.

The patch does not apply on HEAD. Can you please rebase?

Rebased and made a small change as well to use TopMemoryContext rather
than create a new context for slot_list.

+ /*
+ * If we've been promoted, then no point
+ * continuing.
+ */
+ if (SlotSyncCtx->stopSignaled)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("exiting from slot synchronization as"
+ " promotion is triggered")));
+ break;
+ }
"break" statement here looks redundant to me.

+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */

wait for 2 seconds before retrying - the constant is
SLOTSYNC_API_NAPTIME_MS, so technically it may *not* always be 2s if
the macro changes. Maybe reword to “wait for SLOTSYNC_API_NAPTIME_MS
before retrying” would look better?

/* Retry until all slots are sync ready atleast */

and

/* Done if all slots are atleast sync ready */

atleast -> "at least". I am just making this comment because at few
places in the same file I see "at least" and not "atleast".

+static void ProcessSlotSyncInterrupts(void);

Is this change related to this patch?

+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">

double "be".

--
With Regards,
Ashutosh Sharma.

#62

Ajin Cherian

itsajin@gmail.com

4 months ago

In reply to: Ashutosh Sharma (#61)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Sep 4, 2025 at 4:35 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Sep 3, 2025 at 3:19 PM Ajin Cherian <itsajin@gmail.com> wrote:

Thanks for the patch. Please find a few comments:

1)
/* Clean up slot_names if allocated in TopMemoryContext */
if (slot_names)
{
oldcontext = MemoryContextSwitchTo(TopMemoryContext);
list_free_deep(slot_names);
MemoryContextSwitchTo(oldcontext);
}

I think we can free slot_names without switching the context. Can you
please check this?

Removed this.

2)
We should add a comment for:
a) why we are using the slot-names from the first cycle instead of
fetching all failover slots in each cycle.
b) why are we relocating remote_slot list everytime.

I have added a comment, let me know if any more is required.

3)
@@ -1130,7 +1180,7 @@ slotsync_reread_config(void)
- Assert(sync_replication_slots);
+ Assert(!AmLogicalSlotSyncWorkerProcess() || sync_replication_slots);
Do we still need this change after slotsync_api_reread_config?

Removed.

4)
+static void ProcessSlotSyncInterrupts(void);

This is not needed.

Removed.

5)

+ /* update flag, so that we retry */
+ slot_persistence_pending = true;

Can we tweak it to: 'Update the flag so that the API can retry'

Updated.

6)
SyncReplicationSlots():
+ /* Free the current remote_slots list */
+ if (remote_slots)
+ list_free_deep(remote_slots);
Do we need a 'remote_slots' check, won't it manage it internally? We
don't have it in ReplSlotSyncWorkerMain().

Changed.

7)
slotsync_api_reread_config
+ ereport(ERROR,
+ (errcode(ERRCODE_CONFIG_FILE_ERROR),
+ errmsg("cannot continue slot synchronization due to parameter changes"),
+ errdetail("Critical replication parameters (primary_conninfo,
primary_slot_name, or hot_standby_feedback) have changed since
pg_sync_replication_slots() started."),
+ errhint("Retry pg_sync_replication_slots() to use the updated
configuration.")));
I am unsure if we need to mention '(primary_conninfo,
primary_slot_name, or hot_standby_feedback)', but would like to know
what others think.

Leaving this as is now.

On Mon, Sep 8, 2025 at 2:20 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sat, Sep 6, 2025 at 12:05 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Sep 5, 2025 at 6:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've tested this and I see that interrupts are being handled by
sending SIGQUIT and SIGINT to the backend process.

Can you please point me to the code (the call to
CHECK_FOR_INTERRUPTS()) which processes these interrupts while
pg_sync_replication_slots() is executing, especially when the function
is waiting while syncing a slot.

I noticed that the function libpqrcv_processTuples, which is invoked
by fetch_remote_slots, includes a CHECK_FOR_INTERRUPTS call. This is
currently helping in processing interrupts while we are in an infinite
loop within SyncReplicationSlots(). I’m just pointing this out based
on my observation while reviewing the changes in this patch. Ajin,
please correct me if I’m mistaken. If not, can we always rely on this
particular check for interrupts.

It doesn't seem good to rely on CHECKF_FOR_INTERRUPTS from so far
away. It's better to have one being called from SyncReplicationSlots()
which has the wait loop. That's how the other functions which have
potentially long wait loops do.

Ok, I agree. Added the Check.

On Mon, Sep 8, 2025 at 2:33 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Hi,

Sharing some of my review comments, please look if these make sense to you.

+ /*
+ * If we've been promoted, then no point
+ * continuing.
+ */
+ if (SlotSyncCtx->stopSignaled)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("exiting from slot synchronization as"
+ " promotion is triggered")));
+ break;
+ }
"break" statement here looks redundant to me.

Removed.

--
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
*/
wait for 2 seconds before retrying - the constant is
SLOTSYNC_API_NAPTIME_MS, so technically it may *not* always be 2s if
the macro changes. Maybe reword to “wait for SLOTSYNC_API_NAPTIME_MS
before retrying” would look better?

I've removed the reference to 2 seconds.

--

/* Retry until all slots are sync ready atleast */

and

/* Done if all slots are atleast sync ready */

atleast -> "at least". I am just making this comment because at few
places in the same file I see "at least" and not "atleast".

Changed.

--

+static void ProcessSlotSyncInterrupts(void);

Is this change related to this patch?

It was used earlier, and forgot to change it. Removed now.

+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">

double "be".

Fixed

On Fri, Sep 5, 2025 at 11:21 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:

AFAICS, it's a matter of expanding the scope of what's returned by
those functions. The worker may not want to use the whole expanded
scope but the API will use it. That shouldn't change the functionality
of the worker, but it will help avoid the global variable - which have
much wider scope and their maintenance can be prone to bugs.

Ok, added a new return parameter for these functions that will return
if there are any slots pending persistence and removed the global
variable.

Attached v11 patch addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v11-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v11-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 2dfc64d89ee23b96c9c3148a79796600e2f0873e Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 9 Sep 2025 21:48:31 +1000
Subject: [PATCH v11] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  35 +-
 src/backend/replication/logical/slotsync.c    | 317 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 286 insertions(+), 72 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 57ff333159f..d85440abab4 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..504c79f2fd2 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,12 +370,16 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +402,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 9d0072a49ed..8236a3cdb28 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -64,6 +64,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -112,6 +113,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -552,12 +554,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * If the remote restart_lsn and catalog_xmin have caught up with the
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
+ * The slot_persistence_pending flag is used by the pg_sync_replication_slots
+ * API to track if any slots could not be persisted and need to be retried.
  *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +581,16 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle. Update the
+		 * slot_persistence_pending flag, so the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +605,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* update the flag, so that the API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -616,12 +630,15 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * The slot is created as a temporary slot and stays in the same state until the
  * remote_slot catches up with locally reserved position and local slot is
  * updated. The slot is then persisted and is considered as sync-ready for
- * periodic syncs.
+ * periodic syncs. The slot_persistence_pending flag is used by the
+ * pg_sync_replication_slots API to track if any slots could not be persisted
+ * and need to be retried.
  *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +732,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +802,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +814,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +839,46 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+	ListCell   *lc;
+	bool		first_slot = true;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +929,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +948,33 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ * The slot_persistence_pending flag is used by the pg_sync_replication_slots
+ * API to track if any slots could not been persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +990,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1275,7 +1332,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1562,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1735,26 +1809,183 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		/* Allocate slot name in current memory context */
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file for API context.
+ *
+ * Throws an error if conninfo, primary_slot_name or hot_standby_feedback changed.
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due to parameter changes"),
+				 errdetail("Critical replication parameters (primary_conninfo, primary_slot_name, or hot_standby_feedback) have changed since pg_sync_replication_slots() started."),
+				 errhint("Retry pg_sync_replication_slots() to use the updated configuration.")));
+	}
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+		MemoryContext oldcontext;
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all slots are sync ready at least */
+		for (;;)
+		{
+			int		rc;
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+
+			/* reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the previous iteration; re-fetching all failover slots each time could
+			 * cause an endless loop.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				/* Switch to long-lived TopMemoryContext to store slot names */
+				oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+				/* Extract slot names from the remote slots */
+				slot_names = extract_slot_names(remote_slots);
+
+				MemoryContextSwitchTo(oldcontext);
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are at least sync ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+			{
+				ResetLatch(MyLatch);
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/*
+			 * If we've been promoted, then no point
+			 * continuing.
+			 */
+			if (SlotSyncCtx->stopSignaled)
+			{
+				ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("exiting from slot synchronization as"
+							" promotion is triggered")));
+			}
+
+			/* error out if configuration parameters changed */
+			if (ConfigReloadPending)
+				slotsync_api_reread_config();
+
+		}
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
 
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
+
+		/* Clean up slot_names if allocated in TopMemoryContext */
+		if (slot_names)
+			list_free_deep(slot_names);
+
 	}
 	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#63

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: Ajin Cherian (#62)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Sep 9, 2025 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attached v11 patch addressing the above comments.

Please find a few comments:

+ Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical

We can say: Retry is done after SLOTSYNC_API_NAPTIME_MS wait.

2)
+ /*
+ * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+ * fetch all failover-enabled slots. Note that we reuse slot_names from
+ * the previous iteration; re-fetching all failover slots each time could
+ * cause an endless loop.
+ */

a)
the previous iteration --> the first iteration.

b) Also we can mention the reason why we take names from first
iteration instead of going for pending ones alone, something like:

Instead of reprocessing only the pending slots in each iteration, it's
better to process all the slots received in the first iteration.
This ensures that by the time we're done, all slots reflect the latest values.

3)
+ remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+
+ /* Attempt to synchronize slots */
+ synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);

One extra blank line can be removed

+ /* Clean up slot_names if allocated in TopMemoryContext */
+ if (slot_names)
+ list_free_deep(slot_names);

Can we please move it before 'ReplicationSlotCleanup'.

5)
In case of error in subsequent iteration, slot_names allocated from
TopMemoryContext will be left unfreed?

6)
+ ListCell   *lc;
+ bool first_slot = true;

Shall we move these two to concerned if-block:
if (slot_names != NIL)

7)
* The slot_persistence_pending flag is used by the pg_sync_replication_slots
* API to track if any slots could not be persisted and need to be retried.

a) Instead of mentioning only about slot_persistence_pending argument
in concerned function's header, we shall define all the arguments.

b) We can remove the 'flag' term from the comments as it is a
function-argument now.

8)
I think we should add briefly in the header of the file about the new
behaviour of API.

thanks
Shveta

#64

Ajin Cherian

itsajin@gmail.com

4 months ago

In reply to: shveta malik (#63)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Sep 10, 2025 at 2:45 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Sep 9, 2025 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attached v11 patch addressing the above comments.

Please find a few comments:

1)
+ Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical
We can say: Retry is done after SLOTSYNC_API_NAPTIME_MS wait.

Changed.

2)
+ /*
+ * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+ * fetch all failover-enabled slots. Note that we reuse slot_names from
+ * the previous iteration; re-fetching all failover slots each time could
+ * cause an endless loop.
+ */
a)
the previous iteration --> the first iteration.

b) Also we can mention the reason why we take names from first
iteration instead of going for pending ones alone, something like:

Instead of reprocessing only the pending slots in each iteration, it's
better to process all the slots received in the first iteration.
This ensures that by the time we're done, all slots reflect the latest values.
3)
+ remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+
+ /* Attempt to synchronize slots */
+ synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
One extra blank line can be removed

Fixed.

4)
+ /* Clean up slot_names if allocated in TopMemoryContext */
+ if (slot_names)
+ list_free_deep(slot_names);
Can we please move it before 'ReplicationSlotCleanup'.

Fixed.

5)
In case of error in subsequent iteration, slot_names allocated from
TopMemoryContext will be left unfreed?

I've changed the logic so that even on error, slot_names are freed.

6)
+ ListCell   *lc;
+ bool first_slot = true;
Shall we move these two to concerned if-block:
if (slot_names != NIL)

Changed.

7)
* The slot_persistence_pending flag is used by the pg_sync_replication_slots
* API to track if any slots could not be persisted and need to be retried.

a) Instead of mentioning only about slot_persistence_pending argument
in concerned function's header, we shall define all the arguments.

b) We can remove the 'flag' term from the comments as it is a
function-argument now.

Changed.

8)
I think we should add briefly in the header of the file about the new
behaviour of API.

Added.

Attaching patch v12 addressing these comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v12-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v12-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 2f09a3c59f378cdd0badd648c12582bb3c5fe8ea Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 15 Sep 2025 22:08:13 +1000
Subject: [PATCH v12] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  35 +-
 src/backend/replication/logical/slotsync.c    | 345 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 314 insertions(+), 72 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 57ff333159f..d85440abab4 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..504c79f2fd2 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,12 +370,16 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +402,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 9d0072a49ed..cb2d7ff211c 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are ready to be synced. Refer to the comments in SyncReplicationSlots()
+ * for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -112,6 +119,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -552,12 +560,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * If the remote restart_lsn and catalog_xmin have caught up with the
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
+ * The slot_persistence_pending flag is used by the pg_sync_replication_slots
+ * API to track if any slots could not be persisted and need to be retried.
  *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +587,16 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle. Update the
+		 * slot_persistence_pending flag, so the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +611,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* update the flag, so that the API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -616,12 +636,15 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * The slot is created as a temporary slot and stays in the same state until the
  * remote_slot catches up with locally reserved position and local slot is
  * updated. The slot is then persisted and is considered as sync-ready for
- * periodic syncs.
+ * periodic syncs. The slot_persistence_pending flag is used by the
+ * pg_sync_replication_slots API to track if any slots could not be persisted
+ * and need to be retried.
  *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +738,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +808,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +820,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +845,47 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		ListCell   *lc;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +936,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +955,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1002,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1275,7 +1344,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1574,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1735,26 +1821,199 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		/* Allocate slot name in current memory context */
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ * Returns true if conninfo, primary_slot_name or hot_standby_feedback changed.
+ */
+static bool
+slotsync_api_config_changed(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+		return true;
+	else
+		return false;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after
+ * SLOTSYNC_API_NAPTIME_MS wait. Exits early if promotion is triggered or
+ * certain critical configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+		MemoryContext oldcontext;
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all slots are sync ready at least */
+		for (;;)
+		{
+			int		rc;
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+
+			/* reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				/* Switch to long-lived TopMemoryContext to store slot names */
+				oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+				/* Extract slot names from the remote slots */
+				slot_names = extract_slot_names(remote_slots);
+
+				MemoryContextSwitchTo(oldcontext);
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are at least sync ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+			{
+				ResetLatch(MyLatch);
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/*
+			 * If we've been promoted, then no point
+			 * continuing.
+			 */
+			if (SlotSyncCtx->stopSignaled)
+			{
+				if(slot_names)
+					list_free_deep(slot_names);
+
+				ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("exiting from slot synchronization as"
+							" promotion is triggered")));
+			}
+
+			/* error out if configuration parameters changed */
+			if (ConfigReloadPending && slotsync_api_config_changed())
+			{
+				if(slot_names)
+					list_free_deep(slot_names);
+
+				ereport(ERROR,
+						(errcode(ERRCODE_CONFIG_FILE_ERROR),
+						errmsg("cannot continue slot synchronization due"
+							   " to parameter changes"),
+						errdetail("Critical replication parameters"
+								  " (primary_conninfo, primary_slot_name,"
+								  " or hot_standby_feedback) have changed"
+								  "  since pg_sync_replication_slots() started."),
+						errhint("Retry pg_sync_replication_slots() to use the"
+								" updated configuration.")));
+			}
+
+
+		}
+
+		/* Clean up slot_names if allocated in TopMemoryContext */
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
 
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
+
 	}
 	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#65

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: Ajin Cherian (#64)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Sep 15, 2025 at 6:17 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Sep 10, 2025 at 2:45 PM shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Sep 9, 2025 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attached v11 patch addressing the above comments.

Please find a few comments:

1)
+ Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical
We can say: Retry is done after SLOTSYNC_API_NAPTIME_MS wait.
Changed.
2)
+ /*
+ * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+ * fetch all failover-enabled slots. Note that we reuse slot_names from
+ * the previous iteration; re-fetching all failover slots each time could
+ * cause an endless loop.
+ */
a)
the previous iteration --> the first iteration.

b) Also we can mention the reason why we take names from first
iteration instead of going for pending ones alone, something like:

Instead of reprocessing only the pending slots in each iteration, it's
better to process all the slots received in the first iteration.
This ensures that by the time we're done, all slots reflect the latest values.
3)
+ remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+
+ /* Attempt to synchronize slots */
+ synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
One extra blank line can be removed
Fixed.
4)
+ /* Clean up slot_names if allocated in TopMemoryContext */
+ if (slot_names)
+ list_free_deep(slot_names);
Can we please move it before 'ReplicationSlotCleanup'.
Fixed.

5)
In case of error in subsequent iteration, slot_names allocated from
TopMemoryContext will be left unfreed?

I've changed the logic so that even on error, slot_names are freed.

I see that you have freed 'slot_names' in config-changed and promotion
case but the ERROR can come from other flows as well. The idea was to
somehow to free it (if possible) in slotsync_failure_callback() by
passing it as an argument, like we do for 'wrconn'.

6)
+ ListCell   *lc;
+ bool first_slot = true;
Shall we move these two to concerned if-block:
if (slot_names != NIL)
Changed.

7)
* The slot_persistence_pending flag is used by the pg_sync_replication_slots
* API to track if any slots could not be persisted and need to be retried.

a) Instead of mentioning only about slot_persistence_pending argument
in concerned function's header, we shall define all the arguments.

b) We can remove the 'flag' term from the comments as it is a
function-argument now.

Changed.

8)
I think we should add briefly in the header of the file about the new
behaviour of API.

Added.

Attaching patch v12 addressing these comments.

Thank You for the patch. Please find a few comments:

1)
+ bool slot_persistence_pending = false;

We can move this declaration outside of the loop. And I think we don't
need to initialize as we are resetting it to false before each
iteration.

+ /* Switch to long-lived TopMemoryContext to store slot names */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+ /* Extract slot names from the remote slots */
+ slot_names = extract_slot_names(remote_slots);
+
+ MemoryContextSwitchTo(oldcontext);

I think it will be better if we move 'MemoryContextSwitchTo' calls
inside extract_slot_names() itself. The entire logic related to
'slot_names' will then be consolidated in one place

3)
+ * The slot_persistence_pending flag is used by the pg_sync_replication_slots
+ * API to track if any slots could not be persisted and need to be retried.

Can we change it to below. We can have it started in a new line after
a blank line (see how remote_slot_precedes, found_consistent_snapshot
are defined)

*slot_persistence_pending is set to true if any of the slots fail to
persist. It is utilized by the pg_sync_replication_slots() API.

Please change it in both synchronize_one_slot() and
update_and_persist_local_synced_slot()

4)
a)
+ Update the
+ * slot_persistence_pending flag, so the API can retry.
  */

b)
/* update the flag, so that the API can retry */

It will be good if we can remove 'flag' usage from both occurrences in
update_and_persist_local_synced_slot().

5)
Similar to ProcessSlotSyncInterrupts() for worker, shall we have one
such function for API which can have all 3 things:

{
/*
* If we've been promoted, then no point
* continuing.
*/
if (SlotSyncCtx->stopSignaled)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exiting from slot synchronization as"
" promotion is triggered")));
}

CHECK_FOR_INTERRUPTS();

if (ConfigReloadPending)
slotsync_api_reread_config();
}

And similar to the worker case, we can have it checked in the
beginning of the loop. Thoughts?

thanks
Shveta

#66

Ashutosh Sharma

ashu.coek88@gmail.com

4 months ago

In reply to: shveta malik (#65)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Sep 16, 2025 at 11:53 AM shveta malik <shveta.malik@gmail.com>
wrote:

On Mon, Sep 15, 2025 at 6:17 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Sep 10, 2025 at 2:45 PM shveta malik <shveta.malik@gmail.com>

wrote:

On Tue, Sep 9, 2025 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attached v11 patch addressing the above comments.

Please find a few comments:

1)
+ Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain

critical

We can say: Retry is done after SLOTSYNC_API_NAPTIME_MS wait.

Changed.
2)
+ /*
+ * Fetch remote slot info for the given slot_names. If slot_names is

NIL,

+ * fetch all failover-enabled slots. Note that we reuse slot_names

from

+ * the previous iteration; re-fetching all failover slots each time

could

+ * cause an endless loop.
+ */

a)
the previous iteration --> the first iteration.

b) Also we can mention the reason why we take names from first
iteration instead of going for pending ones alone, something like:

Instead of reprocessing only the pending slots in each iteration, it's
better to process all the slots received in the first iteration.
This ensures that by the time we're done, all slots reflect the

latest values.

3)
+ remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+
+ /* Attempt to synchronize slots */
+ synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
One extra blank line can be removed
Fixed.
4)
+ /* Clean up slot_names if allocated in TopMemoryContext */
+ if (slot_names)
+ list_free_deep(slot_names);
Can we please move it before 'ReplicationSlotCleanup'.
Fixed.

5)
In case of error in subsequent iteration, slot_names allocated from
TopMemoryContext will be left unfreed?

I've changed the logic so that even on error, slot_names are freed.
I see that you have freed 'slot_names' in config-changed and promotion
case but the ERROR can come from other flows as well. The idea was to
somehow to free it (if possible) in slotsync_failure_callback() by
passing it as an argument, like we do for 'wrconn'.

Are you suggesting introducing a structure (for example, SlotSyncContext as
shown below) that encapsulates both wrconn and slot_names, and then passing
a pointer to this structure as the Datum argument to the
slotsync_failure_callback cleanup function, so that the callback can handle
freeing wrconn and slot_names and maybe some other members within the
structure that allocate memory?

/*
* Extended structure that can hold both connection and slot_names info
*/
typedef struct SlotSyncContext
{

WalReceiverConn *wrconn; /* Must be first for compatibility */
List *slot_names; /* Pointer to slot_names list */
bool extended; /* Flag to indicate extended
context */

} SlotSyncContext;

SyncReplicationSlots(WalReceiverConn *wrconn)
{

SlotSyncContext sync_ctx;
...
...
/* Initialize extended context */
sync_ctx.wrconn = wrconn;
sync_ctx.slot_names_ptr = &slot_names;
sync_ctx.extended = true;

PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
PointerGetDatum(&sync_ctx));

...
}

--
With Regards,
Ashutosh Sharma.

#67

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: Ashutosh Sharma (#66)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Sep 16, 2025 at 5:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Tue, Sep 16, 2025 at 11:53 AM shveta malik <shveta.malik@gmail.com> wrote:
On Mon, Sep 15, 2025 at 6:17 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Wed, Sep 10, 2025 at 2:45 PM shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Sep 9, 2025 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attached v11 patch addressing the above comments.

Please find a few comments:

1)
+ Retry is done after 2
+ * sec wait. Exits early if promotion is triggered or certain critical
We can say: Retry is done after SLOTSYNC_API_NAPTIME_MS wait.
Changed.
2)
+ /*
+ * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+ * fetch all failover-enabled slots. Note that we reuse slot_names from
+ * the previous iteration; re-fetching all failover slots each time could
+ * cause an endless loop.
+ */
a)
the previous iteration --> the first iteration.

b) Also we can mention the reason why we take names from first
iteration instead of going for pending ones alone, something like:

Instead of reprocessing only the pending slots in each iteration, it's
better to process all the slots received in the first iteration.
This ensures that by the time we're done, all slots reflect the latest values.
3)
+ remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+
+ /* Attempt to synchronize slots */
+ synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
One extra blank line can be removed
Fixed.
4)
+ /* Clean up slot_names if allocated in TopMemoryContext */
+ if (slot_names)
+ list_free_deep(slot_names);
Can we please move it before 'ReplicationSlotCleanup'.
Fixed.

5)
In case of error in subsequent iteration, slot_names allocated from
TopMemoryContext will be left unfreed?

I've changed the logic so that even on error, slot_names are freed.
I see that you have freed 'slot_names' in config-changed and promotion
case but the ERROR can come from other flows as well. The idea was to
somehow to free it (if possible) in slotsync_failure_callback() by
passing it as an argument, like we do for 'wrconn'.
Are you suggesting introducing a structure (for example, SlotSyncContext as shown below) that encapsulates both wrconn and slot_names, and then passing a pointer to this structure as the Datum argument to the slotsync_failure_callback cleanup function, so that the callback can handle freeing wrconn and slot_names and maybe some other members within the structure that allocate memory?

Yes, as I do not see any other simpler way to take care of this
memory-free in all ERROR scenarios.

/*
* Extended structure that can hold both connection and slot_names info
*/
typedef struct SlotSyncContext
{

WalReceiverConn *wrconn; /* Must be first for compatibility */
List *slot_names; /* Pointer to slot_names list */
bool extended; /* Flag to indicate extended context */

} SlotSyncContext;

SyncReplicationSlots(WalReceiverConn *wrconn)
{

SlotSyncContext sync_ctx;
...
...
/* Initialize extended context */
sync_ctx.wrconn = wrconn;
sync_ctx.slot_names_ptr = &slot_names;
sync_ctx.extended = true;

PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&sync_ctx));

Yes, like this.

thanks
Shveta

#68

Ajin Cherian

itsajin@gmail.com

4 months ago

In reply to: shveta malik (#65)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Sep 16, 2025 at 4:23 PM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Sep 15, 2025 at 6:17 PM Ajin Cherian <itsajin@gmail.com> wrote:

Thank You for the patch. Please find a few comments:

1)
+ bool slot_persistence_pending = false;

We can move this declaration outside of the loop. And I think we don't
need to initialize as we are resetting it to false before each
iteration.

Fixed.

2)
+ /* Switch to long-lived TopMemoryContext to store slot names */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+ /* Extract slot names from the remote slots */
+ slot_names = extract_slot_names(remote_slots);
+
+ MemoryContextSwitchTo(oldcontext);
I think it will be better if we move 'MemoryContextSwitchTo' calls
inside extract_slot_names() itself. The entire logic related to
'slot_names' will then be consolidated in one place

Changed,

3)
+ * The slot_persistence_pending flag is used by the pg_sync_replication_slots
+ * API to track if any slots could not be persisted and need to be retried.
Can we change it to below. We can have it started in a new line after
a blank line (see how remote_slot_precedes, found_consistent_snapshot
are defined)

*slot_persistence_pending is set to true if any of the slots fail to
persist. It is utilized by the pg_sync_replication_slots() API.

Please change it in both synchronize_one_slot() and
update_and_persist_local_synced_slot()

Changed.

4)
a)
+ Update the
+ * slot_persistence_pending flag, so the API can retry.
*/
b)
/* update the flag, so that the API can retry */

It will be good if we can remove 'flag' usage from both occurrences in
update_and_persist_local_synced_slot().

Changed.

5)
Similar to ProcessSlotSyncInterrupts() for worker, shall we have one
such function for API which can have all 3 things:

{
/*
* If we've been promoted, then no point
* continuing.
*/
if (SlotSyncCtx->stopSignaled)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exiting from slot synchronization as"
" promotion is triggered")));
}

CHECK_FOR_INTERRUPTS();

if (ConfigReloadPending)
slotsync_api_reread_config();
}

And similar to the worker case, we can have it checked in the
beginning of the loop. Thoughts?

Changed it and added a function - ProcessSlotSyncAPIChanges()

Created a patch v13 with these changes.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v13-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v13-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 389904ab80f0f4ba9f40cf88659b9292e10e0353 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 22 Sep 2025 18:39:30 +1000
Subject: [PATCH v13] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  35 +-
 src/backend/replication/logical/slotsync.c    | 369 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 335 insertions(+), 75 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 57ff333159f..d85440abab4 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1478,9 +1478,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..504c79f2fd2 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,12 +370,16 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +402,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..528288f3eff 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are ready to be synced. Refer to the comments in SyncReplicationSlots()
+ * for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -112,6 +129,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -147,6 +165,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static bool slotsync_api_config_changed(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +572,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +599,16 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle. Update the
+		 * slot_persistence_pending flag, so the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +623,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +650,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +751,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +821,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +833,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +858,47 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		ListCell   *lc;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +949,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +968,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1015,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1268,36 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIChanges()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("exiting from slot synchronization as"
+						" promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending && slotsync_api_config_changed())
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("Critical replication parameters"
+						   " (primary_conninfo, primary_slot_name,"
+						   " or hot_standby_feedback) have changed"
+						   "  since pg_sync_replication_slots() started."),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1387,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1617,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1705,7 +1834,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1732,29 +1862,176 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+	MemoryContext oldcontext;
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		/* Switch to long-lived TopMemoryContext to store slot names */
+		oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ * Returns true if conninfo, primary_slot_name or hot_standby_feedback changed.
+ */
+static bool
+slotsync_api_config_changed(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+		return true;
+	else
+		return false;
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after
+ * SLOTSYNC_API_NAPTIME_MS wait. Exits early if promotion is triggered or
+ * certain critical configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NULL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all slots are sync ready at least */
+		for (;;)
+		{
+			int		rc;
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+
+			/* reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/* check for interrupts and config changes */
+			ProcessSlotSyncAPIChanges();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				/* Extract slot names from the remote slots */
+				slot_names = extract_slot_names(remote_slots);
+
+			/* update the failure structure so that it can be freed on error */
+			fparams.slot_names = slot_names;
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are at least sync ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+				ResetLatch(MyLatch);
+
+		}
+
+		/* Clean up slot_names if allocated in TopMemoryContext */
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
 
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
+
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#69

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: Ajin Cherian (#68)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Sep 22, 2025 at 4:21 PM Ajin Cherian <itsajin@gmail.com> wrote:

Created a patch v13 with these changes.

Please find a few comments:

1)
+ /* update the failure structure so that it can be freed on error */
+ fparams.slot_names = slot_names;
+

Since slot_names is assigned only once, we can make the above
assignment as well only once, inside the if-block where we initialize
slot_names.

2)
extract_slot_names():

+ foreach(lc, remote_slots)
+ {
+ RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+ char       *slot_name;
+
+ /* Switch to long-lived TopMemoryContext to store slot names */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+ slot_name = pstrdup(remote_slot->name);
+ slot_names = lappend(slot_names, slot_name);
+
+ MemoryContextSwitchTo(oldcontext);
+ }

It will be better to move 'MemoryContextSwitchTo' calls outside of the
loop. No need to switch the context for each slot.

3)
ProcessSlotSyncAPIChanges() gives a feeling that it is actually
processing API changes where instead it is processing interrupts or
config changes. Can we please rename to ProcessSlotSyncAPIInterrupts()

4)
I prefer version 11's slotsync_api_reread_config() over current
slotsync_api_config_changed(). There, even error was taken care of
inside the function, which to me looked better and similar to how
slotsync worker deals with it.

I have made some comment changes, attached the patch. Please include
it if you find it okay.

thanks
Shveta

Attachments:

0001-comments-changes.patch.txttext/plain; charset=US-ASCII; name=0001-comments-changes.patch.txtDownload

From 4f5634c33092e536b8662244dd7436f045195690 Mon Sep 17 00:00:00 2001
From: Shveta Malik <shveta.malik@gmail.com>
Date: Tue, 23 Sep 2025 10:24:19 +0530
Subject: [PATCH] comments changes.

---
 src/backend/replication/logical/slotsync.c | 28 +++++++++++-----------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 19792a95635..f980c0f73c4 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -42,8 +42,8 @@
  * If the pg_sync_replication API is used to sync the slots, and if the slots
  * are not ready to be synced and are marked as RS_TEMPORARY because of any of
  * the reasons mentioned above, then the API also waits and retries until the
- * slots are ready to be synced. Refer to the comments in SyncReplicationSlots()
- * for more details.
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
  *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
@@ -572,7 +572,7 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
- * slot_persistence_pending is set to true if any of the slots fail to
+ * *slot_persistence_pending is set to true if any of the slots fail to
  * persist. It is utilized by the pg_sync_replication_slots() API.
  *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
@@ -604,7 +604,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 		 * the next cycle. It may take more time to create such a
 		 * slot. Therefore, we keep this slot and attempt the
 		 * synchronization in the next cycle. Update the
-		 * slot_persistence_pending flag, so the API can retry.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
 		if (slot_persistence_pending)
 			*slot_persistence_pending = true;
@@ -623,7 +625,7 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
-		/* set this, so that API can retry */
+		/* Set this, so that API can retry */
 		if (slot_persistence_pending)
 			*slot_persistence_pending = true;
 
@@ -650,7 +652,7 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
- * slot_persistence_pending is set to true if any of the slots fail to
+ * *slot_persistence_pending is set to true if any of the slots fail to
  * persist. It is utilized by the pg_sync_replication_slots() API.
  *
  * Returns TRUE if the local slot is updated.
@@ -1953,17 +1955,17 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 
 		validate_remote_info(wrconn);
 
-		/* Retry until all slots are sync ready at least */
+		/* Retry until all the slots are sync-ready */
 		for (;;)
 		{
 			int		rc;
 			bool	started_tx = false;
 			bool	slot_persistence_pending = false;
 
-			/* reset flag before every iteration */
+			/* Reset flag before every iteration */
 			slot_persistence_pending = false;
 
-			/* check for interrupts and config changes */
+			/* Check for interrupts and config changes */
 			ProcessSlotSyncAPIChanges();
 
 			/*
@@ -1994,10 +1996,9 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 			 * for future iterations (only needed if we haven't done it yet)
 			 */
 			if (slot_names == NIL && slot_persistence_pending)
-				/* Extract slot names from the remote slots */
 				slot_names = extract_slot_names(remote_slots);
 
-			/* update the failure structure so that it can be freed on error */
+			/* Update the failure structure so that it can be freed on error */
 			fparams.slot_names = slot_names;
 
 			/* Free the current remote_slots list */
@@ -2007,11 +2008,11 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 			if (started_tx)
 				CommitTransactionCommand();
 
-			/* Done if all slots are at least sync ready */
+			/* Done if all slots are persisted i.e. are sync-ready */
 			if (!slot_persistence_pending)
 				break;
 
-			/* wait before retrying */
+			/* Wait before retrying */
 			rc = WaitLatch(MyLatch,
 					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 					SLOTSYNC_API_NAPTIME_MS,
@@ -2022,7 +2023,6 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 
 		}
 
-		/* Clean up slot_names if allocated in TopMemoryContext */
 		if (slot_names)
 			list_free_deep(slot_names);
 
-- 
2.34.1

#70

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: shveta malik (#69)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Sep 23, 2025 at 10:29 AM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Sep 22, 2025 at 4:21 PM Ajin Cherian <itsajin@gmail.com> wrote:

Created a patch v13 with these changes.

Please find a few comments:
1)
+ /* update the failure structure so that it can be freed on error */
+ fparams.slot_names = slot_names;
+
Since slot_names is assigned only once, we can make the above
assignment as well only once, inside the if-block where we initialize
slot_names.

2)
extract_slot_names():
+ foreach(lc, remote_slots)
+ {
+ RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+ char       *slot_name;
+
+ /* Switch to long-lived TopMemoryContext to store slot names */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+ slot_name = pstrdup(remote_slot->name);
+ slot_names = lappend(slot_names, slot_name);
+
+ MemoryContextSwitchTo(oldcontext);
+ }
It will be better to move 'MemoryContextSwitchTo' calls outside of the
loop. No need to switch the context for each slot.

3)
ProcessSlotSyncAPIChanges() gives a feeling that it is actually
processing API changes where instead it is processing interrupts or
config changes. Can we please rename to ProcessSlotSyncAPIInterrupts()

4)
I prefer version 11's slotsync_api_reread_config() over current
slotsync_api_config_changed(). There, even error was taken care of
inside the function, which to me looked better and similar to how
slotsync worker deals with it.

I have made some comment changes, attached the patch. Please include
it if you find it okay.

Tested the patch, few more suggestions

5)
Currently the error message is:

postgres=# SELECT pg_sync_replication_slots();
ERROR: cannot continue slot synchronization due to parameter changes
DETAIL: Critical replication parameters (primary_conninfo,
primary_slot_name, or hot_standby_feedback) have changed since
pg_sync_replication_slots() started.
HINT: Retry pg_sync_replication_slots() to use the updated configuration.

a)
To be consistent with other error-messages, can we change ERROR msg
to: 'cannot continue replication slots synchronization due to
parameter changes'

b)
There is double space in DETAIL msg: "have changed since"

Will it be better to shorten the DETAIL as: 'One or more of
primary_conninfo, primary_slot_name, or hot_standby_feedback were
modified.'

6)
postgres=# SELECT pg_sync_replication_slots();
ERROR: exiting from slot synchronization as promotion is triggered

Shall we rephrase it similar to the previous message: 'cannot continue
replication slots synchronization as standby promotion is triggered'

thanks
Shveta

#71

Ajin Cherian

itsajin@gmail.com

4 months ago

In reply to: shveta malik (#70)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Sep 23, 2025 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Sep 22, 2025 at 4:21 PM Ajin Cherian <itsajin@gmail.com> wrote:

Created a patch v13 with these changes.

Please find a few comments:
1)
+ /* update the failure structure so that it can be freed on error */
+ fparams.slot_names = slot_names;
+
Since slot_names is assigned only once, we can make the above
assignment as well only once, inside the if-block where we initialize
slot_names.

Changed.

2)
extract_slot_names():

+ foreach(lc, remote_slots)
+ {
+ RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+ char       *slot_name;
+
+ /* Switch to long-lived TopMemoryContext to store slot names */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+ slot_name = pstrdup(remote_slot->name);
+ slot_names = lappend(slot_names, slot_name);
+
+ MemoryContextSwitchTo(oldcontext);
+ }

It will be better to move 'MemoryContextSwitchTo' calls outside of the
loop. No need to switch the context for each slot.

Changed.

3)
ProcessSlotSyncAPIChanges() gives a feeling that it is actually
processing API changes where instead it is processing interrupts or
config changes. Can we please rename to ProcessSlotSyncAPIInterrupts()

Changed.

4)
I prefer version 11's slotsync_api_reread_config() over current
slotsync_api_config_changed(). There, even error was taken care of
inside the function, which to me looked better and similar to how
slotsync worker deals with it.

Changed.

I have made some comment changes, attached the patch. Please include
it if you find it okay.

Incorporated.

On Tue, Sep 23, 2025 at 4:42 PM shveta malik <shveta.malik@gmail.com> wrote:

Tested the patch, few more suggestions

5)
Currently the error message is:

postgres=# SELECT pg_sync_replication_slots();
ERROR: cannot continue slot synchronization due to parameter changes
DETAIL: Critical replication parameters (primary_conninfo,
primary_slot_name, or hot_standby_feedback) have changed since
pg_sync_replication_slots() started.
HINT: Retry pg_sync_replication_slots() to use the updated configuration.

a)
To be consistent with other error-messages, can we change ERROR msg
to: 'cannot continue replication slots synchronization due to
parameter changes'

Changed.

b)
There is double space in DETAIL msg: "have changed since"

Will it be better to shorten the DETAIL as: 'One or more of
primary_conninfo, primary_slot_name, or hot_standby_feedback were
modified.'

Changed.

6)
postgres=# SELECT pg_sync_replication_slots();
ERROR: exiting from slot synchronization as promotion is triggered

Shall we rephrase it similar to the previous message: 'cannot continue
replication slots synchronization as standby promotion is triggered'

Changed.

Attaching patch v14 incorporating the above changes.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v14-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v14-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 8ffb2aa2691135b6fa8e9ca276b6e30793316c46 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 24 Sep 2025 21:51:31 +1000
Subject: [PATCH v13] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  35 +-
 src/backend/replication/logical/slotsync.c    | 370 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 336 insertions(+), 75 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..504c79f2fd2 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,12 +370,16 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +402,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..9657f81a721 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -112,6 +129,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -147,6 +165,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +572,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +599,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +625,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +652,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +753,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +823,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +835,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +860,47 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		ListCell   *lc;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +951,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +970,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1017,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1270,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1379,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1609,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1705,7 +1826,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1732,29 +1854,185 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ * Returns true if conninfo, primary_slot_name or hot_standby_feedback changed.
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("Critical replication parameters"
+						   " (primary_conninfo, primary_slot_name,"
+						   " or hot_standby_feedback) have changed"
+						   "  since pg_sync_replication_slots() started."),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after
+ * SLOTSYNC_API_NAPTIME_MS wait. Exits early if promotion is triggered or
+ * certain critical configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NULL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			int		rc;
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+
+			/* Reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* Wait before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+				ResetLatch(MyLatch);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
 
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
+
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#72

shveta malik

shveta.malik@gmail.com

4 months ago

In reply to: Ajin Cherian (#71)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Sep 24, 2025 at 5:35 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Tue, Sep 23, 2025 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
On Mon, Sep 22, 2025 at 4:21 PM Ajin Cherian <itsajin@gmail.com> wrote:

Created a patch v13 with these changes.

Please find a few comments:
1)
+ /* update the failure structure so that it can be freed on error */
+ fparams.slot_names = slot_names;
+
Since slot_names is assigned only once, we can make the above
assignment as well only once, inside the if-block where we initialize
slot_names.
Changed.
2)
extract_slot_names():
+ foreach(lc, remote_slots)
+ {
+ RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+ char       *slot_name;
+
+ /* Switch to long-lived TopMemoryContext to store slot names */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+ slot_name = pstrdup(remote_slot->name);
+ slot_names = lappend(slot_names, slot_name);
+
+ MemoryContextSwitchTo(oldcontext);
+ }
It will be better to move 'MemoryContextSwitchTo' calls outside of the
loop. No need to switch the context for each slot.
Changed.

3)
ProcessSlotSyncAPIChanges() gives a feeling that it is actually
processing API changes where instead it is processing interrupts or
config changes. Can we please rename to ProcessSlotSyncAPIInterrupts()

Changed.

4)
I prefer version 11's slotsync_api_reread_config() over current
slotsync_api_config_changed(). There, even error was taken care of
inside the function, which to me looked better and similar to how
slotsync worker deals with it.

Changed.

I have made some comment changes, attached the patch. Please include
it if you find it okay.

Incorporated.

On Tue, Sep 23, 2025 at 4:42 PM shveta malik <shveta.malik@gmail.com> wrote:

Tested the patch, few more suggestions

5)
Currently the error message is:

postgres=# SELECT pg_sync_replication_slots();
ERROR: cannot continue slot synchronization due to parameter changes
DETAIL: Critical replication parameters (primary_conninfo,
primary_slot_name, or hot_standby_feedback) have changed since
pg_sync_replication_slots() started.
HINT: Retry pg_sync_replication_slots() to use the updated configuration.

a)
To be consistent with other error-messages, can we change ERROR msg
to: 'cannot continue replication slots synchronization due to
parameter changes'

Changed.

It seems it is missed to be changed perhaps due to bringing back
previous implementation of slotsync_api_reread_config()

b)
There is double space in DETAIL msg: "have changed since"

Will it be better to shorten the DETAIL as: 'One or more of
primary_conninfo, primary_slot_name, or hot_standby_feedback were
modified.'

Changed.

This too is missed.

6)
postgres=# SELECT pg_sync_replication_slots();
ERROR: exiting from slot synchronization as promotion is triggered

Shall we rephrase it similar to the previous message: 'cannot continue
replication slots synchronization as standby promotion is triggered'

Changed.

Attaching patch v14 incorporating the above changes.

Few trivial comments:

1)
slotsync_api_reread_config():
+ * Returns true if conninfo, primary_slot_name or hot_standby_feedback changed.

This comment is no longer valid, we can remove it.

2)
/* We are done with sync, so reset sync flag */
reset_syncing_flag();
+

This extra blank line is not needed.

thanks
Shveta

#73

Ajin Cherian

itsajin@gmail.com

3 months ago

In reply to: shveta malik (#72)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Sep 26, 2025 at 8:14 PM shveta malik <shveta.malik@gmail.com> wrote:

It seems it is missed to be changed perhaps due to bringing back
previous implementation of slotsync_api_reread_config()

b)
There is double space in DETAIL msg: "have changed since"

Will it be better to shorten the DETAIL as: 'One or more of
primary_conninfo, primary_slot_name, or hot_standby_feedback were
modified.'

Changed.

Fixed.

This too is missed.

6)
postgres=# SELECT pg_sync_replication_slots();
ERROR: exiting from slot synchronization as promotion is triggered

Shall we rephrase it similar to the previous message: 'cannot continue
replication slots synchronization as standby promotion is triggered'

Changed.

Attaching patch v14 incorporating the above changes.

This was rephrased.

Few trivial comments:

1)
slotsync_api_reread_config():
+ * Returns true if conninfo, primary_slot_name or hot_standby_feedback changed.

This comment is no longer valid, we can remove it.

Removed.

2)
/* We are done with sync, so reset sync flag */
reset_syncing_flag();
+

This extra blank line is not needed.

Fixed.

Attaching patch v15 addressing these comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v15-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v15-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 24ff5a032267eeb918582ce84bf9e511636c676f Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Thu, 2 Oct 2025 21:08:17 +1000
Subject: [PATCH v15] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  35 +-
 src/backend/replication/logical/slotsync.c    | 367 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 333 insertions(+), 75 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..504c79f2fd2 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,12 +370,16 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +402,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..3ba2e500c92 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -112,6 +129,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -147,6 +165,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +572,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +599,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +625,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +652,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +753,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +823,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +835,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +860,47 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		ListCell   *lc;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +951,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +970,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1017,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1270,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1379,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1609,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1705,7 +1826,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1732,23 +1854,176 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after
+ * SLOTSYNC_API_NAPTIME_MS wait. Exits early if promotion is triggered or
+ * certain critical configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NULL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			int		rc;
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+
+			/* Reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* Wait before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+				ResetLatch(MyLatch);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1756,5 +2031,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#74

shveta malik

shveta.malik@gmail.com

3 months ago

In reply to: Ajin Cherian (#73)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Oct 2, 2025 at 4:53 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v15 addressing these comments.

It seems v15 had a compilation issue. Resolved it. Attaching v15 again.

thanks
Shveta

Attachments:

v15-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v15-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 85dd977f1ad8e25765243823c36713eef579d45b Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Thu, 2 Oct 2025 21:08:17 +1000
Subject: [PATCH v15] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  35 +-
 src/backend/replication/logical/slotsync.c    | 367 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 333 insertions(+), 75 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..504c79f2fd2 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,12 +370,16 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation.
-     Additionally, enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby
-     is required. By enabling <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link>
-     on the standby, the failover slots can be synchronized periodically in
+     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
+     synchronization can be performed either manually by calling
+     <link linkend="pg-sync-replication-slots">
+     <function>pg_sync_replication_slots</function></link>
+     on the standby, or automatically by enabling
+     <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby.
+     When <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> is enabled
+     on the standby, the failover slots are periodically synchronized by
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -398,25 +402,6 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
-    <note>
-     <para>
-      While enabling <link linkend="guc-sync-replication-slots">
-      <varname>sync_replication_slots</varname></link> allows for automatic
-      periodic synchronization of failover slots, they can also be manually
-      synchronized using the <link linkend="pg-sync-replication-slots">
-      <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
-     </para>
-    </note>
-
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..19815a55839 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -112,6 +129,7 @@ bool		sync_replication_slots = false;
  */
 #define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
 #define MAX_SLOTSYNC_WORKER_NAPTIME_MS  30000	/* 30s */
+#define SLOTSYNC_API_NAPTIME_MS         2000	/* 2s */
 
 static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
@@ -147,6 +165,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +572,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +599,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +625,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +652,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +753,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +823,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +835,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +860,47 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		ListCell   *lc;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +951,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +970,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1017,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1270,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1379,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1609,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1705,7 +1826,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1732,23 +1854,176 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready". Retry is done after
+ * SLOTSYNC_API_NAPTIME_MS wait. Exits early if promotion is triggered or
+ * certain critical configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NULL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			int		rc;
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+
+			/* Reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* Wait before retrying */
+			rc = WaitLatch(MyLatch,
+					WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+					SLOTSYNC_API_NAPTIME_MS,
+					WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
+
+			if (rc & WL_LATCH_SET)
+				ResetLatch(MyLatch);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1756,5 +2031,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.34.1

#75

shveta malik

shveta.malik@gmail.com

3 months ago

In reply to: shveta malik (#74)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Oct 6, 2025 at 10:22 AM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, Oct 2, 2025 at 4:53 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v15 addressing these comments.

It seems v15 had a compilation issue. Resolved it. Attaching v15 again.

Verified patch, works well.

I have changed doc as per Ashutosh's suggestion in [1]/messages/by-id/CAExHW5vCLTMQcFKZXrT8bjZpQWvhBUL7Ge6Ufb5oSLh0bp10PA@mail.gmail.com. Please include
if you find it okay. Attached the patch as txt file.

[1]: /messages/by-id/CAExHW5vCLTMQcFKZXrT8bjZpQWvhBUL7Ge6Ufb5oSLh0bp10PA@mail.gmail.com

thanks
Shveta

Attachments:

0001-Doc-update.patch.txttext/plain; charset=UTF-8; name=0001-Doc-update.patch.txtDownload

From c62d15d4127308fce79ec27ae30fe9ccc59a6132 Mon Sep 17 00:00:00 2001
From: Shveta Malik <shveta.malik@gmail.com>
Date: Mon, 6 Oct 2025 11:34:04 +0530
Subject: [PATCH] Doc update

---
 doc/src/sgml/logicaldecoding.sgml | 33 +++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 504c79f2fd2..b964937d509 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -370,16 +370,12 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      <function>pg_create_logical_replication_slot</function></link>, or by
      using the <link linkend="sql-createsubscription-params-with-failover">
      <literal>failover</literal></link> option of
-     <command>CREATE SUBSCRIPTION</command> during slot creation. After that,
-     synchronization can be performed either manually by calling
-     <link linkend="pg-sync-replication-slots">
-     <function>pg_sync_replication_slots</function></link>
-     on the standby, or automatically by enabling
-     <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> on the standby.
-     When <link linkend="guc-sync-replication-slots">
-     <varname>sync_replication_slots</varname></link> is enabled
-     on the standby, the failover slots are periodically synchronized by
+     <command>CREATE SUBSCRIPTION</command> during slot creation.
+     Additionally, enabling <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link> on the standby
+     is required. By enabling <link linkend="guc-sync-replication-slots">
+     <varname>sync_replication_slots</varname></link>
+     on the standby, the failover slots can be synchronized periodically in
      the slotsync worker. For the synchronization to work, it is mandatory to
      have a physical replication slot between the primary and the standby (i.e.,
      <link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
@@ -402,6 +398,23 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
      receiving the WAL up to the latest flushed position on the primary server.
     </para>
 
+    <note>
+     <para>
+      While enabling <link linkend="guc-sync-replication-slots">
+      <varname>sync_replication_slots</varname></link> allows for automatic
+      periodic synchronization of failover slots, they can also be manually
+      synchronized using the <link linkend="pg-sync-replication-slots">
+      <function>pg_sync_replication_slots</function></link> function on the standby.
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
+      via <varname>sync_replication_slots</varname> provides continuous slot
+      updates, enabling seamless failover and supporting high availability.
+     </para>
+    </note>
+
     <para>
      When slot synchronization is configured as recommended,
      and the initial synchronization is performed either automatically or
-- 
2.34.1

#76

Ajin Cherian

itsajin@gmail.com

3 months ago

In reply to: shveta malik (#75)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Hello Hackers,

In an offline discussion, I was considering adding a TAP test for this
patch. However, testing the pg_sync_replication_slots() API’s wait
logic requires a delay of at least 2 seconds, since that’s the
interval the API sleeps before retrying. I’m not sure it’s acceptable
to add a TAP test that increases runtime by 2 seconds.
I’m also wondering if 2 seconds is too long for the API to wait?
Should we reduce it to something like 200 ms instead? I’d appreciate
your feedback.

regards,
Ajin Cherian
Fujitsu Australia

#77

shveta malik

shveta.malik@gmail.com

3 months ago

In reply to: Ajin Cherian (#76)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Oct 7, 2025 at 3:24 PM Ajin Cherian <itsajin@gmail.com> wrote:

Hello Hackers,

In an offline discussion, I was considering adding a TAP test for this
patch. However, testing the pg_sync_replication_slots() API’s wait
logic requires a delay of at least 2 seconds, since that’s the
interval the API sleeps before retrying. I’m not sure it’s acceptable
to add a TAP test that increases runtime by 2 seconds.
I’m also wondering if 2 seconds is too long for the API to wait?
Should we reduce it to something like 200 ms instead? I’d appreciate
your feedback.

I feel a shorter nap will be good since it is an API and should finish
fast. But too short a nap may result in too many primary pings
specially when primary-slots are not advancing. But that case should
be a rare one. Shall we have a nap of say 500ms? It is neither too
short nor too long. Thoughts?

thanks
Shveta

#78

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

3 months ago

In reply to: shveta malik (#77)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Oct 7, 2025 at 3:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Oct 7, 2025 at 3:24 PM Ajin Cherian <itsajin@gmail.com> wrote:

Hello Hackers,

In an offline discussion, I was considering adding a TAP test for this
patch. However, testing the pg_sync_replication_slots() API’s wait
logic requires a delay of at least 2 seconds, since that’s the
interval the API sleeps before retrying. I’m not sure it’s acceptable
to add a TAP test that increases runtime by 2 seconds.
I’m also wondering if 2 seconds is too long for the API to wait?
Should we reduce it to something like 200 ms instead? I’d appreciate
your feedback.

I feel a shorter nap will be good since it is an API and should finish
fast. But too short a nap may result in too many primary pings
specially when primary-slots are not advancing. But that case should
be a rare one. Shall we have a nap of say 500ms? It is neither too
short nor too long. Thoughts?

Shorter nap times mean higher possibility of wasted CPU cycles - that
should be avoided. Doing that for a test's sake seems wrong. Is there
a way that the naptime can controlled by external factors such as
likelihood of an advanced slot (just firing bullets in the dark) or is
the naptime controllable by user interface like GUC? The test can use
those interfaces.

--
Best Wishes,
Ashutosh Bapat

#79

shveta malik

shveta.malik@gmail.com

3 months ago

In reply to: Ashutosh Bapat (#78)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Oct 7, 2025 at 4:49 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Oct 7, 2025 at 3:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Oct 7, 2025 at 3:24 PM Ajin Cherian <itsajin@gmail.com> wrote:

Hello Hackers,

In an offline discussion, I was considering adding a TAP test for this
patch. However, testing the pg_sync_replication_slots() API’s wait
logic requires a delay of at least 2 seconds, since that’s the
interval the API sleeps before retrying. I’m not sure it’s acceptable
to add a TAP test that increases runtime by 2 seconds.
I’m also wondering if 2 seconds is too long for the API to wait?
Should we reduce it to something like 200 ms instead? I’d appreciate
your feedback.

I feel a shorter nap will be good since it is an API and should finish
fast. But too short a nap may result in too many primary pings
specially when primary-slots are not advancing. But that case should
be a rare one. Shall we have a nap of say 500ms? It is neither too
short nor too long. Thoughts?

Shorter nap times mean higher possibility of wasted CPU cycles - that
should be avoided. Doing that for a test's sake seems wrong. Is there
a way that the naptime can controlled by external factors such as
likelihood of an advanced slot (just firing bullets in the dark) or is
the naptime controllable by user interface like GUC? The test can use
those interfaces.

Yes, we can control naptime based on the fact whether any slots are
being advanced on primary. This is how a slotsync worker does. It
keeps on doubling the naptime if there is no activity on primary
starting from 200ms till max of 30 sec. As soon as activity happens,
naptime is reduced to 200ms again.

thanks
Shveta

#80

Amit Kapila

amit.kapila16@gmail.com

3 months ago

In reply to: shveta malik (#79)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Oct 7, 2025 at 5:13 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Oct 7, 2025 at 4:49 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Shorter nap times mean higher possibility of wasted CPU cycles - that
should be avoided. Doing that for a test's sake seems wrong. Is there
a way that the naptime can controlled by external factors such as
likelihood of an advanced slot (just firing bullets in the dark) or is
the naptime controllable by user interface like GUC? The test can use
those interfaces.

Yes, we can control naptime based on the fact whether any slots are
being advanced on primary. This is how a slotsync worker does. It
keeps on doubling the naptime if there is no activity on primary
starting from 200ms till max of 30 sec. As soon as activity happens,
naptime is reduced to 200ms again.

Is there a reason why we don't want to use the same naptime strategy
for API and worker?

--
With Regards,
Amit Kapila.

#81

shveta malik

shveta.malik@gmail.com

3 months ago

In reply to: Amit Kapila (#80)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Oct 9, 2025 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Oct 7, 2025 at 5:13 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Oct 7, 2025 at 4:49 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Shorter nap times mean higher possibility of wasted CPU cycles - that
should be avoided. Doing that for a test's sake seems wrong. Is there
a way that the naptime can controlled by external factors such as
likelihood of an advanced slot (just firing bullets in the dark) or is
the naptime controllable by user interface like GUC? The test can use
those interfaces.

Yes, we can control naptime based on the fact whether any slots are
being advanced on primary. This is how a slotsync worker does. It
keeps on doubling the naptime if there is no activity on primary
starting from 200ms till max of 30 sec. As soon as activity happens,
naptime is reduced to 200ms again.

Is there a reason why we don't want to use the same naptime strategy
for API and worker?

There was a suggestion at [1]/messages/by-id/CAExHW5sQLJGhEA+9ZFVwZUpqfFFP5KPn9w64t3uiHSuiEH-9mQ@mail.gmail.com for a shorter naptime in case of API.

[1]: /messages/by-id/CAExHW5sQLJGhEA+9ZFVwZUpqfFFP5KPn9w64t3uiHSuiEH-9mQ@mail.gmail.com

thanks
Shveta

#82

Amit Kapila

amit.kapila16@gmail.com

3 months ago

In reply to: Ashutosh Bapat (#78)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Oct 7, 2025 at 4:49 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Oct 7, 2025 at 3:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Oct 7, 2025 at 3:24 PM Ajin Cherian <itsajin@gmail.com> wrote:

Hello Hackers,

In an offline discussion, I was considering adding a TAP test for this
patch. However, testing the pg_sync_replication_slots() API’s wait
logic requires a delay of at least 2 seconds, since that’s the
interval the API sleeps before retrying. I’m not sure it’s acceptable
to add a TAP test that increases runtime by 2 seconds.
I’m also wondering if 2 seconds is too long for the API to wait?
Should we reduce it to something like 200 ms instead? I’d appreciate
your feedback.

I feel a shorter nap will be good since it is an API and should finish
fast. But too short a nap may result in too many primary pings
specially when primary-slots are not advancing. But that case should
be a rare one. Shall we have a nap of say 500ms? It is neither too
short nor too long. Thoughts?

Shorter nap times mean higher possibility of wasted CPU cycles - that
should be avoided.

This seems to be exactly opposite of what you argued previously in email [1]/messages/by-id/CAExHW5sQLJGhEA+9ZFVwZUpqfFFP5KPn9w64t3uiHSuiEH-9mQ@mail.gmail.com -- With Regards, Amit Kapila..

Doing that for a test's sake seems wrong.

Yeah, if test writing is important to cover this case then we can even
consider using an injection point.

Is there

a way that the naptime can controlled by external factors such as
likelihood of an advanced slot

We already do this for the worker where the naptime is increased
gradually when there is no activity on the primary. It is better to
use the same strategy here. This API is not going to be used
frequently; rather I would say, one would like to use it just before
planned switchover. So, I feel it is okay even if the wait time is
slightly higher when actually required. This would prevent adding
additional code maintenance for API and worker.

[1]: /messages/by-id/CAExHW5sQLJGhEA+9ZFVwZUpqfFFP5KPn9w64t3uiHSuiEH-9mQ@mail.gmail.com -- With Regards, Amit Kapila.
--
With Regards,
Amit Kapila.

#83

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

3 months ago

In reply to: Amit Kapila (#82)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Oct 9, 2025 at 2:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Oct 7, 2025 at 4:49 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Oct 7, 2025 at 3:47 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Oct 7, 2025 at 3:24 PM Ajin Cherian <itsajin@gmail.com> wrote:

Hello Hackers,

In an offline discussion, I was considering adding a TAP test for this
patch. However, testing the pg_sync_replication_slots() API’s wait
logic requires a delay of at least 2 seconds, since that’s the
interval the API sleeps before retrying. I’m not sure it’s acceptable
to add a TAP test that increases runtime by 2 seconds.
I’m also wondering if 2 seconds is too long for the API to wait?
Should we reduce it to something like 200 ms instead? I’d appreciate
your feedback.

I feel a shorter nap will be good since it is an API and should finish
fast. But too short a nap may result in too many primary pings
specially when primary-slots are not advancing. But that case should
be a rare one. Shall we have a nap of say 500ms? It is neither too
short nor too long. Thoughts?

Shorter nap times mean higher possibility of wasted CPU cycles - that
should be avoided.

This seems to be exactly opposite of what you argued previously in email [1].

Doing that for a test's sake seems wrong.

Yeah, if test writing is important to cover this case then we can even
consider using an injection point.

That observation was made to make my point that the logic to decide
naptime in function and in worker should be separate. The naptime in
the function can be significantly smaller than the naptime in the
worker. But making it shorter just for the test's sake isn't a good
idea. If we could use injection points, better.

Is there

a way that the naptime can controlled by external factors such as
likelihood of an advanced slot

We already do this for the worker where the naptime is increased
gradually when there is no activity on the primary. It is better to
use the same strategy here. This API is not going to be used
frequently; rather I would say, one would like to use it just before
planned switchover. So, I feel it is okay even if the wait time is
slightly higher when actually required. This would prevent adding
additional code maintenance for API and worker.

That makes sense.

--
Best Wishes,
Ashutosh Bapat

#84

Ajin Cherian

itsajin@gmail.com

3 months ago

In reply to: shveta malik (#79)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Oct 7, 2025 at 10:43 PM shveta malik <shveta.malik@gmail.com> wrote:

Yes, we can control naptime based on the fact whether any slots are
being advanced on primary. This is how a slotsync worker does. It
keeps on doubling the naptime if there is no activity on primary
starting from 200ms till max of 30 sec. As soon as activity happens,
naptime is reduced to 200ms again.

I have modified the patch to use the same wait logic as the slotsync
worker. I have also incorporated the document changes that you shared.
Attaching v16 with the above changes.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v16-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v16-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From b7717587daa5487ae6ef325dd8e430975c2bc00a Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 13 Oct 2025 18:40:20 +1100
Subject: [PATCH v16] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  12 +-
 src/backend/replication/logical/slotsync.c    | 359 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 4 files changed, 320 insertions(+), 57 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..b964937d509 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,15 +405,13 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..e210d037afa 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -147,6 +164,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +571,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +598,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +624,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +651,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +752,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +822,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +834,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +859,47 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		ListCell   *lc;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +950,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +969,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1016,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1269,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1378,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1608,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1705,7 +1825,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1732,23 +1853,169 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NULL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+
+			/* Reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(false);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1756,5 +2023,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
-- 
2.47.3

#85

Ajin Cherian

itsajin@gmail.com

3 months ago

In reply to: Ajin Cherian (#84)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Oct 13, 2025 at 6:57 PM Ajin Cherian <itsajin@gmail.com> wrote:

I have modified the patch to use the same wait logic as the slotsync
worker. I have also incorporated the document changes that you shared.
Attaching v16 with the above changes.

Updated the patch with a tap test.
Attaching patch v17 which has a tap test to test the feature added.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v17-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v17-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 64d19671aca903015c68d5479955af418ba4d61a Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 13 Oct 2025 18:40:20 +1100
Subject: [PATCH v17] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  12 +-
 src/backend/replication/logical/slotsync.c    | 359 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      | 107 ++++--
 5 files changed, 393 insertions(+), 91 deletions(-)
 mode change 100644 => 100755 src/test/recovery/t/040_standby_failover_slots_sync.pl

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..b964937d509 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,15 +405,13 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..e210d037afa 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -147,6 +164,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +571,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +598,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +624,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +651,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +752,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +822,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +834,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +859,47 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		ListCell   *lc;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +950,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +969,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1016,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1269,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1378,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1608,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1705,7 +1825,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1732,23 +1853,169 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NULL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+
+			/* Reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			synchronize_slots(wrconn, remote_slots, &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(false);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1756,5 +2023,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
old mode 100644
new mode 100755
index 2c61c51e914..20b805b3d24
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -114,19 +114,10 @@ ok( $stderr =~
 	  /ERROR:  replication slots can only be synchronized to a standby server/,
 	"cannot sync slots on a non-standby server");
 
-##################################################
-# Test logical failover slots corresponding to different plugins can be
-# synced to the standby.
-#
-# Configure standby1 to replicate and synchronize logical slots configured
-# for failover on the primary
-#
-#              failover slot lsub1_slot   |       output_plugin: pgoutput
-#              failover slot lsub2_slot   |       output_plugin: test_decoding
-# primary --->                            |
-#              physical slot sb1_slot --->| ----> standby1 (connected via streaming replication)
-#                                         |                 lsub1_slot, lsub2_slot (synced_slot)
-##################################################
+#################################################
+# Test that pg_sync_replication_slots on the standby waits and retries
+# until the slot can be synced.
+#################################################
 
 my $primary = $publisher;
 my $backup_name = 'backup';
@@ -153,47 +144,63 @@ log_min_messages = 'debug2'
 $primary->append_conf('postgresql.conf', "log_min_messages = 'debug2'");
 $primary->reload;
 
-# Drop the subscription to prevent further advancement of the restart_lsn for
+# Disable the subscription to prevent further advancement of the restart_lsn for
 # the lsub1_slot.
-$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub1;");
-
-# To ensure that restart_lsn has moved to a recent WAL position, we re-create
-# the lsub1_slot.
-$primary->psql('postgres',
-	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
-);
-
-$primary->psql('postgres',
-	q{SELECT pg_create_logical_replication_slot('lsub2_slot', 'test_decoding', false, false, true);}
-);
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 DISABLE;");
 
+# create the physical slot for the standby
 $primary->psql('postgres',
 	q{SELECT pg_create_physical_replication_slot('sb1_slot');});
 
 # Start the standby so that slot syncing can begin
 $standby1->start;
 
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+$subscriber1->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
 # Capture the inactive_since of the slot from the primary. Note that the slot
 # will be inactive since the corresponding subscription was dropped.
 my $inactive_since_on_primary =
   $primary->validate_slot_inactive_since('lsub1_slot',
 	$slot_creation_time_on_primary);
 
-# Wait for the standby to catch up so that the standby is not lagging behind
-# the failover slots.
-$primary->wait_for_replay_catchup($standby1);
+# Attempt to synchronize slots using API, this will hang as the slots are
+# not sync ready, so call the API in a background process.
+my $log_offset = -s $standby1->logfile;
 
-# Synchronize the primary server slots to the standby.
-$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");
+
+# Confirm that the slot could not be synced.
+$standby1->wait_for_log(
+    qr/could not synchronize replication slot \"lsub1_slot\"/,
+    $log_offset);
+
+$primary->safe_psql('postgres', "INSERT INTO push_wal values (1);");
 
-# Confirm that the logical failover slots are created on the standby and are
+# Restart subscription to consume data in slot lsub1_slot
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE;");
+
+# Create xl_running_xacts records on the primary for which the standby is waiting
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm log that the slot has been synced.
+$standby1->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
+# Confirm that the logical failover slot is created on the standby and are
 # flagged as 'synced'
 is( $standby1->safe_psql(
 		'postgres',
-		q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot') AND synced AND NOT temporary;}
+		q{SELECT count(*) = 1 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot') AND synced AND NOT temporary;}
 	),
 	"t",
-	'logical slots have synced as true on standby');
+	'logical slots are synced after API retry on standby');
 
 # Capture the inactive_since of the synced slot on the standby
 my $inactive_since_on_standby =
@@ -208,6 +215,38 @@ is( $standby1->safe_psql(
 	"t",
 	'synchronized slot has got its own inactive_since');
 
+# Drop the tables
+$primary->safe_psql('postgres', "DROP TABLE push_wal;");
+$subscriber1->safe_psql('postgres', "DROP TABLE push_wal;");
+
+##################################################
+# Test logical failover slots corresponding to different plugins can be
+# synced to the standby.
+#
+# Configure standby1 to replicate and synchronize logical slots configured
+# for failover on the primary
+#
+#              failover slot lsub1_slot   |       output_plugin: pgoutput
+#              failover slot lsub2_slot   |       output_plugin: test_decoding
+# primary --->                            |
+#              physical slot sb1_slot --->| ----> standby1 (connected via streaming replication)
+#                                         |                 lsub1_slot, lsub2_slot (synced_slot)
+##################################################
+
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub1;");
+
+# Re-create the lsub1_slot
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
+);
+
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub2_slot', 'test_decoding', false, false, true);}
+);
+
+$primary->psql('postgres',
+	q{SELECT pg_create_physical_replication_slot('sb1_slot');});
+
 ##################################################
 # Test that the synchronized slot will be dropped if the corresponding remote
 # slot on the primary server has been dropped.
@@ -279,7 +318,7 @@ $inactive_since_on_primary =
 # the failover slots.
 $primary->wait_for_replay_catchup($standby1);
 
-my $log_offset = -s $standby1->logfile;
+$log_offset = -s $standby1->logfile;
 
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-- 
2.47.3

#86

shveta malik

shveta.malik@gmail.com

3 months ago

In reply to: Ajin Cherian (#85)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Oct 15, 2025 at 9:57 AM Ajin Cherian <itsajin@gmail.com> wrote:

Updated the patch with a tap test.
Attaching patch v17 which has a tap test to test the feature added.

Thanks for the patch. I noticed that in the case of API, we are
passing 'some_slot_updated' as always false to
wait_for_slot_activity(). Shouldn't we pass it as actual value just
like slotsync worker does? There may be a case that in a given cycle,
one of the temp slots is persisted or one of the persisted slots is
updated, in such a case we should not double the naptime. The naptime
doubling logic is only when there is no activity happening on primary.

thanks
Shveta

#87

shveta malik

shveta.malik@gmail.com

3 months ago

In reply to: shveta malik (#86)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Oct 15, 2025 at 2:08 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Oct 15, 2025 at 9:57 AM Ajin Cherian <itsajin@gmail.com> wrote:

Updated the patch with a tap test.
Attaching patch v17 which has a tap test to test the feature added.

Test also needs correction. It seems the existing test of 'Test
logical failover slots corresponding to different plugins can be
synced to the standby.' is disturbed. If it is already tested and need
not be covered again, then comments need to be changed to clarify
that; otherwise the test needs to be brought back.

thanks
Shveta

#88

Ajin Cherian

itsajin@gmail.com

3 months ago

In reply to: shveta malik (#87)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Oct 15, 2025 at 7:38 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Oct 15, 2025 at 9:57 AM Ajin Cherian <itsajin@gmail.com> wrote:

Updated the patch with a tap test.
Attaching patch v17 which has a tap test to test the feature added.

Thanks for the patch. I noticed that in the case of API, we are
passing 'some_slot_updated' as always false to
wait_for_slot_activity(). Shouldn't we pass it as actual value just
like slotsync worker does? There may be a case that in a given cycle,
one of the temp slots is persisted or one of the persisted slots is
updated, in such a case we should not double the naptime. The naptime
doubling logic is only when there is no activity happening on primary.

I've modified this accordingly

On Wed, Oct 15, 2025 at 8:29 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Oct 15, 2025 at 2:08 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Oct 15, 2025 at 9:57 AM Ajin Cherian <itsajin@gmail.com> wrote:

Updated the patch with a tap test.
Attaching patch v17 which has a tap test to test the feature added.

Test also needs correction. It seems the existing test of 'Test
logical failover slots corresponding to different plugins can be
synced to the standby.' is disturbed. If it is already tested and need
not be covered again, then comments need to be changed to clarify
that; otherwise the test needs to be brought back.

I've modified the comments to reflect the new changes.

attaching patch v18 with the above changes.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v18-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v18-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 052715969b0bc5b42e111be30bcb0870a5f31bdf Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 13 Oct 2025 18:40:20 +1100
Subject: [PATCH v18] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  12 +-
 src/backend/replication/logical/slotsync.c    | 361 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  93 +++--
 5 files changed, 386 insertions(+), 86 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..b964937d509 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,15 +405,13 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8c061d55bdb..c6bbc12675a 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -147,6 +164,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +571,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +598,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +624,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +651,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +752,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +822,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +834,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +859,47 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		ListCell   *lc;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +950,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +969,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1016,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1269,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1378,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1608,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1705,7 +1825,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1732,23 +1853,171 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NULL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1756,5 +2025,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 2c61c51e914..44b43db6a56 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -115,17 +115,21 @@ ok( $stderr =~
 	"cannot sync slots on a non-standby server");
 
 ##################################################
-# Test logical failover slots corresponding to different plugins can be
-# synced to the standby.
+# Set up a standby server (standby1) to test slot synchronization.
 #
-# Configure standby1 to replicate and synchronize logical slots configured
-# for failover on the primary
+# Configure standby1 to replicate from the primary and synchronize
+# logical failover slots.
 #
-#              failover slot lsub1_slot   |       output_plugin: pgoutput
-#              failover slot lsub2_slot   |       output_plugin: test_decoding
+#              failover slot lsub1_slot   |
 # primary --->                            |
 #              physical slot sb1_slot --->| ----> standby1 (connected via streaming replication)
-#                                         |                 lsub1_slot, lsub2_slot (synced_slot)
+#                                         |
+##################################################
+
+##################################################
+# Test that pg_sync_replication_slots() on the standby waits and retries
+# until the slot becomes sync-ready (when the standby catches up to the
+# slot's restart_lsn).
 ##################################################
 
 my $primary = $publisher;
@@ -153,47 +157,64 @@ log_min_messages = 'debug2'
 $primary->append_conf('postgresql.conf', "log_min_messages = 'debug2'");
 $primary->reload;
 
-# Drop the subscription to prevent further advancement of the restart_lsn for
+# Disable the subscription to prevent further advancement of the restart_lsn for
 # the lsub1_slot.
-$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub1;");
-
-# To ensure that restart_lsn has moved to a recent WAL position, we re-create
-# the lsub1_slot.
-$primary->psql('postgres',
-	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
-);
-
-$primary->psql('postgres',
-	q{SELECT pg_create_logical_replication_slot('lsub2_slot', 'test_decoding', false, false, true);}
-);
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 DISABLE;");
 
+# create the physical slot for the standby
 $primary->psql('postgres',
 	q{SELECT pg_create_physical_replication_slot('sb1_slot');});
 
 # Start the standby so that slot syncing can begin
 $standby1->start;
 
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+$subscriber1->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
 # Capture the inactive_since of the slot from the primary. Note that the slot
-# will be inactive since the corresponding subscription was dropped.
+# will be inactive since the corresponding subscription was disabled.
 my $inactive_since_on_primary =
   $primary->validate_slot_inactive_since('lsub1_slot',
 	$slot_creation_time_on_primary);
 
-# Wait for the standby to catch up so that the standby is not lagging behind
-# the failover slots.
-$primary->wait_for_replay_catchup($standby1);
+# Attempt to synchronize slots using API. This will initially fail because
+# the slot is not yet sync-ready (standby hasn't caught up to slot's restart_lsn),
+# but the API will wait and retry. Call the API in a background process.
+my $log_offset = -s $standby1->logfile;
 
-# Synchronize the primary server slots to the standby.
-$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
 
-# Confirm that the logical failover slots are created on the standby and are
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");
+
+# Confirm that the slot could not be synced initially.
+$standby1->wait_for_log(
+    qr/could not synchronize replication slot \"lsub1_slot\"/,
+    $log_offset);
+
+$primary->safe_psql('postgres', "INSERT INTO push_wal values (1);");
+
+# Restart subscription to consume data in slot lsub1_slot
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE;");
+
+# Create xl_running_xacts records on the primary for which the standby is waiting
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm log that the slot has been synced after becoming sync-ready.
+$standby1->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
+# Confirm that the logical failover slot is created on the standby and is
 # flagged as 'synced'
 is( $standby1->safe_psql(
 		'postgres',
-		q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot') AND synced AND NOT temporary;}
+		q{SELECT count(*) = 1 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot') AND synced AND NOT temporary;}
 	),
 	"t",
-	'logical slots have synced as true on standby');
+	'logical slots are synced after API retry on standby');
 
 # Capture the inactive_since of the synced slot on the standby
 my $inactive_since_on_standby =
@@ -208,6 +229,20 @@ is( $standby1->safe_psql(
 	"t",
 	'synchronized slot has got its own inactive_since');
 
+# Drop the tables and subscription
+$primary->safe_psql('postgres', "DROP TABLE push_wal;");
+$subscriber1->safe_psql('postgres', "DROP TABLE push_wal;");
+$subscriber1->safe_psql('postgres', "DROP SUBSCRIPTION regress_mysub1;");
+
+# Re-create the lsub1_slot with pgoutput plugin and create lsub2_slot with test_decoding
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
+);
+
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub2_slot', 'test_decoding', false, false, true);}
+);
+
 ##################################################
 # Test that the synchronized slot will be dropped if the corresponding remote
 # slot on the primary server has been dropped.
@@ -279,7 +314,7 @@ $inactive_since_on_primary =
 # the failover slots.
 $primary->wait_for_replay_catchup($standby1);
 
-my $log_offset = -s $standby1->logfile;
+$log_offset = -s $standby1->logfile;
 
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-- 
2.47.3

#89

shveta malik

shveta.malik@gmail.com

3 months ago

In reply to: Ajin Cherian (#88)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Oct 22, 2025 at 10:25 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've modified the comments to reflect the new changes.

attaching patch v18 with the above changes.

Thanks for the patch. The test is still not clear. Can we please add
the test after the test of "Test logical failover slots corresponding
to different plugins" finishes instead of adding it in between?

thanks
Shveta

#90

Ajin Cherian

itsajin@gmail.com

3 months ago

In reply to: shveta malik (#89)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Oct 24, 2025 at 8:29 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Oct 22, 2025 at 10:25 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've modified the comments to reflect the new changes.

attaching patch v18 with the above changes.

Thanks for the patch. The test is still not clear. Can we please add
the test after the test of "Test logical failover slots corresponding
to different plugins" finishes instead of adding it in between?

I've rewritten the tests again to make this possible. Attaching v19
which has the modified tap test.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v19-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v19-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 77f91611811e6e0c92b8305e9b6d7efeb57d4493 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 27 Oct 2025 18:46:10 +1100
Subject: [PATCH v19] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  12 +-
 src/backend/replication/logical/slotsync.c    | 361 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  72 +++-
 5 files changed, 389 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..b964937d509 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,15 +405,13 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index b122d99b009..1b78ffc5ff1 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -147,6 +164,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +571,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +598,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +624,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +651,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +752,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +822,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +834,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +859,47 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		ListCell   *lc;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach(lc, slot_names)
+		{
+			char *slot_name = (char *) lfirst(lc);
+
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoString(&query, ")");
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +950,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +969,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+	pfree(query.data);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1016,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1269,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1378,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1608,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1711,7 +1831,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1738,23 +1859,171 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	ListCell	*lc;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach(lc, remote_slots)
+	{
+		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NULL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1762,5 +2031,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 3059bb8177b..4d42defa156 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -211,21 +211,82 @@ is( $standby1->safe_psql(
 	'synchronized slot has got its own inactive_since');
 
 ##################################################
-# Test that the synchronized slot will be dropped if the corresponding remote
-# slot on the primary server has been dropped.
+# Test that the synchronized slots will be dropped if the corresponding remote
+# slots on the primary server has been dropped.
 ##################################################
 
 $primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
 
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
 is( $standby1->safe_psql(
 		'postgres',
-		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot');}
 	),
 	"t",
 	'synchronized slot has been dropped');
 
+##################################################
+# Test that pg_sync_replication_slots() on the standby waits and retries
+# until the slot becomes sync-ready (when the standby catches up to the
+# slot's restart_lsn).
+##################################################
+
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Attempt to synchronize slots using API. This will initially fail because
+# the slot is not yet sync-ready (standby hasn't caught up to slot's restart_lsn),
+# but the API will wait and retry. Call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");
+
+# Confirm that the slot could not be synced initially.
+$standby1->wait_for_log(
+    qr/could not synchronize replication slot \"lsub1_slot\"/,
+    $log_offset);
+
+# Enable the Subscription, so that the slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts records on the primary for which the standby is waiting
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm log that the slot has been synced after becoming sync-ready.
+$standby1->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
+# Confirm that the logical failover slot is created on the standby and is
+# flagged as 'synced'
+is( $standby1->safe_psql(
+		'postgres',
+		q{SELECT count(*) = 1 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot') AND synced AND NOT temporary;}
+	),
+	"t",
+	'logical slots are synced after API retry on standby');
+
+# Drop the subscription and the tables created, create the slot again so that it can
+# be used later.
+$subscriber1->safe_psql('postgres',"DROP SUBSCRIPTION regress_mysub1");
+$primary->safe_psql('postgres',"DROP TABLE push_wal");
+$subscriber1->safe_psql('postgres',"DROP TABLE push_wal");
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
+);
+
 ##################################################
 # Test that if the synchronized slot is invalidated while the remote slot is
 # still valid, the slot will be dropped and re-created on the standby by
@@ -281,7 +342,7 @@ $inactive_since_on_primary =
 # the failover slots.
 $primary->wait_for_replay_catchup($standby1);
 
-my $log_offset = -s $standby1->logfile;
+$log_offset = -s $standby1->logfile;
 
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -941,10 +1002,11 @@ my $standby1_conninfo = $standby1->connstr . ' dbname=postgres';
 $subscriber1->safe_psql('postgres',
 	"ALTER SUBSCRIPTION regress_mysub1 CONNECTION '$standby1_conninfo';");
 
-# Confirm the synced slot 'lsub1_slot' is retained on the new primary
+# Confirm that the synced slots 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
 is( $standby1->safe_psql(
 		'postgres',
 		q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
+
 	),
 	't',
 	'synced slot retained on the new primary');
-- 
2.47.3

#91

Japin Li

japinli@hotmail.com

3 months ago

In reply to: Ajin Cherian (#90)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Hi, Ajin

Thanks for updating the patch.

On Mon, 27 Oct 2025 at 18:47, Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Oct 24, 2025 at 8:29 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Oct 22, 2025 at 10:25 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've modified the comments to reflect the new changes.

attaching patch v18 with the above changes.

Thanks for the patch. The test is still not clear. Can we please add
the test after the test of "Test logical failover slots corresponding
to different plugins" finishes instead of adding it in between?

I've rewritten the tests again to make this possible. Attaching v19
which has the modified tap test.

Here are some comments on the new patch.

1. Given the existence of the foreach_ptr macro, we can switch the usage of
foreach to foreach_ptr.

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1b78ffc5ff1..5db51407a82 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -872,7 +872,6 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)

if (slot_names != NIL)
{
- ListCell *lc;
bool first_slot = true;

/*
@@ -880,10 +879,8 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
*/
appendStringInfoString(&query, " AND slot_name IN (");

-		foreach(lc, slot_names)
+		foreach_ptr(char, slot_name, slot_names)
 		{
-			char *slot_name = (char *) lfirst(lc);
-
 			if (!first_slot)
 				appendStringInfoString(&query, ", ");

@@ -1872,15 +1869,13 @@ static List *
extract_slot_names(List *remote_slots)
{
List *slot_names = NIL;
- ListCell *lc;
MemoryContext oldcontext;

/* Switch to long-lived TopMemoryContext to store slot names */
oldcontext = MemoryContextSwitchTo(TopMemoryContext);

-	foreach(lc, remote_slots)
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
 	{
-		RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
 		char       *slot_name;

slot_name = pstrdup(remote_slot->name);

2. To append a signal character, switch from appendStringInfoString() to the
more efficient appendStringInfoChar().

+ appendStringInfoString(&query, ")");

3. The query memory can be released immediately after walrcv_exec() because
there are no subsequent references.

@@ -895,6 +892,7 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)

/* Execute the query */
res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+ pfree(query.data);
if (res->status != WALRCV_OK_TUPLES)
ereport(ERROR,
errmsg("could not fetch failover logical slots info from the primary server: %s",
@@ -975,7 +973,6 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
}

walrcv_clear_result(res);
- pfree(query.data);

return remote_slot_list;
}

Fujitsu Australia

[2. text/x-diff; v19-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch]...

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

#92

Ajin Cherian

itsajin@gmail.com

2 months ago

In reply to: Japin Li (#91)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Oct 27, 2025 at 8:22 PM Japin Li <japinli@hotmail.com> wrote:

Hi, Ajin

Thanks for updating the patch.

On Mon, 27 Oct 2025 at 18:47, Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Oct 24, 2025 at 8:29 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Oct 22, 2025 at 10:25 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've modified the comments to reflect the new changes.

attaching patch v18 with the above changes.

Thanks for the patch. The test is still not clear. Can we please add
the test after the test of "Test logical failover slots corresponding
to different plugins" finishes instead of adding it in between?

I've rewritten the tests again to make this possible. Attaching v19
which has the modified tap test.

Here are some comments on the new patch.

1. Given the existence of the foreach_ptr macro, we can switch the usage of
foreach to foreach_ptr.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1b78ffc5ff1..5db51407a82 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -872,7 +872,6 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
if (slot_names != NIL)
{
- ListCell *lc;
bool first_slot = true;

/*
@@ -880,10 +879,8 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
*/
appendStringInfoString(&query, " AND slot_name IN (");
-               foreach(lc, slot_names)
+               foreach_ptr(char, slot_name, slot_names)
{
-                       char *slot_name = (char *) lfirst(lc);
-
if (!first_slot)
appendStringInfoString(&query, ", ");
@@ -1872,15 +1869,13 @@ static List *
extract_slot_names(List *remote_slots)
{
List *slot_names = NIL;
- ListCell *lc;
MemoryContext oldcontext;

/* Switch to long-lived TopMemoryContext to store slot names */
oldcontext = MemoryContextSwitchTo(TopMemoryContext);
-       foreach(lc, remote_slots)
+       foreach_ptr(RemoteSlot, remote_slot, remote_slots)
{
-               RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
char       *slot_name;
slot_name = pstrdup(remote_slot->name);

2. To append a signal character, switch from appendStringInfoString() to the
more efficient appendStringInfoChar().

+ appendStringInfoString(&query, ")");

3. The query memory can be released immediately after walrcv_exec() because
there are no subsequent references.

@@ -895,6 +892,7 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)

/* Execute the query */
res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+ pfree(query.data);
if (res->status != WALRCV_OK_TUPLES)
ereport(ERROR,
errmsg("could not fetch failover logical slots info from the primary server: %s",
@@ -975,7 +973,6 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
}

walrcv_clear_result(res);
- pfree(query.data);

return remote_slot_list;
}

Thanks for your review, Japin. Here's patch v20 addressing the comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From f96cd099461499d56fcde53419fd27a866bdd83d Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 27 Oct 2025 18:46:10 +1100
Subject: [PATCH v20] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  12 +-
 src/backend/replication/logical/slotsync.c    | 356 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  72 +++-
 5 files changed, 384 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..b964937d509 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,15 +405,13 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index b122d99b009..4d43a7eae21 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -147,6 +164,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +571,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -576,11 +598,18 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		/*
 		 * The remote slot didn't catch up to locally reserved position.
 		 *
-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +624,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +651,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +752,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +822,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +834,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +859,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +948,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +967,37 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1013,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1266,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1375,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1605,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1711,7 +1828,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1738,23 +1856,169 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NULL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Reset flag before every iteration */
+			slot_persistence_pending = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1762,5 +2026,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 3059bb8177b..4d42defa156 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -211,21 +211,82 @@ is( $standby1->safe_psql(
 	'synchronized slot has got its own inactive_since');
 
 ##################################################
-# Test that the synchronized slot will be dropped if the corresponding remote
-# slot on the primary server has been dropped.
+# Test that the synchronized slots will be dropped if the corresponding remote
+# slots on the primary server has been dropped.
 ##################################################
 
 $primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
 
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
 is( $standby1->safe_psql(
 		'postgres',
-		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot');}
 	),
 	"t",
 	'synchronized slot has been dropped');
 
+##################################################
+# Test that pg_sync_replication_slots() on the standby waits and retries
+# until the slot becomes sync-ready (when the standby catches up to the
+# slot's restart_lsn).
+##################################################
+
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Attempt to synchronize slots using API. This will initially fail because
+# the slot is not yet sync-ready (standby hasn't caught up to slot's restart_lsn),
+# but the API will wait and retry. Call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");
+
+# Confirm that the slot could not be synced initially.
+$standby1->wait_for_log(
+    qr/could not synchronize replication slot \"lsub1_slot\"/,
+    $log_offset);
+
+# Enable the Subscription, so that the slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts records on the primary for which the standby is waiting
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm log that the slot has been synced after becoming sync-ready.
+$standby1->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
+# Confirm that the logical failover slot is created on the standby and is
+# flagged as 'synced'
+is( $standby1->safe_psql(
+		'postgres',
+		q{SELECT count(*) = 1 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot') AND synced AND NOT temporary;}
+	),
+	"t",
+	'logical slots are synced after API retry on standby');
+
+# Drop the subscription and the tables created, create the slot again so that it can
+# be used later.
+$subscriber1->safe_psql('postgres',"DROP SUBSCRIPTION regress_mysub1");
+$primary->safe_psql('postgres',"DROP TABLE push_wal");
+$subscriber1->safe_psql('postgres',"DROP TABLE push_wal");
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
+);
+
 ##################################################
 # Test that if the synchronized slot is invalidated while the remote slot is
 # still valid, the slot will be dropped and re-created on the standby by
@@ -281,7 +342,7 @@ $inactive_since_on_primary =
 # the failover slots.
 $primary->wait_for_replay_catchup($standby1);
 
-my $log_offset = -s $standby1->logfile;
+$log_offset = -s $standby1->logfile;
 
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -941,10 +1002,11 @@ my $standby1_conninfo = $standby1->connstr . ' dbname=postgres';
 $subscriber1->safe_psql('postgres',
 	"ALTER SUBSCRIPTION regress_mysub1 CONNECTION '$standby1_conninfo';");
 
-# Confirm the synced slot 'lsub1_slot' is retained on the new primary
+# Confirm that the synced slots 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
 is( $standby1->safe_psql(
 		'postgres',
 		q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
+
 	),
 	't',
 	'synced slot retained on the new primary');
-- 
2.47.3

#93

Chao Li

li.evan.chao@gmail.com

2 months ago

In reply to: Ajin Cherian (#92)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Hi Ajin,

I have reviewed v20 and got a few comments:

On Oct 30, 2025, at 18:18, Ajin Cherian <itsajin@gmail.com> wrote:

<v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch>

1 - slotsync.c
```
+		if (slot_names)
+			list_free_deep(slot_names);

 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1762,5 +2026,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
```

I am afraid there is a risk of double memory free. Slot_names has been assigned to fparams.slot_names within the for loop, and it’s freed after the loop. If something gets wrong and slotsync_failure_callback() is called, the function will free fparams.slot_names again.

2 - slotsync.c
```
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
```

I am thinking if that could be a problem. As you now extract_slot_names() only in the first iteration, if a slot is dropped, and a new slot comes with the same name, will the new slot be incorrectly synced?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#94

Japin Li

japinli@hotmail.com

2 months ago

In reply to: Ajin Cherian (#92)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, 30 Oct 2025 at 21:18, Ajin Cherian <itsajin@gmail.com> wrote:

On Mon, Oct 27, 2025 at 8:22 PM Japin Li <japinli@hotmail.com> wrote:
Hi, Ajin

Thanks for updating the patch.

On Mon, 27 Oct 2025 at 18:47, Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Oct 24, 2025 at 8:29 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Oct 22, 2025 at 10:25 AM Ajin Cherian <itsajin@gmail.com> wrote:

I've modified the comments to reflect the new changes.

attaching patch v18 with the above changes.

Thanks for the patch. The test is still not clear. Can we please add
the test after the test of "Test logical failover slots corresponding
to different plugins" finishes instead of adding it in between?

I've rewritten the tests again to make this possible. Attaching v19
which has the modified tap test.

Here are some comments on the new patch.

1. Given the existence of the foreach_ptr macro, we can switch the usage of
foreach to foreach_ptr.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1b78ffc5ff1..5db51407a82 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -872,7 +872,6 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
if (slot_names != NIL)
{
- ListCell *lc;
bool first_slot = true;

/*
@@ -880,10 +879,8 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
*/
appendStringInfoString(&query, " AND slot_name IN (");
-               foreach(lc, slot_names)
+               foreach_ptr(char, slot_name, slot_names)
{
-                       char *slot_name = (char *) lfirst(lc);
-
if (!first_slot)
appendStringInfoString(&query, ", ");
@@ -1872,15 +1869,13 @@ static List *
extract_slot_names(List *remote_slots)
{
List *slot_names = NIL;
- ListCell *lc;
MemoryContext oldcontext;

/* Switch to long-lived TopMemoryContext to store slot names */
oldcontext = MemoryContextSwitchTo(TopMemoryContext);
-       foreach(lc, remote_slots)
+       foreach_ptr(RemoteSlot, remote_slot, remote_slots)
{
-               RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
char       *slot_name;
slot_name = pstrdup(remote_slot->name);

2. To append a signal character, switch from appendStringInfoString() to the
more efficient appendStringInfoChar().

+ appendStringInfoString(&query, ")");

3. The query memory can be released immediately after walrcv_exec() because
there are no subsequent references.

@@ -895,6 +892,7 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)

/* Execute the query */
res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+ pfree(query.data);
if (res->status != WALRCV_OK_TUPLES)
ereport(ERROR,
errmsg("could not fetch failover logical slots info from the primary server: %s",
@@ -975,7 +973,6 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
}

walrcv_clear_result(res);
- pfree(query.data);

return remote_slot_list;
}
Thanks for your review, Japin. Here's patch v20 addressing the comments.

Thanks for updating the patch. Here are some comments on v20.

1. Since the content is unchanged, no modification is needed here.

-		 * We do not drop the slot because the restart_lsn can be ahead of the
-		 * current location when recreating the slot in the next cycle. It may
-		 * take more time to create such a slot. Therefore, we keep this slot
-		 * and attempt the synchronization in the next cycle.
+		 * We do not drop the slot because the restart_lsn can be
+		 * ahead of the current location when recreating the slot in
+		 * the next cycle. It may take more time to create such a
+		 * slot. Therefore, we keep this slot and attempt the
+		 * synchronization in the next cycle.

2. Could we align the parameter comment style for synchronize_slots() and
fetch_remote_slots() for better consistency?

3. Is this redundant? It was already initialized to false during declaration.

+			/* Reset flag before every iteration */
+			slot_persistence_pending = false;

4. A minor nitpick. The opening brace should be on a new line for style
consistency.

+			if (!IsTransactionState()) {
+				StartTransactionCommand();
+				started_tx = true;
+			}

5. Given that fparams.slot_names is a list, I suggest we replace NULL with NIL
for type consistency.

+ fparams.slot_names = NULL;

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

#95

Japin Li

japinli@hotmail.com

2 months ago

In reply to: Chao Li (#93)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, 30 Oct 2025 at 19:15, Chao Li <li.evan.chao@gmail.com> wrote:

Hi Ajin,

I have reviewed v20 and got a few comments:

On Oct 30, 2025, at 18:18, Ajin Cherian <itsajin@gmail.com> wrote:

<v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch>
1 - slotsync.c
```
+		if (slot_names)
+			list_free_deep(slot_names);
/* Cleanup the synced temporary slots */
ReplicationSlotCleanup(true);
@@ -1762,5 +2026,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
/* We are done with sync, so reset sync flag */
reset_syncing_flag();
}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
```
I am afraid there is a risk of double memory free. Slot_names has been assigned to fparams.slot_names within the for loop, and it’s freed after the loop. If something gets wrong and slotsync_failure_callback() is called, the function will free fparams.slot_names again.

Agreed.

Maybe we should set the fparams.slot_names to NIL immediately after freeing
the memory.

2 - slotsync.c
```
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
```

The slot name alone is insufficient to distinguish between the old and new
slots. In this case, the new slot state will overwrite the old. I see no
harm in this behavior, but please confirm if this is the desired behavior.

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

#96

shveta malik

shveta.malik@gmail.com

2 months ago

In reply to: Ajin Cherian (#92)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Oct 30, 2025 at 3:48 PM Ajin Cherian <itsajin@gmail.com> wrote:

Thanks for your review, Japin. Here's patch v20 addressing the comments.

Thank You for the patch. Please find a few comment son test:

1)
+# until the slot becomes sync-ready (when the standby catches up to the
+# slot's restart_lsn).

I think it should be 'when the primary server catches up' or 'when the
remote slot catches up with the locally reserved position.'

2)
+# Attempt to synchronize slots using API. This will initially fail because
+# the slot is not yet sync-ready (standby hasn't caught up to slot's
restart_lsn),
+# but the API will wait and retry. Call the API in a background process.

a)
'This will initially fail ' seems like the API will give an error,
which is not the case

b) 'standby hasn't caught up to slot's restart_lsn' is not correct.

We can rephrase to:
# Attempt to synchronize slots using the API. The API will continue
retrying synchronization until the remote slot catches up with the
locally reserved position.

3)
+# Enable the Subscription, so that the slot catches up

slot --> remote slot

4)
+# Create xl_running_xacts records on the primary for which the
standby is waiting

Shall we rephrase to below or anything better if you have?:
Create xl_running_xacts on the primary to speed up restart_lsn advancement.

5)
+# Confirm that the logical failover slot is created on the standby and is
+# flagged as 'synced'

Suggestion:
Verify that the logical failover slot is created on the standby,
marked as 'synced', and persisted.

(It is important to mention persisted because even temporary slot is
marked as synced)

thanks
Shveta

#97

shveta malik

shveta.malik@gmail.com

2 months ago

In reply to: shveta malik (#96)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Oct 31, 2025 at 11:04 AM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, Oct 30, 2025 at 3:48 PM Ajin Cherian <itsajin@gmail.com> wrote:

Thanks for your review, Japin. Here's patch v20 addressing the comments.

Thank You for the patch. Please find a few comment son test:
1)
+# until the slot becomes sync-ready (when the standby catches up to the
+# slot's restart_lsn).
I think it should be 'when the primary server catches up' or 'when the
remote slot catches up with the locally reserved position.'
2)
+# Attempt to synchronize slots using API. This will initially fail because
+# the slot is not yet sync-ready (standby hasn't caught up to slot's
restart_lsn),
+# but the API will wait and retry. Call the API in a background process.
a)
'This will initially fail ' seems like the API will give an error,
which is not the case

b) 'standby hasn't caught up to slot's restart_lsn' is not correct.

We can rephrase to:
# Attempt to synchronize slots using the API. The API will continue
retrying synchronization until the remote slot catches up with the
locally reserved position.

3)
+# Enable the Subscription, so that the slot catches up

slot --> remote slot

4)
+# Create xl_running_xacts records on the primary for which the
standby is waiting

Shall we rephrase to below or anything better if you have?:
Create xl_running_xacts on the primary to speed up restart_lsn advancement.
5)
+# Confirm that the logical failover slot is created on the standby and is
+# flagged as 'synced'
Suggestion:
Verify that the logical failover slot is created on the standby,
marked as 'synced', and persisted.

(It is important to mention persisted because even temporary slot is
marked as synced)

Shall we remove this change as it does not belong to the current patch
directly? I think it was a suggestion earlier, but we shall remove it.

6)
-# Confirm the synced slot 'lsub1_slot' is retained on the new primary
+# Confirm that the synced slots 'lsub1_slot' and 'snap_test_slot' are
retained on the new primary
 is( $standby1->safe_psql(
  'postgres',
  q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
+

thanks
Shveta

#98

Ajin Cherian

itsajin@gmail.com

2 months ago

In reply to: shveta malik (#97)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Oct 30, 2025 at 10:16 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi Ajin,

I have reviewed v20 and got a few comments:

On Oct 30, 2025, at 18:18, Ajin Cherian <itsajin@gmail.com> wrote:

<v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch>
1 - slotsync.c
```
+               if (slot_names)
+                       list_free_deep(slot_names);
/* Cleanup the synced temporary slots */
ReplicationSlotCleanup(true);
@@ -1762,5 +2026,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
/* We are done with sync, so reset sync flag */
reset_syncing_flag();
}
-       PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+       PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
```
I am afraid there is a risk of double memory free. Slot_names has been assigned to fparams.slot_names within the for loop, and it’s freed after the loop. If something gets wrong and slotsync_failure_callback() is called, the function will free fparams.slot_names again.

Yes, good catch. I have changed to set fparams.slot_names to NIL after
freeing it, so that it isn't freed in slotsync_failure_callback().

2 - slotsync.c
```
+                       /*
+                        * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+                        * fetch all failover-enabled slots. Note that we reuse slot_names from
+                        * the first iteration; re-fetching all failover slots each time could
+                        * cause an endless loop. Instead of reprocessing only the pending slots
+                        * in each iteration, it's better to process all the slots received in
+                        * the first iteration. This ensures that by the time we're done, all
+                        * slots reflect the latest values.
+                        */
+                       remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+                       /* Attempt to synchronize slots */
+                       some_slot_updated = synchronize_slots(wrconn, remote_slots,
+                                                                                                 &slot_persistence_pending);
+
+                       /*
+                        * If slot_persistence_pending is true, extract slot names
+                        * for future iterations (only needed if we haven't done it yet)
+                        */
+                       if (slot_names == NIL && slot_persistence_pending)
+                       {
+                               slot_names = extract_slot_names(remote_slots);
+
+                               /* Update the failure structure so that it can be freed on error */
+                               fparams.slot_names = slot_names;
+                       }
```

It doesn't matter, because the new slot will anyway have a later
restart_lsn and xmin, and all other attributes of the slot are also
updated as part of the sync. So, the old slot on the standby will
resemble the new slot on the primary.

On Fri, Oct 31, 2025 at 3:42 PM Japin Li <japinli@hotmail.com> wrote:

Thanks for updating the patch. Here are some comments on v20.

1. Since the content is unchanged, no modification is needed here.

-                * We do not drop the slot because the restart_lsn can be ahead of the
-                * current location when recreating the slot in the next cycle. It may
-                * take more time to create such a slot. Therefore, we keep this slot
-                * and attempt the synchronization in the next cycle.
+                * We do not drop the slot because the restart_lsn can be
+                * ahead of the current location when recreating the slot in
+                * the next cycle. It may take more time to create such a
+                * slot. Therefore, we keep this slot and attempt the
+                * synchronization in the next cycle.

Changed.

2. Could we align the parameter comment style for synchronize_slots() and
fetch_remote_slots() for better consistency?

Fixed.

3. Is this redundant? It was already initialized to false during declaration.
+                       /* Reset flag before every iteration */
+                       slot_persistence_pending = false;

Removed.

4. A minor nitpick. The opening brace should be on a new line for style
consistency.
+                       if (!IsTransactionState()) {
+                               StartTransactionCommand();
+                               started_tx = true;
+                       }

Fixed.

5. Given that fparams.slot_names is a list, I suggest we replace NULL with NIL
for type consistency.

+ fparams.slot_names = NULL;

Changed.

On Fri, Oct 31, 2025 at 4:34 PM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, Oct 30, 2025 at 3:48 PM Ajin Cherian <itsajin@gmail.com> wrote:

Thanks for your review, Japin. Here's patch v20 addressing the comments.

Thank You for the patch. Please find a few comment son test:
1)
+# until the slot becomes sync-ready (when the standby catches up to the
+# slot's restart_lsn).
I think it should be 'when the primary server catches up' or 'when the
remote slot catches up with the locally reserved position.'

Changed.

2)
+# Attempt to synchronize slots using API. This will initially fail because
+# the slot is not yet sync-ready (standby hasn't caught up to slot's
restart_lsn),
+# but the API will wait and retry. Call the API in a background process.
a)
'This will initially fail ' seems like the API will give an error,
which is not the case

b) 'standby hasn't caught up to slot's restart_lsn' is not correct.

We can rephrase to:
# Attempt to synchronize slots using the API. The API will continue
retrying synchronization until the remote slot catches up with the
locally reserved position.

changed accordingly.

3)
+# Enable the Subscription, so that the slot catches up

slot --> remote slot

4)
+# Create xl_running_xacts records on the primary for which the
standby is waiting

Shall we rephrase to below or anything better if you have?:
Create xl_running_xacts on the primary to speed up restart_lsn advancement.
5)
+# Confirm that the logical failover slot is created on the standby and is
+# flagged as 'synced'
Suggestion:
Verify that the logical failover slot is created on the standby,
marked as 'synced', and persisted.

(It is important to mention persisted because even temporary slot is
marked as synced)

changed as recommended.

I have addressed the above comments in patch v21.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v21-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v21-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 6787be3cd52ced6010a736203b0608ccc9212909 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 3 Nov 2025 20:50:14 +1100
Subject: [PATCH v21] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  12 +-
 src/backend/replication/logical/slotsync.c    | 349 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  71 +++-
 5 files changed, 381 insertions(+), 57 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index b803a819cf1..b964937d509 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,15 +405,13 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index b122d99b009..ec25993346e 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -147,6 +164,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +571,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -580,7 +602,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +623,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +650,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +751,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +821,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +833,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +858,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +947,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +966,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1013,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1266,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1275,7 +1375,7 @@ wait_for_slot_activity(bool some_slot_updated)
 	rc = WaitLatch(MyLatch,
 				   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 				   sleep_ms,
-				   WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+				   WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
 
 	if (rc & WL_LATCH_SET)
 		ResetLatch(MyLatch);
@@ -1505,10 +1605,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1711,7 +1828,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1738,23 +1856,170 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NIL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState())
+			{
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+		{
+			list_free_deep(slot_names);
+			fparams.slot_names = slot_names = NIL;
+		}
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1762,5 +2027,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..16b3b04d3c4 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_PRIMARY_CATCHUP	"Waiting for the primary to catch-up."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 3059bb8177b..94660fabd7a 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -211,21 +211,83 @@ is( $standby1->safe_psql(
 	'synchronized slot has got its own inactive_since');
 
 ##################################################
-# Test that the synchronized slot will be dropped if the corresponding remote
-# slot on the primary server has been dropped.
+# Test that the synchronized slots will be dropped if the corresponding remote
+# slots on the primary server has been dropped.
 ##################################################
 
 $primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
 
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
 is( $standby1->safe_psql(
 		'postgres',
-		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot');}
 	),
 	"t",
 	'synchronized slot has been dropped');
 
+##################################################
+# Test that pg_sync_replication_slots() on the standby waits and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+##################################################
+
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up with the locally reserved
+# position. The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");
+
+# Confirm that the slot could not be synced initially.
+$standby1->wait_for_log(
+    qr/could not synchronize replication slot \"lsub1_slot\"/,
+    $log_offset);
+
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm log that the slot has been synced after becoming sync-ready.
+$standby1->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
+# Verify that the logical failover slot is created on the standby,
+# marked as 'synced', and persisted.
+is( $standby1->safe_psql(
+		'postgres',
+		q{SELECT count(*) = 1 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot') AND synced AND NOT temporary;}
+	),
+	"t",
+	'logical slots are synced after API retry on standby');
+
+# Drop the subscription and the tables created, create the slot again so that it can
+# be used later.
+$subscriber1->safe_psql('postgres',"DROP SUBSCRIPTION regress_mysub1");
+$primary->safe_psql('postgres',"DROP TABLE push_wal");
+$subscriber1->safe_psql('postgres',"DROP TABLE push_wal");
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
+);
+
 ##################################################
 # Test that if the synchronized slot is invalidated while the remote slot is
 # still valid, the slot will be dropped and re-created on the standby by
@@ -281,7 +343,7 @@ $inactive_since_on_primary =
 # the failover slots.
 $primary->wait_for_replay_catchup($standby1);
 
-my $log_offset = -s $standby1->logfile;
+$log_offset = -s $standby1->logfile;
 
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -945,6 +1007,7 @@ $subscriber1->safe_psql('postgres',
 is( $standby1->safe_psql(
 		'postgres',
 		q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
+
 	),
 	't',
 	'synced slot retained on the new primary');
-- 
2.47.3

#99

shveta malik

shveta.malik@gmail.com

2 months ago

In reply to: Ajin Cherian (#98)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

I have addressed the above comments in patch v21.

Thank You. Please find a few comments:

1)
+ fparams.slot_names = slot_names = NIL;

I think it is not needed to set slot_names to NIL.

2)
-    WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+    WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);

The new name does not seem appropriate. For the slotsync-worker case,
even when the primary is not behind, the worker still waits but it is
not waiting for primary to catch-up. I could not find a better name
except the original one 'WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN'. We can
change the explanation to :

"Waiting in main loop of slot sync worker and slot sync API."
Or
"Waiting in main loop of slot synchronization."

If anyone has any better name suggestions, we can consider changing.

+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up with the locally reserved
+# position. The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.

Shall we remove 'with the locally reserved position', it’s already
explained in the test header and the comment is good enough even
without it.

4)
+# Confirm log that the slot has been synced after becoming sync-ready.

Shall we just say:
Confirm from the log that the slot is sync-ready now.

5)
# Synchronize the primary server slots to the standby.
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -945,6 +1007,7 @@ $subscriber1->safe_psql('postgres',
is( $standby1->safe_psql(
'postgres',
q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
+
),

Redundant change.

thanks
Shveta

#100

Ajin Cherian

itsajin@gmail.com

2 months ago

In reply to: shveta malik (#99)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Nov 4, 2025 at 5:23 PM shveta malik <shveta.malik@gmail.com> wrote:

I have addressed the above comments in patch v21.

Thank You. Please find a few comments:

1)
+ fparams.slot_names = slot_names = NIL;

I think it is not needed to set slot_names to NIL.
2)
-    WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+    WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
The new name does not seem appropriate. For the slotsync-worker case,
even when the primary is not behind, the worker still waits but it is
not waiting for primary to catch-up. I could not find a better name
except the original one 'WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN'. We can
change the explanation to :

"Waiting in main loop of slot sync worker and slot sync API."
Or
"Waiting in main loop of slot synchronization."

If anyone has any better name suggestions, we can consider changing.

Changed as suggested above.

3)
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up with the locally reserved
+# position. The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
Shall we remove 'with the locally reserved position', it’s already
explained in the test header and the comment is good enough even
without it.

Changed.

4)
+# Confirm log that the slot has been synced after becoming sync-ready.

Shall we just say:
Confirm from the log that the slot is sync-ready now.

5)
# Synchronize the primary server slots to the standby.
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -945,6 +1007,7 @@ $subscriber1->safe_psql('postgres',
is( $standby1->safe_psql(
'postgres',
q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
+
),

Redundant change.

Removed.

Attaching patch v22 addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v22-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v22-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 2ca473897d6e224a96266083b675000f825540c3 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Thu, 6 Nov 2025 16:05:00 +1100
Subject: [PATCH v22] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  12 +-
 src/backend/replication/logical/slotsync.c    | 347 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  70 +++-
 5 files changed, 379 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..54d1be3132e 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,15 +405,13 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index b122d99b009..1559929eb2c 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the pg_sync_replication API is used to sync the slots, and if the slots
+ * are not ready to be synced and are marked as RS_TEMPORARY because of any of
+ * the reasons mentioned above, then the API also waits and retries until the
+ * slots are marked as RS_PERSISTENT (which means sync-ready). Refer to the
+ * comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -147,6 +164,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +571,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -580,7 +602,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the API can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +623,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that API can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +650,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the pg_sync_replication_slots() API.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +751,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +821,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +833,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +858,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +947,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +966,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ * 							  API to track if any slots could not be
+ * 							  persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1013,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1186,6 +1266,26 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for pg_sync_replication_slots() API.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1505,10 +1605,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1711,7 +1828,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1738,23 +1856,170 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NIL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState())
+			{
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+		{
+			list_free_deep(slot_names);
+			fparams.slot_names = NIL;
+		}
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1762,5 +2027,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..8a91099ac86 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 1627e619b1b..974a1983fff 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -211,21 +211,83 @@ is( $standby1->safe_psql(
 	'synchronized slot has got its own inactive_since');
 
 ##################################################
-# Test that the synchronized slot will be dropped if the corresponding remote
-# slot on the primary server has been dropped.
+# Test that the synchronized slots will be dropped if the corresponding remote
+# slots on the primary server has been dropped.
 ##################################################
 
 $primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
 
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
 is( $standby1->safe_psql(
 		'postgres',
-		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot');}
 	),
 	"t",
 	'synchronized slot has been dropped');
 
+##################################################
+# Test that pg_sync_replication_slots() on the standby waits and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+##################################################
+
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");
+
+# Confirm that the slot could not be synced initially.
+$standby1->wait_for_log(
+    qr/could not synchronize replication slot \"lsub1_slot\"/,
+    $log_offset);
+
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby1->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
+# Verify that the logical failover slot is created on the standby,
+# marked as 'synced', and persisted.
+is( $standby1->safe_psql(
+		'postgres',
+		q{SELECT count(*) = 1 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot') AND synced AND NOT temporary;}
+	),
+	"t",
+	'logical slots are synced after API retry on standby');
+
+# Drop the subscription and the tables created, create the slot again so that it can
+# be used later.
+$subscriber1->safe_psql('postgres',"DROP SUBSCRIPTION regress_mysub1");
+$primary->safe_psql('postgres',"DROP TABLE push_wal");
+$subscriber1->safe_psql('postgres',"DROP TABLE push_wal");
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
+);
+
 ##################################################
 # Test that if the synchronized slot is invalidated while the remote slot is
 # still valid, the slot will be dropped and re-created on the standby by
@@ -281,7 +343,7 @@ $inactive_since_on_primary =
 # the failover slots.
 $primary->wait_for_replay_catchup($standby1);
 
-my $log_offset = -s $standby1->logfile;
+$log_offset = -s $standby1->logfile;
 
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-- 
2.47.3

#101

Japin Li

japinli@hotmail.com

2 months ago

In reply to: Ajin Cherian (#100)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, 06 Nov 2025 at 18:53, Ajin Cherian <itsajin@gmail.com> wrote:

On Tue, Nov 4, 2025 at 5:23 PM shveta malik <shveta.malik@gmail.com> wrote:
I have addressed the above comments in patch v21.

Thank You. Please find a few comments:

1)
+ fparams.slot_names = slot_names = NIL;

I think it is not needed to set slot_names to NIL.
2)
-    WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN);
+    WAIT_EVENT_REPLICATION_SLOTSYNC_PRIMARY_CATCHUP);
The new name does not seem appropriate. For the slotsync-worker case,
even when the primary is not behind, the worker still waits but it is
not waiting for primary to catch-up. I could not find a better name
except the original one 'WAIT_EVENT_REPLICATION_SLOTSYNC_MAIN'. We can
change the explanation to :

"Waiting in main loop of slot sync worker and slot sync API."
Or
"Waiting in main loop of slot synchronization."

If anyone has any better name suggestions, we can consider changing.
Changed as suggested above.
3)
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up with the locally reserved
+# position. The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
Shall we remove 'with the locally reserved position', it’s already
explained in the test header and the comment is good enough even
without it.
Changed.

4)
+# Confirm log that the slot has been synced after becoming sync-ready.

Shall we just say:
Confirm from the log that the slot is sync-ready now.

5)
# Synchronize the primary server slots to the standby.
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -945,6 +1007,7 @@ $subscriber1->safe_psql('postgres',
is( $standby1->safe_psql(
'postgres',
q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'snap_test_slot') AND synced AND NOT temporary;}
+
),

Redundant change.

Removed.

Attaching patch v22 addressing the above comments.

@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
 WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."

I've noticed that all events are sorted alphabetically. I think we should keep
the order of REPLICATION_SLOTSYNC_MAIN unchanged.

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

#102

shveta malik

shveta.malik@gmail.com

2 months ago

In reply to: Japin Li (#101)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Nov 7, 2025 at 10:36 AM Japin Li <japinli@hotmail.com> wrote:

Attaching patch v22 addressing the above comments.

@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN    "Waiting in main loop of logical replication apply process."
LOGICAL_LAUNCHER_MAIN  "Waiting in main loop of logical replication launcher process."
LOGICAL_PARALLEL_APPLY_MAIN    "Waiting in main loop of logical replication parallel apply process."
RECOVERY_WAL_STREAM    "Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN      "Waiting in main loop of slot sync worker."
REPLICATION_SLOTSYNC_SHUTDOWN  "Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_MAIN      "Waiting in main loop of slot synchronization."
SYSLOGGER_MAIN "Waiting in main loop of syslogger process."
WAL_RECEIVER_MAIN      "Waiting in main loop of WAL receiver process."
WAL_SENDER_MAIN        "Waiting in main loop of WAL sender process."

I've noticed that all events are sorted alphabetically. I think we should keep
the order of REPLICATION_SLOTSYNC_MAIN unchanged.

+1.

Few trivial comments:

1)
Since we have always used the term 'SQL function' rather than API in
existing code, shall we change all references of API to 'SQL function'
in current patch:

+ * If the pg_sync_replication API is used to sync the slots, and if the slots
"If the SQL function pg_sync_replication_slots() is used.."

+ * the reasons mentioned above, then the API also waits and retries until the
API --> SQL function

+ * persist. It is utilized by the pg_sync_replication_slots() API.
pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()

+ * the API can retry.
API --> SQL function

+ /* Set this, so that API can retry */
API --> SQL function

+ * persist. It is utilized by the pg_sync_replication_slots() API.
pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()

+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ *   API to track if any slots could not be
pg_sync_replication_slots  API --> SQL function pg_sync_replication_slots()

+ * Interrupt handler for pg_sync_replication_slots() API.
pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()

2)
ProcessSlotSyncAPIInterrupts
slotsync_api_reread_config
-- These also have API in it, but I do not have any better name
suggestions here, we can retain the current ones and see what others
say.

3)
/*
* Re-read the config file.
*
* Exit if any of the slot sync GUCs have changed. The postmaster will
* restart it.
*/
static void
slotsync_reread_config(void)

Shall we change this existing comment to: Re-read the config file for
slot sync worker.

+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)

Shall we change comment to:
/*
* Re-read the config file for SQL function pg_sync_replication_slots()
*
* Emit error if any of the slot sync GUCs have changed.
*/

thanks
Shveta

#103

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

2 months ago

In reply to: shveta malik (#102)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Nov 10, 2025 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:

2)
ProcessSlotSyncAPIInterrupts
slotsync_api_reread_config
-- These also have API in it, but I do not have any better name
suggestions here, we can retain the current ones and see what others
say.

ProcessSlotSyncInterrupts() handles shutdown waiting,
ProcessSlotSyncAPIInterrupts doesn't. Why is this difference? It will
be good to explain why we need two different functions for worker and
SQL function and also explain the difference between them.

$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");

I think, the intention behind dropping the slot is to be able to
create it again for the next test. But there is no comment here
explaining that. There is also no comment explaining why we are
dropping both slots here; when the test only needs dropping one.
That's going to create confusion. One might think that all the slots
need to be dropped at this stage, and drop and create any future slots
that are used by prior code, for example. At the end of this test, we
recreate the slot using pg_create_logical_replication_slot(), which is
different method of creating slot than this test does. Though I can
understand the reason, it's not apparent. Generally reusing slot names
across multiple tests (in this file) is a source of confusion. But at
least for the test you are adding, you could use a different slot name
to avoid confusion.
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");

If the standby does not receive the WAL corresponding to the DDL
before this function is executed, the slot will get synchronized
immediately. I think we have to make sure that the standby has
received the DDL before executing this function.

Also most of the code which uses query_until has pattern like this:
$h->query_until(qr/start/, q{\echo start
SQL command});
But we expect an empty string here. Why this difference?

I think we need a similar test to test promotion while the function is
waiting for the slot to become sync-ready.

SyncReplicationSlots() and the main loop in ReplSlotSyncWorkerMain()
are similar with some functional differences. Some part of their code
needs to be kept in sync in future. How do we achieve that? At least
we need a comment saying so in each of those patches and keep those
two codes in proximity.

--
Best Wishes,
Ashutosh Bapat

#104

Ajin Cherian

itsajin@gmail.com

2 months ago

In reply to: Ashutosh Bapat (#103)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Nov 10, 2025 at 8:31 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Nov 7, 2025 at 10:36 AM Japin Li <japinli@hotmail.com> wrote:

Attaching patch v22 addressing the above comments.

@@ -62,8 +62,8 @@ LOGICAL_APPLY_MAIN    "Waiting in main loop of logical replication apply process."
LOGICAL_LAUNCHER_MAIN  "Waiting in main loop of logical replication launcher process."
LOGICAL_PARALLEL_APPLY_MAIN    "Waiting in main loop of logical replication parallel apply process."
RECOVERY_WAL_STREAM    "Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN      "Waiting in main loop of slot sync worker."
REPLICATION_SLOTSYNC_SHUTDOWN  "Waiting for slot sync worker to shut down."
+REPLICATION_SLOTSYNC_MAIN      "Waiting in main loop of slot synchronization."
SYSLOGGER_MAIN "Waiting in main loop of syslogger process."
WAL_RECEIVER_MAIN      "Waiting in main loop of WAL receiver process."
WAL_SENDER_MAIN        "Waiting in main loop of WAL sender process."

I've noticed that all events are sorted alphabetically. I think we should keep
the order of REPLICATION_SLOTSYNC_MAIN unchanged.

+1.

Yes, changed.

Few trivial comments:

1)
Since we have always used the term 'SQL function' rather than API in
existing code, shall we change all references of API to 'SQL function'
in current patch:

+ * If the pg_sync_replication API is used to sync the slots, and if the slots
"If the SQL function pg_sync_replication_slots() is used.."

+ * the reasons mentioned above, then the API also waits and retries until the
API --> SQL function

+ * persist. It is utilized by the pg_sync_replication_slots() API.
pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()

+ * the API can retry.
API --> SQL function

+ /* Set this, so that API can retry */
API --> SQL function

+ * persist. It is utilized by the pg_sync_replication_slots() API.
pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()
+ * slot_persistence_pending - boolean used by pg_sync_replication_slots
+ *   API to track if any slots could not be
pg_sync_replication_slots  API --> SQL function pg_sync_replication_slots()
+ * Interrupt handler for pg_sync_replication_slots() API.
pg_sync_replication_slots() API --> SQL function pg_sync_replication_slots()

2)
ProcessSlotSyncAPIInterrupts
slotsync_api_reread_config
-- These also have API in it, but I do not have any better name
suggestions here, we can retain the current ones and see what others
say.

Changed.

3)
/*
* Re-read the config file.
*
* Exit if any of the slot sync GUCs have changed. The postmaster will
* restart it.
*/
static void
slotsync_reread_config(void)

Shall we change this existing comment to: Re-read the config file for
slot sync worker.

4)
+/*
+ * Re-read the config file and check for critical parameter changes.
+ *
+ */
+static void
+slotsync_api_reread_config(void)
Shall we change comment to:
/*
* Re-read the config file for SQL function pg_sync_replication_slots()
*
* Emit error if any of the slot sync GUCs have changed.
*/

Changed.

On Mon, Nov 10, 2025 at 9:44 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Mon, Nov 10, 2025 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:

2)
ProcessSlotSyncAPIInterrupts
slotsync_api_reread_config
-- These also have API in it, but I do not have any better name
suggestions here, we can retain the current ones and see what others
say.

ProcessSlotSyncInterrupts() handles shutdown waiting,
ProcessSlotSyncAPIInterrupts doesn't. Why is this difference? It will
be good to explain why we need two different functions for worker and
SQL function and also explain the difference between them.

I've updated the function header to explain this. The slot sync worker
is a specific background worker while the API runs in the regular
backend, so special handling is not needed.

$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");

I think, the intention behind dropping the slot is to be able to
create it again for the next test. But there is no comment here
explaining that. There is also no comment explaining why we are
dropping both slots here; when the test only needs dropping one.
That's going to create confusion. One might think that all the slots
need to be dropped at this stage, and drop and create any future slots
that are used by prior code, for example. At the end of this test, we
recreate the slot using pg_create_logical_replication_slot(), which is
different method of creating slot than this test does. Though I can
understand the reason, it's not apparent. Generally reusing slot names
across multiple tests (in this file) is a source of confusion. But at
least for the test you are adding, you could use a different slot name
to avoid confusion.

I've added a comment there that dropping both the slots is required
for the next test. Also I cannot change the name of the slot as the
next tests need the same slot synced.

+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr//, "SELECT pg_sync_replication_slots();\n");

Yes, I've added a line to make sure that the stanbdy has caught up.

Also most of the code which uses query_until has pattern like this:
$h->query_until(qr/start/, q{\echo start
SQL command});
But we expect an empty string here. Why this difference?

I've modified it as suggested.

I think we need a similar test to test promotion while the function is
waiting for the slot to become sync-ready.

Unfortunately, that will make this test too long if I add one more
wait loop for slot sync.

SyncReplicationSlots() and the main loop in ReplSlotSyncWorkerMain()
are similar with some functional differences. Some part of their code
needs to be kept in sync in future. How do we achieve that? At least
we need a comment saying so in each of those patches and keep those
two codes in proximity.

I've added a comment in the header of ReplSlotSyncWorkerMain to suggest this.

Attaching patch v23 addressing these comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v23-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v23-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 768b78b3c9a02d382b8eaded301594a7d387e30d Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 12 Nov 2025 19:15:39 +1100
Subject: [PATCH v23] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  12 +-
 src/backend/replication/logical/slotsync.c    | 356 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  77 +++-
 5 files changed, 394 insertions(+), 57 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..54d1be3132e 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,15 +405,13 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8b4afd87dc9..1f259d9cfec 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication is used to sync the slots, and if
+ * the slots are not ready to be synced and are marked as RS_TEMPORARY because
+ * of any of the reasons mentioned above, then the SQL function also waits and
+ * retries until the slots are marked as RS_PERSISTENT (which means sync-ready).
+ * Refer to the comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -147,6 +164,7 @@ typedef struct RemoteSlot
 
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
+static void slotsync_api_reread_config(void);
 
 /*
  * If necessary, update the local synced slot's metadata based on the data
@@ -553,11 +571,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -580,7 +602,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +623,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +650,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +751,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +821,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +833,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +858,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +947,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +966,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1013,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1115,7 +1195,7 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot sync worker.
  *
  * Exit if any of the slot sync GUCs have changed. The postmaster will
  * restart it.
@@ -1186,6 +1266,29 @@ ProcessSlotSyncInterrupts(void)
 		slotsync_reread_config();
 }
 
+/*
+ * Interrupt handler for SQL function pg_sync_replication_slots().
+ * This is different from the slot sync worker interrupt handler beacause
+ * we need not handle shutdown requests and only need to worry about fewer
+ * config param changes.
+ */
+static void
+ProcessSlotSyncAPIInterrupts()
+{
+	CHECK_FOR_INTERRUPTS();
+
+	/* If we've been promoted, then no point continuing. */
+	if (SlotSyncCtx->stopSignaled)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("cannot continue replication slots synchronization"
+						" as standby promotion is triggered")));
+
+	/* error out if configuration parameters changed */
+	if (ConfigReloadPending)
+		slotsync_api_reread_config();
+}
+
 /*
  * Connection cleanup function for slotsync worker.
  *
@@ -1344,6 +1447,9 @@ reset_syncing_flag()
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1505,10 +1611,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1711,7 +1834,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1738,23 +1862,171 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+	MemoryContext oldcontext;
+
+	/* Switch to long-lived TopMemoryContext to store slot names */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return slot_names;
+}
+
+/*
+ * Re-read the config file for SQL function pg_sync_replication_slots().
+ *
+ * Emit error if any of the slot sync GUCs have changed.
+ */
+static void
+slotsync_api_reread_config(void)
+{
+	char       *old_primary_conninfo = pstrdup(PrimaryConnInfo);
+	char       *old_primary_slotname = pstrdup(PrimarySlotName);
+	bool        old_hot_standby_feedback = hot_standby_feedback;
+	bool        conninfo_changed;
+	bool        primary_slotname_changed;
+
+	ConfigReloadPending = false;
+	ProcessConfigFile(PGC_SIGHUP);
+
+	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
+	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
+	pfree(old_primary_conninfo);
+	pfree(old_primary_slotname);
+
+	/* throw error for certain parameter changes */
+	if (conninfo_changed ||
+		primary_slotname_changed ||
+		(old_hot_standby_feedback != hot_standby_feedback))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_CONFIG_FILE_ERROR),
+				 errmsg("cannot continue slot synchronization due"
+						" to parameter changes"),
+				 errdetail("One or more of primary_conninfo,"
+						   " primary_slot_name or hot_standby_feedback"
+						   " were modified"),
+				 errhint("Retry pg_sync_replication_slots() to use the"
+						 " updated configuration.")));
+	}
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NIL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(InvalidPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	started_tx = false;
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncAPIInterrupts();
+
+			/*
+			 * The syscache access in fetch_remote_slots() needs a
+			 * transaction env.
+			 */
+			if (!IsTransactionState())
+			{
+				StartTransactionCommand();
+				started_tx = true;
+			}
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Commit transaction if we started it */
+			if (started_tx)
+				CommitTransactionCommand();
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+		{
+			list_free_deep(slot_names);
+			fparams.slot_names = NIL;
+		}
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1762,5 +2034,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..92101e12cd6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 1627e619b1b..1b0c5b7f331 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -211,21 +211,90 @@ is( $standby1->safe_psql(
 	'synchronized slot has got its own inactive_since');
 
 ##################################################
-# Test that the synchronized slot will be dropped if the corresponding remote
-# slot on the primary server has been dropped.
+# Test that the synchronized slots will be dropped if the corresponding remote
+# slots on the primary server has been dropped.
+# Note: Both slots need to be dropped for the next test to work
 ##################################################
 
 $primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
 
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
 is( $standby1->safe_psql(
 		'postgres',
-		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot');}
 	),
 	"t",
 	'synchronized slot has been dropped');
 
+##################################################
+# Test that pg_sync_replication_slots() on the standby waits and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+##################################################
+
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
+$primary->wait_for_replay_catchup($standby1);
+
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
+
+# Confirm that the slot could not be synced initially.
+$standby1->wait_for_log(
+    qr/could not synchronize replication slot \"lsub1_slot\"/,
+    $log_offset);
+
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby1->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
+# Verify that the logical failover slot is created on the standby,
+# marked as 'synced', and persisted.
+is( $standby1->safe_psql(
+		'postgres',
+		q{SELECT count(*) = 1 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot') AND synced AND NOT temporary;}
+	),
+	"t",
+	'logical slots are synced after API retry on standby');
+
+# Drop the subscription and the tables created, create the lsub1_slot slot again so that it can
+# be used later.
+$subscriber1->safe_psql('postgres',"DROP SUBSCRIPTION regress_mysub1");
+$primary->safe_psql('postgres',"DROP TABLE push_wal");
+$subscriber1->safe_psql('postgres',"DROP TABLE push_wal");
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
+);
+
 ##################################################
 # Test that if the synchronized slot is invalidated while the remote slot is
 # still valid, the slot will be dropped and re-created on the standby by
@@ -281,7 +350,7 @@ $inactive_since_on_primary =
 # the failover slots.
 $primary->wait_for_replay_catchup($standby1);
 
-my $log_offset = -s $standby1->logfile;
+$log_offset = -s $standby1->logfile;
 
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-- 
2.47.3

#105

Amit Kapila

amit.kapila16@gmail.com

about 2 months ago

In reply to: Ajin Cherian (#104)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Nov 12, 2025 at 1:54 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v23 addressing these comments.

Few comments:
=============
1.
In contrast, automatic synchronization
via <varname>sync_replication_slots</varname> provides continuous slot
updates, enabling seamless failover and supporting high availability.
- Therefore, it is the recommended method for synchronizing slots.

I think a slotsync worker should still be a recommended method. So, we
shouldn't remove the last line.

2. I think we can unify slotsync_api_reread_config() and
ProcessSlotSyncAPIInterrupts() with corresponding existing functions
for slotsync worker. Having separate functions for API and worker to
handle interrupts looks odd and bug-prone w.r.t future changes in this
area.

3.
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+ List *slot_names = NIL;
+ MemoryContext oldcontext;
+
+ /* Switch to long-lived TopMemoryContext to store slot names */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+ foreach_ptr(RemoteSlot, remote_slot, remote_slots)

Why did we allocate this memory in TopMemoryContext here? We only need
till this function executes, isn't CurrentMemoryContext (which I think
is ExprContext) sufficient, if not, why? If we use some
function/query-level context then we don't need to make it part of
SlotSyncApiFailureParams. If we can get rid of slot_names from
SlotSyncApiFailureParams then we probably don't need struct
SlotSyncApiFailureParams.

--
With Regards,
Amit Kapila.

#106

shveta malik

shveta.malik@gmail.com

about 2 months ago

In reply to: Ajin Cherian (#104)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Nov 12, 2025 at 1:54 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v23 addressing these comments.

Thanks for the patch.

I observed that if the API is taking a nap in between slot sync cycles
and a promotion is triggered during that time, the promotion has to
wait for the entire nap period to finish before slot-sync stops and
the process can continue. There should be a mechanism to wake up the
backend so the API can exit early once stopSignaled is set. How about
doing SetLatch for the process doing synchronization in
ShutDownSlotSync()?

thanks
Shveta

#107

Ajin Cherian

itsajin@gmail.com

about 2 months ago

In reply to: shveta malik (#106)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Nov 19, 2025 at 5:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 12, 2025 at 1:54 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v23 addressing these comments.

Few comments:
=============
1.
In contrast, automatic synchronization
via <varname>sync_replication_slots</varname> provides continuous slot
updates, enabling seamless failover and supporting high availability.
- Therefore, it is the recommended method for synchronizing slots.

I think a slotsync worker should still be a recommended method. So, we
shouldn't remove the last line.

Changed.

2. I think we can unify slotsync_api_reread_config() and
ProcessSlotSyncAPIInterrupts() with corresponding existing functions
for slotsync worker. Having separate functions for API and worker to
handle interrupts looks odd and bug-prone w.r.t future changes in this
area.

I’ve refactored and unified slotsync_api_reread_config() and
ProcessSlotSyncAPIInterrupts() with their existing counterparts. As
part of this, I switched the shutdown signal for slot-syncing workers
and backends from SIGINT to SIGUSR1, so that all slot-synchronization
processes use a consistent signaling model. Background workers handle
SIGINT via StatementCancelHandler, which is not suitable for
coordinated shutdowns, but SIGUSR1 reliably sets the process latch and
wakes up both worker types. Once awakened, these processes invoke
ProcessSlotSyncInterrupts(), which now checks
SlotSyncCtx->stopSignaled to perform a clean shutdown with appropriate
logs.

3.
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+ List *slot_names = NIL;
+ MemoryContext oldcontext;
+
+ /* Switch to long-lived TopMemoryContext to store slot names */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+ foreach_ptr(RemoteSlot, remote_slot, remote_slots)
Why did we allocate this memory in TopMemoryContext here? We only need
till this function executes, isn't CurrentMemoryContext (which I think
is ExprContext) sufficient, if not, why? If we use some
function/query-level context then we don't need to make it part of
SlotSyncApiFailureParams. If we can get rid of slot_names from
SlotSyncApiFailureParams then we probably don't need struct
SlotSyncApiFailureParams.

In further testing, I've found that a Transaction is always started
when pg_sync_replication_slots() is called, and there was no need to
start new transactions and there was no need for a seperate memory
context for remote_slots. I think the confusion was probably from an
earlier bug in my code. I have added an assert to make sure that
transactions are started when in the API (should I make it an error
instead?).

On Wed, Nov 19, 2025 at 6:05 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Nov 12, 2025 at 1:54 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v23 addressing these comments.

Thanks for the patch.

I observed that if the API is taking a nap in between slot sync cycles
and a promotion is triggered during that time, the promotion has to
wait for the entire nap period to finish before slot-sync stops and
the process can continue. There should be a mechanism to wake up the
backend so the API can exit early once stopSignaled is set. How about
doing SetLatch for the process doing synchronization in
ShutDownSlotSync()?

Yes. To address this, I now also store the background worker pid which
is calling pg_sync_replication_slots() in SlotSyncCtx->pid, and I've
modified the ShutDownSlotSync logic to issue a SIGUSR1 which will
SetLatch on SlotSyncCtx->pid.

Attaching patch v24, addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v24-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v24-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 97bd936394feac1c3d828567ec76784b4389fdee Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Fri, 21 Nov 2025 14:17:04 +1100
Subject: [PATCH v24] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  11 +-
 src/backend/replication/logical/slotsync.c    | 357 ++++++++++++++----
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  79 +++-
 5 files changed, 366 insertions(+), 87 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..33940504622 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,12 +405,11 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
       Therefore, it is the recommended method for synchronizing slots.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8b4afd87dc9..8f7138afff7 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication is used to sync the slots, and if
+ * the slots are not ready to be synced and are marked as RS_TEMPORARY because
+ * of any of the reasons mentioned above, then the SQL function also waits and
+ * retries until the slots are marked as RS_PERSISTENT (which means sync-ready).
+ * Refer to the comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -553,11 +570,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -580,7 +601,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +622,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +649,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +750,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +820,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +832,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +857,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +946,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +965,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1012,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1115,13 +1194,14 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * from_api = true when called from SQL function pg_sync_replication_slots()
+ * from_api = false when called from slotsync worker
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
-slotsync_reread_config(void)
+slotsync_reread_config(bool from_api)
 {
 	char	   *old_primary_conninfo = pstrdup(PrimaryConnInfo);
 	char	   *old_primary_slotname = pstrdup(PrimarySlotName);
@@ -1130,38 +1210,52 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	if (!from_api)
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
 
 	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
 	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
 	pfree(old_primary_conninfo);
 	pfree(old_primary_slotname);
 
-	if (old_sync_replication_slots != sync_replication_slots)
+	/* Worker-specific check for sync_replication_slots change */
+	if (!from_api && old_sync_replication_slots != sync_replication_slots)
 	{
 		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
+				/* translator: %s is a GUC variable name */
+				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+					   "sync_replication_slots"));
 		proc_exit(0);
 	}
 
+	/* Check for parameter changes common to both API and worker */
 	if (conninfo_changed ||
 		primary_slotname_changed ||
 		(old_hot_standby_feedback != hot_standby_feedback))
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
+		int elevel;
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+		if (from_api)
+			elevel = ERROR;
+		else
+			elevel = LOG;
 
-		proc_exit(0);
+		ereport(elevel,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
+
+		if (!from_api)
+		{
+			/*
+			 * Reset the last-start time for this worker so that the postmaster
+			 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+			 */
+			SlotSyncCtx->last_start_time = 0;
+			proc_exit(0);
+		}
 	}
 
 }
@@ -1170,20 +1264,28 @@ slotsync_reread_config(void)
  * Interrupt handler for main loop of slot sync worker.
  */
 static void
-ProcessSlotSyncInterrupts(void)
+ProcessSlotSyncInterrupts(bool from_api)
 {
 	CHECK_FOR_INTERRUPTS();
 
-	if (ShutdownRequestPending)
+	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down on receiving SIGINT"));
+		if (from_api)
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("cannot continue replication slots synchronization"
+						   " as standby promotion is triggered"));
+		else
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker is shutting down on receiving SIGUSR1"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
 	}
 
 	if (ConfigReloadPending)
-		slotsync_reread_config();
+		slotsync_reread_config(from_api);
 }
 
 /*
@@ -1290,9 +1392,6 @@ check_and_set_sync_info(pid_t worker_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
 	/*
 	 * Emit an error if startup process signaled the slot sync machinery to
 	 * stop. See comments atop SlotSyncCtxStruct.
@@ -1315,6 +1414,9 @@ check_and_set_sync_info(pid_t worker_pid)
 
 	SlotSyncCtx->syncing = true;
 
+	/* The worker pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	/*
 	 * Advertise the required PID so that the startup process can kill the
 	 * slot sync worker on promotion.
@@ -1334,6 +1436,7 @@ reset_syncing_flag()
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1344,6 +1447,9 @@ reset_syncing_flag()
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1408,7 +1514,6 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/* Setup signal handling */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, SignalHandlerForShutdownRequest);
 	pqsignal(SIGTERM, die);
 	pqsignal(SIGFPE, FloatExceptionHandler);
 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
@@ -1505,17 +1610,34 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
-		ProcessSlotSyncInterrupts();
+		ProcessSlotSyncInterrupts(false);
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
 
 	/*
 	 * The slot sync worker can't get here because it will only stop when it
-	 * receives a SIGINT from the startup process, or when there is an error.
+	 * receives a SIGUSR1 from the startup process, or when there is an error.
 	 */
 	Assert(false);
 }
@@ -1542,7 +1664,7 @@ update_synced_slots_inactive_since(void)
 		return;
 
 	/* The slot sync worker or SQL function mustn't be running by now */
-	Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
+	Assert(!SlotSyncCtx->syncing);
 
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
@@ -1601,7 +1723,7 @@ ShutDownSlotSync(void)
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGINT);
+		kill(worker_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1711,7 +1833,8 @@ SlotSyncShmemInit(void)
 static void
 slotsync_failure_callback(int code, Datum arg)
 {
-	WalReceiverConn *wrconn = (WalReceiverConn *) DatumGetPointer(arg);
+	SlotSyncApiFailureParams *fparams =
+		(SlotSyncApiFailureParams *) DatumGetPointer(arg);
 
 	/*
 	 * We need to do slots cleanup here just like WalSndErrorCleanup() does.
@@ -1738,23 +1861,113 @@ slotsync_failure_callback(int code, Datum arg)
 	if (syncing_slots)
 		reset_syncing_flag();
 
-	walrcv_disconnect(wrconn);
+	if (fparams->slot_names)
+		list_free_deep(fparams->slot_names);
+
+	walrcv_disconnect(fparams->wrconn);
+}
+
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
 }
 
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
-	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	SlotSyncApiFailureParams fparams;
+
+	fparams.wrconn = wrconn;
+	fparams.slot_names = NIL;
+
+	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 	{
-		check_and_set_sync_info(InvalidPid);
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
+		check_and_set_sync_info(MyProcPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts(true);
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+		{
+			list_free_deep(slot_names);
+			fparams.slot_names = NIL;
+		}
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1762,5 +1975,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..92101e12cd6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 1627e619b1b..29f7d865171 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -211,21 +211,90 @@ is( $standby1->safe_psql(
 	'synchronized slot has got its own inactive_since');
 
 ##################################################
-# Test that the synchronized slot will be dropped if the corresponding remote
-# slot on the primary server has been dropped.
+# Test that the synchronized slots will be dropped if the corresponding remote
+# slots on the primary server has been dropped.
+# Note: Both slots need to be dropped for the next test to work
 ##################################################
 
 $primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
 
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
 is( $standby1->safe_psql(
 		'postgres',
-		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot');}
 	),
 	"t",
 	'synchronized slot has been dropped');
 
+##################################################
+# Test that pg_sync_replication_slots() on the standby waits and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+##################################################
+
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
+$primary->wait_for_replay_catchup($standby1);
+
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
+
+# Confirm that the slot could not be synced initially.
+$standby1->wait_for_log(
+    qr/could not synchronize replication slot \"lsub1_slot\"/,
+    $log_offset);
+
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby1->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
+# Verify that the logical failover slot is created on the standby,
+# marked as 'synced', and persisted.
+is( $standby1->safe_psql(
+		'postgres',
+		q{SELECT count(*) = 1 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot') AND synced AND NOT temporary;}
+	),
+	"t",
+	'logical slots are synced after API retry on standby');
+
+# Drop the subscription and the tables created, create the lsub1_slot slot again so that it can
+# be used later.
+$subscriber1->safe_psql('postgres',"DROP SUBSCRIPTION regress_mysub1");
+$primary->safe_psql('postgres',"DROP TABLE push_wal");
+$subscriber1->safe_psql('postgres',"DROP TABLE push_wal");
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
+);
+
 ##################################################
 # Test that if the synchronized slot is invalidated while the remote slot is
 # still valid, the slot will be dropped and re-created on the standby by
@@ -281,7 +350,7 @@ $inactive_since_on_primary =
 # the failover slots.
 $primary->wait_for_replay_catchup($standby1);
 
-my $log_offset = -s $standby1->logfile;
+$log_offset = -s $standby1->logfile;
 
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -554,7 +623,7 @@ $standby1->reload;
 # Confirm that slot sync worker acknowledge the GUC change and logs the msg
 # about wrong configuration.
 $standby1->wait_for_log(
-	qr/slot synchronization worker will restart because of a parameter change/,
+	qr/replication slot synchronization will stop because of a parameter change/,
 	$log_offset);
 $standby1->wait_for_log(
 	qr/slot synchronization requires "hot_standby_feedback" to be enabled/,
-- 
2.47.3

#108

shveta malik

shveta.malik@gmail.com

about 2 months ago

In reply to: Ajin Cherian (#107)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Nov 21, 2025 at 9:14 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v24, addressing the above comments.

Thanks for the patch. Please find a few comments:

1)
Instead of passing an argument to slotsync_reread_config and
ProcessSlotSyncInterrupts, we can use 'AmLogicalSlotSyncWorkerProcess'
to distinguish the worker and API.

2)
Also, since we are not using a separate memory context, we don't need
to make structure 'SlotSyncApiFailureParams' for slot_names failure.
slot_names will be freed with the memory-context itself when
exec_simple_query finishes.

3)
- if (old_sync_replication_slots != sync_replication_slots)
+ /* Worker-specific check for sync_replication_slots change */
+ if (!from_api && old_sync_replication_slots != sync_replication_slots)
  {
  ereport(LOG,
- /* translator: %s is a GUC variable name */
- errmsg("replication slot synchronization worker will shut down
because \"%s\" is disabled", "sync_replication_slots"));
+ /* translator: %s is a GUC variable name */
+ errmsg("replication slot synchronization worker will shut down
because \"%s\" is disabled",
+    "sync_replication_slots"));
  proc_exit(0);
  }

Here, we need not to have different flow for api and worker. Both can
quit sync when this parameter is changed. The idea is if someone
enables 'sync_replication_slots' when API is working, that means we
need to start slot-sync worker, so it is okay if the API notices this
and exits too.

4)
+ if (from_api)
+ elevel = ERROR;
+ else
+ elevel = LOG;

- proc_exit(0);
+ ereport(elevel,
+ errmsg("replication slot synchronization will stop because of a
parameter change"));
+

We can do:
ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR, ...);

5)
SlotSyncCtx->syncing = true;

+ /* The worker pid must not be already assigned in SlotSyncCtx */
+ Assert(SlotSyncCtx->pid == InvalidPid);
+

We can shift Assert before we set the shared-memory flag 'SlotSyncCtx->syncing'.

thanks
Shveta

#109

Ajin Cherian

itsajin@gmail.com

about 2 months ago

In reply to: shveta malik (#108)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Nov 21, 2025 at 8:10 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Nov 21, 2025 at 9:14 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v24, addressing the above comments.

Thanks for the patch. Please find a few comments:

1)
Instead of passing an argument to slotsync_reread_config and
ProcessSlotSyncInterrupts, we can use 'AmLogicalSlotSyncWorkerProcess'
to distinguish the worker and API.

Changed as such.

2)
Also, since we are not using a separate memory context, we don't need
to make structure 'SlotSyncApiFailureParams' for slot_names failure.
slot_names will be freed with the memory-context itself when
exec_simple_query finishes.

Removed.

3)
- if (old_sync_replication_slots != sync_replication_slots)
+ /* Worker-specific check for sync_replication_slots change */
+ if (!from_api && old_sync_replication_slots != sync_replication_slots)
{
ereport(LOG,
- /* translator: %s is a GUC variable name */
- errmsg("replication slot synchronization worker will shut down
because \"%s\" is disabled", "sync_replication_slots"));
+ /* translator: %s is a GUC variable name */
+ errmsg("replication slot synchronization worker will shut down
because \"%s\" is disabled",
+    "sync_replication_slots"));
proc_exit(0);
}
Here, we need not to have different flow for api and worker. Both can
quit sync when this parameter is changed. The idea is if someone
enables 'sync_replication_slots' when API is working, that means we
need to start slot-sync worker, so it is okay if the API notices this
and exits too.

changed but used a different error message.

4)
+ if (from_api)
+ elevel = ERROR;
+ else
+ elevel = LOG;

- proc_exit(0);
+ ereport(elevel,
+ errmsg("replication slot synchronization will stop because of a
parameter change"));
+

We can do:
ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR, ...);

changed as such.

5)
SlotSyncCtx->syncing = true;
+ /* The worker pid must not be already assigned in SlotSyncCtx */
+ Assert(SlotSyncCtx->pid == InvalidPid);
+
We can shift Assert before we set the shared-memory flag 'SlotSyncCtx->syncing'.

Done.

Attaching patch v25 addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v25-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v25-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 02c594bf644fff58690150c91cf0a1bf6e23a499 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Mon, 24 Nov 2025 17:29:17 +1100
Subject: [PATCH v25] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  11 +-
 src/backend/replication/logical/slotsync.c    | 333 ++++++++++++++----
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  79 ++++-
 5 files changed, 347 insertions(+), 82 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..33940504622 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,12 +405,11 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
       Therefore, it is the recommended method for synchronizing slots.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 8b4afd87dc9..1f3ceb695d1 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication is used to sync the slots, and if
+ * the slots are not ready to be synced and are marked as RS_TEMPORARY because
+ * of any of the reasons mentioned above, then the SQL function also waits and
+ * retries until the slots are marked as RS_PERSISTENT (which means sync-ready).
+ * Refer to the comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -100,6 +107,16 @@ typedef struct SlotSyncCtxStruct
 	slock_t		mutex;
 } SlotSyncCtxStruct;
 
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+	WalReceiverConn *wrconn;
+	List			*slot_names;
+} SlotSyncApiFailureParams;
+
 static SlotSyncCtxStruct *SlotSyncCtx = NULL;
 
 /* GUC variable */
@@ -553,11 +570,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -580,7 +601,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -595,6 +622,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -618,10 +649,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr;
@@ -715,7 +750,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -784,7 +820,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		ReplicationSlotsComputeRequiredXmin(true);
 		LWLockRelease(ProcArrayLock);
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -795,15 +832,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -812,29 +857,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -885,7 +946,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -905,12 +965,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -926,19 +1012,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1115,13 +1194,14 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
-slotsync_reread_config(void)
+slotsync_reread_config()
 {
 	char	   *old_primary_conninfo = pstrdup(PrimaryConnInfo);
 	char	   *old_primary_slotname = pstrdup(PrimarySlotName);
@@ -1130,38 +1210,54 @@ slotsync_reread_config(void)
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
 
-	Assert(sync_replication_slots);
+	if (AmLogicalSlotSyncWorkerProcess())
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
 
 	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
 	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
 	pfree(old_primary_conninfo);
 	pfree(old_primary_slotname);
 
+	/* Worker-specific check for sync_replication_slots change */
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+						   "sync_replication_slots"));
+			proc_exit(0);
+		}
+		else
+			ereport(ERROR,
+					/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization will stop because \"%s\" is enabled",
+						   "sync_replication_slots"));
 	}
 
+	/* Check for parameter changes common to both API and worker */
 	if (conninfo_changed ||
 		primary_slotname_changed ||
 		(old_hot_standby_feedback != hot_standby_feedback))
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+		ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 
-		proc_exit(0);
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			/*
+			 * Reset the last-start time for this worker so that the postmaster
+			 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+			 */
+			SlotSyncCtx->last_start_time = 0;
+			proc_exit(0);
+		}
 	}
 
 }
@@ -1170,16 +1266,24 @@ slotsync_reread_config(void)
  * Interrupt handler for main loop of slot sync worker.
  */
 static void
-ProcessSlotSyncInterrupts(void)
+ProcessSlotSyncInterrupts()
 {
 	CHECK_FOR_INTERRUPTS();
 
-	if (ShutdownRequestPending)
+	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down on receiving SIGINT"));
+		if (!AmLogicalSlotSyncWorkerProcess())
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("cannot continue replication slots synchronization"
+						   " as standby promotion is triggered"));
+		else
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker is shutting down on receiving SIGUSR1"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
 	}
 
 	if (ConfigReloadPending)
@@ -1290,9 +1394,6 @@ check_and_set_sync_info(pid_t worker_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
 	/*
 	 * Emit an error if startup process signaled the slot sync machinery to
 	 * stop. See comments atop SlotSyncCtxStruct.
@@ -1313,6 +1414,9 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The worker pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
@@ -1334,6 +1438,7 @@ reset_syncing_flag()
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1344,6 +1449,9 @@ reset_syncing_flag()
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1408,7 +1516,6 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/* Setup signal handling */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, SignalHandlerForShutdownRequest);
 	pqsignal(SIGTERM, die);
 	pqsignal(SIGFPE, FloatExceptionHandler);
 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
@@ -1505,17 +1612,34 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
 
 	/*
 	 * The slot sync worker can't get here because it will only stop when it
-	 * receives a SIGINT from the startup process, or when there is an error.
+	 * receives a SIGUSR1 from the startup process, or when there is an error.
 	 */
 	Assert(false);
 }
@@ -1542,7 +1666,7 @@ update_synced_slots_inactive_since(void)
 		return;
 
 	/* The slot sync worker or SQL function mustn't be running by now */
-	Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
+	Assert(!SlotSyncCtx->syncing);
 
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
@@ -1601,7 +1725,7 @@ ShutDownSlotSync(void)
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGINT);
+		kill(worker_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1741,20 +1865,95 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
+
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
+		check_and_set_sync_info(MyProcPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts();
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				slot_names = extract_slot_names(remote_slots);
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..92101e12cd6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 1627e619b1b..29f7d865171 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -211,21 +211,90 @@ is( $standby1->safe_psql(
 	'synchronized slot has got its own inactive_since');
 
 ##################################################
-# Test that the synchronized slot will be dropped if the corresponding remote
-# slot on the primary server has been dropped.
+# Test that the synchronized slots will be dropped if the corresponding remote
+# slots on the primary server has been dropped.
+# Note: Both slots need to be dropped for the next test to work
 ##################################################
 
 $primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub2_slot');");
+$primary->psql('postgres', "SELECT pg_drop_replication_slot('lsub1_slot');");
 
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
 is( $standby1->safe_psql(
 		'postgres',
-		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name = 'lsub2_slot';}
+		q{SELECT count(*) = 0 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot');}
 	),
 	"t",
 	'synchronized slot has been dropped');
 
+##################################################
+# Test that pg_sync_replication_slots() on the standby waits and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+##################################################
+
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
+$primary->wait_for_replay_catchup($standby1);
+
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $log_offset = -s $standby1->logfile;
+
+my $h = $standby1->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
+
+# Confirm that the slot could not be synced initially.
+$standby1->wait_for_log(
+    qr/could not synchronize replication slot \"lsub1_slot\"/,
+    $log_offset);
+
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby1->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
+# Verify that the logical failover slot is created on the standby,
+# marked as 'synced', and persisted.
+is( $standby1->safe_psql(
+		'postgres',
+		q{SELECT count(*) = 1 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot') AND synced AND NOT temporary;}
+	),
+	"t",
+	'logical slots are synced after API retry on standby');
+
+# Drop the subscription and the tables created, create the lsub1_slot slot again so that it can
+# be used later.
+$subscriber1->safe_psql('postgres',"DROP SUBSCRIPTION regress_mysub1");
+$primary->safe_psql('postgres',"DROP TABLE push_wal");
+$subscriber1->safe_psql('postgres',"DROP TABLE push_wal");
+$primary->psql('postgres',
+	q{SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);}
+);
+
 ##################################################
 # Test that if the synchronized slot is invalidated while the remote slot is
 # still valid, the slot will be dropped and re-created on the standby by
@@ -281,7 +350,7 @@ $inactive_since_on_primary =
 # the failover slots.
 $primary->wait_for_replay_catchup($standby1);
 
-my $log_offset = -s $standby1->logfile;
+$log_offset = -s $standby1->logfile;
 
 # Synchronize the primary server slots to the standby.
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
@@ -554,7 +623,7 @@ $standby1->reload;
 # Confirm that slot sync worker acknowledge the GUC change and logs the msg
 # about wrong configuration.
 $standby1->wait_for_log(
-	qr/slot synchronization worker will restart because of a parameter change/,
+	qr/replication slot synchronization will stop because of a parameter change/,
 	$log_offset);
 $standby1->wait_for_log(
 	qr/slot synchronization requires "hot_standby_feedback" to be enabled/,
-- 
2.47.3

#110

shveta malik

shveta.malik@gmail.com

about 2 months ago

In reply to: Ajin Cherian (#109)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Mon, Nov 24, 2025 at 12:12 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v25 addressing the above comments.

The patch needs rebase due to recent commit 76b78721.

thanks
Shveta

#111

shveta malik

shveta.malik@gmail.com

about 2 months ago

In reply to: shveta malik (#110)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

A few comments:

1)
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+ WalReceiverConn *wrconn;
+ List *slot_names;
+} SlotSyncApiFailureParams;
+

We can get rid of it now as we do not use it.

2)
ProcessSlotSyncInterrupts():

+ if (!AmLogicalSlotSyncWorkerProcess())
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot continue replication slots synchronization"
+    " as standby promotion is triggered"));
+ else
+ {

Can we please reverse the if-else i.e. first worker and then API.
Negated if-condition can be avoided in this case.

slotsync_reread_config():
+ /* Worker-specific check for sync_replication_slots change */

Now since we check for both API and worker, this comment is not needed.

4)
- ereport(LOG,
- errmsg("replication slot synchronization worker will restart because
of a parameter change"));

- /*
- * Reset the last-start time for this worker so that the postmaster
- * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
- */
- SlotSyncCtx->last_start_time = 0;
+ ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
+ errmsg("replication slot synchronization will stop because of a
parameter change"));

Here, we should retain the same old message for worker i.e. 'worker
will restart..'. instead of 'synchronization will stop'. I find the
old message better in this case.

5)
slotsync_reread_config() is slightly difficult to follow.
I think in the case of API, we can display a common error message
instead of 2 different messages for 'sync_replication_slot change' and
the rest of the parameters. We can mark if any of the parameters
changed in both 'if' blocks and if the current process has not exited,
then at the end based on 'parameter-changed', we can deal with API by
giving a common message. Something like:

/*
* If we have reached here with a parameter change, we must be running
* in SQL function, emit error in such a case.
*/
if (parameter_changed (new variable))
{
Assert (!AmLogicalSlotSyncWorkerProcess);
ereport(ERROR,
errmsg("replication slot synchronization will stop because of a
parameter change"));
}

thanks
Shveta

#112

Ajin Cherian

itsajin@gmail.com

about 2 months ago

In reply to: shveta malik (#111)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Nov 26, 2025 at 7:42 PM shveta malik <shveta.malik@gmail.com> wrote:

A few comments:

1)
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+ WalReceiverConn *wrconn;
+ List *slot_names;
+} SlotSyncApiFailureParams;
+

We can get rid of it now as we do not use it.

Removed.

2)
ProcessSlotSyncInterrupts():
+ if (!AmLogicalSlotSyncWorkerProcess())
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot continue replication slots synchronization"
+    " as standby promotion is triggered"));
+ else
+ {
Can we please reverse the if-else i.e. first worker and then API.
Negated if-condition can be avoided in this case.

Changed.

3)

slotsync_reread_config():
+ /* Worker-specific check for sync_replication_slots change */

Now since we check for both API and worker, this comment is not needed.

Removed.

4)
- ereport(LOG,
- errmsg("replication slot synchronization worker will restart because
of a parameter change"));
- /*
- * Reset the last-start time for this worker so that the postmaster
- * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
- */
- SlotSyncCtx->last_start_time = 0;
+ ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
+ errmsg("replication slot synchronization will stop because of a
parameter change"));
Here, we should retain the same old message for worker i.e. 'worker
will restart..'. instead of 'synchronization will stop'. I find the
old message better in this case.

5)
slotsync_reread_config() is slightly difficult to follow.
I think in the case of API, we can display a common error message
instead of 2 different messages for 'sync_replication_slot change' and
the rest of the parameters. We can mark if any of the parameters
changed in both 'if' blocks and if the current process has not exited,
then at the end based on 'parameter-changed', we can deal with API by
giving a common message. Something like:

/*
* If we have reached here with a parameter change, we must be running
* in SQL function, emit error in such a case.
*/
if (parameter_changed (new variable))
{
Assert (!AmLogicalSlotSyncWorkerProcess);
ereport(ERROR,
errmsg("replication slot synchronization will stop because of a
parameter change"));
}

Fixed as above.

I've addressed the above comments as well as rebased the patch based
on changes in commit 76b7872 in patch v26

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v26-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v26-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 17b02491ccf73f0d01d2fc3aa3492281d571f114 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Fri, 28 Nov 2025 15:04:04 +1100
Subject: [PATCH v26] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.

Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  11 +-
 src/backend/replication/logical/slotsync.c    | 337 ++++++++++++++----
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  59 ++-
 5 files changed, 323 insertions(+), 90 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..33940504622 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,12 +405,11 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
       Therefore, it is the recommended method for synchronizing slots.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1f4f06d467b..76fd8ff2dea 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication is used to sync the slots, and if
+ * the slots are not ready to be synced and are marked as RS_TEMPORARY because
+ * of any of the reasons mentioned above, then the SQL function also waits and
+ * retries until the slots are marked as RS_PERSISTENT (which means sync-ready).
+ * Refer to the comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -563,11 +570,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -591,7 +602,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -606,6 +623,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -629,10 +650,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr = GetStandbyFlushRecPtr(NULL);
@@ -734,7 +759,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -831,7 +857,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 			return false;
 		}
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -842,15 +869,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -859,29 +894,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -932,7 +983,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -952,12 +1002,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -973,19 +1049,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1162,13 +1231,14 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
-slotsync_reread_config(void)
+slotsync_reread_config()
 {
 	char	   *old_primary_conninfo = pstrdup(PrimaryConnInfo);
 	char	   *old_primary_slotname = pstrdup(PrimarySlotName);
@@ -1176,39 +1246,69 @@ slotsync_reread_config(void)
 	bool		old_hot_standby_feedback = hot_standby_feedback;
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
+	bool		worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;
 
-	Assert(sync_replication_slots);
+	if (AmLogicalSlotSyncWorkerProcess())
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
 
 	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
 	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
 	pfree(old_primary_conninfo);
 	pfree(old_primary_slotname);
 
+	/* check for sync_replication_slots change */
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
+		if (worker)
+		{
+			ereport(LOG,
+					/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+						   "sync_replication_slots"));
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
 	}
 
+	/* Check for parameter changes common to both API and worker */
 	if (conninfo_changed ||
 		primary_slotname_changed ||
 		(old_hot_standby_feedback != hot_standby_feedback))
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+		if (worker)
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		proc_exit(0);
+			/*
+			 * Reset the last-start time for this worker so that the postmaster
+			 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+			 */
+			SlotSyncCtx->last_start_time = 0;
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
+	}
+
+	/*
+	 * If we have reached here with a parameter change, we must be running in SQL function,
+	 * emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert (!worker);
+		ereport(ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
 
 }
@@ -1217,16 +1317,24 @@ slotsync_reread_config(void)
  * Interrupt handler for main loop of slot sync worker.
  */
 static void
-ProcessSlotSyncInterrupts(void)
+ProcessSlotSyncInterrupts()
 {
 	CHECK_FOR_INTERRUPTS();
 
-	if (ShutdownRequestPending)
+	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down on receiving SIGINT"));
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker is shutting down on receiving SIGUSR1"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
+		else
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("cannot continue replication slots synchronization"
+						   " as standby promotion is triggered"));
 	}
 
 	if (ConfigReloadPending)
@@ -1337,9 +1445,6 @@ check_and_set_sync_info(pid_t worker_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
 	/*
 	 * Emit an error if startup process signaled the slot sync machinery to
 	 * stop. See comments atop SlotSyncCtxStruct.
@@ -1360,6 +1465,9 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The worker pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
@@ -1381,6 +1489,7 @@ reset_syncing_flag()
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1391,6 +1500,9 @@ reset_syncing_flag()
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1455,7 +1567,6 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/* Setup signal handling */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, SignalHandlerForShutdownRequest);
 	pqsignal(SIGTERM, die);
 	pqsignal(SIGFPE, FloatExceptionHandler);
 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
@@ -1552,17 +1663,34 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
 
 	/*
 	 * The slot sync worker can't get here because it will only stop when it
-	 * receives a SIGINT from the startup process, or when there is an error.
+	 * receives a SIGUSR1 from the startup process, or when there is an error.
 	 */
 	Assert(false);
 }
@@ -1589,7 +1717,7 @@ update_synced_slots_inactive_since(void)
 		return;
 
 	/* The slot sync worker or SQL function mustn't be running by now */
-	Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
+	Assert(!SlotSyncCtx->syncing);
 
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
@@ -1648,7 +1776,7 @@ ShutDownSlotSync(void)
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGINT);
+		kill(worker_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1788,20 +1916,95 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
+
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
+		check_and_set_sync_info(MyProcPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts();
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				slot_names = extract_slot_names(remote_slots);
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..92101e12cd6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 7d3c82e0a29..abbb5ea5490 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1000,6 +1000,12 @@ $primary->psql(
 ));
 
 $subscriber2->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+$subscriber1->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub1;');
+
+# Remove the standby from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
 
 # Verify that all slots have been removed except the one necessary for standby2,
 # which is needed for further testing.
@@ -1016,34 +1022,47 @@ $primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
 $primary->wait_for_replay_catchup($standby2);
 
 ##################################################
-# Verify that slotsync skip statistics are correctly updated when the
+# Test that pg_sync_replication_slots() on the standby skips and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+# Also verify that slotsync skip statistics are correctly updated when the
 # slotsync operation is skipped.
 ##################################################
 
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
-	'postgres', qq(
-	SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-	CREATE TABLE wal_push(a int);
-));
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	TRUNCATE tab_int;
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
 $primary->wait_for_replay_catchup($standby2);
 
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
 $log_offset = -s $standby2->logfile;
 
-# Enable slot sync worker
+# Enable standby for slot synchronization
 $standby2->append_conf(
-	'postgresql.conf', qq(
+    'postgresql.conf', qq(
 hot_standby_feedback = on
 primary_conninfo = '$connstr_1 dbname=postgres'
 log_min_messages = 'debug2'
-sync_replication_slots = on
 ));
 
 $standby2->reload;
 
-# Confirm that the slot sync worker is able to start.
-$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+my $h = $standby2->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
 
 # Confirm that the slot sync is skipped due to the remote slot lagging behind
 $standby2->wait_for_log(
@@ -1055,4 +1074,18 @@ $result = $standby2->safe_psql('postgres',
 );
 is($result, 't', "check slot sync skip count increments");
 
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby2->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
 done_testing();
-- 
2.47.3

#113

Japin Li

japinli@hotmail.com

about 2 months ago

In reply to: Ajin Cherian (#112)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, 28 Nov 2025 at 15:46, Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Nov 26, 2025 at 7:42 PM shveta malik <shveta.malik@gmail.com>
wrote:
A few comments:
1)
+/*
+ * Structure holding parameters that need to be freed on error in
+ * pg_sync_replication_slots()
+ */
+typedef struct SlotSyncApiFailureParams
+{
+ WalReceiverConn *wrconn;
+ List *slot_names;
+} SlotSyncApiFailureParams;
+
We can get rid of it now as we do not use it.
Removed.
2)
ProcessSlotSyncInterrupts():
+ if (!AmLogicalSlotSyncWorkerProcess())
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot continue replication slots synchronization"
+    " as standby promotion is triggered"));
+ else
+ {
Can we please reverse the if-else i.e. first worker and then API.
Negated if-condition can be avoided in this case.
Changed.

3)

slotsync_reread_config():
+ /* Worker-specific check for sync_replication_slots change */

Now since we check for both API and worker, this comment is not
needed.

Removed.
4)
- ereport(LOG,
- errmsg("replication slot synchronization worker will restart
because
of a parameter change"));
- /*
- * Reset the last-start time for this worker so that the postmaster
- * can restart it without waiting for
SLOTSYNC_RESTART_INTERVAL_SEC.
- */
- SlotSyncCtx->last_start_time = 0;
+ ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
+ errmsg("replication slot synchronization will stop because of a
parameter change"));
Here, we should retain the same old message for worker i.e. 'worker
will restart..'. instead of 'synchronization will stop'. I find the
old message better in this case.

5)
slotsync_reread_config() is slightly difficult to follow.
I think in the case of API, we can display a common error message
instead of 2 different messages for 'sync_replication_slot change'
and
the rest of the parameters. We can mark if any of the parameters
changed in both 'if' blocks and if the current process has not
exited,
then at the end based on 'parameter-changed', we can deal with API
by
giving a common message. Something like:

/*
* If we have reached here with a parameter change, we must be
running
* in SQL function, emit error in such a case.
*/
if (parameter_changed (new variable))
{
Assert (!AmLogicalSlotSyncWorkerProcess);
ereport(ERROR,
errmsg("replication slot synchronization will stop because of a
parameter change"));
}
Fixed as above.

I've addressed the above comments as well as rebased the patch based
on changes in commit 76b7872 in patch v26

1.
Initialize slot_persistence_pending to false (to avoid uninitialized values, or
initialize to true by mistaken) in update_and_persist_local_synced_slot(). This
aligns with the handling of found_consistent_snapshot and remote_slot_precedes
in update_local_synced_slot().

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 20eada3393..c55ba11f17 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -617,6 +617,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
        bool            found_consistent_snapshot = false;
        bool            remote_slot_precedes = false;

+       if (slot_persistence_pending)
+               *slot_persistence_pending = false;
+
        /* Slotsync skip stats are handled in function update_local_synced_slot() */
        (void) update_local_synced_slot(remote_slot, remote_dbid,
                                                                        &found_consistent_snapshot,

2.
This change seems unnecessary。

 static void
-slotsync_reread_config(void)
+slotsync_reread_config()

 static void
-ProcessSlotSyncInterrupts(void)
+ProcessSlotSyncInterrupts()

3.
Since we are already caching the result of AmLogicalSlotSyncWorkerProcess() in
a local worker variable, how about applying this replacement:
s/if (AmLogicalSlotSyncWorkerProcess())/if (worker)/g?

+	bool		worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;

-	Assert(sync_replication_slots);
+	if (AmLogicalSlotSyncWorkerProcess())
+		Assert(sync_replication_slots);

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

#114

shveta malik

shveta.malik@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#112)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Nov 28, 2025 at 10:16 AM Ajin Cherian <itsajin@gmail.com> wrote:

Fixed as above.

I've addressed the above comments as well as rebased the patch based
on changes in commit 76b7872 in patch v26

Thanks for the patch. Please find a few trivial comments:

1)
+ if (AmLogicalSlotSyncWorkerProcess())
+ Assert(sync_replication_slots);

Here too we can use 'worker'.

2)
+ /* check for sync_replication_slots change */

check --> Check

3)
Assert (!worker)
Extra space in between.

4)
check_and_set_sync_info() and ShutDownSlotSync() refers to the pid as
worker_pid. But now it could be backend-pid as well.
Using 'worker' in this variable could be misleading. Shall we make it
sync_process_pid?

5)
/*
* Interrupt handler for main loop of slot sync worker.
*/
static void
ProcessSlotSyncInterrupts()

We can modify the comment to include API as well.

6)
/*
* Shut down the slot sync worker.
*
* This function sends signal to shutdown slot sync worker, if required. It
* also waits till the slot sync worker has exited or
* pg_sync_replication_slots() has finished.
*/
void
ShutDownSlotSync(void)

We should change comments to give details on API as well.

7)
+# Remove the standby from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;

We can update the comment to below for better clarity:
Remove the dropped sb1_slot from the ...

8)
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.

We can move these comments atop:
my $h = $standby2->background_psql('postgres', on_error_stop => 0);

thanks
Shveta

#115

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: shveta malik (#114)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Nov 28, 2025 at 5:03 PM Japin Li <japinli@hotmail.com> wrote:

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 20eada3393..c55ba11f17 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -617,6 +617,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
bool            found_consistent_snapshot = false;
bool            remote_slot_precedes = false;

+       if (slot_persistence_pending)
+               *slot_persistence_pending = false;
+
/* Slotsync skip stats are handled in function update_local_synced_slot() */
(void) update_local_synced_slot(remote_slot, remote_dbid,
&found_consistent_snapshot,

I don't understand what the comment is here.

2.
This change seems unnecessary。

static void
-slotsync_reread_config(void)
+slotsync_reread_config()

static void
-ProcessSlotSyncInterrupts(void)
+ProcessSlotSyncInterrupts()

Fixed.

3.
Since we are already caching the result of AmLogicalSlotSyncWorkerProcess() in
a local worker variable, how about applying this replacement:
s/if (AmLogicalSlotSyncWorkerProcess())/if (worker)/g?
+       bool            worker = AmLogicalSlotSyncWorkerProcess();
+       bool            parameter_changed = false;
-       Assert(sync_replication_slots);
+       if (AmLogicalSlotSyncWorkerProcess())
+               Assert(sync_replication_slots);

Fixed.

On Fri, Nov 28, 2025 at 5:25 PM shveta malik <shveta.malik@gmail.com> wrote:

Thanks for the patch. Please find a few trivial comments:
1)
+ if (AmLogicalSlotSyncWorkerProcess())
+ Assert(sync_replication_slots);
Here too we can use 'worker'.

Fixed.

2)
+ /* check for sync_replication_slots change */

check --> Check

Fixed.

3)
Assert (!worker)
Extra space in between.

Fixed

4)
check_and_set_sync_info() and ShutDownSlotSync() refers to the pid as
worker_pid. But now it could be backend-pid as well.
Using 'worker' in this variable could be misleading. Shall we make it
sync_process_pid?

Changed.

5)
/*
* Interrupt handler for main loop of slot sync worker.
*/
static void
ProcessSlotSyncInterrupts()

We can modify the comment to include API as well.

Changed.

6)
/*
* Shut down the slot sync worker.
*
* This function sends signal to shutdown slot sync worker, if required. It
* also waits till the slot sync worker has exited or
* pg_sync_replication_slots() has finished.
*/
void
ShutDownSlotSync(void)

We should change comments to give details on API as well.

changed.

7)
+# Remove the standby from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
We can update the comment to below for better clarity:
Remove the dropped sb1_slot from the ...

Changed.

8)
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.

We can move these comments atop:
my $h = $standby2->background_psql('postgres', on_error_stop => 0);

Changed.

Attached patch v27 addresses the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v27-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v27-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From ae74c15cfb4349a94284dea108fbe7a632e202d1 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 2 Dec 2025 18:27:25 +1100
Subject: [PATCH v27] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.

Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  11 +-
 src/backend/replication/logical/slotsync.c    | 350 ++++++++++++++----
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  59 ++-
 5 files changed, 330 insertions(+), 96 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..33940504622 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,12 +405,11 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
       Therefore, it is the recommended method for synchronizing slots.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 53c7d629239..a57d38735c5 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication is used to sync the slots, and if
+ * the slots are not ready to be synced and are marked as RS_TEMPORARY because
+ * of any of the reasons mentioned above, then the SQL function also waits and
+ * retries until the slots are marked as RS_PERSISTENT (which means sync-ready).
+ * Refer to the comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -596,11 +603,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -624,7 +635,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -639,6 +656,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -662,10 +683,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr = GetStandbyFlushRecPtr(NULL);
@@ -767,7 +792,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -864,7 +890,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 			return false;
 		}
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -875,15 +902,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -892,29 +927,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -965,7 +1016,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -985,12 +1035,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -1006,19 +1082,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1195,10 +1264,11 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
 slotsync_reread_config(void)
@@ -1209,57 +1279,96 @@ slotsync_reread_config(void)
 	bool		old_hot_standby_feedback = hot_standby_feedback;
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
+	bool		worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;
 
-	Assert(sync_replication_slots);
+	if (worker)
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
 
 	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
 	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
 	pfree(old_primary_conninfo);
 	pfree(old_primary_slotname);
 
+	/* Check for sync_replication_slots change */
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
+		if (worker)
+		{
+			ereport(LOG,
+					/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+						   "sync_replication_slots"));
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
 	}
 
+	/* Check for parameter changes common to both API and worker */
 	if (conninfo_changed ||
 		primary_slotname_changed ||
 		(old_hot_standby_feedback != hot_standby_feedback))
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+		if (worker)
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		proc_exit(0);
+			/*
+			 * Reset the last-start time for this worker so that the postmaster
+			 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+			 */
+			SlotSyncCtx->last_start_time = 0;
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
+	}
+
+	/*
+	 * If we have reached here with a parameter change, we must be running in SQL function,
+	 * emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert(!worker);
+		ereport(ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
 
 }
 
 /*
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for main loop of slot sync worker and
+ * SQL function pg_sync_replication_slots().
  */
 static void
 ProcessSlotSyncInterrupts(void)
 {
 	CHECK_FOR_INTERRUPTS();
 
-	if (ShutdownRequestPending)
+	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down on receiving SIGINT"));
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker is shutting down on receiving SIGUSR1"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
+		else
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("cannot continue replication slots synchronization"
+						   " as standby promotion is triggered"));
 	}
 
 	if (ConfigReloadPending)
@@ -1366,13 +1475,10 @@ wait_for_slot_activity(bool some_slot_updated)
  * Otherwise, advertise that a sync is in progress.
  */
 static void
-check_and_set_sync_info(pid_t worker_pid)
+check_and_set_sync_info(pid_t sync_process_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
 	/*
 	 * Emit an error if startup process signaled the slot sync machinery to
 	 * stop. See comments atop SlotSyncCtxStruct.
@@ -1393,13 +1499,16 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The worker pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
 	 * Advertise the required PID so that the startup process can kill the
-	 * slot sync worker on promotion.
+	 * slot sync process on promotion.
 	 */
-	SlotSyncCtx->pid = worker_pid;
+	SlotSyncCtx->pid = sync_process_pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
@@ -1414,6 +1523,7 @@ reset_syncing_flag()
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1424,6 +1534,9 @@ reset_syncing_flag()
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1488,7 +1601,6 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/* Setup signal handling */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, SignalHandlerForShutdownRequest);
 	pqsignal(SIGTERM, die);
 	pqsignal(SIGFPE, FloatExceptionHandler);
 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
@@ -1585,17 +1697,34 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
 
 	/*
 	 * The slot sync worker can't get here because it will only stop when it
-	 * receives a SIGINT from the startup process, or when there is an error.
+	 * receives a SIGUSR1 from the startup process, or when there is an error.
 	 */
 	Assert(false);
 }
@@ -1622,7 +1751,7 @@ update_synced_slots_inactive_since(void)
 		return;
 
 	/* The slot sync worker or SQL function mustn't be running by now */
-	Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
+	Assert(!SlotSyncCtx->syncing);
 
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
@@ -1652,14 +1781,14 @@ update_synced_slots_inactive_since(void)
 /*
  * Shut down the slot sync worker.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
+ * This function sends signal to shutdown the slot sync process, if required. It
  * also waits till the slot sync worker has exited or
  * pg_sync_replication_slots() has finished.
  */
 void
 ShutDownSlotSync(void)
 {
-	pid_t		worker_pid;
+	pid_t		sync_process_pid;
 
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
@@ -1676,12 +1805,12 @@ ShutDownSlotSync(void)
 		return;
 	}
 
-	worker_pid = SlotSyncCtx->pid;
+	sync_process_pid = SlotSyncCtx->pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
-	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGINT);
+	if (sync_process_pid != InvalidPid)
+		kill(sync_process_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1821,20 +1950,95 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
+
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
+		check_and_set_sync_info(MyProcPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts();
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				slot_names = extract_slot_names(remote_slots);
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..92101e12cd6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 25777fa188c..8f63bfbb977 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1000,6 +1000,12 @@ $primary->psql(
 ));
 
 $subscriber2->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+$subscriber1->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub1;');
+
+# Remove the dropped sb1_slot from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
 
 # Verify that all slots have been removed except the one necessary for standby2,
 # which is needed for further testing.
@@ -1016,34 +1022,47 @@ $primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
 $primary->wait_for_replay_catchup($standby2);
 
 ##################################################
-# Verify that slotsync skip statistics are correctly updated when the
+# Test that pg_sync_replication_slots() on the standby skips and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+# Also verify that slotsync skip statistics are correctly updated when the
 # slotsync operation is skipped.
 ##################################################
 
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
-	'postgres', qq(
-	SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-	CREATE TABLE wal_push(a int);
-));
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	TRUNCATE tab_int;
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
 $primary->wait_for_replay_catchup($standby2);
 
 $log_offset = -s $standby2->logfile;
 
-# Enable slot sync worker
+# Enable standby for slot synchronization
 $standby2->append_conf(
-	'postgresql.conf', qq(
+    'postgresql.conf', qq(
 hot_standby_feedback = on
 primary_conninfo = '$connstr_1 dbname=postgres'
 log_min_messages = 'debug2'
-sync_replication_slots = on
 ));
 
 $standby2->reload;
 
-# Confirm that the slot sync worker is able to start.
-$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $h = $standby2->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
 
 # Confirm that the slot sync is skipped due to the remote slot lagging behind
 $standby2->wait_for_log(
@@ -1061,4 +1080,18 @@ $result = $standby2->safe_psql('postgres',
 );
 is($result, 't', "check slot sync skip count increments");
 
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby2->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
 done_testing();
-- 
2.47.3

#116

shveta malik

shveta.malik@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#115)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Dec 2, 2025 at 1:58 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attached patch v27 addresses the above comments.

Thanks for the patch. Please find a few comments:

1)
+ /* The worker pid must not be already assigned in SlotSyncCtx */
+ Assert(SlotSyncCtx->pid == InvalidPid);
+

We can mention just 'pid' here instead of 'worker pid'

2)
+ /*
+ * The syscache access in fetch_or_refresh_remote_slots() needs a
+ * transaction env.
+ */
fetch_or_refresh_remote_slots --> fetch_remote_slots

3)
SyncReplicationSlots(WalReceiverConn *wrconn)
{
+

We can get rid of this blank line at the start of the function.

4)
 /*
  * Shut down the slot sync worker.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
+ * This function sends signal to shutdown the slot sync process, if
required. It
  * also waits till the slot sync worker has exited or
  * pg_sync_replication_slots() has finished.
  */
Shall we change comment to something like (rephrase if required):

Shut down the slot synchronization.
This function wakes up the slot sync process (either worker or backend
running SQL function) and sets stopSignaled=true
so that worker can exit or SQL function pg_sync_replication_slots()
can finish. It also waits till the slot sync worker has exited or
pg_sync_replication_slots() has finished.

5)
We should change the comment atop 'SlotSyncCtxStruct' as well to
mention that this pid is either the slot sync worker's pid or
backend's pid running the SQL function. It is needed by the startup
process to wake these up, so that they can stop synchronization on
seeing stopSignaled. <please rephrase as needed>

6)
+ ereport(LOG,
+ errmsg("replication slot synchronization worker is shutting down on
receiving SIGUSR1"));

SIGUSR1 was actually just a wake-up signal. We may change the comment to:
replication slot synchronization worker is shutting down as promotion
is triggered.

7)
update_synced_slots_inactive_since:
/* The slot sync worker or SQL function mustn't be running by now */
- Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
+ Assert(!SlotSyncCtx->syncing);

Regarding this, I see that 'update_synced_slots_inactive_since' is
only called when we are sure that 'syncing' is false. So shouldn't pid
be also Invalid by that time? Even if it was backend's pid to start
with, but since backend has stopped syncing (finished or error-ed
out),
pid should be reset to Invalid in such a case. And this Assert need
not to be changed.

+ if (sync_process_pid != InvalidPid)
+ kill(sync_process_pid, SIGUSR1);

We can write comments to say wake-up slot sync process.

thanks
Shveta

#117

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: shveta malik (#116)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Dec 2, 2025 at 8:35 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Dec 2, 2025 at 1:58 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attached patch v27 addresses the above comments.

Thanks for the patch. Please find a few comments:
1)
+ /* The worker pid must not be already assigned in SlotSyncCtx */
+ Assert(SlotSyncCtx->pid == InvalidPid);
+
We can mention just 'pid' here instead of 'worker pid'

Changed.

2)
+ /*
+ * The syscache access in fetch_or_refresh_remote_slots() needs a
+ * transaction env.
+ */
fetch_or_refresh_remote_slots --> fetch_remote_slots
3)
SyncReplicationSlots(WalReceiverConn *wrconn)
{
+

We can get rid of this blank line at the start of the function.

Fixed.

4)
/*
* Shut down the slot sync worker.
*
- * This function sends signal to shutdown slot sync worker, if required. It
+ * This function sends signal to shutdown the slot sync process, if
required. It
* also waits till the slot sync worker has exited or
* pg_sync_replication_slots() has finished.
*/
Shall we change comment to something like (rephrase if required):
Shut down the slot synchronization.
This function wakes up the slot sync process (either worker or backend
running SQL function) and sets stopSignaled=true
so that worker can exit or SQL function pg_sync_replication_slots()
can finish. It also waits till the slot sync worker has exited or
pg_sync_replication_slots() has finished.

Changed.

5)
We should change the comment atop 'SlotSyncCtxStruct' as well to
mention that this pid is either the slot sync worker's pid or
backend's pid running the SQL function. It is needed by the startup
process to wake these up, so that they can stop synchronization on
seeing stopSignaled. <please rephrase as needed>

Changed.

6)
+ ereport(LOG,
+ errmsg("replication slot synchronization worker is shutting down on
receiving SIGUSR1"));
SIGUSR1 was actually just a wake-up signal. We may change the comment to:
replication slot synchronization worker is shutting down as promotion
is triggered.

Changed.

7)
update_synced_slots_inactive_since:
/* The slot sync worker or SQL function mustn't be running by now */
- Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
+ Assert(!SlotSyncCtx->syncing);
Regarding this, I see that 'update_synced_slots_inactive_since' is
only called when we are sure that 'syncing' is false. So shouldn't pid
be also Invalid by that time? Even if it was backend's pid to start
with, but since backend has stopped syncing (finished or error-ed
out),
pid should be reset to Invalid in such a case. And this Assert need
not to be changed.

Fixed.

8)

+ if (sync_process_pid != InvalidPid)
+ kill(sync_process_pid, SIGUSR1);

We can write comments to say wake-up slot sync process.

Added comments.

Attaching patch v28 addressing these comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v28-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v28-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 60d18dd19bed14e85673a73a01fc1e5a8df46921 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 3 Dec 2025 14:16:44 +1100
Subject: [PATCH v28] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.

Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  11 +-
 src/backend/replication/logical/slotsync.c    | 364 ++++++++++++++----
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  59 ++-
 5 files changed, 339 insertions(+), 101 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..33940504622 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,12 +405,11 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
       Therefore, it is the recommended method for synchronizing slots.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 53c7d629239..36106dd35e1 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication is used to sync the slots, and if
+ * the slots are not ready to be synced and are marked as RS_TEMPORARY because
+ * of any of the reasons mentioned above, then the SQL function also waits and
+ * retries until the slots are marked as RS_PERSISTENT (which means sync-ready).
+ * Refer to the comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -71,9 +78,11 @@
 /*
  * Struct for sharing information to control slot synchronization.
  *
- * The slot sync worker's pid is needed by the startup process to shut it
- * down during promotion. The startup process shuts down the slot sync worker
- * and also sets stopSignaled=true to handle the race condition when the
+ * The pid is either the slot sync worker's pid or the backend's pid running
+ * the SQL function pg_sync_replication_slots(). It is needed by the startup
+ * process to wake these up, so that they can stop synchronization on seeing
+ * stopSignaled on promotion.
+ * Setting stopSignaled is also used to handle the race condition when the
  * postmaster has not noticed the promotion yet and thus may end up restarting
  * the slot sync worker. If stopSignaled is set, the worker will exit in such a
  * case. The SQL function pg_sync_replication_slots() will also error out if
@@ -596,11 +605,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -624,7 +637,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -639,6 +658,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -662,10 +685,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr = GetStandbyFlushRecPtr(NULL);
@@ -767,7 +794,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -864,7 +892,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 			return false;
 		}
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -875,15 +904,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -892,29 +929,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -965,7 +1018,6 @@ synchronize_slots(WalReceiverConn *wrconn)
 		remote_slot->invalidated = isnull ? RS_INVAL_NONE :
 			GetSlotInvalidationCause(TextDatumGetCString(d));
 
-		/* Sanity check */
 		Assert(col == SLOTSYNC_COLUMN_COUNT);
 
 		/*
@@ -985,12 +1037,38 @@ synchronize_slots(WalReceiverConn *wrconn)
 			remote_slot->invalidated == RS_INVAL_NONE)
 			pfree(remote_slot);
 		else
-			/* Create list of remote slots */
 			remote_slot_list = lappend(remote_slot_list, remote_slot);
 
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -1006,19 +1084,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1195,10 +1266,11 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
 slotsync_reread_config(void)
@@ -1209,57 +1281,96 @@ slotsync_reread_config(void)
 	bool		old_hot_standby_feedback = hot_standby_feedback;
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
+	bool		worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;
 
-	Assert(sync_replication_slots);
+	if (worker)
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
 
 	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
 	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
 	pfree(old_primary_conninfo);
 	pfree(old_primary_slotname);
 
+	/* Check for sync_replication_slots change */
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
+		if (worker)
+		{
+			ereport(LOG,
+					/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+						   "sync_replication_slots"));
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
 	}
 
+	/* Check for parameter changes common to both API and worker */
 	if (conninfo_changed ||
 		primary_slotname_changed ||
 		(old_hot_standby_feedback != hot_standby_feedback))
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+		if (worker)
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		proc_exit(0);
+			/*
+			 * Reset the last-start time for this worker so that the postmaster
+			 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+			 */
+			SlotSyncCtx->last_start_time = 0;
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
+	}
+
+	/*
+	 * If we have reached here with a parameter change, we must be running in SQL function,
+	 * emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert(!worker);
+		ereport(ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
 
 }
 
 /*
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for main loop of slot sync worker and
+ * SQL function pg_sync_replication_slots().
  */
 static void
 ProcessSlotSyncInterrupts(void)
 {
 	CHECK_FOR_INTERRUPTS();
 
-	if (ShutdownRequestPending)
+	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down on receiving SIGINT"));
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker is shutting down as promotion is triggered"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
+		else
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("cannot continue replication slots synchronization"
+						   " as standby promotion is triggered"));
 	}
 
 	if (ConfigReloadPending)
@@ -1366,13 +1477,10 @@ wait_for_slot_activity(bool some_slot_updated)
  * Otherwise, advertise that a sync is in progress.
  */
 static void
-check_and_set_sync_info(pid_t worker_pid)
+check_and_set_sync_info(pid_t sync_process_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
 	/*
 	 * Emit an error if startup process signaled the slot sync machinery to
 	 * stop. See comments atop SlotSyncCtxStruct.
@@ -1393,13 +1501,16 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
 	 * Advertise the required PID so that the startup process can kill the
-	 * slot sync worker on promotion.
+	 * slot sync process on promotion.
 	 */
-	SlotSyncCtx->pid = worker_pid;
+	SlotSyncCtx->pid = sync_process_pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
@@ -1414,6 +1525,7 @@ reset_syncing_flag()
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1424,6 +1536,9 @@ reset_syncing_flag()
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1488,7 +1603,6 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/* Setup signal handling */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, SignalHandlerForShutdownRequest);
 	pqsignal(SIGTERM, die);
 	pqsignal(SIGFPE, FloatExceptionHandler);
 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
@@ -1585,17 +1699,34 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
 
 	/*
 	 * The slot sync worker can't get here because it will only stop when it
-	 * receives a SIGINT from the startup process, or when there is an error.
+	 * receives a SIGUSR1 from the startup process, or when there is an error.
 	 */
 	Assert(false);
 }
@@ -1650,16 +1781,18 @@ update_synced_slots_inactive_since(void)
 }
 
 /*
- * Shut down the slot sync worker.
+ * Shut down slot synchronization.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
- * also waits till the slot sync worker has exited or
- * pg_sync_replication_slots() has finished.
+ * This function wakes up the slot sync process (either worker or backend
+ * running SQL function pg_sync_replication_slots()) and sets
+ * stopSignaled=true so that worker can exit or SQL function
+ * pg_sync_replication_slots() can finish. It also waits tll the slot sync
+ * worker has exited or pg_sync_replication_slots() has finished.
  */
 void
 ShutDownSlotSync(void)
 {
-	pid_t		worker_pid;
+	pid_t		sync_process_pid;
 
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
@@ -1676,12 +1809,13 @@ ShutDownSlotSync(void)
 		return;
 	}
 
-	worker_pid = SlotSyncCtx->pid;
+	sync_process_pid = SlotSyncCtx->pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
-	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGINT);
+	/* Wake up slot sync process */
+	if (sync_process_pid != InvalidPid)
+		kill(sync_process_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1821,20 +1955,94 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
+		check_and_set_sync_info(MyProcPid);
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts();
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				slot_names = extract_slot_names(remote_slots);
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..92101e12cd6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 25777fa188c..8f63bfbb977 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1000,6 +1000,12 @@ $primary->psql(
 ));
 
 $subscriber2->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+$subscriber1->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub1;');
+
+# Remove the dropped sb1_slot from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
 
 # Verify that all slots have been removed except the one necessary for standby2,
 # which is needed for further testing.
@@ -1016,34 +1022,47 @@ $primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
 $primary->wait_for_replay_catchup($standby2);
 
 ##################################################
-# Verify that slotsync skip statistics are correctly updated when the
+# Test that pg_sync_replication_slots() on the standby skips and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+# Also verify that slotsync skip statistics are correctly updated when the
 # slotsync operation is skipped.
 ##################################################
 
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
-	'postgres', qq(
-	SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-	CREATE TABLE wal_push(a int);
-));
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	TRUNCATE tab_int;
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
 $primary->wait_for_replay_catchup($standby2);
 
 $log_offset = -s $standby2->logfile;
 
-# Enable slot sync worker
+# Enable standby for slot synchronization
 $standby2->append_conf(
-	'postgresql.conf', qq(
+    'postgresql.conf', qq(
 hot_standby_feedback = on
 primary_conninfo = '$connstr_1 dbname=postgres'
 log_min_messages = 'debug2'
-sync_replication_slots = on
 ));
 
 $standby2->reload;
 
-# Confirm that the slot sync worker is able to start.
-$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $h = $standby2->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
 
 # Confirm that the slot sync is skipped due to the remote slot lagging behind
 $standby2->wait_for_log(
@@ -1061,4 +1080,18 @@ $result = $standby2->safe_psql('postgres',
 );
 is($result, 't', "check slot sync skip count increments");
 
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby2->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
 done_testing();
-- 
2.47.3

#118

shveta malik

shveta.malik@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#117)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 3, 2025 at 8:51 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Tue, Dec 2, 2025 at 8:35 PM shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Dec 2, 2025 at 1:58 PM Ajin Cherian <itsajin@gmail.com> wrote:

Attached patch v27 addresses the above comments.

Thanks for the patch. Please find a few comments:
1)
+ /* The worker pid must not be already assigned in SlotSyncCtx */
+ Assert(SlotSyncCtx->pid == InvalidPid);
+
We can mention just 'pid' here instead of 'worker pid'
Changed.
2)
+ /*
+ * The syscache access in fetch_or_refresh_remote_slots() needs a
+ * transaction env.
+ */
fetch_or_refresh_remote_slots --> fetch_remote_slots
3)
SyncReplicationSlots(WalReceiverConn *wrconn)
{
+

We can get rid of this blank line at the start of the function.
Fixed.
4)
/*
* Shut down the slot sync worker.
*
- * This function sends signal to shutdown slot sync worker, if required. It
+ * This function sends signal to shutdown the slot sync process, if
required. It
* also waits till the slot sync worker has exited or
* pg_sync_replication_slots() has finished.
*/
Shall we change comment to something like (rephrase if required):
Shut down the slot synchronization.
This function wakes up the slot sync process (either worker or backend
running SQL function) and sets stopSignaled=true
so that worker can exit or SQL function pg_sync_replication_slots()
can finish. It also waits till the slot sync worker has exited or
pg_sync_replication_slots() has finished.
Changed.

5)
We should change the comment atop 'SlotSyncCtxStruct' as well to
mention that this pid is either the slot sync worker's pid or
backend's pid running the SQL function. It is needed by the startup
process to wake these up, so that they can stop synchronization on
seeing stopSignaled. <please rephrase as needed>

Changed.
6)
+ ereport(LOG,
+ errmsg("replication slot synchronization worker is shutting down on
receiving SIGUSR1"));
SIGUSR1 was actually just a wake-up signal. We may change the comment to:
replication slot synchronization worker is shutting down as promotion
is triggered.
Changed.
7)
update_synced_slots_inactive_since:
/* The slot sync worker or SQL function mustn't be running by now */
- Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
+ Assert(!SlotSyncCtx->syncing);
Regarding this, I see that 'update_synced_slots_inactive_since' is
only called when we are sure that 'syncing' is false. So shouldn't pid
be also Invalid by that time? Even if it was backend's pid to start
with, but since backend has stopped syncing (finished or error-ed
out),
pid should be reset to Invalid in such a case. And this Assert need
not to be changed.
Fixed.

8)

+ if (sync_process_pid != InvalidPid)
+ kill(sync_process_pid, SIGUSR1);

We can write comments to say wake-up slot sync process.

Added comments.

Attaching patch v28 addressing these comments.

Thanks for the patch. A few trivial comments:

1)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot continue replication slots synchronization"
+    " as standby promotion is triggered"));

slots->slot, since in every error-message we use 'replication slot
synchronization'

2)
+ * The pid is either the slot sync worker's pid or the backend's pid running
+ * the SQL function pg_sync_replication_slots(). It is needed by the startup
+ * process to wake these up, so that they can stop synchronization on seeing
+ * stopSignaled on promotion.
+ * Setting stopSignaled is also used to handle the race condition when the

Can we rephrase slightly to indicate clearly that it is the startup
process which sets 'stopSignaled' during promotion. Suggestion:

The pid is either the slot sync worker’s pid or the backend’s pid running
the SQL function pg_sync_replication_slots(). When the startup process
sets stopSignaled during promotion, it uses this pid to wake the
currently synchronizing process so that the process can immediately stop its
synchronization work upon seeing stopSignaled set to true.
Setting stopSignaled....

thanks
Shveta

#119

Japin Li

japinli@hotmail.com

about 1 month ago

In reply to: Ajin Cherian (#115)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, 02 Dec 2025 at 19:27, Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Nov 28, 2025 at 5:03 PM Japin Li <japinli@hotmail.com> wrote:
1.
Initialize slot_persistence_pending to false (to avoid uninitialized values, or
initialize to true by mistaken) in update_and_persist_local_synced_slot(). This
aligns with the handling of found_consistent_snapshot and remote_slot_precedes
in update_local_synced_slot().
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 20eada3393..c55ba11f17 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -617,6 +617,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
bool            found_consistent_snapshot = false;
bool            remote_slot_precedes = false;
+       if (slot_persistence_pending)
+               *slot_persistence_pending = false;
+
/* Slotsync skip stats are handled in function update_local_synced_slot() */
(void) update_local_synced_slot(remote_slot, remote_dbid,
&found_consistent_snapshot,
I don't understand what the comment is here.

I mean, we should always set the slot_persistence_pending variable to false
immediately when entering the update_and_persist_local_synced_slot() function.

For example:

bool slot_persistence_pending = true;

update_and_persist_local_synced_slot(..., &slot_persistence_pending);

/* Here the slot_persistence_pending is always true, is this expected? */

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

#120

Amit Kapila

amit.kapila16@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#117)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 3, 2025 at 8:51 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v28 addressing these comments.

Can we extract the part of the patch that handles SIGUSR1 signal
separately as a first patch and the remaining as a second patch?
Please do mention the reason in the commit message as to why we are
changing the signal for SIGINT to SIGUSR1.

--
With Regards,
Amit Kapila.

#121

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: Amit Kapila (#120)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 3, 2025 at 10:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 3, 2025 at 8:51 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v28 addressing these comments.

Can we extract the part of the patch that handles SIGUSR1 signal
separately as a first patch and the remaining as a second patch?
Please do mention the reason in the commit message as to why we are
changing the signal for SIGINT to SIGUSR1.

I have extracted out the SIGUSR1 signal handling changes separately
into a patch and sharing. I will share the next patch later.
Let me know if there are any comments for this patch.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v29-0001-Use-SIGUSR1-to-interrupt-slot-synchronization-wo.patchapplication/octet-stream; name=v29-0001-Use-SIGUSR1-to-interrupt-slot-synchronization-wo.patchDownload

From 2ecc4a7658c795435edd07323378536bfeaac8a6 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Thu, 4 Dec 2025 16:01:25 +1100
Subject: [PATCH v29] Use SIGUSR1 to interrupt slot synchronization workers and
 backends.

Previously, during promotion, only the slot synchronization worker was
interrupted with SIGINT to shutdown for promotion. That meant backends
that perform slot synchronization via the pg_sync_replication_slots()
SQL function were not signalled at all because their PIDs were not
recorded in the slot-sync context.

This patch changes behaviour to:
1. Store the backend PID in SlotSyncCtxStruct so the backend performing
   slot synchronization can be signalled.
2. On promotion, send SIGUSR1 (not SIGINT) to the recorded PID - either
   the slot-sync worker or any backend currently syncing slots.
3. Backends invoking pg_sync_replication_slots() also calls
   ProcessSlotSyncInterrupts() to handle promotion signal as well any
   configuration changes that might result in stopping of
   synchronization.

This patch also acts as a base for a larger patch that improves
pg_sync_replication_slots() to wait for slots to be persisted before
exiting.

Author: Ajin Cherian <itsajin@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 src/backend/replication/logical/slotsync.c | 143 +++++++++++++--------
 1 file changed, 90 insertions(+), 53 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 31d7cb3ca77..359b64f163e 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -71,9 +71,12 @@
 /*
  * Struct for sharing information to control slot synchronization.
  *
- * The slot sync worker's pid is needed by the startup process to shut it
- * down during promotion. The startup process shuts down the slot sync worker
- * and also sets stopSignaled=true to handle the race condition when the
+ * The pid is either the slot sync worker's pid or the backend's pid running
+ * the SQL function pg_sync_replication_slots(). When the startup process sets
+ * stopSignaled during promotion, it uses this pid to wake up the currently
+ * synchronizing process so that the process can immediately stop its
+ * synchronizing work on seeing stopSignaled set.
+ * Setting stopSignaled is also used to handle the race condition when the
  * postmaster has not noticed the promotion yet and thus may end up restarting
  * the slot sync worker. If stopSignaled is set, the worker will exit in such a
  * case. The SQL function pg_sync_replication_slots() will also error out if
@@ -1195,10 +1198,11 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
 slotsync_reread_config(void)
@@ -1209,57 +1213,96 @@ slotsync_reread_config(void)
 	bool		old_hot_standby_feedback = hot_standby_feedback;
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
+	bool		worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;
 
-	Assert(sync_replication_slots);
+	if (worker)
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
 
 	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
 	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
 	pfree(old_primary_conninfo);
 	pfree(old_primary_slotname);
 
+	/* Check for sync_replication_slots change */
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
+		if (worker)
+		{
+			ereport(LOG,
+					/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+						   "sync_replication_slots"));
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
 	}
 
+	/* Check for parameter changes common to both API and worker */
 	if (conninfo_changed ||
 		primary_slotname_changed ||
 		(old_hot_standby_feedback != hot_standby_feedback))
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+		if (worker)
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		proc_exit(0);
+			/*
+			 * Reset the last-start time for this worker so that the postmaster
+			 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+			 */
+			SlotSyncCtx->last_start_time = 0;
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
+	}
+
+	/*
+	 * If we have reached here with a parameter change, we must be running in SQL function,
+	 * emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert(!worker);
+		ereport(ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
 
 }
 
 /*
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for main loop of slot sync worker and
+ * SQL function pg_sync_replication_slots().
  */
 static void
 ProcessSlotSyncInterrupts(void)
 {
 	CHECK_FOR_INTERRUPTS();
 
-	if (ShutdownRequestPending)
+	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down on receiving SIGINT"));
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker is shutting down as promotion is triggered"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
+		else
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("cannot continue replication slot synchronization"
+						   " as standby promotion is triggered"));
 	}
 
 	if (ConfigReloadPending)
@@ -1366,25 +1409,10 @@ wait_for_slot_activity(bool some_slot_updated)
  * Otherwise, advertise that a sync is in progress.
  */
 static void
-check_and_set_sync_info(pid_t worker_pid)
+check_and_set_sync_info(pid_t sync_process_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
-	/*
-	 * Emit an error if startup process signaled the slot sync machinery to
-	 * stop. See comments atop SlotSyncCtxStruct.
-	 */
-	if (SlotSyncCtx->stopSignaled)
-	{
-		SpinLockRelease(&SlotSyncCtx->mutex);
-		ereport(ERROR,
-				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				errmsg("cannot synchronize replication slots when standby promotion is ongoing"));
-	}
-
 	if (SlotSyncCtx->syncing)
 	{
 		SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1393,13 +1421,16 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
 	 * Advertise the required PID so that the startup process can kill the
-	 * slot sync worker on promotion.
+	 * slot sync process on promotion.
 	 */
-	SlotSyncCtx->pid = worker_pid;
+	SlotSyncCtx->pid = sync_process_pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
@@ -1414,6 +1445,7 @@ reset_syncing_flag(void)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1488,7 +1520,6 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/* Setup signal handling */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, SignalHandlerForShutdownRequest);
 	pqsignal(SIGTERM, die);
 	pqsignal(SIGFPE, FloatExceptionHandler);
 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
@@ -1595,7 +1626,7 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/*
 	 * The slot sync worker can't get here because it will only stop when it
-	 * receives a SIGINT from the startup process, or when there is an error.
+	 * receives a SIGUSR1 from the startup process, or when there is an error.
 	 */
 	Assert(false);
 }
@@ -1650,16 +1681,18 @@ update_synced_slots_inactive_since(void)
 }
 
 /*
- * Shut down the slot sync worker.
+ * Shut down slot synchronization.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
- * also waits till the slot sync worker has exited or
- * pg_sync_replication_slots() has finished.
+ * This function wakes up the slot sync process (either worker or backend
+ * running SQL function pg_sync_replication_slots()) and sets
+ * stopSignaled=true so that worker can exit or SQL function
+ * pg_sync_replication_slots() can finish. It also waits tll the slot sync
+ * worker has exited or pg_sync_replication_slots() has finished.
  */
 void
 ShutDownSlotSync(void)
 {
-	pid_t		worker_pid;
+	pid_t		sync_process_pid;
 
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
@@ -1676,12 +1709,13 @@ ShutDownSlotSync(void)
 		return;
 	}
 
-	worker_pid = SlotSyncCtx->pid;
+	sync_process_pid = SlotSyncCtx->pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
-	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGINT);
+	/* Wake up slot sync process */
+	if (sync_process_pid != InvalidPid)
+		kill(sync_process_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1830,7 +1864,10 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		check_and_set_sync_info(MyProcPid);
+
+		/* Check for interrupts and config changes */
+		ProcessSlotSyncInterrupts();
 
 		validate_remote_info(wrconn);
 
-- 
2.47.3

#122

shveta malik

shveta.malik@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#121)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Dec 4, 2025 at 10:51 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Dec 3, 2025 at 10:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 3, 2025 at 8:51 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching patch v28 addressing these comments.

Can we extract the part of the patch that handles SIGUSR1 signal
separately as a first patch and the remaining as a second patch?
Please do mention the reason in the commit message as to why we are
changing the signal for SIGINT to SIGUSR1.

I have extracted out the SIGUSR1 signal handling changes separately
into a patch and sharing. I will share the next patch later.
Let me know if there are any comments for this patch.

I have just 2 trivial comments for v29-001:

1)
-   * receives a SIGINT from the startup process, or when there is an error.
+   * receives a SIGUSR1 from the startup process, or when there is an error.

In above we should mention stopSignaled rather than SIGUSR1, as
SIGUSR1 is just a wakeup signal and not termination signal.

 2)
+    else
+      ereport(ERROR,
+          errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+          errmsg("cannot continue replication slot synchronization"
+               " as standby promotion is triggered"));

Please mention that it is SQL-function in the comment for else-block.

I tested the touched scenarios and here are the LOGs:

a)
When promotion is ongoing and the startup process has terminated
slot-sync worker but if the postmaster has not noticed that, it may
end up starting slotsync worker again. For that scenario, we get
these:

11:03:19.712 IST [151559] LOG: replication slot synchronization
worker is shutting down as promotion is triggered
11:03:19.726 IST [151629] LOG: slot sync worker started
11:03:19.795 IST [151629] LOG: replication slot synchronization
worker is shutting down as promotion is triggered

b)
On promotion, API gets this (originating from ProcessSlotSyncInterrupts now):
postgres=# SELECT pg_sync_replication_slots();
ERROR: cannot continue replication slot synchronization as standby
promotion is triggered

c)
If any parameter is changed between ValidateSlotSyncParams() and
ProcessSlotSyncInterrupts() for API, we get this:
postgres=# SELECT pg_sync_replication_slots();
ERROR: replication slot synchronization will stop because of a parameter change

--on re-run (originating from ValidateSlotSyncParams())
postgres=# SELECT pg_sync_replication_slots();
ERROR: replication slot synchronization requires
"hot_standby_feedback" to be enabled

The tested scenarios' behaviour looks good to me.

thanks
Shveta

#123

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: shveta malik (#122)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Dec 4, 2025 at 5:04 PM shveta malik <shveta.malik@gmail.com> wrote:

I have just 2 trivial comments for v29-001:
1)
-   * receives a SIGINT from the startup process, or when there is an error.
+   * receives a SIGUSR1 from the startup process, or when there is an error.
In above we should mention stopSignaled rather than SIGUSR1, as
SIGUSR1 is just a wakeup signal and not termination signal.

Fixed.

2)
+    else
+      ereport(ERROR,
+          errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+          errmsg("cannot continue replication slot synchronization"
+               " as standby promotion is triggered"));

Please mention that it is SQL-function in the comment for else-block.

Fixed.

~~

I tested the touched scenarios and here are the LOGs:

Thanks for testing!

Attaching patch v30 with the above changes addressed. I've also run
pgindent on the changes.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v30-0001-Use-SIGUSR1-to-interrupt-slot-synchronization-wo.patchapplication/octet-stream; name=v30-0001-Use-SIGUSR1-to-interrupt-slot-synchronization-wo.patchDownload

From fbfd2df94ee7738cccca73429e43d2a217c292bc Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Thu, 4 Dec 2025 19:12:21 +1100
Subject: [PATCH v30] Use SIGUSR1 to interrupt slot synchronization workers and
 backends.

Previously, during promotion, only the slot synchronization worker was
interrupted with SIGINT to shutdown for promotion. That meant backends
that perform slot synchronization via the pg_sync_replication_slots()
SQL function were not signalled at all because their PIDs were not
recorded in the slot-sync context.

This patch changes behaviour to:
1. Store the backend PID in SlotSyncCtxStruct so the backend performing
   slot synchronization can be signalled.
2. On promotion, send SIGUSR1 (not SIGINT) to the recorded PID - either
   the slot-sync worker or any backend currently syncing slots.
3. Backends invoking pg_sync_replication_slots() also calls
   ProcessSlotSyncInterrupts() to handle promotion signal as well any
   configuration changes that might result in stopping of
   synchronization.

This patch also acts as a base for a larger patch that improves
pg_sync_replication_slots() to wait for slots to be persisted before
exiting.

Author: Ajin Cherian <itsajin@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 src/backend/replication/logical/slotsync.c | 148 +++++++++++++--------
 1 file changed, 95 insertions(+), 53 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 31d7cb3ca77..64457f71de0 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -71,9 +71,12 @@
 /*
  * Struct for sharing information to control slot synchronization.
  *
- * The slot sync worker's pid is needed by the startup process to shut it
- * down during promotion. The startup process shuts down the slot sync worker
- * and also sets stopSignaled=true to handle the race condition when the
+ * The pid is either the slot sync worker's pid or the backend's pid running
+ * the SQL function pg_sync_replication_slots(). When the startup process sets
+ * stopSignaled during promotion, it uses this pid to wake up the currently
+ * synchronizing process so that the process can immediately stop its
+ * synchronizing work on seeing stopSignaled set.
+ * Setting stopSignaled is also used to handle the race condition when the
  * postmaster has not noticed the promotion yet and thus may end up restarting
  * the slot sync worker. If stopSignaled is set, the worker will exit in such a
  * case. The SQL function pg_sync_replication_slots() will also error out if
@@ -1195,10 +1198,11 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
 slotsync_reread_config(void)
@@ -1209,57 +1213,100 @@ slotsync_reread_config(void)
 	bool		old_hot_standby_feedback = hot_standby_feedback;
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
+	bool		worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;
 
-	Assert(sync_replication_slots);
+	if (worker)
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
 
 	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
 	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
 	pfree(old_primary_conninfo);
 	pfree(old_primary_slotname);
 
+	/* Check for sync_replication_slots change */
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
+		if (worker)
+		{
+			ereport(LOG,
+			/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+						   "sync_replication_slots"));
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
 	}
 
+	/* Check for parameter changes common to both API and worker */
 	if (conninfo_changed ||
 		primary_slotname_changed ||
 		(old_hot_standby_feedback != hot_standby_feedback))
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+		if (worker)
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		proc_exit(0);
+			/*
+			 * Reset the last-start time for this worker so that the
+			 * postmaster can restart it without waiting for
+			 * SLOTSYNC_RESTART_INTERVAL_SEC.
+			 */
+			SlotSyncCtx->last_start_time = 0;
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
+	}
+
+	/*
+	 * If we have reached here with a parameter change, we must be running in
+	 * SQL function, emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert(!worker);
+		ereport(ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
 
 }
 
 /*
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for main loop of slot sync worker and
+ * SQL function pg_sync_replication_slots().
  */
 static void
 ProcessSlotSyncInterrupts(void)
 {
 	CHECK_FOR_INTERRUPTS();
 
-	if (ShutdownRequestPending)
+	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down on receiving SIGINT"));
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker is shutting down as promotion is triggered"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
+		else
+		{
+			/* For SQL function */
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("cannot continue replication slot synchronization"
+						   " as standby promotion is triggered"));
+		}
 	}
 
 	if (ConfigReloadPending)
@@ -1366,25 +1413,10 @@ wait_for_slot_activity(bool some_slot_updated)
  * Otherwise, advertise that a sync is in progress.
  */
 static void
-check_and_set_sync_info(pid_t worker_pid)
+check_and_set_sync_info(pid_t sync_process_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
-	/*
-	 * Emit an error if startup process signaled the slot sync machinery to
-	 * stop. See comments atop SlotSyncCtxStruct.
-	 */
-	if (SlotSyncCtx->stopSignaled)
-	{
-		SpinLockRelease(&SlotSyncCtx->mutex);
-		ereport(ERROR,
-				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				errmsg("cannot synchronize replication slots when standby promotion is ongoing"));
-	}
-
 	if (SlotSyncCtx->syncing)
 	{
 		SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1393,13 +1425,16 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
 	 * Advertise the required PID so that the startup process can kill the
-	 * slot sync worker on promotion.
+	 * slot sync process on promotion.
 	 */
-	SlotSyncCtx->pid = worker_pid;
+	SlotSyncCtx->pid = sync_process_pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
@@ -1414,6 +1449,7 @@ reset_syncing_flag(void)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1488,7 +1524,6 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/* Setup signal handling */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, SignalHandlerForShutdownRequest);
 	pqsignal(SIGTERM, die);
 	pqsignal(SIGFPE, FloatExceptionHandler);
 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
@@ -1595,7 +1630,8 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/*
 	 * The slot sync worker can't get here because it will only stop when it
-	 * receives a SIGINT from the startup process, or when there is an error.
+	 * receives a stopSignaled from the startup process, or when there is an
+	 * error.
 	 */
 	Assert(false);
 }
@@ -1650,16 +1686,18 @@ update_synced_slots_inactive_since(void)
 }
 
 /*
- * Shut down the slot sync worker.
+ * Shut down slot synchronization.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
- * also waits till the slot sync worker has exited or
- * pg_sync_replication_slots() has finished.
+ * This function wakes up the slot sync process (either worker or backend
+ * running SQL function pg_sync_replication_slots()) and sets
+ * stopSignaled=true so that worker can exit or SQL function
+ * pg_sync_replication_slots() can finish. It also waits tll the slot sync
+ * worker has exited or pg_sync_replication_slots() has finished.
  */
 void
 ShutDownSlotSync(void)
 {
-	pid_t		worker_pid;
+	pid_t		sync_process_pid;
 
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
@@ -1676,12 +1714,13 @@ ShutDownSlotSync(void)
 		return;
 	}
 
-	worker_pid = SlotSyncCtx->pid;
+	sync_process_pid = SlotSyncCtx->pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
-	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGINT);
+	/* Wake up slot sync process */
+	if (sync_process_pid != InvalidPid)
+		kill(sync_process_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1830,7 +1869,10 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		check_and_set_sync_info(MyProcPid);
+
+		/* Check for interrupts and config changes */
+		ProcessSlotSyncInterrupts();
 
 		validate_remote_info(wrconn);
 
-- 
2.47.3

#124

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#123)

2 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Hi all,

Since commit 04396ea [1]https://github.com/postgres/postgres/commit/04396eacd3faeaa4fa3d084a6749e4e384bdf0db has been pushed, which included part of the
changes this patch set was addressing, I have updated and rebased the
patch set to incorporate those changes.

The patch set now contains two patches:

0001 – Modify the pg_sync_replication_slots API to also handle
promotion signals and stop synchronization, similar to the slot sync
worker.
0002 – Improve pg_sync_replication_slots to wait for and persist slots
until they are sync-ready.

Please review the updated patch set (v31).

Regards,
Ajin Cherian
Fujitsu Australia

[1]: https://github.com/postgres/postgres/commit/04396eacd3faeaa4fa3d084a6749e4e384bdf0db

Attachments:

v31-0001-Signal-backends-running-pg_sync_replication_slot.patchapplication/octet-stream; name=v31-0001-Signal-backends-running-pg_sync_replication_slot.patchDownload

From 2764cc594b7b8dc947b6c66193691853de5e83f3 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 9 Dec 2025 21:11:31 +1100
Subject: [PATCH v31 1/2] Signal backends running pg_sync_replication_slots()
 during promotion.

Previously, during promotion, only the slot synchronization worker was
interrupted to shutdown for promotion. That meant backends
that perform slot synchronization via the pg_sync_replication_slots()
SQL function were not signalled at all because their PIDs were not
recorded in the slot-sync context.

This patch changes behaviour to:
1. Store the backend PID in SlotSyncCtxStruct so the backend performing
   slot synchronization can be signalled.
2. On promotion, send SIGUSR1 to the recorded PID - either
   the slot-sync worker or any backend currently syncing slots.
3. Backends invoking pg_sync_replication_slots() also calls
   ProcessSlotSyncInterrupts() to handle promotion signal as well any
   configuration changes that might result in stopping of
   synchronization.

This patch also acts as a base for a larger patch that improves
pg_sync_replication_slots() to wait for slots to be persisted before
exiting.

Author: Ajin Cherian <itsajin@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 src/backend/replication/logical/slotsync.c | 145 +++++++++++++--------
 1 file changed, 93 insertions(+), 52 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 7e3b4c4413e..327bb361d61 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -71,9 +71,12 @@
 /*
  * Struct for sharing information to control slot synchronization.
  *
- * The slot sync worker's pid is needed by the startup process to shut it
- * down during promotion. The startup process shuts down the slot sync worker
- * and also sets stopSignaled=true to handle the race condition when the
+ * The pid is either the slot sync worker's pid or the backend's pid running
+ * the SQL function pg_sync_replication_slots(). When the startup process sets
+ * stopSignaled during promotion, it uses this pid to wake up the currently
+ * synchronizing process so that the process can immediately stop its
+ * synchronizing work on seeing stopSignaled set.
+ * Setting stopSignaled is also used to handle the race condition when the
  * postmaster has not noticed the promotion yet and thus may end up restarting
  * the slot sync worker. If stopSignaled is set, the worker will exit in such a
  * case. The SQL function pg_sync_replication_slots() will also error out if
@@ -1195,10 +1198,11 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
 slotsync_reread_config(void)
@@ -1209,45 +1213,77 @@ slotsync_reread_config(void)
 	bool		old_hot_standby_feedback = hot_standby_feedback;
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
+	bool		worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;
 
-	Assert(sync_replication_slots);
+	if (worker)
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
 
 	conninfo_changed = strcmp(old_primary_conninfo, PrimaryConnInfo) != 0;
 	primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
 	pfree(old_primary_conninfo);
 	pfree(old_primary_slotname);
 
+	/* Check for sync_replication_slots change */
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
+		if (worker)
+		{
+			ereport(LOG,
+			/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+						   "sync_replication_slots"));
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
 	}
 
+	/* Check for parameter changes common to both API and worker */
 	if (conninfo_changed ||
 		primary_slotname_changed ||
 		(old_hot_standby_feedback != hot_standby_feedback))
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+		if (worker)
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		proc_exit(0);
+			/*
+			 * Reset the last-start time for this worker so that the
+			 * postmaster can restart it without waiting for
+			 * SLOTSYNC_RESTART_INTERVAL_SEC.
+			 */
+			SlotSyncCtx->last_start_time = 0;
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
+	}
+
+	/*
+	 * If we have reached here with a parameter change, we must be running in
+	 * SQL function, emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert(!worker);
+		ereport(ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
 
 }
 
 /*
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for main loop of slot sync worker and
+ * SQL function pg_sync_replication_slots().
  */
 static void
 ProcessSlotSyncInterrupts(void)
@@ -1256,10 +1292,20 @@ ProcessSlotSyncInterrupts(void)
 
 	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down because promotion is triggered"));
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker is shutting down because promotion is triggered"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
+		else
+		{
+			/* For SQL function */
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("replication slot synchronization will stop because promotion is triggered"));
+		}
 	}
 
 	if (ConfigReloadPending)
@@ -1366,25 +1412,10 @@ wait_for_slot_activity(bool some_slot_updated)
  * Otherwise, advertise that a sync is in progress.
  */
 static void
-check_and_set_sync_info(pid_t worker_pid)
+check_and_set_sync_info(pid_t sync_process_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
-	/*
-	 * Emit an error if startup process signaled the slot sync machinery to
-	 * stop. See comments atop SlotSyncCtxStruct.
-	 */
-	if (SlotSyncCtx->stopSignaled)
-	{
-		SpinLockRelease(&SlotSyncCtx->mutex);
-		ereport(ERROR,
-				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				errmsg("cannot synchronize replication slots when standby promotion is ongoing"));
-	}
-
 	if (SlotSyncCtx->syncing)
 	{
 		SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1393,13 +1424,16 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
 	 * Advertise the required PID so that the startup process can kill the
-	 * slot sync worker on promotion.
+	 * slot sync process on promotion.
 	 */
-	SlotSyncCtx->pid = worker_pid;
+	SlotSyncCtx->pid = sync_process_pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
@@ -1414,6 +1448,7 @@ reset_syncing_flag(void)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1595,7 +1630,7 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 
 	/*
 	 * The slot sync worker can't get here because it will only stop when it
-	 * receives a stop request from the startup process, or when there is an
+	 * because receives a stop request from the startup process, or when there is an
 	 * error.
 	 */
 	Assert(false);
@@ -1651,16 +1686,18 @@ update_synced_slots_inactive_since(void)
 }
 
 /*
- * Shut down the slot sync worker.
+ * Shut down slot synchronization.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
- * also waits till the slot sync worker has exited or
+ * This function sets stopSignaled=true and wakes up the slot sync process
+ * (either worker or backend running SQL function pg_sync_replication_slots())
+ * so that worker can exit or SQL function pg_sync_replication_slots() can
+ * finish. It also waits till the slot sync worker has exited or
  * pg_sync_replication_slots() has finished.
  */
 void
 ShutDownSlotSync(void)
 {
-	pid_t		worker_pid;
+	pid_t		sync_process_pid;
 
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
@@ -1677,16 +1714,17 @@ ShutDownSlotSync(void)
 		return;
 	}
 
-	worker_pid = SlotSyncCtx->pid;
+	sync_process_pid = SlotSyncCtx->pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	/*
-	 * Signal slotsync worker if it was still running. The worker will stop
-	 * upon detecting that the stopSignaled flag is set to true.
+	 * Signal slotsync worker or backend process running pg_sync_replication_slots()
+	 * if running. The process will stop upon detecting that the stopSignaled
+	 * flag is set to true.
 	 */
-	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGUSR1);
+	if (sync_process_pid!= InvalidPid)
+		kill(sync_process_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1835,7 +1873,10 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		check_and_set_sync_info(MyProcPid);
+
+		/* Check for interrupts and config changes */
+		ProcessSlotSyncInterrupts();
 
 		validate_remote_info(wrconn);
 
-- 
2.47.3

v31-0002-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v31-0002-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 3164f51439820dc7a1f452be6e7e901a00cd1ae9 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Tue, 9 Dec 2025 21:20:57 +1100
Subject: [PATCH v31 2/2] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.

Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  11 +-
 src/backend/replication/logical/slotsync.c    | 235 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  59 ++++-
 5 files changed, 253 insertions(+), 58 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..33940504622 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,12 +405,11 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
       Therefore, it is the recommended method for synchronizing slots.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 327bb361d61..c44c6b1a5c5 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication is used to sync the slots, and if
+ * the slots are not ready to be synced and are marked as RS_TEMPORARY because
+ * of any of the reasons mentioned above, then the SQL function also waits and
+ * retries until the slots are marked as RS_PERSISTENT (which means sync-ready).
+ * Refer to the comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -599,11 +606,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -627,7 +638,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -642,6 +659,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -665,10 +686,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr = GetStandbyFlushRecPtr(NULL);
@@ -770,7 +795,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -867,7 +893,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 			return false;
 		}
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -878,15 +905,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -895,29 +930,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -994,6 +1045,33 @@ synchronize_slots(WalReceiverConn *wrconn)
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -1009,19 +1087,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1459,6 +1530,9 @@ reset_syncing_flag(void)
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1620,10 +1694,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1864,15 +1955,42 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(MyProcPid);
 
 		/* Check for interrupts and config changes */
@@ -1880,7 +1998,54 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts();
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				slot_names = extract_slot_names(remote_slots);
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 1e5e368a5dc..bfaa0eb0f25 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 25777fa188c..8f63bfbb977 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1000,6 +1000,12 @@ $primary->psql(
 ));
 
 $subscriber2->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+$subscriber1->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub1;');
+
+# Remove the dropped sb1_slot from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
 
 # Verify that all slots have been removed except the one necessary for standby2,
 # which is needed for further testing.
@@ -1016,34 +1022,47 @@ $primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
 $primary->wait_for_replay_catchup($standby2);
 
 ##################################################
-# Verify that slotsync skip statistics are correctly updated when the
+# Test that pg_sync_replication_slots() on the standby skips and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+# Also verify that slotsync skip statistics are correctly updated when the
 # slotsync operation is skipped.
 ##################################################
 
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
-	'postgres', qq(
-	SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-	CREATE TABLE wal_push(a int);
-));
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	TRUNCATE tab_int;
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
 $primary->wait_for_replay_catchup($standby2);
 
 $log_offset = -s $standby2->logfile;
 
-# Enable slot sync worker
+# Enable standby for slot synchronization
 $standby2->append_conf(
-	'postgresql.conf', qq(
+    'postgresql.conf', qq(
 hot_standby_feedback = on
 primary_conninfo = '$connstr_1 dbname=postgres'
 log_min_messages = 'debug2'
-sync_replication_slots = on
 ));
 
 $standby2->reload;
 
-# Confirm that the slot sync worker is able to start.
-$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $h = $standby2->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
 
 # Confirm that the slot sync is skipped due to the remote slot lagging behind
 $standby2->wait_for_log(
@@ -1061,4 +1080,18 @@ $result = $standby2->safe_psql('postgres',
 );
 is($result, 't', "check slot sync skip count increments");
 
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby2->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
 done_testing();
-- 
2.47.3

#125

shveta malik

shveta.malik@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#124)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Dec 9, 2025 at 4:04 PM Ajin Cherian <itsajin@gmail.com> wrote:

Hi all,

Since commit 04396ea [1] has been pushed, which included part of the
changes this patch set was addressing, I have updated and rebased the
patch set to incorporate those changes.

The patch set now contains two patches:

0001 – Modify the pg_sync_replication_slots API to also handle
promotion signals and stop synchronization, similar to the slot sync
worker.
0002 – Improve pg_sync_replication_slots to wait for and persist slots
until they are sync-ready.

Please review the updated patch set (v31).

Thanks. Please find a few comments on 001:

1)
Commit message:
"That meant backends
that perform slot synchronization via the pg_sync_replication_slots()
SQL function were not signalled at all because their PIDs were not
recorded in the slot-sync context."

We should phrase it in the singular ('backend'), since multiple
backends cannot run simultaneously because concurrent sync is not
allowed.
Using the term 'their PIDs' gives an impression that there could be
multiple PIDs, which is not the case.

2)
primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
pfree(old_primary_conninfo);

This change to put blank line is not needed.

+ /* Check for sync_replication_slots change */
+ /* Check for parameter changes common to both API and worker */

IMO, these comments are not needed as it is self-explanatory. Even if
we plan to put these, both should be same, either both mentioning API
and worker or neither.

4)
- * receives a stop request from the startup process, or when there is an
+ * because receives a stop request from the startup process, or when
there is an

I think, this change is done by mistake.

5)
+ * Signal slotsync worker or backend process running
pg_sync_replication_slots()
+ * if running. The process will stop upon detecting that the stopSignaled
+ * flag is set to true.

Comment looks slightly odd. Shall we say:

Signal the slotsync worker or the backend process running
pg_sync_replication_slots(), if either one is active. The process...

thanks
Shveta

#126

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: shveta malik (#125)

2 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Tue, Dec 9, 2025 at 10:45 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Dec 9, 2025 at 4:04 PM Ajin Cherian <itsajin@gmail.com> wrote:

Hi all,

Since commit 04396ea [1] has been pushed, which included part of the
changes this patch set was addressing, I have updated and rebased the
patch set to incorporate those changes.

The patch set now contains two patches:

0001 – Modify the pg_sync_replication_slots API to also handle
promotion signals and stop synchronization, similar to the slot sync
worker.
0002 – Improve pg_sync_replication_slots to wait for and persist slots
until they are sync-ready.

Please review the updated patch set (v31).

Thanks. Please find a few comments on 001:

1)
Commit message:
"That meant backends
that perform slot synchronization via the pg_sync_replication_slots()
SQL function were not signalled at all because their PIDs were not
recorded in the slot-sync context."

We should phrase it in the singular ('backend'), since multiple
backends cannot run simultaneously because concurrent sync is not
allowed.
Using the term 'their PIDs' gives an impression that there could be
multiple PIDs, which is not the case.

Fixed.

2)
primary_slotname_changed = strcmp(old_primary_slotname, PrimarySlotName) != 0;
+
pfree(old_primary_conninfo);

This change to put blank line is not needed.

Removed.

3)
+ /* Check for sync_replication_slots change */
+ /* Check for parameter changes common to both API and worker */
IMO, these comments are not needed as it is self-explanatory. Even if
we plan to put these, both should be same, either both mentioning API
and worker or neither.

Removed.

4)
- * receives a stop request from the startup process, or when there is an
+ * because receives a stop request from the startup process, or when
there is an

I think, this change is done by mistake.

Yes, removed.

5)
+ * Signal slotsync worker or backend process running
pg_sync_replication_slots()
+ * if running. The process will stop upon detecting that the stopSignaled
+ * flag is set to true.
Comment looks slightly odd. Shall we say:

Signal the slotsync worker or the backend process running
pg_sync_replication_slots(), if either one is active. The process...

Changed.

Attaching patch v32 addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v32-0001-Signal-backends-running-pg_sync_replication_slot.patchapplication/octet-stream; name=v32-0001-Signal-backends-running-pg_sync_replication_slot.patchDownload

From 9f61c459a646ff9ea1f3c016af6c3f2941159dcf Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 10 Dec 2025 13:14:18 +1100
Subject: [PATCH v32 1/2] Signal backends running pg_sync_replication_slots()
 during promotion.

Previously, during promotion, only the slot synchronization worker was
interrupted to shutdown for promotion. That meant a backend
that performs slot synchronization via the pg_sync_replication_slots()
SQL function was not signalled at all because its PID was not
recorded in the slot-sync context.

This patch changes behaviour to:
1. Store the backend PID in SlotSyncCtxStruct so the backend performing
   slot synchronization can be signalled.
2. On promotion, send SIGUSR1 to the recorded PID - either
   the slot-sync worker or any backend currently syncing slots.
3. Backend invoking pg_sync_replication_slots() also calls
   ProcessSlotSyncInterrupts() to handle promotion signal as well any
   configuration changes that might result in stopping of
   synchronization.

This patch also acts as a base for a larger patch that improves
pg_sync_replication_slots() to wait for slots to be persisted before
exiting.

Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 src/backend/replication/logical/slotsync.c | 141 +++++++++++++--------
 1 file changed, 90 insertions(+), 51 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 9f92c21237e..1e7b131c0f8 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -71,9 +71,12 @@
 /*
  * Struct for sharing information to control slot synchronization.
  *
- * The slot sync worker's pid is needed by the startup process to shut it
- * down during promotion. The startup process shuts down the slot sync worker
- * and also sets stopSignaled=true to handle the race condition when the
+ * The pid is either the slot sync worker's pid or the backend's pid running
+ * the SQL function pg_sync_replication_slots(). When the startup process sets
+ * stopSignaled during promotion, it uses this pid to wake up the currently
+ * synchronizing process so that the process can immediately stop its
+ * synchronizing work on seeing stopSignaled set.
+ * Setting stopSignaled is also used to handle the race condition when the
  * postmaster has not noticed the promotion yet and thus may end up restarting
  * the slot sync worker. If stopSignaled is set, the worker will exit in such a
  * case. The SQL function pg_sync_replication_slots() will also error out if
@@ -1195,10 +1198,11 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
 slotsync_reread_config(void)
@@ -1209,8 +1213,11 @@ slotsync_reread_config(void)
 	bool		old_hot_standby_feedback = hot_standby_feedback;
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
+	bool		worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;
 
-	Assert(sync_replication_slots);
+	if (worker)
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1222,32 +1229,58 @@ slotsync_reread_config(void)
 
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
+		if (worker)
+		{
+			ereport(LOG,
+			/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+						   "sync_replication_slots"));
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
 	}
 
 	if (conninfo_changed ||
 		primary_slotname_changed ||
 		(old_hot_standby_feedback != hot_standby_feedback))
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+		if (worker)
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		proc_exit(0);
+			/*
+			 * Reset the last-start time for this worker so that the
+			 * postmaster can restart it without waiting for
+			 * SLOTSYNC_RESTART_INTERVAL_SEC.
+			 */
+			SlotSyncCtx->last_start_time = 0;
+
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
+	}
+
+	/*
+	 * If we have reached here with a parameter change, we must be running in
+	 * SQL function, emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert(!worker);
+		ereport(ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
 
 }
 
 /*
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for main loop of slot sync worker and
+ * SQL function pg_sync_replication_slots().
  */
 static void
 ProcessSlotSyncInterrupts(void)
@@ -1256,10 +1289,20 @@ ProcessSlotSyncInterrupts(void)
 
 	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down because promotion is triggered"));
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker is shutting down because promotion is triggered"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
+		else
+		{
+			/* For SQL function */
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("replication slot synchronization will stop because promotion is triggered"));
+		}
 	}
 
 	if (ConfigReloadPending)
@@ -1366,25 +1409,10 @@ wait_for_slot_activity(bool some_slot_updated)
  * Otherwise, advertise that a sync is in progress.
  */
 static void
-check_and_set_sync_info(pid_t worker_pid)
+check_and_set_sync_info(pid_t sync_process_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
-	/*
-	 * Emit an error if startup process signaled the slot sync machinery to
-	 * stop. See comments atop SlotSyncCtxStruct.
-	 */
-	if (SlotSyncCtx->stopSignaled)
-	{
-		SpinLockRelease(&SlotSyncCtx->mutex);
-		ereport(ERROR,
-				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				errmsg("cannot synchronize replication slots when standby promotion is ongoing"));
-	}
-
 	if (SlotSyncCtx->syncing)
 	{
 		SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1393,13 +1421,16 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
 	 * Advertise the required PID so that the startup process can kill the
-	 * slot sync worker on promotion.
+	 * slot sync process on promotion.
 	 */
-	SlotSyncCtx->pid = worker_pid;
+	SlotSyncCtx->pid = sync_process_pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
@@ -1414,6 +1445,7 @@ reset_syncing_flag(void)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1651,16 +1683,18 @@ update_synced_slots_inactive_since(void)
 }
 
 /*
- * Shut down the slot sync worker.
+ * Shut down slot synchronization.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
- * also waits till the slot sync worker has exited or
+ * This function sets stopSignaled=true and wakes up the slot sync process
+ * (either worker or backend running SQL function pg_sync_replication_slots())
+ * so that worker can exit or SQL function pg_sync_replication_slots() can
+ * finish. It also waits till the slot sync worker has exited or
  * pg_sync_replication_slots() has finished.
  */
 void
 ShutDownSlotSync(void)
 {
-	pid_t		worker_pid;
+	pid_t		sync_process_pid;
 
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
@@ -1677,16 +1711,18 @@ ShutDownSlotSync(void)
 		return;
 	}
 
-	worker_pid = SlotSyncCtx->pid;
+	sync_process_pid = SlotSyncCtx->pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	/*
-	 * Signal slotsync worker if it was still running. The worker will stop
-	 * upon detecting that the stopSignaled flag is set to true.
+	 * Signal slotsync worker or the backend process running
+	 * pg_sync_replication_slots(), if either one is active.
+	 * The process will stop upon detecting that the stopSignaled
+	 * flag is set to true.
 	 */
-	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGUSR1);
+	if (sync_process_pid!= InvalidPid)
+		kill(sync_process_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1835,7 +1871,10 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		check_and_set_sync_info(MyProcPid);
+
+		/* Check for interrupts and config changes */
+		ProcessSlotSyncInterrupts();
 
 		validate_remote_info(wrconn);
 
-- 
2.47.3

v32-0002-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v32-0002-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 499039b76f01b9f6022910b42d098889fce5ae0c Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 10 Dec 2025 13:16:09 +1100
Subject: [PATCH v32 2/2] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.

Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  11 +-
 src/backend/replication/logical/slotsync.c    | 235 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  59 ++++-
 5 files changed, 253 insertions(+), 58 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..33940504622 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,12 +405,11 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
       Therefore, it is the recommended method for synchronizing slots.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 1e7b131c0f8..777f0635ed0 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication is used to sync the slots, and if
+ * the slots are not ready to be synced and are marked as RS_TEMPORARY because
+ * of any of the reasons mentioned above, then the SQL function also waits and
+ * retries until the slots are marked as RS_PERSISTENT (which means sync-ready).
+ * Refer to the comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -599,11 +606,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -627,7 +638,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -642,6 +659,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -665,10 +686,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr = GetStandbyFlushRecPtr(NULL);
@@ -770,7 +795,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -867,7 +893,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 			return false;
 		}
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -878,15 +905,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -895,29 +930,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -994,6 +1045,33 @@ synchronize_slots(WalReceiverConn *wrconn)
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -1009,19 +1087,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1456,6 +1527,9 @@ reset_syncing_flag(void)
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1617,10 +1691,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1862,15 +1953,42 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(MyProcPid);
 
 		/* Check for interrupts and config changes */
@@ -1878,7 +1996,54 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts();
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				slot_names = extract_slot_names(remote_slots);
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index f39830dbb34..c0632bf901a 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 25777fa188c..8f63bfbb977 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1000,6 +1000,12 @@ $primary->psql(
 ));
 
 $subscriber2->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+$subscriber1->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub1;');
+
+# Remove the dropped sb1_slot from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
 
 # Verify that all slots have been removed except the one necessary for standby2,
 # which is needed for further testing.
@@ -1016,34 +1022,47 @@ $primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
 $primary->wait_for_replay_catchup($standby2);
 
 ##################################################
-# Verify that slotsync skip statistics are correctly updated when the
+# Test that pg_sync_replication_slots() on the standby skips and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+# Also verify that slotsync skip statistics are correctly updated when the
 # slotsync operation is skipped.
 ##################################################
 
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
-	'postgres', qq(
-	SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-	CREATE TABLE wal_push(a int);
-));
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	TRUNCATE tab_int;
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
 $primary->wait_for_replay_catchup($standby2);
 
 $log_offset = -s $standby2->logfile;
 
-# Enable slot sync worker
+# Enable standby for slot synchronization
 $standby2->append_conf(
-	'postgresql.conf', qq(
+    'postgresql.conf', qq(
 hot_standby_feedback = on
 primary_conninfo = '$connstr_1 dbname=postgres'
 log_min_messages = 'debug2'
-sync_replication_slots = on
 ));
 
 $standby2->reload;
 
-# Confirm that the slot sync worker is able to start.
-$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $h = $standby2->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
 
 # Confirm that the slot sync is skipped due to the remote slot lagging behind
 $standby2->wait_for_log(
@@ -1061,4 +1080,18 @@ $result = $standby2->safe_psql('postgres',
 );
 is($result, 't', "check slot sync skip count increments");
 
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby2->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
 done_testing();
-- 
2.47.3

#127

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#124)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Dec 9, 2025, at 18:33, Ajin Cherian <itsajin@gmail.com> wrote:

Hi all,

Since commit 04396ea [1] has been pushed, which included part of the
changes this patch set was addressing, I have updated and rebased the
patch set to incorporate those changes.

The patch set now contains two patches:

0001 – Modify the pg_sync_replication_slots API to also handle
promotion signals and stop synchronization, similar to the slot sync
worker.
0002 – Improve pg_sync_replication_slots to wait for and persist slots
until they are sync-ready.

Please review the updated patch set (v31).

Regards,
Ajin Cherian
Fujitsu Australia

[1] https://github.com/postgres/postgres/commit/04396eacd3faeaa4fa3d084a6749e4e384bdf0db
<v31-0001-Signal-backends-running-pg_sync_replication_slot.patch><v31-0002-Improve-initial-slot-synchronization-in-pg_sync_.patch>

Hi Ajin,

I’d like to revisit this patch, but looks like 04396eacd3faeaa4fa3d084a6749e4e384bdf0db has some conflicts to this patch. So can you please rebase this patch?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#128

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: Chao Li (#127)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 10, 2025 at 1:29 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi Ajin,

I’d like to revisit this patch, but looks like 04396eacd3faeaa4fa3d084a6749e4e384bdf0db has some conflicts to this patch. So can you please rebase this patch?

Best regards,
--

It's been rebased. Have a look at the latest version.

regards,
Ajin Cherian
Fujitsu Australia

#129

shveta malik

shveta.malik@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#128)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 10, 2025 at 8:10 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Dec 10, 2025 at 1:29 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi Ajin,

I’d like to revisit this patch, but looks like 04396eacd3faeaa4fa3d084a6749e4e384bdf0db has some conflicts to this patch. So can you please rebase this patch?

Best regards,
--

It's been rebased. Have a look at the latest version.

Few comments on 001:

1)
/*
* Emit an error if a promotion or a concurrent sync call is in progress.
* Otherwise, advertise that a sync is in progress.
*/
static void
check_and_set_sync_info

We need to change this comment because now this function does not
handle promotion case.

2)
+  if (sync_process_pid!= InvalidPid)
+    kill(sync_process_pid, SIGUSR1);

We need to have space between sync_process_pid and '!='

3)
+ * Exit or throw errors if relevant GUCs have changed depending on whether

errors->error

4)
In slotsync_reread_config(), even when we mark parameter_changed=true
in the first if-block, we still go to the second if-block which was
not needed. So shall we make second if-block as else-if to avoid
this? Thoughts?

5)
As discussed in [1]/messages/by-id/6AE56C64-F760-4CBD-BABF-72633D3F7B5E@gmail.com, we can make this change in ProcessSlotSyncInterrupts():

'replication slot synchronization worker is shutting down because
promotion is triggered'
to
'replication slot synchronization worker will stop because promotion
is triggered'

[1]: /messages/by-id/6AE56C64-F760-4CBD-BABF-72633D3F7B5E@gmail.com

thanks
Shveta

#130

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#128)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Dec 10, 2025, at 10:40, Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Dec 10, 2025 at 1:29 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi Ajin,

I’d like to revisit this patch, but looks like 04396eacd3faeaa4fa3d084a6749e4e384bdf0db has some conflicts to this patch. So can you please rebase this patch?

Best regards,
--

It's been rebased. Have a look at the latest version.

Here are some comments for v32.

1 - 0001
```
- * The slot sync worker's pid is needed by the startup process to shut it
- * down during promotion. The startup process shuts down the slot sync worker
- * and also sets stopSignaled=true to handle the race condition when the
+ * The pid is either the slot sync worker's pid or the backend's pid running
```

I think we should add single quotes for “pid” and “stopSignaled". Looking at other comment lines, structure fields all have single-quoted:
```
* The 'syncing' flag is needed to prevent concurrent slot syncs to avoid slot

* The 'last_start_time' is needed by postmaster to start the slot sync worker
```

2 - 0001 - the same code block as 1

I wonder how to distinct if the “pid” is a slot sync worker or a backend process?

3 - 0001
```
+ bool worker = AmLogicalSlotSyncWorkerProcess();
```

The variable name “worker” doesn’t indicate a bool type, maybe renamed to “is_slotsync_worker”.

4 - 0001
```
+	/*
+	 * If we have reached here with a parameter change, we must be running in
+	 * SQL function, emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert(!worker);
+		ereport(ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
```

The Assert(!worker) feels redundant, because it then immediately error out.

5 - 0001
```
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
```

Let’s change “slotsync” to “slot sync” because elsewhere comments all use “slot sync”, just to keep consistent.

6 - 0001
```
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for main loop of slot sync worker and
+ * SQL function pg_sync_replication_slots().
```

Missing “the” before “SQL function”. This comment applies to multiple places.

7 - 0001
```
+ if (sync_process_pid!= InvalidPid)
+ kill(sync_process_pid, SIGUSR1);
```

Nit: missing a white space before “!="

8 - 0002
```
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
```

The logic of appending “, “ can be slightly simplified as:
```
If (slot_names != NIL)
{
Const char *sep = “”;
appendStringInfoString(&query, " AND slot_name IN (“);
foreach_ptr(char, slot_name, slot_names)
{
appendStringInfo(&query, “%s'%s'", sep, slot_name);
sep = “, “;
}
}
```

That saves a “if” check and a appendStringInfoString().

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#131

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: Chao Li (#130)

2 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 10, 2025 at 3:05 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Dec 10, 2025 at 8:10 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Dec 10, 2025 at 1:29 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi Ajin,

I’d like to revisit this patch, but looks like 04396eacd3faeaa4fa3d084a6749e4e384bdf0db has some conflicts to this patch. So can you please rebase this patch?

Best regards,
--

It's been rebased. Have a look at the latest version.

Few comments on 001:

1)
/*
* Emit an error if a promotion or a concurrent sync call is in progress.
* Otherwise, advertise that a sync is in progress.
*/
static void
check_and_set_sync_info

We need to change this comment because now this function does not
handle promotion case.

Fixed.

2)
+  if (sync_process_pid!= InvalidPid)
+    kill(sync_process_pid, SIGUSR1);
We need to have space between sync_process_pid and '!='

Fixed.

3)
+ * Exit or throw errors if relevant GUCs have changed depending on whether

errors->error

Fixed.

4)
In slotsync_reread_config(), even when we mark parameter_changed=true
in the first if-block, we still go to the second if-block which was
not needed. So shall we make second if-block as else-if to avoid
this? Thoughts?

Changed as suggested.

5)
As discussed in [1], we can make this change in ProcessSlotSyncInterrupts():

'replication slot synchronization worker is shutting down because
promotion is triggered'
to
'replication slot synchronization worker will stop because promotion
is triggered'

[1]: /messages/by-id/6AE56C64-F760-4CBD-BABF-72633D3F7B5E@gmail.com

Changed.

On Wed, Dec 10, 2025 at 3:27 PM Chao Li <li.evan.chao@gmail.com> wrote:

Here are some comments for v32.
1 - 0001
```
- * The slot sync worker's pid is needed by the startup process to shut it
- * down during promotion. The startup process shuts down the slot sync worker
- * and also sets stopSignaled=true to handle the race condition when the
+ * The pid is either the slot sync worker's pid or the backend's pid running
```
I think we should add single quotes for “pid” and “stopSignaled". Looking at other comment lines, structure fields all have single-quoted:
```
* The 'syncing' flag is needed to prevent concurrent slot syncs to avoid slot

* The 'last_start_time' is needed by postmaster to start the slot sync worker
```

Changed as suggested.

2 - 0001 - the same code block as 1

I wonder how to distinct if the “pid” is a slot sync worker or a backend process?

No, there is now way currently and iis not really required with the
current logic.

3 - 0001
```
+ bool worker = AmLogicalSlotSyncWorkerProcess();
```

The variable name “worker” doesn’t indicate a bool type, maybe renamed to “is_slotsync_worker”.

Changed as suggested.

4 - 0001
```
+       /*
+        * If we have reached here with a parameter change, we must be running in
+        * SQL function, emit error in such a case.
+        */
+       if (parameter_changed)
+       {
+               Assert(!worker);
+               ereport(ERROR,
+                               errmsg("replication slot synchronization will stop because of a parameter change"));
}
```

The Assert(!worker) feels redundant, because it then immediately error out.

I don't think it is redundant as Asserts are used to catch unexpected
code paths in testing.

5 - 0001
```
+ * Exit or throw errors if relevant GUCs have changed depending on whether
+ * called from slotsync worker or from SQL function pg_sync_replication_slots()
```
Let’s change “slotsync” to “slot sync” because elsewhere comments all use “slot sync”, just to keep consistent.

Changed.

6 - 0001
```
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for main loop of slot sync worker and
+ * SQL function pg_sync_replication_slots().
```
Missing “the” before “SQL function”. This comment applies to multiple places.

Changed.

7 - 0001
```
+       if (sync_process_pid!= InvalidPid)
+               kill(sync_process_pid, SIGUSR1);
```

Nit: missing a white space before “!="

Fixed.

8 - 0002
```
+       if (slot_names != NIL)
{
-               StartTransactionCommand();
-               started_tx = true;
+               bool            first_slot = true;
+
+               /*
+                * Construct the query to fetch only the specified slots
+                */
+               appendStringInfoString(&query, " AND slot_name IN (");
+
+               foreach_ptr(char, slot_name, slot_names)
+               {
+                       if (!first_slot)
+                               appendStringInfoString(&query, ", ");
+
+                       appendStringInfo(&query, "'%s'", slot_name);
+                       first_slot = false;
+               }
+               appendStringInfoChar(&query, ')');
}
```
The logic of appending “, “ can be slightly simplified as:
```
If (slot_names != NIL)
{
Const char *sep = “”;
appendStringInfoString(&query, " AND slot_name IN (“);
foreach_ptr(char, slot_name, slot_names)
{
appendStringInfo(&query, “%s'%s'", sep, slot_name);
sep = “, “;
}
}
```

That saves a “if” check and a appendStringInfoString().

I'm not sure if this is much of an improvement, I like the current
approach and matches with similar coding patterns in the code base.

Attaching v34 addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v34-0001-Signal-backends-running-pg_sync_replication_slot.patchapplication/octet-stream; name=v34-0001-Signal-backends-running-pg_sync_replication_slot.patchDownload

From 3b1d1cb598312f9c748352c10f093f77f70fb6f6 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 10 Dec 2025 15:46:21 +1100
Subject: [PATCH v34 1/2] Signal backends running pg_sync_replication_slots()
 during promotion.

Previously, during promotion, only the slot synchronization worker was
interrupted to shutdown for promotion. That meant a backend
that performs slot synchronization via the pg_sync_replication_slots()
SQL function was not signalled at all because its PID was not
recorded in the slot-sync context.

This patch changes behaviour to:
1. Store the backend PID in SlotSyncCtxStruct so the backend performing
   slot synchronization can be signalled.
2. On promotion, send SIGUSR1 to the recorded PID - either
   the slot-sync worker or any backend currently syncing slots.
3. Backend invoking pg_sync_replication_slots() also calls
   ProcessSlotSyncInterrupts() to handle promotion signal as well any
   configuration changes that might result in stopping of
   synchronization.

This patch also acts as a base for a larger patch that improves
pg_sync_replication_slots() to wait for slots to be persisted before
exiting.

Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 src/backend/replication/logical/slotsync.c | 157 +++++++++++++--------
 1 file changed, 99 insertions(+), 58 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 9f92c21237e..bb623db25d4 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -71,11 +71,14 @@
 /*
  * Struct for sharing information to control slot synchronization.
  *
- * The slot sync worker's pid is needed by the startup process to shut it
- * down during promotion. The startup process shuts down the slot sync worker
- * and also sets stopSignaled=true to handle the race condition when the
+ * The 'pid' is either the slot sync worker's pid or the backend's pid running
+ * the SQL function pg_sync_replication_slots(). When the startup process sets
+ * 'stopSignaled' during promotion, it uses this 'pid' to wake up the currently
+ * synchronizing process so that the process can immediately stop its
+ * synchronizing work on seeing 'stopSignaled' set.
+ * Setting 'stopSignaled' is also used to handle the race condition when the
  * postmaster has not noticed the promotion yet and thus may end up restarting
- * the slot sync worker. If stopSignaled is set, the worker will exit in such a
+ * the slot sync worker. If 'stopSignaled' is set, the worker will exit in such a
  * case. The SQL function pg_sync_replication_slots() will also error out if
  * this flag is set. Note that we don't need to reset this variable as after
  * promotion the slot sync worker won't be restarted because the pmState
@@ -1195,10 +1198,11 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
+ *
+ * Exit or throw error if relevant GUCs have changed depending on whether
+ * called from slot sync worker or from the SQL function pg_sync_replication_slots()
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
  */
 static void
 slotsync_reread_config(void)
@@ -1209,8 +1213,11 @@ slotsync_reread_config(void)
 	bool		old_hot_standby_feedback = hot_standby_feedback;
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
+	bool		is_slotsync_worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;
 
-	Assert(sync_replication_slots);
+	if (is_slotsync_worker)
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1222,32 +1229,60 @@ slotsync_reread_config(void)
 
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
-	}
+		if (is_slotsync_worker)
+		{
+			ereport(LOG,
+			/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled",
+						   "sync_replication_slots"));
 
-	if (conninfo_changed ||
-		primary_slotname_changed ||
-		(old_hot_standby_feedback != hot_standby_feedback))
+			proc_exit(0);
+		}
+
+		parameter_changed = true;
+	}
+	else
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
+		if (conninfo_changed ||
+			primary_slotname_changed ||
+			(old_hot_standby_feedback != hot_standby_feedback))
+		{
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+			if (is_slotsync_worker)
+			{
+				ereport(LOG,
+						errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		proc_exit(0);
+				/*
+				 * Reset the last-start time for this worker so that the
+				 * postmaster can restart it without waiting for
+				 * SLOTSYNC_RESTART_INTERVAL_SEC.
+				 */
+				SlotSyncCtx->last_start_time = 0;
+
+				proc_exit(0);
+			}
+
+			parameter_changed = true;
+		}
+	}
+
+	/*
+	 * If we have reached here with a parameter change, we must be running in
+	 * SQL function, emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert(!is_slotsync_worker);
+		ereport(ERROR,
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
 
 }
 
 /*
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for main loop of slot sync worker and
+ * the SQL function pg_sync_replication_slots().
  */
 static void
 ProcessSlotSyncInterrupts(void)
@@ -1256,10 +1291,20 @@ ProcessSlotSyncInterrupts(void)
 
 	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down because promotion is triggered"));
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker will stop because promotion is triggered"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
+		else
+		{
+			/* For the SQL function */
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("replication slot synchronization will stop because promotion is triggered"));
+		}
 	}
 
 	if (ConfigReloadPending)
@@ -1362,29 +1407,14 @@ wait_for_slot_activity(bool some_slot_updated)
 }
 
 /*
- * Emit an error if a promotion or a concurrent sync call is in progress.
+ * Emit an error if a concurrent sync call is in progress.
  * Otherwise, advertise that a sync is in progress.
  */
 static void
-check_and_set_sync_info(pid_t worker_pid)
+check_and_set_sync_info(pid_t sync_process_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
-	/*
-	 * Emit an error if startup process signaled the slot sync machinery to
-	 * stop. See comments atop SlotSyncCtxStruct.
-	 */
-	if (SlotSyncCtx->stopSignaled)
-	{
-		SpinLockRelease(&SlotSyncCtx->mutex);
-		ereport(ERROR,
-				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				errmsg("cannot synchronize replication slots when standby promotion is ongoing"));
-	}
-
 	if (SlotSyncCtx->syncing)
 	{
 		SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1393,13 +1423,16 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
 	 * Advertise the required PID so that the startup process can kill the
-	 * slot sync worker on promotion.
+	 * slot sync process on promotion.
 	 */
-	SlotSyncCtx->pid = worker_pid;
+	SlotSyncCtx->pid = sync_process_pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
@@ -1414,6 +1447,7 @@ reset_syncing_flag(void)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1622,7 +1656,7 @@ update_synced_slots_inactive_since(void)
 	if (!StandbyMode)
 		return;
 
-	/* The slot sync worker or SQL function mustn't be running by now */
+	/* The slot sync worker or the SQL function mustn't be running by now */
 	Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
 
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
@@ -1651,16 +1685,18 @@ update_synced_slots_inactive_since(void)
 }
 
 /*
- * Shut down the slot sync worker.
+ * Shut down slot synchronization.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
- * also waits till the slot sync worker has exited or
+ * This function sets stopSignaled=true and wakes up the slot sync process
+ * (either worker or backend running the SQL function pg_sync_replication_slots())
+ * so that worker can exit or the SQL function pg_sync_replication_slots() can
+ * finish. It also waits till the slot sync worker has exited or
  * pg_sync_replication_slots() has finished.
  */
 void
 ShutDownSlotSync(void)
 {
-	pid_t		worker_pid;
+	pid_t		sync_process_pid;
 
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
@@ -1677,16 +1713,18 @@ ShutDownSlotSync(void)
 		return;
 	}
 
-	worker_pid = SlotSyncCtx->pid;
+	sync_process_pid = SlotSyncCtx->pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	/*
-	 * Signal slotsync worker if it was still running. The worker will stop
-	 * upon detecting that the stopSignaled flag is set to true.
+	 * Signal slotsync worker or the backend process running
+	 * pg_sync_replication_slots(), if either one is active.
+	 * The process will stop upon detecting that the stopSignaled
+	 * flag is set to true.
 	 */
-	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGUSR1);
+	if (sync_process_pid != InvalidPid)
+		kill(sync_process_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1835,7 +1873,10 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		check_and_set_sync_info(MyProcPid);
+
+		/* Check for interrupts and config changes */
+		ProcessSlotSyncInterrupts();
 
 		validate_remote_info(wrconn);
 
-- 
2.47.3

v34-0002-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v34-0002-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 6aa50daee0982b5b589ea80c553ad88b191538fa Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 10 Dec 2025 15:53:00 +1100
Subject: [PATCH v34 2/2] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.

Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  11 +-
 src/backend/replication/logical/slotsync.c    | 235 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  59 ++++-
 5 files changed, 253 insertions(+), 58 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..33940504622 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,12 +405,11 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
       Therefore, it is the recommended method for synchronizing slots.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index bb623db25d4..3a57e11687d 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,12 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication is used to sync the slots, and if
+ * the slots are not ready to be synced and are marked as RS_TEMPORARY because
+ * of any of the reasons mentioned above, then the SQL function also waits and
+ * retries until the slots are marked as RS_PERSISTENT (which means sync-ready).
+ * Refer to the comments in SyncReplicationSlots() for more details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +70,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -599,11 +606,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -627,7 +638,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -642,6 +659,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -665,10 +686,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr = GetStandbyFlushRecPtr(NULL);
@@ -770,7 +795,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -867,7 +893,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 			return false;
 		}
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -878,15 +905,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
+ *
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -895,29 +930,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "'%s'", slot_name);
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -994,6 +1045,33 @@ synchronize_slots(WalReceiverConn *wrconn)
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -1009,19 +1087,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1458,6 +1529,9 @@ reset_syncing_flag(void)
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1619,10 +1693,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_or_refresh_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1864,15 +1955,42 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(MyProcPid);
 
 		/* Check for interrupts and config changes */
@@ -1880,7 +1998,54 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts();
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				slot_names = extract_slot_names(remote_slots);
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index f39830dbb34..c0632bf901a 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 25777fa188c..8f63bfbb977 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1000,6 +1000,12 @@ $primary->psql(
 ));
 
 $subscriber2->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+$subscriber1->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub1;');
+
+# Remove the dropped sb1_slot from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
 
 # Verify that all slots have been removed except the one necessary for standby2,
 # which is needed for further testing.
@@ -1016,34 +1022,47 @@ $primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
 $primary->wait_for_replay_catchup($standby2);
 
 ##################################################
-# Verify that slotsync skip statistics are correctly updated when the
+# Test that pg_sync_replication_slots() on the standby skips and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+# Also verify that slotsync skip statistics are correctly updated when the
 # slotsync operation is skipped.
 ##################################################
 
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
-	'postgres', qq(
-	SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-	CREATE TABLE wal_push(a int);
-));
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	TRUNCATE tab_int;
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
 $primary->wait_for_replay_catchup($standby2);
 
 $log_offset = -s $standby2->logfile;
 
-# Enable slot sync worker
+# Enable standby for slot synchronization
 $standby2->append_conf(
-	'postgresql.conf', qq(
+    'postgresql.conf', qq(
 hot_standby_feedback = on
 primary_conninfo = '$connstr_1 dbname=postgres'
 log_min_messages = 'debug2'
-sync_replication_slots = on
 ));
 
 $standby2->reload;
 
-# Confirm that the slot sync worker is able to start.
-$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $h = $standby2->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
 
 # Confirm that the slot sync is skipped due to the remote slot lagging behind
 $standby2->wait_for_log(
@@ -1061,4 +1080,18 @@ $result = $standby2->safe_psql('postgres',
 );
 is($result, 't', "check slot sync skip count increments");
 
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby2->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
 done_testing();
-- 
2.47.3

#132

Yilin Zhang

jiezhilove@126.com

about 1 month ago

In reply to: Ajin Cherian (#131)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

At 2025-12-10 13:07:34, "Ajin Cherian" <itsajin@gmail.com> wrote:

I'm not sure if this is much of an improvement, I like the current
approach and matches with similar coding patterns in the code base.

Attaching v34 addressing the above comments.

Hi,
Few comments for v34.

1 - 0002

```

--- a/src/backend/replication/logical/slotsync.c

+++ b/src/backend/replication/logical/slotsync.c

@@ -39,6 +39,12 @@

* the last cycle. Refer to the comments above wait_for_slot_activity() for

* more details.

+ * If the SQL function pg_sync_replication is used to sync the slots, and if

```

Typo, it should be "pg_sync_replication_slots()" instead of "pg_sync_replication".

2 - 0002

```

+ /*

+ * The syscache access in fetch_or_refresh_remote_slots() needs a

+ * transaction env.

+ */

```

Typo, it should be "fetch_remote_slots()" instead of "fetch_or_refresh_remote_slots()".

3 - 0002

```

+ appendStringInfo(&query, "'%s'", slot_name);

```

Instead of manually add single quotes to slot name, consider using quote_literal_cstr().

While I was reviewing patch v32, Ajin Cherian had already submitted patch v34, but these issues still persisted.

Best regards,

Yilin Zhang

#133

Amit Kapila

amit.kapila16@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#131)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 10, 2025 at 10:37 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching v34 addressing the above comments.

0001 looks mostly good to me. I have made minor edits in the comments
and added error_code for one of the error messages. Please check
attached and let me know what you think?

--
With Regards,
Amit Kapila.

Attachments:

v35-0001-Signal-backends-running-pg_sync_replication_slot.patchapplication/octet-stream; name=v35-0001-Signal-backends-running-pg_sync_replication_slot.patchDownload

From ea51056b11ba648dbb65edbcebbdd49bd217d48f Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Wed, 10 Dec 2025 15:46:21 +1100
Subject: [PATCH v35] Signal backends running pg_sync_replication_slots()
 during promotion.

Previously, during promotion, only the slot synchronization worker was
interrupted to shutdown for promotion. That meant a backend
that performs slot synchronization via the pg_sync_replication_slots()
SQL function was not signalled at all because its PID was not
recorded in the slot-sync context.

This patch changes behaviour to:
1. Store the backend PID in SlotSyncCtxStruct so the backend performing
   slot synchronization can be signalled.
2. On promotion, send SIGUSR1 to the recorded PID - either
   the slot-sync worker or any backend currently syncing slots.
3. Backend invoking pg_sync_replication_slots() also calls
   ProcessSlotSyncInterrupts() to handle promotion signal as well any
   configuration changes that might result in stopping of
   synchronization.

This patch also acts as a base for a larger patch that improves
pg_sync_replication_slots() to wait for slots to be persisted before
exiting.

Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 src/backend/replication/logical/slotsync.c | 157 +++++++++++++--------
 1 file changed, 99 insertions(+), 58 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 9f92c21237e..873aa003eec 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -71,11 +71,14 @@
 /*
  * Struct for sharing information to control slot synchronization.
  *
- * The slot sync worker's pid is needed by the startup process to shut it
- * down during promotion. The startup process shuts down the slot sync worker
- * and also sets stopSignaled=true to handle the race condition when the
+ * The 'pid' is either the slot sync worker's pid or the backend's pid running
+ * the SQL function pg_sync_replication_slots(). When the startup process sets
+ * 'stopSignaled' during promotion, it uses this 'pid' to wake up the currently
+ * synchronizing process so that the process can immediately stop its
+ * synchronizing work on seeing 'stopSignaled' set.
+ * Setting 'stopSignaled' is also used to handle the race condition when the
  * postmaster has not noticed the promotion yet and thus may end up restarting
- * the slot sync worker. If stopSignaled is set, the worker will exit in such a
+ * the slot sync worker. If 'stopSignaled' is set, the worker will exit in such a
  * case. The SQL function pg_sync_replication_slots() will also error out if
  * this flag is set. Note that we don't need to reset this variable as after
  * promotion the slot sync worker won't be restarted because the pmState
@@ -1195,10 +1198,10 @@ ValidateSlotSyncParams(int elevel)
 }
 
 /*
- * Re-read the config file.
+ * Re-read the config file for slot synchronization.
  *
- * Exit if any of the slot sync GUCs have changed. The postmaster will
- * restart it.
+ * Exit or throw error if relevant GUCs have changed depending on whether
+ * called from slot sync worker or from the SQL function pg_sync_replication_slots()
  */
 static void
 slotsync_reread_config(void)
@@ -1209,8 +1212,11 @@ slotsync_reread_config(void)
 	bool		old_hot_standby_feedback = hot_standby_feedback;
 	bool		conninfo_changed;
 	bool		primary_slotname_changed;
+	bool		is_slotsync_worker = AmLogicalSlotSyncWorkerProcess();
+	bool		parameter_changed = false;
 
-	Assert(sync_replication_slots);
+	if (is_slotsync_worker)
+		Assert(sync_replication_slots);
 
 	ConfigReloadPending = false;
 	ProcessConfigFile(PGC_SIGHUP);
@@ -1222,32 +1228,60 @@ slotsync_reread_config(void)
 
 	if (old_sync_replication_slots != sync_replication_slots)
 	{
-		ereport(LOG,
-		/* translator: %s is a GUC variable name */
-				errmsg("replication slot synchronization worker will shut down because \"%s\" is disabled", "sync_replication_slots"));
-		proc_exit(0);
-	}
+		if (is_slotsync_worker)
+		{
+			ereport(LOG,
+			/* translator: %s is a GUC variable name */
+					errmsg("replication slot synchronization worker will stop because \"%s\" is disabled",
+						   "sync_replication_slots"));
+
+			proc_exit(0);
+		}
 
-	if (conninfo_changed ||
-		primary_slotname_changed ||
-		(old_hot_standby_feedback != hot_standby_feedback))
+		parameter_changed = true;
+	}
+	else
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker will restart because of a parameter change"));
+		if (conninfo_changed ||
+			primary_slotname_changed ||
+			(old_hot_standby_feedback != hot_standby_feedback))
+		{
 
-		/*
-		 * Reset the last-start time for this worker so that the postmaster
-		 * can restart it without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
-		 */
-		SlotSyncCtx->last_start_time = 0;
+			if (is_slotsync_worker)
+			{
+				ereport(LOG,
+						errmsg("replication slot synchronization worker will restart because of a parameter change"));
 
-		proc_exit(0);
+				/*
+				 * Reset the last-start time for this worker so that the
+				 * postmaster can restart it without waiting for
+				 * SLOTSYNC_RESTART_INTERVAL_SEC.
+				 */
+				SlotSyncCtx->last_start_time = 0;
+
+				proc_exit(0);
+			}
+
+			parameter_changed = true;
+		}
+	}
+
+	/*
+	 * If we have reached here with a parameter change, we must be running in
+	 * SQL function, emit error in such a case.
+	 */
+	if (parameter_changed)
+	{
+		Assert(!is_slotsync_worker);
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("replication slot synchronization will stop because of a parameter change"));
 	}
 
 }
 
 /*
- * Interrupt handler for main loop of slot sync worker.
+ * Interrupt handler for process performing slot synchronization.
  */
 static void
 ProcessSlotSyncInterrupts(void)
@@ -1256,10 +1290,23 @@ ProcessSlotSyncInterrupts(void)
 
 	if (SlotSyncCtx->stopSignaled)
 	{
-		ereport(LOG,
-				errmsg("replication slot synchronization worker is shutting down because promotion is triggered"));
+		if (AmLogicalSlotSyncWorkerProcess())
+		{
+			ereport(LOG,
+					errmsg("replication slot synchronization worker will stop because promotion is triggered"));
 
-		proc_exit(0);
+			proc_exit(0);
+		}
+		else
+		{
+			/*
+			 * For the backend executing SQL function
+			 * pg_sync_replication_slots().
+			 */
+			ereport(ERROR,
+					errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					errmsg("replication slot synchronization will stop because promotion is triggered"));
+		}
 	}
 
 	if (ConfigReloadPending)
@@ -1362,29 +1409,14 @@ wait_for_slot_activity(bool some_slot_updated)
 }
 
 /*
- * Emit an error if a promotion or a concurrent sync call is in progress.
+ * Emit an error if a concurrent sync call is in progress.
  * Otherwise, advertise that a sync is in progress.
  */
 static void
-check_and_set_sync_info(pid_t worker_pid)
+check_and_set_sync_info(pid_t sync_process_pid)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
-	/* The worker pid must not be already assigned in SlotSyncCtx */
-	Assert(worker_pid == InvalidPid || SlotSyncCtx->pid == InvalidPid);
-
-	/*
-	 * Emit an error if startup process signaled the slot sync machinery to
-	 * stop. See comments atop SlotSyncCtxStruct.
-	 */
-	if (SlotSyncCtx->stopSignaled)
-	{
-		SpinLockRelease(&SlotSyncCtx->mutex);
-		ereport(ERROR,
-				errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				errmsg("cannot synchronize replication slots when standby promotion is ongoing"));
-	}
-
 	if (SlotSyncCtx->syncing)
 	{
 		SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1393,13 +1425,16 @@ check_and_set_sync_info(pid_t worker_pid)
 				errmsg("cannot synchronize replication slots concurrently"));
 	}
 
+	/* The pid must not be already assigned in SlotSyncCtx */
+	Assert(SlotSyncCtx->pid == InvalidPid);
+
 	SlotSyncCtx->syncing = true;
 
 	/*
 	 * Advertise the required PID so that the startup process can kill the
-	 * slot sync worker on promotion.
+	 * slot sync process on promotion.
 	 */
-	SlotSyncCtx->pid = worker_pid;
+	SlotSyncCtx->pid = sync_process_pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
@@ -1414,6 +1449,7 @@ reset_syncing_flag(void)
 {
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 	SlotSyncCtx->syncing = false;
+	SlotSyncCtx->pid = InvalidPid;
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	syncing_slots = false;
@@ -1622,7 +1658,7 @@ update_synced_slots_inactive_since(void)
 	if (!StandbyMode)
 		return;
 
-	/* The slot sync worker or SQL function mustn't be running by now */
+	/* The slot sync worker or the SQL function mustn't be running by now */
 	Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);
 
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
@@ -1651,16 +1687,18 @@ update_synced_slots_inactive_since(void)
 }
 
 /*
- * Shut down the slot sync worker.
+ * Shut down slot synchronization.
  *
- * This function sends signal to shutdown slot sync worker, if required. It
- * also waits till the slot sync worker has exited or
+ * This function sets stopSignaled=true and wakes up the slot sync process
+ * (either worker or backend running the SQL function pg_sync_replication_slots())
+ * so that worker can exit or the SQL function pg_sync_replication_slots() can
+ * finish. It also waits till the slot sync worker has exited or
  * pg_sync_replication_slots() has finished.
  */
 void
 ShutDownSlotSync(void)
 {
-	pid_t		worker_pid;
+	pid_t		sync_process_pid;
 
 	SpinLockAcquire(&SlotSyncCtx->mutex);
 
@@ -1677,16 +1715,16 @@ ShutDownSlotSync(void)
 		return;
 	}
 
-	worker_pid = SlotSyncCtx->pid;
+	sync_process_pid = SlotSyncCtx->pid;
 
 	SpinLockRelease(&SlotSyncCtx->mutex);
 
 	/*
-	 * Signal slotsync worker if it was still running. The worker will stop
-	 * upon detecting that the stopSignaled flag is set to true.
+	 * Signal process doing slotsync, if any. The process will stop upon
+	 * detecting that the stopSignaled flag is set to true.
 	 */
-	if (worker_pid != InvalidPid)
-		kill(worker_pid, SIGUSR1);
+	if (sync_process_pid != InvalidPid)
+		kill(sync_process_pid, SIGUSR1);
 
 	/* Wait for slot sync to end */
 	for (;;)
@@ -1835,7 +1873,10 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
-		check_and_set_sync_info(InvalidPid);
+		check_and_set_sync_info(MyProcPid);
+
+		/* Check for interrupts and config changes */
+		ProcessSlotSyncInterrupts();
 
 		validate_remote_info(wrconn);
 
-- 
2.50.1.windows.1

#134

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: Amit Kapila (#133)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Dec 10, 2025, at 20:23, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 10, 2025 at 10:37 AM Ajin Cherian <itsajin@gmail.com> wrote:

Attaching v34 addressing the above comments.

0001 looks mostly good to me. I have made minor edits in the comments
and added error_code for one of the error messages. Please check
attached and let me know what you think?

--
With Regards,
Amit Kapila.
<v35-0001-Signal-backends-running-pg_sync_replication_slot.patch>

I saw you have integrated the missed change from the last push into v35, and v35 looks good to me.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#135

shveta malik

shveta.malik@gmail.com

about 1 month ago

In reply to: Amit Kapila (#133)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 10, 2025 at 5:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

0001 looks mostly good to me. I have made minor edits in the comments
and added error_code for one of the error messages. Please check
attached and let me know what you think?

v35 looks good to me.

thanks
Shveta

#136

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: Yilin Zhang (#132)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 10, 2025 at 4:23 PM Yilin Zhang <jiezhilove@126.com> wrote:

Hi,
Few comments for v34.

Thanks for your review!

I've addressed your comments.

As patch 0001 has been pushed. I've rebased and created a new version
v36 with the remaining patch.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v36-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v36-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From 1b4be7e6514acb6ce4b88ee83b0a5ee707e5c975 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Thu, 11 Dec 2025 15:55:59 +1100
Subject: [PATCH v36] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.

Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Yilin Zhang <jiezhilove@126.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  11 +-
 src/backend/replication/logical/slotsync.c    | 236 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  59 ++++-
 5 files changed, 254 insertions(+), 58 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..33940504622 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,12 +405,11 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically to some extent—continuing until all
+      the failover slots that existed on primary at the start of the function
+      call are synchronized. Any slots created after the function begins will
+      not be synchronized. In contrast, automatic synchronization
       via <varname>sync_replication_slots</varname> provides continuous slot
       updates, enabling seamless failover and supporting high availability.
       Therefore, it is the recommended method for synchronizing slots.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 873aa003eec..437e882fa81 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,13 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication_slots() is used to sync the slots,
+ * and if the slots are not ready to be synced and are marked as RS_TEMPORARY
+ * because of any of the reasons mentioned above, then the SQL function also
+ * waits and retries until the slots are marked as RS_PERSISTENT (which means
+ * sync-ready). Refer to the comments in SyncReplicationSlots() for more
+ * details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +71,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -599,11 +607,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -627,7 +639,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -642,6 +660,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -665,10 +687,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr = GetStandbyFlushRecPtr(NULL);
@@ -770,7 +796,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -867,7 +894,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 			return false;
 		}
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -878,15 +906,23 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
+ *
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * Parameters:
+ *	wrconn - Connection to the primary server
+ *	slot_names - List of slot names (char *) to fetch from primary,
+ *				or NIL to fetch all failover logical slots.
+ *
+ * Returns:
+ *	List of remote slot information structures. Returns NIL if no slot
+ *	is found.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -895,29 +931,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "%s", quote_literal_cstr(slot_name));
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -994,6 +1046,33 @@ synchronize_slots(WalReceiverConn *wrconn)
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * Takes a list of remote slots and synchronizes them locally. Creates the
+ * slots if not present on the standby and updates existing ones.
+ *
+ * Parameters:
+ * wrconn - Connection to the primary server
+ * remote_slot_list - List of RemoteSlot structures to synchronize.
+ * slot_persistence_pending - boolean used by SQL function
+ * 							  pg_sync_replication_slots() to track if any slots
+ * 							  could not be persisted and need to be retried.
+ *
+ * Returns:
+ * TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -1009,19 +1088,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1460,6 +1532,9 @@ reset_syncing_flag(void)
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots() also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1621,10 +1696,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1864,15 +1956,42 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(MyProcPid);
 
 		/* Check for interrupts and config changes */
@@ -1880,7 +1999,54 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts();
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				slot_names = extract_slot_names(remote_slots);
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index f39830dbb34..c0632bf901a 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 25777fa188c..6bc938a0236 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1000,6 +1000,13 @@ $primary->psql(
 ));
 
 $subscriber2->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+$subscriber1->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub1;');
+$subscriber1->safe_psql('postgres', 'TRUNCATE tab_int;');
+
+# Remove the dropped sb1_slot from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
 
 # Verify that all slots have been removed except the one necessary for standby2,
 # which is needed for further testing.
@@ -1016,34 +1023,46 @@ $primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
 $primary->wait_for_replay_catchup($standby2);
 
 ##################################################
-# Verify that slotsync skip statistics are correctly updated when the
+# Test that pg_sync_replication_slots() on the standby skips and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+# Also verify that slotsync skip statistics are correctly updated when the
 # slotsync operation is skipped.
 ##################################################
 
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
-	'postgres', qq(
-	SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-	CREATE TABLE wal_push(a int);
-));
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
 $primary->wait_for_replay_catchup($standby2);
 
 $log_offset = -s $standby2->logfile;
 
-# Enable slot sync worker
+# Enable standby for slot synchronization
 $standby2->append_conf(
-	'postgresql.conf', qq(
+    'postgresql.conf', qq(
 hot_standby_feedback = on
 primary_conninfo = '$connstr_1 dbname=postgres'
 log_min_messages = 'debug2'
-sync_replication_slots = on
 ));
 
 $standby2->reload;
 
-# Confirm that the slot sync worker is able to start.
-$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $h = $standby2->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
 
 # Confirm that the slot sync is skipped due to the remote slot lagging behind
 $standby2->wait_for_log(
@@ -1061,4 +1080,18 @@ $result = $standby2->safe_psql('postgres',
 );
 is($result, 't', "check slot sync skip count increments");
 
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby2->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
 done_testing();
-- 
2.47.3

#137

shveta malik

shveta.malik@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#136)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Dec 11, 2025 at 10:45 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Dec 10, 2025 at 4:23 PM Yilin Zhang <jiezhilove@126.com> wrote:

Hi,
Few comments for v34.

Thanks for your review!

I've addressed your comments.

As patch 0001 has been pushed. I've rebased and created a new version
v36 with the remaining patch.

Verified, the patch works well.

thanks
Shveta

#138

Amit Kapila

amit.kapila16@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#136)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Dec 11, 2025 at 10:45 AM Ajin Cherian <itsajin@gmail.com> wrote:

As patch 0001 has been pushed. I've rebased and created a new version
v36 with the remaining patch.

I have made a number of changes in code comments and docs. Kindly
review and if you are okay with these then include them in the next
version.

--
With Regards,
Amit Kapila.

Attachments:

v36_amit_1.patch.txttext/plain; charset=UTF-8; name=v36_amit_1.patch.txtDownload

diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 33940504622..cae8a376c3b 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -406,13 +406,12 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
       However, unlike automatic synchronization, it does not perform incremental
-      updates. It retries cyclically to some extent—continuing until all
-      the failover slots that existed on primary at the start of the function
-      call are synchronized. Any slots created after the function begins will
-      not be synchronized. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
+      updates. It retries cyclically until all the failover slots that existed on
+      primary at the start of the function call are synchronized. Any slots created
+      after the function begins will not be synchronized. In contrast, automatic
+      synchronization via <varname>sync_replication_slots</varname> provides
+      continuous slot updates, enabling seamless failover and supporting high
+      availability. Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 437e882fa81..10a769ccf85 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -608,7 +608,7 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * future synchronization; otherwise, do nothing.
  *
  * *slot_persistence_pending is set to true if any of the slots fail to
- * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ * persist.
  *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
@@ -688,7 +688,7 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
  * periodic syncs.
  *
  * *slot_persistence_pending is set to true if any of the slots fail to
- * persist. It is utilized by the SQL function pg_sync_replication_slots().
+ * persist.
  *
  * Returns TRUE if the local slot is updated.
  */
@@ -911,15 +911,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
  * If slot_names is NIL, fetches all failover logical slots from the
  * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Parameters:
- *	wrconn - Connection to the primary server
- *	slot_names - List of slot names (char *) to fetch from primary,
- *				or NIL to fetch all failover logical slots.
- *
- * Returns:
- *	List of remote slot information structures. Returns NIL if no slot
- *	is found.
- *
+ * Returns list of remote slot information structures, if any, otherwise,
+ * NIL if no slot is found.
  */
 static List *
 fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
@@ -1054,18 +1047,13 @@ fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 /*
  * Synchronize slots.
  *
- * Takes a list of remote slots and synchronizes them locally. Creates the
- * slots if not present on the standby and updates existing ones.
+ * This function takes a list of remote slots and synchronizes them locally. It
+ * creates the slots if not present on the standby and updates existing ones.
  *
- * Parameters:
- * wrconn - Connection to the primary server
- * remote_slot_list - List of RemoteSlot structures to synchronize.
- * slot_persistence_pending - boolean used by SQL function
- * 							  pg_sync_replication_slots() to track if any slots
- * 							  could not be persisted and need to be retried.
+ * If slot_persistence_pending is not NULL, it will be set to true if one or
+ * more slots could not be persisted.
  *
- * Returns:
- * TRUE if any of the slots gets updated in this sync-cycle.
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
  */
 static bool
 synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
@@ -1981,6 +1969,7 @@ extract_slot_names(List *remote_slots)
  *
  * Repeatedly fetches and updates replication slot information from the
  * primary until all slots are at least "sync ready".
+ *
  * Exits early if promotion is triggered or certain critical
  * configuration parameters have changed.
  */
@@ -2042,7 +2031,6 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 
 			/* wait before retrying again */
 			wait_for_slot_activity(some_slot_updated);
-
 		}
 
 		if (slot_names)

#139

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: Amit Kapila (#138)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Dec 11, 2025, at 20:23, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 11, 2025 at 10:45 AM Ajin Cherian <itsajin@gmail.com> wrote:

As patch 0001 has been pushed. I've rebased and created a new version
v36 with the remaining patch.

I have made a number of changes in code comments and docs. Kindly
review and if you are okay with these then include them in the next
version.

This diff enhanced docs and comments, overall looks good to me. A few nit comments:

1
```
- * Returns:
- *	List of remote slot information structures. Returns NIL if no slot
- *	is found.
- *
+ * Returns list of remote slot information structures, if any, otherwise,
+ * NIL if no slot is found.
```

I think “a” is needed before “list”, and “if any, otherwise,” looks rarely seen in code comments. So suggesting:
```
* Returns a list of remote slot information structures, or NIL if none
* are found.
```

2
```
- * Parameters:
- * wrconn - Connection to the primary server
- * remote_slot_list - List of RemoteSlot structures to synchronize.
- * slot_persistence_pending - boolean used by SQL function
- * 							  pg_sync_replication_slots() to track if any slots
- * 							  could not be persisted and need to be retried.
+ * If slot_persistence_pending is not NULL, it will be set to true if one or
+ * more slots could not be persisted.
```

The changed version loses the meaning of retry. So, suggesting:
```
* If slot_persistence_pending is not NULL, it will be set to true if one
* or more slots could not be persisted. This allows callers such as
* pg_sync_replication_slots() to retry those slots.
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#140

Ajin Cherian

itsajin@gmail.com

about 1 month ago

In reply to: Amit Kapila (#138)

1 attachment(s)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Thu, Dec 11, 2025 at 11:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 11, 2025 at 10:45 AM Ajin Cherian <itsajin@gmail.com> wrote:

As patch 0001 has been pushed. I've rebased and created a new version
v36 with the remaining patch.

I have made a number of changes in code comments and docs. Kindly
review and if you are okay with these then include them in the next
version.

I have included these changes as well as comments by Chao. Attaching
v37 with the changes.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v37-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchapplication/octet-stream; name=v37-0001-Improve-initial-slot-synchronization-in-pg_sync_.patchDownload

From ea3d7c56dc9c09057dd88496189cdff2202288e6 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <itsajin@gmail.com>
Date: Thu, 11 Dec 2025 15:55:59 +1100
Subject: [PATCH v37] Improve initial slot synchronization in
 pg_sync_replication_slots()

During initial slot synchronization on a standby, the operation may fail if
required catalog rows or WALs have been removed or are at risk of removal. The
slotsync worker handles this by creating a temporary slot for initial sync and
retain it even in case of failure. It will keep retrying until the slot on the
primary has been advanced to a position where all the required data are also
available on the standby. However, pg_sync_replication_slots() had
no such protection mechanism.

The SQL API would fail immediately if synchronization requirements weren't
met. This could lead to permanent failure as the standby might continue
removing the still-required data.

To address this, we now make pg_sync_replication_slots() wait for the primary
slot to advance to a suitable position before completing synchronization and
before removing the temporary slot. Once the slot advances to a suitable
position, we retry synchronization. Additionally, if a promotion occurs on
the standby during this wait, the process exits gracefully and the
temporary slot is removed.

Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Yilin Zhang <jiezhilove@126.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
---
 doc/src/sgml/func/func-admin.sgml             |   4 +-
 doc/src/sgml/logicaldecoding.sgml             |  16 +-
 src/backend/replication/logical/slotsync.c    | 225 +++++++++++++++---
 .../utils/activity/wait_event_names.txt       |   2 +-
 .../t/040_standby_failover_slots_sync.pl      |  57 ++++-
 5 files changed, 244 insertions(+), 60 deletions(-)

diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..2896cd9e429 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -1497,9 +1497,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         standby server. Temporary synced slots, if any, cannot be used for
         logical decoding and must be dropped after promotion. See
         <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
-        Note that this function is primarily intended for testing and
-        debugging purposes and should be used with caution. Additionally,
-        this function cannot be executed if
+        Note that this function cannot be executed if
         <link linkend="guc-sync-replication-slots"><varname>
         sync_replication_slots</varname></link> is enabled and the slotsync
         worker is already running to perform the synchronization of slots.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index d5a5e22fe2c..cae8a376c3b 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -405,15 +405,13 @@ postgres=# SELECT * from pg_logical_slot_get_changes('regression_slot', NULL, NU
       periodic synchronization of failover slots, they can also be manually
       synchronized using the <link linkend="pg-sync-replication-slots">
       <function>pg_sync_replication_slots</function></link> function on the standby.
-      However, this function is primarily intended for testing and debugging and
-      should be used with caution. Unlike automatic synchronization, it does not
-      include cyclic retries, making it more prone to synchronization failures,
-      particularly during initial sync scenarios where the required WAL files
-      or catalog rows for the slot might have already been removed or are at risk
-      of being removed on the standby. In contrast, automatic synchronization
-      via <varname>sync_replication_slots</varname> provides continuous slot
-      updates, enabling seamless failover and supporting high availability.
-      Therefore, it is the recommended method for synchronizing slots.
+      However, unlike automatic synchronization, it does not perform incremental
+      updates. It retries cyclically until all the failover slots that existed on
+      primary at the start of the function call are synchronized. Any slots created
+      after the function begins will not be synchronized. In contrast, automatic
+      synchronization via <varname>sync_replication_slots</varname> provides
+      continuous slot updates, enabling seamless failover and supporting high
+      availability. Therefore, it is the recommended method for synchronizing slots.
      </para>
     </note>
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 873aa003eec..dc904c12419 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -39,6 +39,13 @@
  * the last cycle. Refer to the comments above wait_for_slot_activity() for
  * more details.
  *
+ * If the SQL function pg_sync_replication_slots() is used to sync the slots,
+ * and if the slots are not ready to be synced and are marked as RS_TEMPORARY
+ * because of any of the reasons mentioned above, then the SQL function also
+ * waits and retries until the slots are marked as RS_PERSISTENT (which means
+ * sync-ready). Refer to the comments in SyncReplicationSlots() for more
+ * details.
+ *
  * Any standby synchronized slots will be dropped if they no longer need
  * to be synchronized. See comment atop drop_local_obsolete_slots() for more
  * details.
@@ -64,6 +71,7 @@
 #include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/ps_status.h"
 #include "utils/timeout.h"
@@ -599,11 +607,15 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
  * local ones, then update the LSNs and persist the local synced slot for
  * future synchronization; otherwise, do nothing.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist.
+ *
  * Return true if the slot is marked as RS_PERSISTENT (sync-ready), otherwise
  * false.
  */
 static bool
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+									 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot = MyReplicationSlot;
 	bool		found_consistent_snapshot = false;
@@ -627,7 +639,13 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		 * current location when recreating the slot in the next cycle. It may
 		 * take more time to create such a slot. Therefore, we keep this slot
 		 * and attempt the synchronization in the next cycle.
+		 *
+		 * We also update the slot_persistence_pending parameter, so
+		 * the SQL function can retry.
 		 */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -642,6 +660,10 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 				errdetail("Synchronization could lead to data loss, because the standby could not build a consistent snapshot to decode WALs at LSN %X/%08X.",
 						  LSN_FORMAT_ARGS(slot->data.restart_lsn)));
 
+		/* Set this, so that SQL function can retry */
+		if (slot_persistence_pending)
+			*slot_persistence_pending = true;
+
 		return false;
 	}
 
@@ -665,10 +687,14 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
  * updated. The slot is then persisted and is considered as sync-ready for
  * periodic syncs.
  *
+ * *slot_persistence_pending is set to true if any of the slots fail to
+ * persist.
+ *
  * Returns TRUE if the local slot is updated.
  */
 static bool
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
+					 bool *slot_persistence_pending)
 {
 	ReplicationSlot *slot;
 	XLogRecPtr	latestFlushPtr = GetStandbyFlushRecPtr(NULL);
@@ -770,7 +796,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 		if (slot->data.persistency == RS_TEMPORARY)
 		{
 			slot_updated = update_and_persist_local_synced_slot(remote_slot,
-																remote_dbid);
+																remote_dbid,
+																slot_persistence_pending);
 		}
 
 		/* Slot ready for sync, so sync it. */
@@ -867,7 +894,8 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 			return false;
 		}
 
-		update_and_persist_local_synced_slot(remote_slot, remote_dbid);
+		update_and_persist_local_synced_slot(remote_slot, remote_dbid,
+											 slot_persistence_pending);
 
 		slot_updated = true;
 	}
@@ -878,15 +906,16 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Synchronize slots.
+ * Fetch remote slots.
  *
- * Gets the failover logical slots info from the primary server and updates
- * the slots locally. Creates the slots if not present on the standby.
+ * If slot_names is NIL, fetches all failover logical slots from the
+ * primary server, otherwise fetches only the ones with names in slot_names.
  *
- * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ * Returns a list of remote slot information structures, or NIL if none
+ * are found.
  */
-static bool
-synchronize_slots(WalReceiverConn *wrconn)
+static List *
+fetch_remote_slots(WalReceiverConn *wrconn, List *slot_names)
 {
 #define SLOTSYNC_COLUMN_COUNT 10
 	Oid			slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
@@ -895,29 +924,45 @@ synchronize_slots(WalReceiverConn *wrconn)
 	WalRcvExecResult *res;
 	TupleTableSlot *tupslot;
 	List	   *remote_slot_list = NIL;
-	bool		some_slot_updated = false;
-	bool		started_tx = false;
-	const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
-		" restart_lsn, catalog_xmin, two_phase, two_phase_at, failover,"
-		" database, invalidation_reason"
-		" FROM pg_catalog.pg_replication_slots"
-		" WHERE failover and NOT temporary";
-
-	/* The syscache access in walrcv_exec() needs a transaction env. */
-	if (!IsTransactionState())
+	StringInfoData query;
+
+	initStringInfo(&query);
+	appendStringInfoString(&query,
+				"SELECT slot_name, plugin, confirmed_flush_lsn,"
+				" restart_lsn, catalog_xmin, two_phase,"
+				" two_phase_at, failover,"
+				" database, invalidation_reason"
+				" FROM pg_catalog.pg_replication_slots"
+				" WHERE failover and NOT temporary");
+
+	if (slot_names != NIL)
 	{
-		StartTransactionCommand();
-		started_tx = true;
+		bool		first_slot = true;
+
+		/*
+		 * Construct the query to fetch only the specified slots
+		 */
+		appendStringInfoString(&query, " AND slot_name IN (");
+
+		foreach_ptr(char, slot_name, slot_names)
+		{
+			if (!first_slot)
+				appendStringInfoString(&query, ", ");
+
+			appendStringInfo(&query, "%s", quote_literal_cstr(slot_name));
+			first_slot = false;
+		}
+		appendStringInfoChar(&query, ')');
 	}
 
 	/* Execute the query */
-	res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
+	res = walrcv_exec(wrconn, query.data, SLOTSYNC_COLUMN_COUNT, slotRow);
+	pfree(query.data);
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
 				errmsg("could not fetch failover logical slots info from the primary server: %s",
 					   res->err));
 
-	/* Construct the remote_slot tuple and synchronize each slot locally */
 	tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
 	{
@@ -994,6 +1039,29 @@ synchronize_slots(WalReceiverConn *wrconn)
 		ExecClearTuple(tupslot);
 	}
 
+	walrcv_clear_result(res);
+
+	return remote_slot_list;
+}
+
+/*
+ * Synchronize slots.
+ *
+ * This function takes a list of remote slots and synchronizes them locally. It
+ * creates the slots if not present on the standby and updates existing ones.
+ *
+ * If slot_persistence_pending is not NULL, it will be set to true if one or
+ * more slots could not be persisted. This allows callers such as
+ * SyncReplicationSlots() to retry those slots.
+ *
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn, List *remote_slot_list,
+				  bool *slot_persistence_pending)
+{
+	bool		some_slot_updated = false;
+
 	/* Drop local slots that no longer need to be synced. */
 	drop_local_obsolete_slots(remote_slot_list);
 
@@ -1009,19 +1077,12 @@ synchronize_slots(WalReceiverConn *wrconn)
 		 */
 		LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 
-		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid);
+		some_slot_updated |= synchronize_one_slot(remote_slot, remote_dbid,
+												  slot_persistence_pending);
 
 		UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
 	}
 
-	/* We are done, free remote_slot_list elements */
-	list_free_deep(remote_slot_list);
-
-	walrcv_clear_result(res);
-
-	if (started_tx)
-		CommitTransactionCommand();
-
 	return some_slot_updated;
 }
 
@@ -1460,6 +1521,9 @@ reset_syncing_flag(void)
  *
  * It connects to the primary server, fetches logical failover slots
  * information periodically in order to create and sync the slots.
+ *
+ * Note: If any changes are made here, check if the corresponding SQL
+ * function logic in SyncReplicationSlots() also needs to be changed.
  */
 void
 ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
@@ -1621,10 +1685,27 @@ ReplSlotSyncWorkerMain(const void *startup_data, size_t startup_data_len)
 	for (;;)
 	{
 		bool		some_slot_updated = false;
+		bool		started_tx = false;
+		List	   *remote_slots;
 
 		ProcessSlotSyncInterrupts();
 
-		some_slot_updated = synchronize_slots(wrconn);
+		/*
+		 * The syscache access in fetch_remote_slots() needs a
+		 * transaction env.
+		 */
+		if (!IsTransactionState())
+		{
+			StartTransactionCommand();
+			started_tx = true;
+		}
+
+		remote_slots = fetch_remote_slots(wrconn, NIL);
+		some_slot_updated = synchronize_slots(wrconn, remote_slots, NULL);
+		list_free_deep(remote_slots);
+
+		if (started_tx)
+			CommitTransactionCommand();
 
 		wait_for_slot_activity(some_slot_updated);
 	}
@@ -1864,15 +1945,43 @@ slotsync_failure_callback(int code, Datum arg)
 	walrcv_disconnect(wrconn);
 }
 
+/*
+ * Helper function to extract slot names from a list of remote slots
+ */
+static List *
+extract_slot_names(List *remote_slots)
+{
+	List		*slot_names = NIL;
+
+	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
+	{
+		char       *slot_name;
+
+		slot_name = pstrdup(remote_slot->name);
+		slot_names = lappend(slot_names, slot_name);
+	}
+
+	return slot_names;
+}
+
 /*
  * Synchronize the failover enabled replication slots using the specified
  * primary server connection.
+ *
+ * Repeatedly fetches and updates replication slot information from the
+ * primary until all slots are at least "sync ready".
+ *
+ * Exits early if promotion is triggered or certain critical
+ * configuration parameters have changed.
  */
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
 	PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
 	{
+		List		*remote_slots = NIL;
+		List		*slot_names = NIL;  /* List of slot names to track */
+
 		check_and_set_sync_info(MyProcPid);
 
 		/* Check for interrupts and config changes */
@@ -1880,7 +1989,53 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 
 		validate_remote_info(wrconn);
 
-		synchronize_slots(wrconn);
+		/* Retry until all the slots are sync-ready */
+		for (;;)
+		{
+			bool	slot_persistence_pending = false;
+			bool	some_slot_updated = false;
+
+			/* Check for interrupts and config changes */
+			ProcessSlotSyncInterrupts();
+
+			/* We must be in a valid transaction state */
+			Assert(IsTransactionState());
+
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+				slot_names = extract_slot_names(remote_slots);
+
+			/* Free the current remote_slots list */
+			list_free_deep(remote_slots);
+
+			/* Done if all slots are persisted i.e are sync-ready */
+			if (!slot_persistence_pending)
+				break;
+
+			/* wait before retrying again */
+			wait_for_slot_activity(some_slot_updated);
+		}
+
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index f39830dbb34..c0632bf901a 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -62,7 +62,7 @@ LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
 RECOVERY_WAL_STREAM	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot sync worker."
+REPLICATION_SLOTSYNC_MAIN	"Waiting in main loop of slot synchronization."
 REPLICATION_SLOTSYNC_SHUTDOWN	"Waiting for slot sync worker to shut down."
 SYSLOGGER_MAIN	"Waiting in main loop of syslogger process."
 WAL_RECEIVER_MAIN	"Waiting in main loop of WAL receiver process."
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 25777fa188c..20f942cfd14 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1000,6 +1000,13 @@ $primary->psql(
 ));
 
 $subscriber2->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub2;');
+$subscriber1->safe_psql('postgres', 'DROP SUBSCRIPTION regress_mysub1;');
+$subscriber1->safe_psql('postgres', 'TRUNCATE tab_int;');
+
+# Remove the dropped sb1_slot from the synchronized_standby_slots list and reload the
+# configuration.
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
 
 # Verify that all slots have been removed except the one necessary for standby2,
 # which is needed for further testing.
@@ -1016,34 +1023,46 @@ $primary->safe_psql('postgres', "COMMIT PREPARED 'test_twophase_slotsync';");
 $primary->wait_for_replay_catchup($standby2);
 
 ##################################################
-# Verify that slotsync skip statistics are correctly updated when the
+# Test that pg_sync_replication_slots() on the standby skips and retries
+# until the slot becomes sync-ready (when the remote slot catches up with
+# the locally reserved position).
+# Also verify that slotsync skip statistics are correctly updated when the
 # slotsync operation is skipped.
 ##################################################
 
-# Create a logical replication slot and create some DDL on the primary so
-# that the slot lags behind the standby.
-$primary->safe_psql(
-	'postgres', qq(
-	SELECT pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, true);
-	CREATE TABLE wal_push(a int);
-));
+# Recreate the slot by creating a subscription on the subscriber, keep it disabled.
+$subscriber1->safe_psql('postgres', qq[
+	CREATE TABLE push_wal (a int);
+	CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr' PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, failover = true, enabled = false);]);
+
+# Create some DDL on the primary so that the slot lags behind the standby
+$primary->safe_psql('postgres', "CREATE TABLE push_wal (a int);");
+
+# Make sure the DDL changes are synced to the standby
 $primary->wait_for_replay_catchup($standby2);
 
 $log_offset = -s $standby2->logfile;
 
-# Enable slot sync worker
+# Enable standby for slot synchronization
 $standby2->append_conf(
 	'postgresql.conf', qq(
 hot_standby_feedback = on
 primary_conninfo = '$connstr_1 dbname=postgres'
 log_min_messages = 'debug2'
-sync_replication_slots = on
 ));
 
 $standby2->reload;
 
-# Confirm that the slot sync worker is able to start.
-$standby2->wait_for_log(qr/slot sync worker started/, $log_offset);
+# Attempt to synchronize slots using API. The API will continue retrying
+# synchronization until the remote slot catches up.
+# The API will not return until this happens, to be able to make
+# further calls, call the API in a background process.
+my $h = $standby2->background_psql('postgres', on_error_stop => 0);
+
+$h->query_until(qr/start/, q(
+	\echo start
+	SELECT pg_sync_replication_slots();
+	));
 
 # Confirm that the slot sync is skipped due to the remote slot lagging behind
 $standby2->wait_for_log(
@@ -1061,4 +1080,18 @@ $result = $standby2->safe_psql('postgres',
 );
 is($result, 't', "check slot sync skip count increments");
 
+# Enable the Subscription, so that the remote slot catches up
+$subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+$subscriber1->wait_for_subscription_sync;
+
+# Create xl_running_xacts on the primary to speed up restart_lsn advancement.
+$primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot();");
+
+# Confirm from the log that the slot is sync-ready now.
+$standby2->wait_for_log(
+    qr/newly created replication slot \"lsub1_slot\" is sync-ready now/,
+    $log_offset);
+
+$h->quit;
+
 done_testing();
-- 
2.47.3

#141

shveta malik

shveta.malik@gmail.com

about 1 month ago

In reply to: Ajin Cherian (#140)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Dec 12, 2025 at 5:35 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Thu, Dec 11, 2025 at 11:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 11, 2025 at 10:45 AM Ajin Cherian <itsajin@gmail.com> wrote:

As patch 0001 has been pushed. I've rebased and created a new version
v36 with the remaining patch.

I have made a number of changes in code comments and docs. Kindly
review and if you are okay with these then include them in the next
version.

I have included these changes as well as comments by Chao. Attaching
v37 with the changes.

Thanks. v37 LGTM.

thanks
Shveta

#142

Amit Kapila

amit.kapila16@gmail.com

28 days ago

In reply to: shveta malik (#141)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Fri, Dec 12, 2025 at 8:53 AM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Dec 12, 2025 at 5:35 AM Ajin Cherian <itsajin@gmail.com> wrote:

I have included these changes as well as comments by Chao. Attaching
v37 with the changes.

Thanks. v37 LGTM.

Pushed.

--
With Regards,
Amit Kapila.

#143

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

26 days ago

In reply to: Amit Kapila (#142)

1 attachment(s)

RE: Improve pg_sync_replication_slots() to wait for primary to advance

On Monday, December 15, 2025 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 12, 2025 at 8:53 AM shveta malik <shveta.malik@gmail.com>
wrote:

On Fri, Dec 12, 2025 at 5:35 AM Ajin Cherian <itsajin@gmail.com> wrote:

I have included these changes as well as comments by Chao. Attaching
v37 with the changes.

Thanks. v37 LGTM.

Pushed.

My college reported a related BF failure[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2025-12-15%2011%3A25%3A38 to me off-list.

After analyzing, I think the issue is that the newly added test in
040_standby_failover_slots_sync synchronizes a replication slot to the standby
server without configuring synchronized_standby_slots. This omission allows
logical failover slots to advance beyond the designated physical replication
slot, resulting in intermittent synchronization failures.

I confirmed the same from the log where the slotsync failed due to the
reason mentioned above:

--
2025-12-15 12:30:33.502 CET [3015371][client backend][1/2:0] ERROR: skipping slot synchronization because the received slot sync LSN 0/06017C90 for slot "lsub1_slot" is ahead of the standby position 0/06017C58
2025-12-15 12:30:33.502 CET [3015371][client backend][1/2:0] STATEMENT: SELECT pg_sync_replication_slots();
--

Here is a small patch to fix it.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2025-12-15%2011%3A25%3A38

Best Regards,
Hou zj

Attachments:

v1-0001-Fix-an-intermittent-BF-failure.patchapplication/octet-stream; name=v1-0001-Fix-an-intermittent-BF-failure.patchDownload

From 36863892b25adfcedbae0f8d6072a40bcb724fba Mon Sep 17 00:00:00 2001
From: Zhijie Hou <houzj.fnst@fujitsu.com>
Date: Wed, 17 Dec 2025 15:16:38 +0800
Subject: [PATCH v1] Fix an intermittent BF failure

The newly added test in 040_standby_failover_slots_sync synchronizes a
replication slot to the standby server without configuring
synchronized_standby_slots. This omission allows logical failover slots to
advance beyond the designated physical replication slot, resulting in
intermittent synchronization failures.
---
 src/test/recovery/t/040_standby_failover_slots_sync.pl | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 20f942cfd14..7dadd8647e5 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -1080,6 +1080,14 @@ $result = $standby2->safe_psql('postgres',
 );
 is($result, 't', "check slot sync skip count increments");
 
+# Configure primary to disallow any logical slots that have enabled failover
+# from getting ahead of the specified physical replication slot (sb2_slot).
+$primary->append_conf(
+	'postgresql.conf', qq(
+synchronized_standby_slots = 'sb2_slot'
+));
+$primary->reload;
+
 # Enable the Subscription, so that the remote slot catches up
 $subscriber1->safe_psql('postgres', "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
 $subscriber1->wait_for_subscription_sync;
-- 
2.51.1.windows.1

#144

Andres Freund

andres@anarazel.de

26 days ago

In reply to: Zhijie Hou (Fujitsu) (#143)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Hi,

On 2025-12-17 10:28:28 +0000, Zhijie Hou (Fujitsu) wrote:

On Monday, December 15, 2025 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 12, 2025 at 8:53 AM shveta malik <shveta.malik@gmail.com>
wrote:

On Fri, Dec 12, 2025 at 5:35 AM Ajin Cherian <itsajin@gmail.com> wrote:

I have included these changes as well as comments by Chao. Attaching
v37 with the changes.

Thanks. v37 LGTM.

Pushed.

My college reported a related BF failure[1] to me off-list.

FWIW, this also fails semi-regularly in CI. E.g.
https://cirrus-ci.com/task/6281872222715904
https://cirrus-ci.com/task/5243530626465792

Greetings,

Andres Freund

#145

Amit Kapila

amit.kapila16@gmail.com

25 days ago

In reply to: Zhijie Hou (Fujitsu) (#143)

Re: Improve pg_sync_replication_slots() to wait for primary to advance

On Wed, Dec 17, 2025 at 3:58 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Here is a small patch to fix it.

Thanks, I've pushed the patch. BTW, looking at the code of slot_sync
API code path, I could think of the following improvements.

*
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);

/*
* Can get here only if GUC 'synchronized_standby_slots' on the
* primary server was not configured correctly.
*/
ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

Can we change this ERROR to LOG even for API as now the API also
retires to sync the slots during initial sync?

* The use of the slot_persistence_pending flag in the internal APIs
seems to be the reverse of what it should be. I mean to say that
initially it should be true and when we actually persist the slot then
we can set it to false.

* We can retry to sync all the slots present in the primary at the
start of API, not only temporary slots. If we do this then the
previous point may not be required. Also, please mention something
like: "It retries cyclically until all the failover slots that existed
on primary at the start of the function call are synchronized." in the
function description [1]https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION -- With Regards, Amit Kapila. as well.

[1]: https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION -- With Regards, Amit Kapila.
--
With Regards,
Amit Kapila.