Proposal: GUC to control starting/stopping logical subscription workers

Started by SATYANARAYANA NARLAPURAM5 months ago6 messages
#1SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
1 attachment(s)

Hi all,

I couldn't find a previous discussion on a new GUC to globally enable or
disable logical subscription workers at the instance level. So starting a
new thread on this.

In multi-region or high-availability setups, a promoted standby often
requires a controlled switchover before it should start applying logical
replication changes from upstream. Without such control, a promoted standby
may immediately attempt to connect to the publisher as a logical
subscriber, which can cause it to unexpectedly take over replication slots,
start pulling changes before the setup is ready, or even conflict with the
original primary that is still using those slots. Disabling the
subscription on the primary before promoting a standby is not possible in
all cases, for example during PITR or data center outages.

Providing a way to keep logical subscriptions globally disabled—via a GUC
setting—prior to promotion ensures that no changes are accidentally pulled
or applied before the system is fully prepared. This avoids race conditions
and the risk of data divergence.
I would like to propose adding a GUC with the following behavior:

1. Default value for the GUC is ON, same behavior as now without the GUC
2. When off, no new apply workers start and existing ones exit
gracefully similar to when subscription disabled
3. When turned on again, behavior will be the same as the current
behavior
4. This GUC shouldn't require a restart

Attaching a draft patch. Please let me know your thoughts.

Thanks,
Satya

Attachments:

v1_guc_logical_replication_subscriptions.patchapplication/octet-stream; name=v1_guc_logical_replication_subscriptions.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 20ccb2d6b54..d8961977d40 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -12873,6 +12873,29 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-logical-replication-subscriptions-enabled" xreflabel="logical_replication_subscriptions_enabled">
+      <term><varname>logical_replication_subscriptions_enabled</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary>logical_replication_subscriptions_enabled</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables all logical replication subscription apply workers in this server.
+        When set to <literal>off</literal>, subscriber workers will wait until the value
+        is changed to <literal>on</literal>. The change takes effect immediately.
+       </para>
+       <para>
+        This parameter can be changed at run time with <command>ALTER SYSTEM</command>
+       </para>
+       <para>
+        This setting is intended for operational maintenance, such as pausing
+        replication apply during upgrades or large schema changes. This is also useful to not
+        start subscribers immediately after point-in-time restore.
+       </para>
+      </listitem>
+     </varlistentry>
+
     </variablelist>
   </sect1>
   <sect1 id="runtime-config-short">
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 37377f7eb63..281997e932f 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -53,6 +53,7 @@ int			max_sync_workers_per_subscription = 2;
 int			max_parallel_apply_workers_per_subscription = 2;
 
 LogicalRepWorker *MyLogicalRepWorker = NULL;
+bool logical_replication_subscriptions_enabled = true;
 
 typedef struct LogicalRepCtxStruct
 {
@@ -1193,6 +1194,8 @@ ApplyLauncherMain(Datum main_arg)
 									   ALLOCSET_DEFAULT_SIZES);
 		oldctx = MemoryContextSwitchTo(subctx);
 
+		if (logical_replication_subscriptions_enabled)
+		{
 			/*
 			* Start any missing workers for enabled subscriptions.
 			*
@@ -1314,6 +1317,7 @@ ApplyLauncherMain(Datum main_arg)
 									wal_retrieve_retry_interval - elapsed);
 				}
 			}
+		}
 
 		/*
 		 * Drop the CONFLICT_DETECTION_SLOT slot if there is no subscription
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 0fdc5de57ba..48b122c1c2d 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3948,6 +3948,13 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 
 		CHECK_FOR_INTERRUPTS();
 		
+		if (!logical_replication_subscriptions_enabled)
+		{
+			ereport(LOG,
+				(errmsg("logical replication apply worker exiting because replication is disabled")));
+			proc_exit(0);
+		}
+
 		MemoryContextSwitchTo(ApplyMessageContext);
 
 		len = walrcv_receive(LogRepWorkerWalRcvConn, &buf, &fd);
@@ -4697,6 +4704,17 @@ maybe_reread_subscription(void)
 	Subscription *newsub;
 	bool		started_tx = false;
 
+	/*
+	 * Exit if logical_replication_subscriber_enabled is set to false.
+	*/
+	if (!logical_replication_subscriptions_enabled)
+	{
+		ereport(LOG,
+				(errmsg("logical replication worker for subscription \"%s\" will stop because logical replication subscriptions were disabled.",
+						MySubscription->name)));
+		proc_exit(0);
+	}
+
 	/* When cache state is valid there is nothing to do here. */
 	if (MySubscriptionValid)
 		return;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index d14b1678e7f..0724997934b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2143,6 +2143,16 @@ struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"logical_replication_subscriptions_enabled", PGC_SIGHUP, REPLICATION_SUBSCRIBERS,
+			gettext_noop("Enables or disables launching logical replication subscriptions."),
+			NULL
+		},
+		&logical_replication_subscriptions_enabled,
+		true, NULL,
+		NULL, NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..cd88c9730ed 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -396,7 +396,7 @@
 					# (change requires restart)
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 #max_parallel_apply_workers_per_subscription = 2	# taken from max_logical_replication_workers
-
+#logical_replication_subscriptions_enabled = on   # enable or disable logical replication subscriptions
 
 #------------------------------------------------------------------------------
 # QUERY TUNING
diff --git a/src/include/replication/logicallauncher.h b/src/include/replication/logicallauncher.h
index b29453e8e4f..ce5ff5c606f 100644
--- a/src/include/replication/logicallauncher.h
+++ b/src/include/replication/logicallauncher.h
@@ -15,6 +15,7 @@
 extern PGDLLIMPORT int max_logical_replication_workers;
 extern PGDLLIMPORT int max_sync_workers_per_subscription;
 extern PGDLLIMPORT int max_parallel_apply_workers_per_subscription;
+extern PGDLLIMPORT bool logical_replication_subscriptions_enabled;
 
 extern void ApplyLauncherRegister(void);
 extern void ApplyLauncherMain(Datum main_arg);
#2Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#1)
Re: Proposal: GUC to control starting/stopping logical subscription workers

Hi,

On Tue, Aug 12, 2025 at 8:40 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

I couldn't find a previous discussion on a new GUC to globally enable or disable logical subscription workers at the instance level. So starting a new thread on this.

In multi-region or high-availability setups, a promoted standby often requires a controlled switchover before it should start applying logical replication changes from upstream. Without such control, a promoted standby may immediately attempt to connect to the publisher as a logical subscriber, which can cause it to unexpectedly take over replication slots, start pulling changes before the setup is ready, or even conflict with the original primary that is still using those slots. Disabling the subscription on the primary before promoting a standby is not possible in all cases, for example during PITR or data center outages.

Providing a way to keep logical subscriptions globally disabled—via a GUC setting—prior to promotion ensures that no changes are accidentally pulled or applied before the system is fully prepared. This avoids race conditions and the risk of data divergence.

I would like to propose adding a GUC with the following behavior:

Default value for the GUC is ON, same behavior as now without the GUC
When off, no new apply workers start and existing ones exit gracefully similar to when subscription disabled
When turned on again, behavior will be the same as the current behavior
This GUC shouldn't require a restart

If I understand correctly, the end effect is similar to disabling all
subscriptions. Why not just add ALTER SUBSCRIPTION ... DISABLE for all
subscriptions in the failover work flow? Migration of logical
replication slots docs says so -
https://www.postgresql.org/docs/18/logical-replication-upgrade.html.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#3SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Bharath Rupireddy (#2)
Re: Proposal: GUC to control starting/stopping logical subscription workers

HI Bharat,

If I understand correctly, the end effect is similar to disabling all
subscriptions. Why not just add ALTER SUBSCRIPTION ... DISABLE for all
subscriptions in the failover work flow? Migration of logical
replication slots docs says so -
https://www.postgresql.org/docs/18/logical-replication-upgrade.html.

The scenarios I am talking in this case are no major version upgrade, but
PITR and Standby promotion cases.
Server is in read only mode (catalog cannot be updated) before promotion
and subscriptions cannot be disabled.

Show quoted text

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#4Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#3)
Re: Proposal: GUC to control starting/stopping logical subscription workers

Hi,

On Tue, Sep 9, 2025 at 1:16 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

If I understand correctly, the end effect is similar to disabling all
subscriptions. Why not just add ALTER SUBSCRIPTION ... DISABLE for all
subscriptions in the failover work flow? Migration of logical
replication slots docs says so -
https://www.postgresql.org/docs/18/logical-replication-upgrade.html.

The scenarios I am talking in this case are no major version upgrade, but PITR and Standby promotion cases.
Server is in read only mode (catalog cannot be updated) before promotion and subscriptions cannot be disabled.

Thanks for clarifying. AFAICS, failover slots won't have this issue.
All the replication connections start to fail during standby's
promotion (StartLogicalReplication->CreateDecodingContext->errmsg("cannot
use replication slot \"%s\" for logical decoding") and replication
from publisher resumes automatically after promotion and slots are
fully synced.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#5Euler Taveira
euler@eulerto.com
In reply to: SATYANARAYANA NARLAPURAM (#1)
Re: Proposal: GUC to control starting/stopping logical subscription workers

On Wed, Aug 13, 2025, at 12:40 AM, SATYANARAYANA NARLAPURAM wrote:

I couldn't find a previous discussion on a new GUC to globally enable
or disable logical subscription workers at the instance level. So
starting a new thread on this.

max_logical_replication_workers.

In multi-region or high-availability setups, a promoted standby often
requires a controlled switchover before it should start applying
logical replication changes from upstream. Without such control, a
promoted standby may immediately attempt to connect to the publisher as
a logical subscriber, which can cause it to unexpectedly take over
replication slots, start pulling changes before the setup is ready, or
even conflict with the original primary that is still using those
slots. Disabling the subscription on the primary before promoting a
standby is not possible in all cases, for example during PITR or data
center outages.
Providing a way to keep logical subscriptions globally disabled—via a
GUC setting—prior to promotion ensures that no changes are accidentally
pulled or applied before the system is fully prepared. This avoids race
conditions and the risk of data divergence.

Why do you need another GUC? The max_logical_replication_workers parameter is
useful for this exact scenario. For example, pg_createsubscriber uses it to not
start logical replication while converting a physical replica into a logical
one.

I would like to propose adding a GUC with the following behavior:
1. Default value for the GUC is ON, same behavior as now without the
GUC
2. When off, no new apply workers start and existing ones exit
gracefully similar to when subscription disabled
3. When turned on again, behavior will be the same as the current
behavior
4. This GUC shouldn't require a restart

That's the only point not covered by the current behavior. You don't explain
why it is a requirement.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#6SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Euler Taveira (#5)
Re: Proposal: GUC to control starting/stopping logical subscription workers

Hi Euler,

On Wed, Sep 10, 2025 at 5:11 PM Euler Taveira <euler@eulerto.com> wrote:

On Wed, Aug 13, 2025, at 12:40 AM, SATYANARAYANA NARLAPURAM wrote:

I couldn't find a previous discussion on a new GUC to globally enable
or disable logical subscription workers at the instance level. So
starting a new thread on this.

max_logical_replication_workers.

Thanks for the pointer, it was not obvious to me earlier. This should work
in my scenario. Should the documents state that setting this to zero has
the same effect of disabling the publishers and subscribers?

In multi-region or high-availability setups, a promoted standby often
requires a controlled switchover before it should start applying
logical replication changes from upstream. Without such control, a
promoted standby may immediately attempt to connect to the publisher as
a logical subscriber, which can cause it to unexpectedly take over
replication slots, start pulling changes before the setup is ready, or
even conflict with the original primary that is still using those
slots. Disabling the subscription on the primary before promoting a
standby is not possible in all cases, for example during PITR or data
center outages.
Providing a way to keep logical subscriptions globally disabled—via a
GUC setting—prior to promotion ensures that no changes are accidentally
pulled or applied before the system is fully prepared. This avoids race
conditions and the risk of data divergence.

Why do you need another GUC? The max_logical_replication_workers parameter
is
useful for this exact scenario. For example, pg_createsubscriber uses it
to not
start logical replication while converting a physical replica into a
logical
one.

As mentioned earlier, I don't have any scenario why a separate GUC needed
based on the above explanation.

I would like to propose adding a GUC with the following behavior:
1. Default value for the GUC is ON, same behavior as now without the
GUC
2. When off, no new apply workers start and existing ones exit
gracefully similar to when subscription disabled
3. When turned on again, behavior will be the same as the current
behavior
4. This GUC shouldn't require a restart

That's the only point not covered by the current behavior. You don't
explain
why it is a requirement.

Maybe not restarting the instance is the only use case but I can live with
it.

Show quoted text