Unexpected Standby Shutdown on sync_replication_slots change
Hello,
I'm not sure if it's a bug but I've encountered an unexpected behavior when
dynamically changing the sync_replication_slots parameter on a PostgreSQL
17 standby server. Instead of logging an error and continuing to run, the
standby instance shuts down with a FATAL error, which is not the
anticipated behavior for a dynamic parameter change, especially when the
documentation doesn't indicate such an outcome.
Steps to Reproduce
1.
Set up a physical replication between two PostgreSQL 17.5 instances.
2.
Ensure wal_level on the primary (and consequently on the standby) is set
to replica.
3.
Start both the primary and standby instances, confirming replication is
active.
4.
On the *standby* instance, dynamically change the sync_replication_slots
parameter (I have run the following query: ALTER SYSTEM SET
sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)
Expected Behavior
I expected the standby instance to continue running and log an error
message (similar to how hot_standby_feedback behaves when not enabled,
e.g., a loop of LOG: replication slot synchronization requires
"hot_standby_feedback" to be enabled). A FATAL error leading to an
unexpected shutdown for a dynamic parameter change on a running standby is
not the anticipated behavior. The documentation for sync_replication_slots
also doesn't indicate that a misconfiguration or incompatible wal_level
would lead to a shutdown.
Actual Behavior
Upon attempting to set sync_replication_slots to on on the standby with
wal_level set to replica, the standby instance immediately shuts down with
the following log messages:
LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >= "logical"
Environment
-
*PostgreSQL Version:* 17.5
On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
Hello,
I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slots parameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, the standby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change, especially when the documentation doesn't indicate such an outcome.
Steps to Reproduce
Set up a physical replication between two PostgreSQL 17.5 instances.
Ensure wal_level on the primary (and consequently on the standby) is set to replica.
Start both the primary and standby instances, confirming replication is active.
On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTER SYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)
Expected Behavior
I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaves when not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled). A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipated behavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatible wal_level would lead to a shutdown.
Actual Behavior
Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instance immediately shuts down with the following log messages:
LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >= "logical"Environment
PostgreSQL Version: 17.5
Thanks for the report!
I was able to reproduce the issue even on the latest master (v19dev).
I agree that the current behavior—where changing a GUC parameter can
cause the server to shut down—is unexpected and should be avoided.
From what I’ve seen in the code, the problem stems from postmaster
calling ValidateSlotSyncParams() before starting the slot sync worker.
That function raises an ERROR if wal_level is not logical while
sync_replication_slots is enabled. Since ERROR is treated as FATAL
in postmaster, it causes the server to exit.
To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
raise an ERROR in this case, as follows.
ValidateSlotSyncParams(int elevel)
{
/*
* Logical slot sync/creation requires wal_level >= logical.
- *
- * Since altering the wal_level requires a server restart, so error out in
- * this case regardless of elevel provided by caller.
*/
if (wal_level < WAL_LEVEL_LOGICAL)
- ereport(ERROR,
+ {
+ ereport(elevel,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("replication slot synchronization requires \"wal_level\" >=
\"logical\""));
+ return false;
+ }
Regards,
--
Fujii Masao
On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
Hello,
I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slots parameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, the standby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change, especially when the documentation doesn't indicate such an outcome.
Steps to Reproduce
Set up a physical replication between two PostgreSQL 17.5 instances.
Ensure wal_level on the primary (and consequently on the standby) is set to replica.
Start both the primary and standby instances, confirming replication is active.
On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTER SYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)
Expected Behavior
I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaves when not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled). A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipated behavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatible wal_level would lead to a shutdown.
Actual Behavior
Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instance immediately shuts down with the following log messages:
LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >= "logical"Environment
PostgreSQL Version: 17.5
Thanks for the report!
I was able to reproduce the issue even on the latest master (v19dev).
I agree that the current behavior—where changing a GUC parameter can
cause the server to shut down—is unexpected and should be avoided.From what I’ve seen in the code, the problem stems from postmaster
calling ValidateSlotSyncParams() before starting the slot sync worker.
That function raises an ERROR if wal_level is not logical while
sync_replication_slots is enabled. Since ERROR is treated as FATAL
in postmaster, it causes the server to exit.To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
raise an ERROR in this case, as follows.ValidateSlotSyncParams(int elevel) { /* * Logical slot sync/creation requires wal_level >= logical. - * - * Since altering the wal_level requires a server restart, so error out in - * this case regardless of elevel provided by caller. */ if (wal_level < WAL_LEVEL_LOGICAL) - ereport(ERROR, + { + ereport(elevel, errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("replication slot synchronization requires \"wal_level\" >= \"logical\"")); + return false; + }
I've created a patch to implement the above—attached.
Note that this patch does not change the existing behavior when
the misconfiguration (sync_replication_slots enabled but wal_level not
set to logical) is detected at server startup. In that case, the server
still shuts down with a FATAL error, which is consistent with other
settings like summarize_wal.
Regards,
--
Fujii Masao
Attachments:
v1-0001-Avoid-unexpected-shutdown-when-sync_replication_s.patchapplication/octet-stream; name=v1-0001-Avoid-unexpected-shutdown-when-sync_replication_s.patchDownload+7-5
On Fri, Jul 25, 2025 at 12:20 AM Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
Hello,
I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slots parameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, the standby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change, especially when the documentation doesn't indicate such an outcome.
Steps to Reproduce
Set up a physical replication between two PostgreSQL 17.5 instances.
Ensure wal_level on the primary (and consequently on the standby) is set to replica.
Start both the primary and standby instances, confirming replication is active.
On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTER SYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)
Expected Behavior
I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaves when not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled). A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipated behavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatible wal_level would lead to a shutdown.
Actual Behavior
Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instance immediately shuts down with the following log messages:
LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >= "logical"Environment
PostgreSQL Version: 17.5
Thanks for the report!
I was able to reproduce the issue even on the latest master (v19dev).
I agree that the current behavior—where changing a GUC parameter can
cause the server to shut down—is unexpected and should be avoided.From what I’ve seen in the code, the problem stems from postmaster
calling ValidateSlotSyncParams() before starting the slot sync worker.
That function raises an ERROR if wal_level is not logical while
sync_replication_slots is enabled. Since ERROR is treated as FATAL
in postmaster, it causes the server to exit.To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
raise an ERROR in this case, as follows.ValidateSlotSyncParams(int elevel) { /* * Logical slot sync/creation requires wal_level >= logical. - * - * Since altering the wal_level requires a server restart, so error out in - * this case regardless of elevel provided by caller. */ if (wal_level < WAL_LEVEL_LOGICAL) - ereport(ERROR, + { + ereport(elevel, errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("replication slot synchronization requires \"wal_level\" >= \"logical\"")); + return false; + }I've created a patch to implement the above—attached.
Thank You for the patch.
Note that this patch does not change the existing behavior when
the misconfiguration (sync_replication_slots enabled but wal_level not
set to logical) is detected at server startup. In that case, the server
still shuts down with a FATAL error, which is consistent with other
settings like summarize_wal.
Validated the behaviour, the patch looks good to me.
thanks
Shveta
Thanks for the quick patch. I've tested it on the REL_17_STABLE branch, and
it's working fine.
On a standby node, after dynamically setting the sync_replication_slots
parameter, I observed the following logs. The instance did not shut down,
which seems correct:
2025-07-25 09:26:34.613 UTC [4420] LOG: received SIGHUP, reloading
configuration files
2025-07-25 09:26:34.614 UTC [4420] LOG: parameter
"sync_replication_slots" changed to "on"
2025-07-25 09:26:34.615 UTC [4420] LOG: replication slot
synchronization requires "wal_level" >= "logical"
2025-07-25 09:27:34.662 UTC [4420] LOG: replication slot
synchronization requires "wal_level" >= "logical"
The instance did not restart as expected, showing this fatal log:
2025-07-25 09:27:45.668 UTC [4430] FATAL: replication slot
synchronization ("sync_replication_slots" = on) requires "wal_level"
= "logical"
I have a couple of observations:
-
With this patch, a primary instance will not restart if the
configuration is incorrect.
-
Only wal_level is checked, but the ValidateSlotSyncParams function
includes other mandatory parameters. These are not being checked during
startup.
Regards,
Hugo
Le ven. 25 juil. 2025 à 08:16, shveta malik <shveta.malik@gmail.com> a
écrit :
On Fri, Jul 25, 2025 at 12:20 AM Fujii Masao <masao.fujii@gmail.com>
wrote:On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com>
wrote:
On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com>
wrote:
Hello,
I'm not sure if it's a bug but I've encountered an unexpected
behavior when dynamically changing the sync_replication_slots parameter on
a PostgreSQL 17 standby server. Instead of logging an error and continuing
to run, the standby instance shuts down with a FATAL error, which is not
the anticipated behavior for a dynamic parameter change, especially when
the documentation doesn't indicate such an outcome.Steps to Reproduce
Set up a physical replication between two PostgreSQL 17.5 instances.
Ensure wal_level on the primary (and consequently on the standby) is
set to replica.
Start both the primary and standby instances, confirming replication
is active.
On the standby instance, dynamically change the
sync_replication_slots parameter (I have run the following query: ALTER
SYSTEM SET sync_replication_slots = 'on'; followed by SELECT
pg_reload_conf();)Expected Behavior
I expected the standby instance to continue running and log an error
message (similar to how hot_standby_feedback behaves when not enabled,
e.g., a loop of LOG: replication slot synchronization requires
"hot_standby_feedback" to be enabled). A FATAL error leading to an
unexpected shutdown for a dynamic parameter change on a running standby is
not the anticipated behavior. The documentation for sync_replication_slots
also doesn't indicate that a misconfiguration or incompatible wal_level
would lead to a shutdown.Actual Behavior
Upon attempting to set sync_replication_slots to on on the standby
with wal_level set to replica, the standby instance immediately shuts down
with the following log messages:LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >="logical"
Environment
PostgreSQL Version: 17.5
Thanks for the report!
I was able to reproduce the issue even on the latest master (v19dev).
I agree that the current behavior—where changing a GUC parameter can
cause the server to shut down—is unexpected and should be avoided.From what I’ve seen in the code, the problem stems from postmaster
calling ValidateSlotSyncParams() before starting the slot sync worker.
That function raises an ERROR if wal_level is not logical while
sync_replication_slots is enabled. Since ERROR is treated as FATAL
in postmaster, it causes the server to exit.To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
raise an ERROR in this case, as follows.ValidateSlotSyncParams(int elevel)
{
/*
* Logical slot sync/creation requires wal_level >= logical.
- *
- * Since altering the wal_level requires a server restart, so errorout in
- * this case regardless of elevel provided by caller. */ if (wal_level < WAL_LEVEL_LOGICAL) - ereport(ERROR, + { + ereport(elevel, errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("replication slot synchronization requires \"wal_level\" >= \"logical\"")); + return false; + }I've created a patch to implement the above—attached.
Thank You for the patch.
Note that this patch does not change the existing behavior when
the misconfiguration (sync_replication_slots enabled but wal_level not
set to logical) is detected at server startup. In that case, the server
still shuts down with a FATAL error, which is consistent with other
settings like summarize_wal.Validated the behaviour, the patch looks good to me.
thanks
Shveta
--
Cordialement,
[image: Scaleway]
<https://www-uploads.scaleway.com/Logo_Scaleway_2022_New_Tagline_Purple_CMYK_1_c52a040544.pdf?updated_at=2022-10-28T15:20:53.668Z>
*Hugo DUBOIS*
*Devops Engineer*
hdubois@scaleway.com
On Fri, Jul 25, 2025 at 6:40 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
Thanks for the quick patch. I've tested it on the REL_17_STABLE branch, and it's working fine.
On a standby node, after dynamically setting the sync_replication_slots parameter, I observed the following logs. The instance did not shut down, which seems correct:
2025-07-25 09:26:34.613 UTC [4420] LOG: received SIGHUP, reloading configuration files
2025-07-25 09:26:34.614 UTC [4420] LOG: parameter "sync_replication_slots" changed to "on"
2025-07-25 09:26:34.615 UTC [4420] LOG: replication slot synchronization requires "wal_level" >= "logical"
2025-07-25 09:27:34.662 UTC [4420] LOG: replication slot synchronization requires "wal_level" >= "logical"The instance did not restart as expected, showing this fatal log:
2025-07-25 09:27:45.668 UTC [4430] FATAL: replication slot synchronization ("sync_replication_slots" = on) requires "wal_level" >= "logical"
Thanks for testing!
I have a couple of observations:
With this patch, a primary instance will not restart if the configuration is incorrect.
Right. In HEAD, the server shuts down due to the misconfiguration only when
starting up as a standby and entering hot standby mode. With this patch,
the server shuts down earlier—during startup—even when starting as a primary.
If starting as a standby, the shutdown now happens before entering hot standby.
It seems hard to match the current behavior exactly in the patch, and I’m not
sure it’s worth trying. I think we have two options:
#1. Make the server always shut down on misconfiguration, regardless of
whether it starts as a primary or standby, and before reaching hot standby.
#2. Allow the server to start even if the configuration is invalid.
The current patch follows option #1. But if there are valid use cases for
starting a primary with sync_replication_slots enabled and wal_level not set
to logical, then option #2 might be better. What do you think?
Only wal_level is checked, but the ValidateSlotSyncParams function includes other mandatory parameters. These are not being checked during startup.
The other parameters checked by ValidateSlotSyncParams() can be changed
without restarting the server, so users can fix them on the fly while
the server is running with the misconfiguration. But wal_level requires
a server restart, so it's important to catch that upfront.
Regards,
--
Fujii Masao
Le ven. 25 juil. 2025 à 12:32, Fujii Masao <masao.fujii@gmail.com> a écrit :
On Fri, Jul 25, 2025 at 6:40 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
Thanks for the quick patch. I've tested it on the REL_17_STABLE branch,
and it's working fine.
On a standby node, after dynamically setting the sync_replication_slots
parameter, I observed the following logs. The instance did not shut down,
which seems correct:2025-07-25 09:26:34.613 UTC [4420] LOG: received SIGHUP, reloading
configuration files
2025-07-25 09:26:34.614 UTC [4420] LOG: parameter
"sync_replication_slots" changed to "on"
2025-07-25 09:26:34.615 UTC [4420] LOG: replication slot
synchronization requires "wal_level" >= "logical"
2025-07-25 09:27:34.662 UTC [4420] LOG: replication slot
synchronization requires "wal_level" >= "logical"
The instance did not restart as expected, showing this fatal log:
2025-07-25 09:27:45.668 UTC [4430] FATAL: replication slot
synchronization ("sync_replication_slots" = on) requires "wal_level" >=
"logical"Thanks for testing!
I have a couple of observations:
With this patch, a primary instance will not restart if the
configuration is incorrect.
Right. In HEAD, the server shuts down due to the misconfiguration only when
starting up as a standby and entering hot standby mode. With this patch,
the server shuts down earlier—during startup—even when starting as a
primary.
If starting as a standby, the shutdown now happens before entering hot
standby.It seems hard to match the current behavior exactly in the patch, and I’m
not
sure it’s worth trying. I think we have two options:#1. Make the server always shut down on misconfiguration, regardless of
whether it starts as a primary or standby, and before reaching hot
standby.#2. Allow the server to start even if the configuration is invalid.
The current patch follows option #1. But if there are valid use cases for
starting a primary with sync_replication_slots enabled and wal_level not
set
to logical, then option #2 might be better. What do you think?
I'm not sure if there's a particular use case for wal_level and
sync_replication_slots not matching on a primary. So, for me, Option 1
seems correct.
Only wal_level is checked, but the ValidateSlotSyncParams function
includes other mandatory parameters. These are not being checked during
startup.The other parameters checked by ValidateSlotSyncParams() can be changed
without restarting the server, so users can fix them on the fly while
the server is running with the misconfiguration. But wal_level requires
a server restart, so it's important to catch that upfront.
Ok.
Show quoted text
Regards,
--
Fujii Masao
On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.
I also prefer option #1.
However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.
To avoid that, I think we should go with option #2—at least for v17.
Attached is an updated patch implementing option #2.
Regards,
--
Fujii Masao
Attachments:
v2-0001-Avoid-unexpected-shutdown-when-sync_replication_s.patchapplication/octet-stream; name=v2-0001-Avoid-unexpected-shutdown-when-sync_replication_s.patchDownload+4-5
I'm okay with that too, as long as the standby doesn't unexpectedly quit.
The patch looks good to me.
Le ven. 25 juil. 2025 à 18:01, Fujii Masao <masao.fujii@gmail.com> a écrit :
Show quoted text
On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
I'm not sure if there's a particular use case for wal_level and
sync_replication_slots not matching on a primary. So, for me, Option 1
seems correct.I also prefer option #1.
However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.To avoid that, I think we should go with option #2—at least for v17.
Attached is an updated patch implementing option #2.
Regards,
--
Fujii Masao
On Fri, Jul 25, 2025 at 9:31 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.
I also prefer option #1.
However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.To avoid that, I think we should go with option #2—at least for v17.
I still prefer option #1. On HEAD, option #1 makes sense because
wal_level is a GUC that requires a server restart and thus using
ERROR/FATAL in this context is appropriate. It is also consistent with
the rest of the code (other modules) wherever wal_level checks are
performed during startup.
Regarding the back branches, in the case of a primary server, if
sync_replication_slots is left ON while wal_level < logical, then
issuing a ERROR/FATAL is still more suitable. The user can simply
correct the configuration (disable sync_replication_slots) and proceed
with the startup after the minor version upgrade. Also, given that
option #1 is the better fit for HEAD, I don't think it's worth having
different behavior in the back branches. Thoughts?
thanks
Shveta
On Mon, Jul 28, 2025 at 10:18 AM shveta malik <shveta.malik@gmail.com> wrote:
On Fri, Jul 25, 2025 at 9:31 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.
I also prefer option #1.
However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.To avoid that, I think we should go with option #2—at least for v17.
I still prefer option #1. On HEAD, option #1 makes sense because
wal_level is a GUC that requires a server restart and thus using
ERROR/FATAL in this context is appropriate. It is also consistent with
the rest of the code (other modules) wherever wal_level checks are
performed during startup.Regarding the back branches, in the case of a primary server, if
sync_replication_slots is left ON while wal_level < logical, then
issuing a ERROR/FATAL is still more suitable. The user can simply
correct the configuration (disable sync_replication_slots) and proceed
with the startup after the minor version upgrade. Also, given that
option #1 is the better fit for HEAD, I don't think it's worth having
different behavior in the back branches.
+1.
--
With Regards,
Amit Kapila.
On Mon, Jul 28, 2025 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jul 28, 2025 at 10:18 AM shveta malik <shveta.malik@gmail.com> wrote:
On Fri, Jul 25, 2025 at 9:31 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.
I also prefer option #1.
However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.To avoid that, I think we should go with option #2—at least for v17.
I still prefer option #1. On HEAD, option #1 makes sense because
wal_level is a GUC that requires a server restart and thus using
ERROR/FATAL in this context is appropriate. It is also consistent with
the rest of the code (other modules) wherever wal_level checks are
performed during startup.Regarding the back branches, in the case of a primary server, if
sync_replication_slots is left ON while wal_level < logical, then
issuing a ERROR/FATAL is still more suitable. The user can simply
correct the configuration (disable sync_replication_slots) and proceed
with the startup after the minor version upgrade. Also, given that
option #1 is the better fit for HEAD, I don't think it's worth having
different behavior in the back branches.+1.
I think it's basically not acceptable for a server to start up
successfully before a minor version update, but then fail to start with
the same configuration after the update. If that's absolutely necessary
to fix a bug, it might be justifiable. But in this case, I don't think
it's required.
Blocking startup when sync_replication_slots is enabled and wal_level
is not logical could be helpful. But that feels more like an improvement
than a bug fix. I'm fine adding that to master, but I don't think we should
apply it to old branches.
That said, both Shveta and Amit support backpatching this change,
so I'd like to hear more opinions before we decide.
Regards,
--
Fujii Masao
On Tue, 2025-07-29 at 13:40 +0900, Fujii Masao wrote:
I think it's basically not acceptable for a server to start up
successfully before a minor version update, but then fail to start with
the same configuration after the update. If that's absolutely necessary
to fix a bug, it might be justifiable. But in this case, I don't think
it's required.Blocking startup when sync_replication_slots is enabled and wal_level
is not logical could be helpful. But that feels more like an improvement
than a bug fix. I'm fine adding that to master, but I don't think we should
apply it to old branches.That said, both Shveta and Amit support backpatching this change,
so I'd like to hear more opinions before we decide.
I side with you on that one.
I think it would be fine to backpatch the breaking change to v18,
if we mention the behavior change in the release notes.
Yours,
Laurenz Albe
On Tue, Jul 29, 2025 at 12:04 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2025-07-29 at 13:40 +0900, Fujii Masao wrote:
I think it's basically not acceptable for a server to start up
successfully before a minor version update, but then fail to start with
the same configuration after the update. If that's absolutely necessary
to fix a bug, it might be justifiable. But in this case, I don't think
it's required.Blocking startup when sync_replication_slots is enabled and wal_level
is not logical could be helpful. But that feels more like an improvement
than a bug fix. I'm fine adding that to master, but I don't think we should
apply it to old branches.That said, both Shveta and Amit support backpatching this change,
so I'd like to hear more opinions before we decide.I side with you on that one.
I think it would be fine to backpatch the breaking change to v18,
if we mention the behavior change in the release notes.
I am fine with the suggestion.
thanks
Shveta
Hi,
On Wed, Jul 30, 2025 at 11:15 AM shveta malik <shveta.malik@gmail.com> wrote:
On Tue, Jul 29, 2025 at 12:04 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2025-07-29 at 13:40 +0900, Fujii Masao wrote:
I think it's basically not acceptable for a server to start up
successfully before a minor version update, but then fail to start with
the same configuration after the update. If that's absolutely necessary
to fix a bug, it might be justifiable. But in this case, I don't think
it's required.Blocking startup when sync_replication_slots is enabled and wal_level
is not logical could be helpful. But that feels more like an improvement
than a bug fix. I'm fine adding that to master, but I don't think we should
apply it to old branches.That said, both Shveta and Amit support backpatching this change,
so I'd like to hear more opinions before we decide.I side with you on that one.
I think it would be fine to backpatch the breaking change to v18,
if we mention the behavior change in the release notes.I am fine with the suggestion.
+ 1
This fix does feel more like an enhancement to user feedback and
safety nets, not a correction of a broken or unsafe code path. Unable
to start up with the same settings for it might cause
more-than-necessary surprise.
Best,
Xuneng
On Sat, Aug 2, 2025 at 1:28 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
This fix does feel more like an enhancement to user feedback and
safety nets, not a correction of a broken or unsafe code path. Unable
to start up with the same settings for it might cause
more-than-necessary surprise.
Agreed.
At first, since there seems to be rough consensus on applying the current patch
(which prevents unexpected shutdown while the server is running),
I've pushed it to master and back-patched it to v17.
As for the follow-up change that prevents the server from starting with
an invalid configuration, let's continue the discussion. I've attached
the patch and agree it's a good fit for master. However, I'm afraid
it's too late to include it in v18, as beta2 has already been released
and this change is more of an improvement than a bug fix.
Regards,
--
Fujii Masao
Attachments:
v3-0001-Disallow-server-start-with-sync_replication_slots.patchapplication/octet-stream; name=v3-0001-Disallow-server-start-with-sync_replication_slots.patchDownload+3-1
On Mon, Aug 4, 2025 at 5:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Sat, Aug 2, 2025 at 1:28 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
This fix does feel more like an enhancement to user feedback and
safety nets, not a correction of a broken or unsafe code path. Unable
to start up with the same settings for it might cause
more-than-necessary surprise.Agreed.
At first, since there seems to be rough consensus on applying the current patch
(which prevents unexpected shutdown while the server is running),
I've pushed it to master and back-patched it to v17.As for the follow-up change that prevents the server from starting with
an invalid configuration, let's continue the discussion. I've attached
the patch and agree it's a good fit for master.
+1. The patch looks good.
However, I'm afraid
it's too late to include it in v18, as beta2 has already been released
and this change is more of an improvement than a bug fix.
If it's not possible to get it to v18 now, then I am okay to have it
on master alone.
thanks
Shveta
On Tue, Aug 5, 2025 at 10:38 AM shveta malik <shveta.malik@gmail.com> wrote:
On Mon, Aug 4, 2025 at 5:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Sat, Aug 2, 2025 at 1:28 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
This fix does feel more like an enhancement to user feedback and
safety nets, not a correction of a broken or unsafe code path. Unable
to start up with the same settings for it might cause
more-than-necessary surprise.Agreed.
At first, since there seems to be rough consensus on applying the current patch
(which prevents unexpected shutdown while the server is running),
I've pushed it to master and back-patched it to v17.As for the follow-up change that prevents the server from starting with
an invalid configuration, let's continue the discussion. I've attached
the patch and agree it's a good fit for master.+1. The patch looks good.
However, I'm afraid
it's too late to include it in v18, as beta2 has already been released
and this change is more of an improvement than a bug fix.If it's not possible to get it to v18 now, then I am okay to have it
on master alone.
Just wanted to confirm, do we plan to do it on master at-least?
thanks
Shveta
On Wed, Aug 20, 2025 at 7:16 PM shveta malik <shveta.malik@gmail.com> wrote:
Just wanted to confirm, do we plan to do it on master at-least?
Yeah, barring any objections, I'll push the patch to the master.
Regards,
--
Fujii Masao
On Thu, Aug 21, 2025 at 7:41 AM Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Aug 20, 2025 at 7:16 PM shveta malik <shveta.malik@gmail.com> wrote:
Just wanted to confirm, do we plan to do it on master at-least?
Yeah, barring any objections, I'll push the patch to the master.
Okay, thanks!