Unexpected Standby Shutdown on sync_replication_slots change

Started by Hugo DUBOIS9 months ago21 messagesbugs
Jump to latest
#1Hugo DUBOIS
hdubois@scaleway.com

Hello,

I'm not sure if it's a bug but I've encountered an unexpected behavior when
dynamically changing the sync_replication_slots parameter on a PostgreSQL
17 standby server. Instead of logging an error and continuing to run, the
standby instance shuts down with a FATAL error, which is not the
anticipated behavior for a dynamic parameter change, especially when the
documentation doesn't indicate such an outcome.

Steps to Reproduce

1.

Set up a physical replication between two PostgreSQL 17.5 instances.
2.

Ensure wal_level on the primary (and consequently on the standby) is set
to replica.
3.

Start both the primary and standby instances, confirming replication is
active.
4.

On the *standby* instance, dynamically change the sync_replication_slots
parameter (I have run the following query: ALTER SYSTEM SET
sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)

Expected Behavior

I expected the standby instance to continue running and log an error
message (similar to how hot_standby_feedback behaves when not enabled,
e.g., a loop of LOG: replication slot synchronization requires
"hot_standby_feedback" to be enabled). A FATAL error leading to an
unexpected shutdown for a dynamic parameter change on a running standby is
not the anticipated behavior. The documentation for sync_replication_slots
also doesn't indicate that a misconfiguration or incompatible wal_level
would lead to a shutdown.

Actual Behavior

Upon attempting to set sync_replication_slots to on on the standby with
wal_level set to replica, the standby instance immediately shuts down with
the following log messages:

LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >= "logical"

Environment

-

*PostgreSQL Version:* 17.5

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Hugo DUBOIS (#1)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

Hello,

I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slots parameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, the standby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change, especially when the documentation doesn't indicate such an outcome.

Steps to Reproduce

Set up a physical replication between two PostgreSQL 17.5 instances.

Ensure wal_level on the primary (and consequently on the standby) is set to replica.

Start both the primary and standby instances, confirming replication is active.

On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTER SYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)

Expected Behavior

I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaves when not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled). A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipated behavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatible wal_level would lead to a shutdown.

Actual Behavior

Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instance immediately shuts down with the following log messages:

LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >= "logical"

Environment

PostgreSQL Version: 17.5

Thanks for the report!

I was able to reproduce the issue even on the latest master (v19dev).
I agree that the current behavior—where changing a GUC parameter can
cause the server to shut down—is unexpected and should be avoided.

From what I’ve seen in the code, the problem stems from postmaster
calling ValidateSlotSyncParams() before starting the slot sync worker.
That function raises an ERROR if wal_level is not logical while
sync_replication_slots is enabled. Since ERROR is treated as FATAL
in postmaster, it causes the server to exit.

To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
raise an ERROR in this case, as follows.

 ValidateSlotSyncParams(int elevel)
 {
  /*
  * Logical slot sync/creation requires wal_level >= logical.
- *
- * Since altering the wal_level requires a server restart, so error out in
- * this case regardless of elevel provided by caller.
  */
  if (wal_level < WAL_LEVEL_LOGICAL)
- ereport(ERROR,
+ {
+ ereport(elevel,
  errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("replication slot synchronization requires \"wal_level\" >=
\"logical\""));
+ return false;
+ }

Regards,

--
Fujii Masao

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#2)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

Hello,

I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slots parameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, the standby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change, especially when the documentation doesn't indicate such an outcome.

Steps to Reproduce

Set up a physical replication between two PostgreSQL 17.5 instances.

Ensure wal_level on the primary (and consequently on the standby) is set to replica.

Start both the primary and standby instances, confirming replication is active.

On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTER SYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)

Expected Behavior

I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaves when not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled). A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipated behavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatible wal_level would lead to a shutdown.

Actual Behavior

Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instance immediately shuts down with the following log messages:

LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >= "logical"

Environment

PostgreSQL Version: 17.5

Thanks for the report!

I was able to reproduce the issue even on the latest master (v19dev).
I agree that the current behavior—where changing a GUC parameter can
cause the server to shut down—is unexpected and should be avoided.

From what I’ve seen in the code, the problem stems from postmaster
calling ValidateSlotSyncParams() before starting the slot sync worker.
That function raises an ERROR if wal_level is not logical while
sync_replication_slots is enabled. Since ERROR is treated as FATAL
in postmaster, it causes the server to exit.

To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
raise an ERROR in this case, as follows.

ValidateSlotSyncParams(int elevel)
{
/*
* Logical slot sync/creation requires wal_level >= logical.
- *
- * Since altering the wal_level requires a server restart, so error out in
- * this case regardless of elevel provided by caller.
*/
if (wal_level < WAL_LEVEL_LOGICAL)
- ereport(ERROR,
+ {
+ ereport(elevel,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("replication slot synchronization requires \"wal_level\" >=
\"logical\""));
+ return false;
+ }

I've created a patch to implement the above—attached.

Note that this patch does not change the existing behavior when
the misconfiguration (sync_replication_slots enabled but wal_level not
set to logical) is detected at server startup. In that case, the server
still shuts down with a FATAL error, which is consistent with other
settings like summarize_wal.

Regards,

--
Fujii Masao

Attachments:

v1-0001-Avoid-unexpected-shutdown-when-sync_replication_s.patchapplication/octet-stream; name=v1-0001-Avoid-unexpected-shutdown-when-sync_replication_s.patchDownload+7-5
#4shveta malik
shveta.malik@gmail.com
In reply to: Fujii Masao (#3)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Fri, Jul 25, 2025 at 12:20 AM Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

Hello,

I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slots parameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, the standby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change, especially when the documentation doesn't indicate such an outcome.

Steps to Reproduce

Set up a physical replication between two PostgreSQL 17.5 instances.

Ensure wal_level on the primary (and consequently on the standby) is set to replica.

Start both the primary and standby instances, confirming replication is active.

On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTER SYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)

Expected Behavior

I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaves when not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled). A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipated behavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatible wal_level would lead to a shutdown.

Actual Behavior

Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instance immediately shuts down with the following log messages:

LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >= "logical"

Environment

PostgreSQL Version: 17.5

Thanks for the report!

I was able to reproduce the issue even on the latest master (v19dev).
I agree that the current behavior—where changing a GUC parameter can
cause the server to shut down—is unexpected and should be avoided.

From what I’ve seen in the code, the problem stems from postmaster
calling ValidateSlotSyncParams() before starting the slot sync worker.
That function raises an ERROR if wal_level is not logical while
sync_replication_slots is enabled. Since ERROR is treated as FATAL
in postmaster, it causes the server to exit.

To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
raise an ERROR in this case, as follows.

ValidateSlotSyncParams(int elevel)
{
/*
* Logical slot sync/creation requires wal_level >= logical.
- *
- * Since altering the wal_level requires a server restart, so error out in
- * this case regardless of elevel provided by caller.
*/
if (wal_level < WAL_LEVEL_LOGICAL)
- ereport(ERROR,
+ {
+ ereport(elevel,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("replication slot synchronization requires \"wal_level\" >=
\"logical\""));
+ return false;
+ }

I've created a patch to implement the above—attached.

Thank You for the patch.

Note that this patch does not change the existing behavior when
the misconfiguration (sync_replication_slots enabled but wal_level not
set to logical) is detected at server startup. In that case, the server
still shuts down with a FATAL error, which is consistent with other
settings like summarize_wal.

Validated the behaviour, the patch looks good to me.

thanks
Shveta

#5Hugo DUBOIS
hdubois@scaleway.com
In reply to: shveta malik (#4)
Re: Unexpected Standby Shutdown on sync_replication_slots change

Thanks for the quick patch. I've tested it on the REL_17_STABLE branch, and
it's working fine.

On a standby node, after dynamically setting the sync_replication_slots
parameter, I observed the following logs. The instance did not shut down,
which seems correct:

2025-07-25 09:26:34.613 UTC [4420] LOG: received SIGHUP, reloading
configuration files
2025-07-25 09:26:34.614 UTC [4420] LOG: parameter
"sync_replication_slots" changed to "on"
2025-07-25 09:26:34.615 UTC [4420] LOG: replication slot
synchronization requires "wal_level" >= "logical"
2025-07-25 09:27:34.662 UTC [4420] LOG: replication slot
synchronization requires "wal_level" >= "logical"

The instance did not restart as expected, showing this fatal log:

2025-07-25 09:27:45.668 UTC [4430] FATAL: replication slot
synchronization ("sync_replication_slots" = on) requires "wal_level"

= "logical"

I have a couple of observations:

-

With this patch, a primary instance will not restart if the
configuration is incorrect.
-

Only wal_level is checked, but the ValidateSlotSyncParams function
includes other mandatory parameters. These are not being checked during
startup.

Regards,

Hugo

Le ven. 25 juil. 2025 à 08:16, shveta malik <shveta.malik@gmail.com> a
écrit :

On Fri, Jul 25, 2025 at 12:20 AM Fujii Masao <masao.fujii@gmail.com>
wrote:

On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com>

wrote:

On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com>

wrote:

Hello,

I'm not sure if it's a bug but I've encountered an unexpected

behavior when dynamically changing the sync_replication_slots parameter on
a PostgreSQL 17 standby server. Instead of logging an error and continuing
to run, the standby instance shuts down with a FATAL error, which is not
the anticipated behavior for a dynamic parameter change, especially when
the documentation doesn't indicate such an outcome.

Steps to Reproduce

Set up a physical replication between two PostgreSQL 17.5 instances.

Ensure wal_level on the primary (and consequently on the standby) is

set to replica.

Start both the primary and standby instances, confirming replication

is active.

On the standby instance, dynamically change the

sync_replication_slots parameter (I have run the following query: ALTER
SYSTEM SET sync_replication_slots = 'on'; followed by SELECT
pg_reload_conf();)

Expected Behavior

I expected the standby instance to continue running and log an error

message (similar to how hot_standby_feedback behaves when not enabled,
e.g., a loop of LOG: replication slot synchronization requires
"hot_standby_feedback" to be enabled). A FATAL error leading to an
unexpected shutdown for a dynamic parameter change on a running standby is
not the anticipated behavior. The documentation for sync_replication_slots
also doesn't indicate that a misconfiguration or incompatible wal_level
would lead to a shutdown.

Actual Behavior

Upon attempting to set sync_replication_slots to on on the standby

with wal_level set to replica, the standby instance immediately shuts down
with the following log messages:

LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: received SIGHUP, reloading configuration files
LOG: parameter "sync_replication_slots" changed to "on"
FATAL: replication slot synchronization requires "wal_level" >=

"logical"

Environment

PostgreSQL Version: 17.5

Thanks for the report!

I was able to reproduce the issue even on the latest master (v19dev).
I agree that the current behavior—where changing a GUC parameter can
cause the server to shut down—is unexpected and should be avoided.

From what I’ve seen in the code, the problem stems from postmaster
calling ValidateSlotSyncParams() before starting the slot sync worker.
That function raises an ERROR if wal_level is not logical while
sync_replication_slots is enabled. Since ERROR is treated as FATAL
in postmaster, it causes the server to exit.

To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
raise an ERROR in this case, as follows.

ValidateSlotSyncParams(int elevel)
{
/*
* Logical slot sync/creation requires wal_level >= logical.
- *
- * Since altering the wal_level requires a server restart, so error

out in

- * this case regardless of elevel provided by caller.
*/
if (wal_level < WAL_LEVEL_LOGICAL)
- ereport(ERROR,
+ {
+ ereport(elevel,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("replication slot synchronization requires \"wal_level\" >=
\"logical\""));
+ return false;
+ }

I've created a patch to implement the above—attached.

Thank You for the patch.

Note that this patch does not change the existing behavior when
the misconfiguration (sync_replication_slots enabled but wal_level not
set to logical) is detected at server startup. In that case, the server
still shuts down with a FATAL error, which is consistent with other
settings like summarize_wal.

Validated the behaviour, the patch looks good to me.

thanks
Shveta

--

Cordialement,

[image: Scaleway]
<https://www-uploads.scaleway.com/Logo_Scaleway_2022_New_Tagline_Purple_CMYK_1_c52a040544.pdf?updated_at=2022-10-28T15:20:53.668Z&gt;

*Hugo DUBOIS*
*Devops Engineer*

hdubois@scaleway.com

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Hugo DUBOIS (#5)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Fri, Jul 25, 2025 at 6:40 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

Thanks for the quick patch. I've tested it on the REL_17_STABLE branch, and it's working fine.

On a standby node, after dynamically setting the sync_replication_slots parameter, I observed the following logs. The instance did not shut down, which seems correct:

2025-07-25 09:26:34.613 UTC [4420] LOG: received SIGHUP, reloading configuration files
2025-07-25 09:26:34.614 UTC [4420] LOG: parameter "sync_replication_slots" changed to "on"
2025-07-25 09:26:34.615 UTC [4420] LOG: replication slot synchronization requires "wal_level" >= "logical"
2025-07-25 09:27:34.662 UTC [4420] LOG: replication slot synchronization requires "wal_level" >= "logical"

The instance did not restart as expected, showing this fatal log:

2025-07-25 09:27:45.668 UTC [4430] FATAL: replication slot synchronization ("sync_replication_slots" = on) requires "wal_level" >= "logical"

Thanks for testing!

I have a couple of observations:

With this patch, a primary instance will not restart if the configuration is incorrect.

Right. In HEAD, the server shuts down due to the misconfiguration only when
starting up as a standby and entering hot standby mode. With this patch,
the server shuts down earlier—during startup—even when starting as a primary.
If starting as a standby, the shutdown now happens before entering hot standby.

It seems hard to match the current behavior exactly in the patch, and I’m not
sure it’s worth trying. I think we have two options:

#1. Make the server always shut down on misconfiguration, regardless of
whether it starts as a primary or standby, and before reaching hot standby.

#2. Allow the server to start even if the configuration is invalid.

The current patch follows option #1. But if there are valid use cases for
starting a primary with sync_replication_slots enabled and wal_level not set
to logical, then option #2 might be better. What do you think?

Only wal_level is checked, but the ValidateSlotSyncParams function includes other mandatory parameters. These are not being checked during startup.

The other parameters checked by ValidateSlotSyncParams() can be changed
without restarting the server, so users can fix them on the fly while
the server is running with the misconfiguration. But wal_level requires
a server restart, so it's important to catch that upfront.

Regards,

--
Fujii Masao

#7Hugo DUBOIS
hdubois@scaleway.com
In reply to: Fujii Masao (#6)
Re: Unexpected Standby Shutdown on sync_replication_slots change

Le ven. 25 juil. 2025 à 12:32, Fujii Masao <masao.fujii@gmail.com> a écrit :

On Fri, Jul 25, 2025 at 6:40 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

Thanks for the quick patch. I've tested it on the REL_17_STABLE branch,

and it's working fine.

On a standby node, after dynamically setting the sync_replication_slots

parameter, I observed the following logs. The instance did not shut down,
which seems correct:

2025-07-25 09:26:34.613 UTC [4420] LOG: received SIGHUP, reloading

configuration files

2025-07-25 09:26:34.614 UTC [4420] LOG: parameter

"sync_replication_slots" changed to "on"

2025-07-25 09:26:34.615 UTC [4420] LOG: replication slot

synchronization requires "wal_level" >= "logical"

2025-07-25 09:27:34.662 UTC [4420] LOG: replication slot

synchronization requires "wal_level" >= "logical"

The instance did not restart as expected, showing this fatal log:

2025-07-25 09:27:45.668 UTC [4430] FATAL: replication slot

synchronization ("sync_replication_slots" = on) requires "wal_level" >=
"logical"

Thanks for testing!

I have a couple of observations:

With this patch, a primary instance will not restart if the

configuration is incorrect.

Right. In HEAD, the server shuts down due to the misconfiguration only when
starting up as a standby and entering hot standby mode. With this patch,
the server shuts down earlier—during startup—even when starting as a
primary.
If starting as a standby, the shutdown now happens before entering hot
standby.

It seems hard to match the current behavior exactly in the patch, and I’m
not
sure it’s worth trying. I think we have two options:

#1. Make the server always shut down on misconfiguration, regardless of
whether it starts as a primary or standby, and before reaching hot
standby.

#2. Allow the server to start even if the configuration is invalid.

The current patch follows option #1. But if there are valid use cases for
starting a primary with sync_replication_slots enabled and wal_level not
set
to logical, then option #2 might be better. What do you think?

I'm not sure if there's a particular use case for wal_level and
sync_replication_slots not matching on a primary. So, for me, Option 1
seems correct.

Only wal_level is checked, but the ValidateSlotSyncParams function
includes other mandatory parameters. These are not being checked during
startup.

The other parameters checked by ValidateSlotSyncParams() can be changed
without restarting the server, so users can fix them on the fly while
the server is running with the misconfiguration. But wal_level requires
a server restart, so it's important to catch that upfront.

Ok.

Show quoted text

Regards,

--
Fujii Masao

#8Fujii Masao
masao.fujii@gmail.com
In reply to: Hugo DUBOIS (#7)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.

I also prefer option #1.

However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.

To avoid that, I think we should go with option #2—at least for v17.

Attached is an updated patch implementing option #2.

Regards,

--
Fujii Masao

Attachments:

v2-0001-Avoid-unexpected-shutdown-when-sync_replication_s.patchapplication/octet-stream; name=v2-0001-Avoid-unexpected-shutdown-when-sync_replication_s.patchDownload+4-5
#9Hugo DUBOIS
hdubois@scaleway.com
In reply to: Fujii Masao (#8)
Re: Unexpected Standby Shutdown on sync_replication_slots change

I'm okay with that too, as long as the standby doesn't unexpectedly quit.
The patch looks good to me.

Le ven. 25 juil. 2025 à 18:01, Fujii Masao <masao.fujii@gmail.com> a écrit :

Show quoted text

On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

I'm not sure if there's a particular use case for wal_level and

sync_replication_slots not matching on a primary. So, for me, Option 1
seems correct.

I also prefer option #1.

However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.

To avoid that, I think we should go with option #2—at least for v17.

Attached is an updated patch implementing option #2.

Regards,

--
Fujii Masao

#10shveta malik
shveta.malik@gmail.com
In reply to: Fujii Masao (#8)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Fri, Jul 25, 2025 at 9:31 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.

I also prefer option #1.

However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.

To avoid that, I think we should go with option #2—at least for v17.

I still prefer option #1. On HEAD, option #1 makes sense because
wal_level is a GUC that requires a server restart and thus using
ERROR/FATAL in this context is appropriate. It is also consistent with
the rest of the code (other modules) wherever wal_level checks are
performed during startup.

Regarding the back branches, in the case of a primary server, if
sync_replication_slots is left ON while wal_level < logical, then
issuing a ERROR/FATAL is still more suitable. The user can simply
correct the configuration (disable sync_replication_slots) and proceed
with the startup after the minor version upgrade. Also, given that
option #1 is the better fit for HEAD, I don't think it's worth having
different behavior in the back branches. Thoughts?

thanks
Shveta

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: shveta malik (#10)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Mon, Jul 28, 2025 at 10:18 AM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Jul 25, 2025 at 9:31 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.

I also prefer option #1.

However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.

To avoid that, I think we should go with option #2—at least for v17.

I still prefer option #1. On HEAD, option #1 makes sense because
wal_level is a GUC that requires a server restart and thus using
ERROR/FATAL in this context is appropriate. It is also consistent with
the rest of the code (other modules) wherever wal_level checks are
performed during startup.

Regarding the back branches, in the case of a primary server, if
sync_replication_slots is left ON while wal_level < logical, then
issuing a ERROR/FATAL is still more suitable. The user can simply
correct the configuration (disable sync_replication_slots) and proceed
with the startup after the minor version upgrade. Also, given that
option #1 is the better fit for HEAD, I don't think it's worth having
different behavior in the back branches.

+1.

--
With Regards,
Amit Kapila.

#12Fujii Masao
masao.fujii@gmail.com
In reply to: Amit Kapila (#11)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Mon, Jul 28, 2025 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 28, 2025 at 10:18 AM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Jul 25, 2025 at 9:31 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:

I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.

I also prefer option #1.

However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.

To avoid that, I think we should go with option #2—at least for v17.

I still prefer option #1. On HEAD, option #1 makes sense because
wal_level is a GUC that requires a server restart and thus using
ERROR/FATAL in this context is appropriate. It is also consistent with
the rest of the code (other modules) wherever wal_level checks are
performed during startup.

Regarding the back branches, in the case of a primary server, if
sync_replication_slots is left ON while wal_level < logical, then
issuing a ERROR/FATAL is still more suitable. The user can simply
correct the configuration (disable sync_replication_slots) and proceed
with the startup after the minor version upgrade. Also, given that
option #1 is the better fit for HEAD, I don't think it's worth having
different behavior in the back branches.

+1.

I think it's basically not acceptable for a server to start up
successfully before a minor version update, but then fail to start with
the same configuration after the update. If that's absolutely necessary
to fix a bug, it might be justifiable. But in this case, I don't think
it's required.

Blocking startup when sync_replication_slots is enabled and wal_level
is not logical could be helpful. But that feels more like an improvement
than a bug fix. I'm fine adding that to master, but I don't think we should
apply it to old branches.

That said, both Shveta and Amit support backpatching this change,
so I'd like to hear more opinions before we decide.

Regards,

--
Fujii Masao

#13Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#12)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Tue, 2025-07-29 at 13:40 +0900, Fujii Masao wrote:

I think it's basically not acceptable for a server to start up
successfully before a minor version update, but then fail to start with
the same configuration after the update. If that's absolutely necessary
to fix a bug, it might be justifiable. But in this case, I don't think
it's required.

Blocking startup when sync_replication_slots is enabled and wal_level
is not logical could be helpful. But that feels more like an improvement
than a bug fix. I'm fine adding that to master, but I don't think we should
apply it to old branches.

That said, both Shveta and Amit support backpatching this change,
so I'd like to hear more opinions before we decide.

I side with you on that one.

I think it would be fine to backpatch the breaking change to v18,
if we mention the behavior change in the release notes.

Yours,
Laurenz Albe

#14shveta malik
shveta.malik@gmail.com
In reply to: Laurenz Albe (#13)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Tue, Jul 29, 2025 at 12:04 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2025-07-29 at 13:40 +0900, Fujii Masao wrote:

I think it's basically not acceptable for a server to start up
successfully before a minor version update, but then fail to start with
the same configuration after the update. If that's absolutely necessary
to fix a bug, it might be justifiable. But in this case, I don't think
it's required.

Blocking startup when sync_replication_slots is enabled and wal_level
is not logical could be helpful. But that feels more like an improvement
than a bug fix. I'm fine adding that to master, but I don't think we should
apply it to old branches.

That said, both Shveta and Amit support backpatching this change,
so I'd like to hear more opinions before we decide.

I side with you on that one.

I think it would be fine to backpatch the breaking change to v18,
if we mention the behavior change in the release notes.

I am fine with the suggestion.

thanks
Shveta

#15Xuneng Zhou
xunengzhou@gmail.com
In reply to: shveta malik (#14)
Re: Unexpected Standby Shutdown on sync_replication_slots change

Hi,

On Wed, Jul 30, 2025 at 11:15 AM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Jul 29, 2025 at 12:04 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2025-07-29 at 13:40 +0900, Fujii Masao wrote:

I think it's basically not acceptable for a server to start up
successfully before a minor version update, but then fail to start with
the same configuration after the update. If that's absolutely necessary
to fix a bug, it might be justifiable. But in this case, I don't think
it's required.

Blocking startup when sync_replication_slots is enabled and wal_level
is not logical could be helpful. But that feels more like an improvement
than a bug fix. I'm fine adding that to master, but I don't think we should
apply it to old branches.

That said, both Shveta and Amit support backpatching this change,
so I'd like to hear more opinions before we decide.

I side with you on that one.

I think it would be fine to backpatch the breaking change to v18,
if we mention the behavior change in the release notes.

I am fine with the suggestion.

+ 1

This fix does feel more like an enhancement to user feedback and
safety nets, not a correction of a broken or unsafe code path. Unable
to start up with the same settings for it might cause
more-than-necessary surprise.

Best,
Xuneng

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Xuneng Zhou (#15)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Sat, Aug 2, 2025 at 1:28 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

This fix does feel more like an enhancement to user feedback and
safety nets, not a correction of a broken or unsafe code path. Unable
to start up with the same settings for it might cause
more-than-necessary surprise.

Agreed.

At first, since there seems to be rough consensus on applying the current patch
(which prevents unexpected shutdown while the server is running),
I've pushed it to master and back-patched it to v17.

As for the follow-up change that prevents the server from starting with
an invalid configuration, let's continue the discussion. I've attached
the patch and agree it's a good fit for master. However, I'm afraid
it's too late to include it in v18, as beta2 has already been released
and this change is more of an improvement than a bug fix.

Regards,

--
Fujii Masao

Attachments:

v3-0001-Disallow-server-start-with-sync_replication_slots.patchapplication/octet-stream; name=v3-0001-Disallow-server-start-with-sync_replication_slots.patchDownload+3-1
#17shveta malik
shveta.malik@gmail.com
In reply to: Fujii Masao (#16)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Mon, Aug 4, 2025 at 5:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Sat, Aug 2, 2025 at 1:28 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

This fix does feel more like an enhancement to user feedback and
safety nets, not a correction of a broken or unsafe code path. Unable
to start up with the same settings for it might cause
more-than-necessary surprise.

Agreed.

At first, since there seems to be rough consensus on applying the current patch
(which prevents unexpected shutdown while the server is running),
I've pushed it to master and back-patched it to v17.

As for the follow-up change that prevents the server from starting with
an invalid configuration, let's continue the discussion. I've attached
the patch and agree it's a good fit for master.

+1. The patch looks good.

However, I'm afraid
it's too late to include it in v18, as beta2 has already been released
and this change is more of an improvement than a bug fix.

If it's not possible to get it to v18 now, then I am okay to have it
on master alone.

thanks
Shveta

#18shveta malik
shveta.malik@gmail.com
In reply to: shveta malik (#17)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Tue, Aug 5, 2025 at 10:38 AM shveta malik <shveta.malik@gmail.com> wrote:

On Mon, Aug 4, 2025 at 5:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Sat, Aug 2, 2025 at 1:28 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

This fix does feel more like an enhancement to user feedback and
safety nets, not a correction of a broken or unsafe code path. Unable
to start up with the same settings for it might cause
more-than-necessary surprise.

Agreed.

At first, since there seems to be rough consensus on applying the current patch
(which prevents unexpected shutdown while the server is running),
I've pushed it to master and back-patched it to v17.

As for the follow-up change that prevents the server from starting with
an invalid configuration, let's continue the discussion. I've attached
the patch and agree it's a good fit for master.

+1. The patch looks good.

However, I'm afraid
it's too late to include it in v18, as beta2 has already been released
and this change is more of an improvement than a bug fix.

If it's not possible to get it to v18 now, then I am okay to have it
on master alone.

Just wanted to confirm, do we plan to do it on master at-least?

thanks
Shveta

#19Fujii Masao
masao.fujii@gmail.com
In reply to: shveta malik (#18)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Wed, Aug 20, 2025 at 7:16 PM shveta malik <shveta.malik@gmail.com> wrote:

Just wanted to confirm, do we plan to do it on master at-least?

Yeah, barring any objections, I'll push the patch to the master.

Regards,

--
Fujii Masao

#20shveta malik
shveta.malik@gmail.com
In reply to: Fujii Masao (#19)
Re: Unexpected Standby Shutdown on sync_replication_slots change

On Thu, Aug 21, 2025 at 7:41 AM Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Aug 20, 2025 at 7:16 PM shveta malik <shveta.malik@gmail.com> wrote:

Just wanted to confirm, do we plan to do it on master at-least?

Yeah, barring any objections, I'll push the patch to the master.

Okay, thanks!

#21Fujii Masao
masao.fujii@gmail.com
In reply to: shveta malik (#20)