issue with synchronized_standby_slots

Started by Fabrice Chapuis6 months ago41 messages
Jump to latest
#1Fabrice Chapuis
fabrice636861@gmail.com

Hi,
With PG 17.5 and using logical replication failover slots.
When trying to change the value of synchronized_standby_slots, node2 was
not running then the error * invalid value for parameter
"synchronized_standby_slots": "node1,node2" *was generated.
The problem is that statement were affected by this and they can't execute.

STATEMENT: select service_period,sp1_0.address_line_1 from tbl1 where
sp1_0.vn=$1 order by sp1_0.start_of_period
2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application=
ERROR: invalid value for parameter "synchronized_standby_slots":
"node1,node2"
2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application=
DETAIL: replication slot "s029054a" does not exist
2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application=
CONTEXT: while setting parameter "synchronized_standby_slots" to
"node1,node2"
2025-08-24 13:14:29.418 CEST [777453]: [48-1]
user=,db=,client=,application= LOG: background worker "parallel worker"
(PID 848476) exited with exit code 1
2025-08-24 13:14:29.418 CEST [777453]: [49-1]
user=,db=,client=,application= LOG: background worker "parallel worker"
(PID 848477) exited with exit code 1

Is this issue already observed

Thanks for your feedback

Fabrice

#2Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Fabrice Chapuis (#1)
RE: issue with synchronized_standby_slots

On Thursday, September 4, 2025 9:27 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:

With PG 17.5 and using logical replication failover slots. When trying to
change the value of synchronized_standby_slots, node2 was not running then the
error invalid value for parameter "synchronized_standby_slots": "node1,node2"
was generated. The problem is that statement were affected by this and they
can't execute.

STATEMENT: select service_period,sp1_0.address_line_1 from tbl1 where http://sp1_0.vn=$1 order by sp1_0.start_of_period
2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR: invalid value for parameter "synchronized_standby_slots": "node1,node2"
2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL: replication slot "s029054a" does not exist
2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT: while setting parameter "synchronized_standby_slots" to "node1,node2"
2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG: background worker "parallel worker" (PID 848476) exited with exit code 1
2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG: background worker "parallel worker" (PID 848477) exited with exit code 1

Is this issue already observed

Thank you for reporting this issue. It seems you've added a nonexistent slot to
synchronized_standby_slots before the server startup. The server does not verify
the existence of slots at startup due to the absence of slot shared information,
allowing the server to start successfully. However, when the parallel apply
worker starts, it re-verifies the GUC setting, resulting in the ERROR you saw.

I think this scenario is not necessarily a bug, as adding nonexistent slots to GUC is
disallowed. Such slots can block the logical failover slot's advancement,
increasing the risk of disk bloat due to WAL or dead rows, which is why we added
the ERROR. There are precedents for this kind of behavior, like
default_table_access_method and default_tablespace, which prevent queries if
invalid values are set before server startup.

To resolve the issue, you can remove the invalid slot from the GUC and add it
back after creating the physical slot.

I also thought about how to improve user experience for this, but it's not
feasible to verify slot existence at startup because replication has not been
restored to shared memory during GUC checks. Another option might be to simply
remove slot existence/type checks from GUC validation.

Best Regards,
Hou zj

#3Fabrice Chapuis
fabrice636861@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#2)
Re: issue with synchronized_standby_slots

Thanks for your reply Zhijie,

I understand that the error invalid value for parameter will be diplayed
in case of bad value for the GUC synchronized_standby_slots or if a standby
node configured is not up and running.
But the problem I noticed is that statements could not execute normally and
error code is returned to the applcation.
This append after an upgrade from PG 14 to PG 17.
I could try to reproduce the issue

Regards,

Fabrice

On Fri, Sep 5, 2025 at 6:07 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
wrote:

Show quoted text

On Thursday, September 4, 2025 9:27 PM Fabrice Chapuis <
fabrice636861@gmail.com> wrote:

With PG 17.5 and using logical replication failover slots. When trying to
change the value of synchronized_standby_slots, node2 was not running

then the

error invalid value for parameter "synchronized_standby_slots":

"node1,node2"

was generated. The problem is that statement were affected by this and

they

can't execute.

STATEMENT: select service_period,sp1_0.address_line_1 from tbl1 where

http://sp1_0.vn=$1 order by sp1_0.start_of_period

2025-08-24 13:14:29.417 CEST [848477]: [1-1]

user=,db=,client=,application= ERROR: invalid value for parameter
"synchronized_standby_slots": "node1,node2"

2025-08-24 13:14:29.417 CEST [848477]: [2-1]

user=,db=,client=,application= DETAIL: replication slot "s029054a" does
not exist

2025-08-24 13:14:29.417 CEST [848477]: [3-1]

user=,db=,client=,application= CONTEXT: while setting parameter
"synchronized_standby_slots" to "node1,node2"

2025-08-24 13:14:29.418 CEST [777453]: [48-1]

user=,db=,client=,application= LOG: background worker "parallel worker"
(PID 848476) exited with exit code 1

2025-08-24 13:14:29.418 CEST [777453]: [49-1]

user=,db=,client=,application= LOG: background worker "parallel worker"
(PID 848477) exited with exit code 1

Is this issue already observed

Thank you for reporting this issue. It seems you've added a nonexistent
slot to
synchronized_standby_slots before the server startup. The server does not
verify
the existence of slots at startup due to the absence of slot shared
information,
allowing the server to start successfully. However, when the parallel apply
worker starts, it re-verifies the GUC setting, resulting in the ERROR you
saw.

I think this scenario is not necessarily a bug, as adding nonexistent
slots to GUC is
disallowed. Such slots can block the logical failover slot's advancement,
increasing the risk of disk bloat due to WAL or dead rows, which is why we
added
the ERROR. There are precedents for this kind of behavior, like
default_table_access_method and default_tablespace, which prevent queries
if
invalid values are set before server startup.

To resolve the issue, you can remove the invalid slot from the GUC and add
it
back after creating the physical slot.

I also thought about how to improve user experience for this, but it's not
feasible to verify slot existence at startup because replication has not
been
restored to shared memory during GUC checks. Another option might be to
simply
remove slot existence/type checks from GUC validation.

Best Regards,
Hou zj

#4Alexander Kukushkin
cyberdemn@gmail.com
In reply to: Fabrice Chapuis (#3)
Re: issue with synchronized_standby_slots

Hi,

On Sun, 7 Sept 2025 at 10:15, Fabrice Chapuis <fabrice636861@gmail.com>
wrote:

Thanks for your reply Zhijie,

I understand that the error invalid value for parameter will be diplayed
in case of bad value for the GUC synchronized_standby_slots or if a
standby node configured is not up and running.
But the problem I noticed is that statements could not execute normally
and error code is returned to the applcation.
This append after an upgrade from PG 14 to PG 17.
I could try to reproduce the issue

STATEMENT: select service_period,sp1_0.address_line_1 from tbl1 where

http://sp1_0.vn=$1 order by sp1_0.start_of_period

2025-08-24 13:14:29.417 CEST [848477]: [1-1]

user=,db=,client=,application= ERROR: invalid value for parameter
"synchronized_standby_slots": "node1,node2"

2025-08-24 13:14:29.417 CEST [848477]: [2-1]

user=,db=,client=,application= DETAIL: replication slot "s029054a" does
not exist

2025-08-24 13:14:29.417 CEST [848477]: [3-1]

user=,db=,client=,application= CONTEXT: while setting parameter
"synchronized_standby_slots" to "node1,node2"

2025-08-24 13:14:29.418 CEST [777453]: [48-1]

user=,db=,client=,application= LOG: background worker "parallel worker"
(PID 848476) exited with exit code 1

2025-08-24 13:14:29.418 CEST [777453]: [49-1]

user=,db=,client=,application= LOG: background worker "parallel worker"
(PID 848477) exited with exit code 1

Is this issue already observed

Recently we also hit this problem.

I think in a current state check_synchronized_standby_slots()
and validate_sync_standby_slots() functions are not very useful:
- When the hook is executed from postmaster it only checks
that synchronized_standby_slots contains a valid list, but doesn't check
that replication slots exists, because MyProc is NULL. It happens both, on
start and on reload.
- When executed from other backends set_config_with_handle() is called
with elevel = 0, and therefore elevel becomes DEBUG3, which results in no
useful error/warning messages.

There are a couple of places where check_synchronized_standby_slots()
failure is not ignored:
1. alter system set synchronized_standby_slots='invalid value';
2. Parallel workers, because RestoreGUCState()
calls set_config_option_ext()->set_config_with_handle() with elevel=ERROR.
As a result, parallel workers fail to start with the error.

With parallel workers it is actually even worse - we get the error even in
case of standby:
1. start standby with synchronized_standby_slots referring to non-existing
slots
2. SET parallel_setup_cost, parallel_tuple_cost,
and min_parallel_table_scan_size to 0
3. Run select * from pg_namespace; and observe following error:
ERROR: invalid value for parameter "synchronized_standby_slots": "a1,b1"
DETAIL: replication slot "a1" does not exist
CONTEXT: while setting parameter "synchronized_standby_slots" to "a1,b1"
parallel worker

We may argue a lot that invalid configuration must not be used, but the
thing is that the main problem being solved by synchronized_standby_slots
is delaying logical decoding at certains LSN until it was sent to enough
physical standby.
This feature must not affect normal queries, only logical replication.

Please find attached patch fixing a problem for parallel workers.

Regards,
--
Alexander Kukushkin

Attachments:

0001-Exclude-parallel-workers-when-validating-synchronize.patchtext/x-patch; charset=US-ASCII; name=0001-Exclude-parallel-workers-when-validating-synchronize.patchDownload+2-2
#5Fujii Masao
masao.fujii@gmail.com
In reply to: Alexander Kukushkin (#4)
Re: issue with synchronized_standby_slots

On Mon, Sep 8, 2025 at 6:26 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:

Hi,

On Sun, 7 Sept 2025 at 10:15, Fabrice Chapuis <fabrice636861@gmail.com> wrote:

Thanks for your reply Zhijie,

I understand that the error invalid value for parameter will be diplayed in case of bad value for the GUC synchronized_standby_slots or if a standby node configured is not up and running.
But the problem I noticed is that statements could not execute normally and error code is returned to the applcation.
This append after an upgrade from PG 14 to PG 17.
I could try to reproduce the issue

STATEMENT: select service_period,sp1_0.address_line_1 from tbl1 where http://sp1_0.vn=$1 order by sp1_0.start_of_period

2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR: invalid value for parameter "synchronized_standby_slots": "node1,node2"
2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL: replication slot "s029054a" does not exist
2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT: while setting parameter "synchronized_standby_slots" to "node1,node2"
2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG: background worker "parallel worker" (PID 848476) exited with exit code 1
2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG: background worker "parallel worker" (PID 848477) exited with exit code 1

Is this issue already observed

Recently we also hit this problem.

I think in a current state check_synchronized_standby_slots() and validate_sync_standby_slots() functions are not very useful:
- When the hook is executed from postmaster it only checks that synchronized_standby_slots contains a valid list, but doesn't check that replication slots exists, because MyProc is NULL. It happens both, on start and on reload.

This looks quite problematic. If a non-existent slot is specified in
synchronized_standby_slots in postgresql.conf and the configuration file
is reloaded, no error is reported. This happens because the postmaster
cannot detect that the slot doesn't exist (since it has no MyProc).
As a result, synchronized_standby_slots in the postmaster is set to
that slot. New backends then inherit this setting from the postmaster,
while already running backends correctly detect that the slot doesn't
exist and fail to apply it.

This leads to an inconsistent state: the reload succeeds with no error,
but some backends apply the new setting while others do not.
That inconsistency seems like an issue.

To fix this, ISTM that the GUC check hook for synchronized_standby_slots
should be revised so it doesn't rely on MyProc or perform slot existence
checks there....

Regards,

--
Fujii Masao

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Kukushkin (#4)
Re: issue with synchronized_standby_slots

On Mon, Sep 8, 2025 at 2:56 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:

Recently we also hit this problem.

I think in a current state check_synchronized_standby_slots() and validate_sync_standby_slots() functions are not very useful:
- When the hook is executed from postmaster it only checks that synchronized_standby_slots contains a valid list, but doesn't check that replication slots exists, because MyProc is NULL. It happens both, on start and on reload.
- When executed from other backends set_config_with_handle() is called with elevel = 0, and therefore elevel becomes DEBUG3, which results in no useful error/warning messages.

There are a couple of places where check_synchronized_standby_slots() failure is not ignored:
1. alter system set synchronized_standby_slots='invalid value';
2. Parallel workers, because RestoreGUCState() calls set_config_option_ext()->set_config_with_handle() with elevel=ERROR. As a result, parallel workers fail to start with the error.

With parallel workers it is actually even worse - we get the error even in case of standby:
1. start standby with synchronized_standby_slots referring to non-existing slots
2. SET parallel_setup_cost, parallel_tuple_cost, and min_parallel_table_scan_size to 0
3. Run select * from pg_namespace; and observe following error:
ERROR: invalid value for parameter "synchronized_standby_slots": "a1,b1"
DETAIL: replication slot "a1" does not exist
CONTEXT: while setting parameter "synchronized_standby_slots" to "a1,b1"
parallel worker

I see the same behaviour for default_table_access_method and
default_tablespace. For example, see failure cases:
postgres=# Alter system set default_table_access_method='missing';
ERROR: invalid value for parameter "default_table_access_method": "missing"
DETAIL: Table access method "missing" does not exist.

postgres=# SET parallel_setup_cost=0;
SET
postgres=# SET parallel_tuple_cost=0;
SET
postgres=# Set min_parallel_table_scan_size to 0;
SET
postgres=# select * from pg_namespace;
ERROR: invalid value for parameter "default_table_access_method": "missing"
DETAIL: Table access method "missing" does not exist.
CONTEXT: while setting parameter "default_table_access_method" to "missing"
parallel worker

OTOH, there is no ERROR on reload or restart.

It is fair to argue that invalid GUC values should be ignored in
certain cases like parallel query but we should have the same solution
for other similar parameters as well.

As for the synchronized_standby_slots, we can follow the behavior
similar to check_synchronous_standby_names and just give parsing
ERRORs. Any non-existent slot related errors can be given when that
parameter is later used.

--
With Regards,
Amit Kapila.

#7Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#6)
RE: issue with synchronized_standby_slots

On Tuesday, September 9, 2025 1:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 8, 2025 at 2:56 PM Alexander Kukushkin
<cyberdemn@gmail.com> wrote:

Recently we also hit this problem.

I think in a current state check_synchronized_standby_slots() and

validate_sync_standby_slots() functions are not very useful:

- When the hook is executed from postmaster it only checks that

synchronized_standby_slots contains a valid list, but doesn't check that
replication slots exists, because MyProc is NULL. It happens both, on start and
on reload.

- When executed from other backends set_config_with_handle() is called

with elevel = 0, and therefore elevel becomes DEBUG3, which results in no
useful error/warning messages.

There are a couple of places where check_synchronized_standby_slots()

failure is not ignored:

1. alter system set synchronized_standby_slots='invalid value'; 2.
Parallel workers, because RestoreGUCState() calls

set_config_option_ext()->set_config_with_handle() with elevel=ERROR. As a
result, parallel workers fail to start with the error.

With parallel workers it is actually even worse - we get the error even in case

of standby:

1. start standby with synchronized_standby_slots referring to
non-existing slots 2. SET parallel_setup_cost, parallel_tuple_cost,
and min_parallel_table_scan_size to 0 3. Run select * from pg_namespace;

and observe following error:

ERROR: invalid value for parameter "synchronized_standby_slots": "a1,b1"
DETAIL: replication slot "a1" does not exist
CONTEXT: while setting parameter "synchronized_standby_slots" to

"a1,b1"

parallel worker

I see the same behaviour for default_table_access_method and
default_tablespace. For example, see failure cases:
postgres=# Alter system set default_table_access_method='missing';
ERROR: invalid value for parameter "default_table_access_method":
"missing"
DETAIL: Table access method "missing" does not exist.

postgres=# SET parallel_setup_cost=0;
SET
postgres=# SET parallel_tuple_cost=0;
SET
postgres=# Set min_parallel_table_scan_size to 0; SET postgres=# select *
from pg_namespace;
ERROR: invalid value for parameter "default_table_access_method":
"missing"
DETAIL: Table access method "missing" does not exist.
CONTEXT: while setting parameter "default_table_access_method" to
"missing"
parallel worker

OTOH, there is no ERROR on reload or restart.

It is fair to argue that invalid GUC values should be ignored in certain cases like
parallel query but we should have the same solution for other similar
parameters as well.

As for the synchronized_standby_slots, we can follow the behavior similar to
check_synchronous_standby_names and just give parsing ERRORs. Any
non-existent slot related errors can be given when that parameter is later used.

I agree. For synchronized_standby_slots, I think it is acceptable to report only
parsing errors, because slots could be dropped even after validating the slot
existence during GUC loading. Additionally, we would report WARNINGs for
non-existent slots during the wait function anyway (e.g., in
StandbySlotsHaveCaughtup()).

Best Regards,
Hou zj

#8Fujii Masao
masao.fujii@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#7)
Re: issue with synchronized_standby_slots

On Tue, Sep 9, 2025 at 2:39 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

I agree. For synchronized_standby_slots, I think it is acceptable to report only
parsing errors, because slots could be dropped even after validating the slot
existence during GUC loading. Additionally, we would report WARNINGs for
non-existent slots during the wait function anyway (e.g., in
StandbySlotsHaveCaughtup()).

+1, to also address the issue I reported upthread.

As I understand it, only the walsender and the slot sync worker actually
use synchronized_standby_slots. But in the current code, all processes check
for slot existence via the GUC check hook, which wastes cycles. The proposed
change would eliminate that overhead.

Regards,

--
Fujii Masao

#9Rahila Syed
rahilasyed90@gmail.com
In reply to: Amit Kapila (#6)
Re: issue with synchronized_standby_slots

Hi,

As for the synchronized_standby_slots, we can follow the behavior

similar to check_synchronous_standby_names and just give parsing
ERRORs. Any non-existent slot related errors can be given when that
parameter is later used.

--

Please find attached a patch that implements this. I will work on adding a
test for it.

Thank you,
Rahila Syed

Attachments:

0001-Remove-the-validation-from-the-GUC-check-hook.patchapplication/octet-stream; name=0001-Remove-the-validation-from-the-GUC-check-hook.patchDownload+0-37
#10Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Rahila Syed (#9)
Re: issue with synchronized_standby_slots

On Wed, Sep 10, 2025 at 3:33 PM Rahila Syed <rahilasyed90@gmail.com> wrote:

Hi,

As for the synchronized_standby_slots, we can follow the behavior
similar to check_synchronous_standby_names and just give parsing
ERRORs. Any non-existent slot related errors can be given when that
parameter is later used.

--

Please find attached a patch that implements this. I will work on adding a test for it.

I suggest removing validate_sync_standby_slots() entirely since there
isn’t much left in it.

The below logic currently in validate_sync_standby_slots() to parse
the list of slots can be directly moved into
check_synchronized_standby_slots().

/* Verify syntax and parse string into a list of identifiers */
ok = SplitIdentifierString(rawname, ',', elemlist);

if (!ok)
GUC_check_errdetail("List syntax is invalid.");

--
With Regards,
Ashutosh Sharma.

#11Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Rahila Syed (#9)
Re: issue with synchronized_standby_slots

On Wed, 10 Sept 2025 at 15:33, Rahila Syed <rahilasyed90@gmail.com> wrote:

Hi,

As for the synchronized_standby_slots, we can follow the behavior
similar to check_synchronous_standby_names and just give parsing
ERRORs. Any non-existent slot related errors can be given when that
parameter is later used.

--

Please find attached a patch that implements this. I will work on adding a test for it.

Hi Rahila,

I think we should also add a parsing check for slot names specified in
the GUC synchronize_standby_slots as suggested by Amit in [1]/messages/by-id/CAA4eK1Jprqfs_URBmZ5=OOL98D05rPiGup1sscWgcCGcWU=9iA@mail.gmail.com.
I think we should also update the comment message in function
StandbySlotsHaveCaughtup
/*
* If a slot name provided in synchronized_standby_slots does not
* exist, report a message and exit the loop.
*
* Though validate_sync_standby_slots (the GUC check_hook) tries to
* avoid this, it can nonetheless happen because the user can specify
* a nonexistent slot name before server startup. That function cannot
* validate such a slot during startup, as ReplicationSlotCtl is not
* initialized by then. Also, the user might have dropped one slot.
*/
I made the changes in the above for the same and attached the updated patch.

[1]: /messages/by-id/CAA4eK1Jprqfs_URBmZ5=OOL98D05rPiGup1sscWgcCGcWU=9iA@mail.gmail.com

Thanks,
Shlok Kyal

Attachments:

v2-0001-Remove-the-validation-from-the-GUC-check-hook-and.patchapplication/octet-stream; name=v2-0001-Remove-the-validation-from-the-GUC-check-hook-and.patchDownload+2-34
#12Alexander Kukushkin
cyberdemn@gmail.com
In reply to: Shlok Kyal (#11)
Re: issue with synchronized_standby_slots

Hi,

On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

I think we should also add a parsing check for slot names specified in
the GUC synchronize_standby_slots as suggested by Amit in [1].
I made the changes in the above for the same and attached the updated
patch.

I agree, validating that list contains valid replication slot names is a
good idea.
However, you used ReplicationSlotValidateName() function, which is not a
good fit for it, especially when it is called with elevel=ERROR in
postmaster.

--
Regards,
--
Alexander Kukushkin

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Kukushkin (#12)
Re: issue with synchronized_standby_slots

On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:

On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

I think we should also add a parsing check for slot names specified in
the GUC synchronize_standby_slots as suggested by Amit in [1].
I made the changes in the above for the same and attached the updated patch.

I agree, validating that list contains valid replication slot names is a good idea.
However, you used ReplicationSlotValidateName() function, which is not a good fit for it, especially when it is called with elevel=ERROR in postmaster.

Can you please explain why you think so? And what is your proposal for the same?

BTW, we should also try to conclude on my yesterday's point as to why
it is okay to have the same behavior for default_tablespace and
default_table_access_method and not for this parameter? I am asking
because if we change the current behavior, tomorrow, we can get
complaints that one expects the old behaviour as that was similar to
other GUCs like default_tablespace and default_table_access_method.

--
With Regards,
Amit Kapila.

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#13)
Re: issue with synchronized_standby_slots

On Thu, Sep 11, 2025 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:

On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

I think we should also add a parsing check for slot names specified in
the GUC synchronize_standby_slots as suggested by Amit in [1].
I made the changes in the above for the same and attached the updated patch.

I agree, validating that list contains valid replication slot names is a good idea.
However, you used ReplicationSlotValidateName() function, which is not a good fit for it, especially when it is called with elevel=ERROR in postmaster.

Can you please explain why you think so? And what is your proposal for the same?

You are right and I think we should use WARNING here as is used in
check_primary_slot_name() for the same function call. For ERROR
reporting, we need to use GUC_check_* functions. Also, probably the
ERROR during startup could lead to shutdown.

--
With Regards,
Amit Kapila.

#15Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Amit Kapila (#14)
Re: issue with synchronized_standby_slots

On Thu, 11 Sept 2025 at 09:20, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Sep 11, 2025 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:

On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

I think we should also add a parsing check for slot names specified in
the GUC synchronize_standby_slots as suggested by Amit in [1].
I made the changes in the above for the same and attached the updated patch.

I agree, validating that list contains valid replication slot names is a good idea.
However, you used ReplicationSlotValidateName() function, which is not a good fit for it, especially when it is called with elevel=ERROR in postmaster.

Can you please explain why you think so? And what is your proposal for the same?

You are right and I think we should use WARNING here as is used in
check_primary_slot_name() for the same function call. For ERROR
reporting, we need to use GUC_check_* functions. Also, probably the
ERROR during startup could lead to shutdown.

I tested by setting elevel=ERROR and elevel=WARNING in the function
ReplicationSlotValidateName.

For elevel=ERROR,
After hitting ereport inside function ReplicationSlotValidateName, the
PG_CATCH() in 'call_string_check_hook' and process is terminated.
Server logs are
2025-09-11 10:01:17.909 IST [1995206] FATAL: replication slot name
"myslot1*" contains invalid character
2025-09-11 10:01:17.909 IST [1995206] HINT: Replication slot names
may only contain lower case letters, numbers, and the underscore
character.

For level=WARNING,
Even after hitting the ereport, it is continuing with the normal flow of code.
Server logs are:
2025-09-11 10:27:30.195 IST [2013341] WARNING: replication slot name
"myslot1*" contains invalid character
2025-09-11 10:27:30.195 IST [2013341] HINT: Replication slot names
may only contain lower case letters, numbers, and the underscore
character.
2025-09-11 10:28:13.863 IST [2013341] LOG: invalid value for
parameter "synchronized_standby_slots": "myslot1*"
2025-09-11 10:28:13.863 IST [2013341] FATAL: configuration file
"/home/ubuntu/Project/inst/pg_11_9_tmp_4/bin/primary/postgresql.conf"
contains errors

I think we can use ReplicationSlotValidateName with elevel=WARNING here.
I have attached an updated patch with this change.

Thanks,
Shlok Kyal

Attachments:

v3-0001-Remove-the-validation-from-the-GUC-check-hook-and.patchapplication/octet-stream; name=v3-0001-Remove-the-validation-from-the-GUC-check-hook-and.patchDownload+2-34
#16Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Shlok Kyal (#15)
Re: issue with synchronized_standby_slots

On Thu, Sep 11, 2025 at 11:00 AM Shlok Kyal <shlok.kyal.oss@gmail.com>
wrote:

On Thu, 11 Sept 2025 at 09:20, Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Thu, Sep 11, 2025 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <

cyberdemn@gmail.com> wrote:

On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com>

wrote:

I think we should also add a parsing check for slot names

specified in

the GUC synchronize_standby_slots as suggested by Amit in [1].
I made the changes in the above for the same and attached the

updated patch.

I agree, validating that list contains valid replication slot names

is a good idea.

However, you used ReplicationSlotValidateName() function, which is

not a good fit for it, especially when it is called with elevel=ERROR in
postmaster.

Can you please explain why you think so? And what is your proposal

for the same?

You are right and I think we should use WARNING here as is used in
check_primary_slot_name() for the same function call. For ERROR
reporting, we need to use GUC_check_* functions. Also, probably the
ERROR during startup could lead to shutdown.

I tested by setting elevel=ERROR and elevel=WARNING in the function
ReplicationSlotValidateName.

For elevel=ERROR,
After hitting ereport inside function ReplicationSlotValidateName, the
PG_CATCH() in 'call_string_check_hook' and process is terminated.
Server logs are
2025-09-11 10:01:17.909 IST [1995206] FATAL: replication slot name
"myslot1*" contains invalid character
2025-09-11 10:01:17.909 IST [1995206] HINT: Replication slot names
may only contain lower case letters, numbers, and the underscore
character.

For level=WARNING,
Even after hitting the ereport, it is continuing with the normal flow of

code.

Server logs are:
2025-09-11 10:27:30.195 IST [2013341] WARNING: replication slot name
"myslot1*" contains invalid character
2025-09-11 10:27:30.195 IST [2013341] HINT: Replication slot names
may only contain lower case letters, numbers, and the underscore
character.
2025-09-11 10:28:13.863 IST [2013341] LOG: invalid value for
parameter "synchronized_standby_slots": "myslot1*"
2025-09-11 10:28:13.863 IST [2013341] FATAL: configuration file
"/home/ubuntu/Project/inst/pg_11_9_tmp_4/bin/primary/postgresql.conf"
contains errors

I think we can use ReplicationSlotValidateName with elevel=WARNING here.
I have attached an updated patch with this change.

I would suggest getting rid of the "*ok*" flag, it’s probably not needed.
I’d rather rewrite validate_sync_standby_slots() like this:

static bool
validate_sync_standby_slots(char *rawname, List **elemlist)
{
/* Verify syntax and parse string into a list of identifiers */
if (!SplitIdentifierString(rawname, ',', elemlist))
GUC_check_errdetail("List syntax is invalid.");
else
{
foreach_ptr(char, name, *elemlist)
{
if (!ReplicationSlotValidateName(name, false, WARNING))
return false;
}
}

return true;
}

--
With Regards,
Ashutosh Sharma.

#17Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Ashutosh Sharma (#16)
Re: issue with synchronized_standby_slots

On Thu, Sep 11, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:

On Thu, Sep 11, 2025 at 11:00 AM Shlok Kyal <shlok.kyal.oss@gmail.com>
wrote:

On Thu, 11 Sept 2025 at 09:20, Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Thu, Sep 11, 2025 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <

cyberdemn@gmail.com> wrote:

On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <

shlok.kyal.oss@gmail.com> wrote:

I think we should also add a parsing check for slot names

specified in

the GUC synchronize_standby_slots as suggested by Amit in [1].
I made the changes in the above for the same and attached the

updated patch.

I agree, validating that list contains valid replication slot

names is a good idea.

However, you used ReplicationSlotValidateName() function, which is

not a good fit for it, especially when it is called with elevel=ERROR in
postmaster.

Can you please explain why you think so? And what is your proposal

for the same?

You are right and I think we should use WARNING here as is used in
check_primary_slot_name() for the same function call. For ERROR
reporting, we need to use GUC_check_* functions. Also, probably the
ERROR during startup could lead to shutdown.

I tested by setting elevel=ERROR and elevel=WARNING in the function
ReplicationSlotValidateName.

For elevel=ERROR,
After hitting ereport inside function ReplicationSlotValidateName, the
PG_CATCH() in 'call_string_check_hook' and process is terminated.
Server logs are
2025-09-11 10:01:17.909 IST [1995206] FATAL: replication slot name
"myslot1*" contains invalid character
2025-09-11 10:01:17.909 IST [1995206] HINT: Replication slot names
may only contain lower case letters, numbers, and the underscore
character.

For level=WARNING,
Even after hitting the ereport, it is continuing with the normal flow of

code.

Server logs are:
2025-09-11 10:27:30.195 IST [2013341] WARNING: replication slot name
"myslot1*" contains invalid character
2025-09-11 10:27:30.195 IST [2013341] HINT: Replication slot names
may only contain lower case letters, numbers, and the underscore
character.
2025-09-11 10:28:13.863 IST [2013341] LOG: invalid value for
parameter "synchronized_standby_slots": "myslot1*"
2025-09-11 10:28:13.863 IST [2013341] FATAL: configuration file
"/home/ubuntu/Project/inst/pg_11_9_tmp_4/bin/primary/postgresql.conf"
contains errors

I think we can use ReplicationSlotValidateName with elevel=WARNING here.
I have attached an updated patch with this change.

I would suggest getting rid of the "*ok*" flag, it’s probably not needed.
I’d rather rewrite validate_sync_standby_slots() like this:

static bool
validate_sync_standby_slots(char *rawname, List **elemlist)
{
/* Verify syntax and parse string into a list of identifiers */
if (!SplitIdentifierString(rawname, ',', elemlist))
GUC_check_errdetail("List syntax is invalid.");
else
{
foreach_ptr(char, name, *elemlist)
{
if (!ReplicationSlotValidateName(name, false, WARNING))
return false;
}
}

return true;
}

Apart from this change, if you agree, then you may also add a test-case
that sets synchronized_standby_slots to some reserved name like
"pg_conflict_detection" to verify for the WARNING message.

--
With Regards,
Ashutosh Sharma.

#18Rahila Syed
rahilasyed90@gmail.com
In reply to: Amit Kapila (#13)
Re: issue with synchronized_standby_slots

Hi,

BTW, we should also try to conclude on my yesterday's point as to why
it is okay to have the same behavior for default_tablespace and
default_table_access_method and not for this parameter? I am asking
because if we change the current behavior, tomorrow, we can get
complaints that one expects the old behaviour as that was similar to
other GUCs like default_tablespace and default_table_access_method.

Fair point. I haven't examined the validation of GUCs in parallel workers
closely,
but one argument for preventing parallel workers from failing due to an
incorrect
value of synchronized_standby_slots is that a select query works in this
situation
without parallel workers.

Whereas, for incorrect values of default_tablespace and
default_table_access_method,
most commands would fail regardless of whether parallel workers are enabled.

PFA a test for the original bug report on this thread. This applies on the
v3 version of the patch
that was shared.

Thank you,
Rahila Syed

Attachments:

0001-Add-a-test-for-a-disabled-GUC-check-for-parallel-wor.patchapplication/octet-stream; name=0001-Add-a-test-for-a-disabled-GUC-check-for-parallel-wor.patchDownload+29-1
#19Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Rahila Syed (#18)
Re: issue with synchronized_standby_slots

On Thu, 11 Sept 2025 at 12:00, Rahila Syed <rahilasyed90@gmail.com> wrote:

Hi,

BTW, we should also try to conclude on my yesterday's point as to why
it is okay to have the same behavior for default_tablespace and
default_table_access_method and not for this parameter? I am asking
because if we change the current behavior, tomorrow, we can get
complaints that one expects the old behaviour as that was similar to
other GUCs like default_tablespace and default_table_access_method.

Fair point. I haven't examined the validation of GUCs in parallel workers closely,
but one argument for preventing parallel workers from failing due to an incorrect
value of synchronized_standby_slots is that a select query works in this situation
without parallel workers.

Whereas, for incorrect values of default_tablespace and default_table_access_method,
most commands would fail regardless of whether parallel workers are enabled.

PFA a test for the original bug report on this thread. This applies on the v3 version of the patch
that was shared.

Thanks for sharing the patch. I checked the test and it looks good to
me. But I am not sure if we should have a new file for the test. I
have added the test in the '040_standby_failover_slots_sync.pl' file
along with some other tests.
Also I have addressed the comments by Ashutosh in [1]/messages/by-id/CAE9k0P=x3J3nmSmYKmTkiFXTDKLxJkXFO4+VHJyNu01Od6CZfg@mail.gmail.com[2]/messages/by-id/CAE9k0P=OFMFCRy9aDGWZ3bt91tbB1WnzsAbzXN72iWBaGVuMrw@mail.gmail.com.

I have attached the updated v4 patch

[1]: /messages/by-id/CAE9k0P=x3J3nmSmYKmTkiFXTDKLxJkXFO4+VHJyNu01Od6CZfg@mail.gmail.com
[2]: /messages/by-id/CAE9k0P=OFMFCRy9aDGWZ3bt91tbB1WnzsAbzXN72iWBaGVuMrw@mail.gmail.com

Thanks,
Shlok Kyal

Attachments:

v4-0001-Remove-the-validation-from-the-GUC-check-hook-and.patchapplication/octet-stream; name=v4-0001-Remove-the-validation-from-the-GUC-check-hook-and.patchDownload+41-47
#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Shlok Kyal (#19)
Re: issue with synchronized_standby_slots

On Fri, Sep 12, 2025 at 2:34 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:

I have attached the updated v4 patch

+# Cannot be set synchronized_standby_slots to a reserved slot name
+($result, $stdout, $stderr) = $primary->psql('postgres',
+ "ALTER SYSTEM SET synchronized_standby_slots='pg_conflict_detection'");
+ok( $stderr =~
+   m/WARNING:  replication slot name "pg_conflict_detection" is reserved/,
+ "Cannot use a reserverd slot name");
+
+# Cannot be set synchronized_standby_slots to slot name with invalid characters
+($result, $stdout, $stderr) = $primary->psql('postgres',
+ "ALTER SYSTEM SET synchronized_standby_slots='invalid*'");
+ok( $stderr =~
+   m/WARNING:  replication slot name "invalid\*" contains invalid character/,
+ "Cannot use a invalid slot name");

These tests can be present in some sql file. I think you have kept
these in the .pl file to keep it along with other tests but I think
these are better suited for some .sql file.

--
With Regards,
Amit Kapila.

#21Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Amit Kapila (#20)
#22Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Shlok Kyal (#21)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#22)
#24Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#23)
#25Fabrice Chapuis
fabrice636861@gmail.com
In reply to: Ashutosh Sharma (#24)
#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Fabrice Chapuis (#25)
#27Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#26)
#28Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Amit Kapila (#27)
#29Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Shlok Kyal (#28)
#30Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Shlok Kyal (#28)
#31Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#30)
#32Amit Kapila
amit.kapila16@gmail.com
In reply to: Shlok Kyal (#31)
#33Fujii Masao
masao.fujii@gmail.com
In reply to: Amit Kapila (#32)
#34Amit Kapila
amit.kapila16@gmail.com
In reply to: Fujii Masao (#33)
#35Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Amit Kapila (#32)
#36Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Amit Kapila (#34)
#37Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Shlok Kyal (#35)
#38Amit Kapila
amit.kapila16@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#37)
#39Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Amit Kapila (#38)
#40Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Shlok Kyal (#39)
#41Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#40)