Standby server with cascade logical replication could not be properly stopped under load

Started by Alexey Makhmutov11 months ago16 messagesbugs
Jump to latest
#1Alexey Makhmutov
a.makhmutov@postgrespro.ru

Assuming following configuration with three connected servers A->B->C: A
(primary), B (physical standby) and C (logical replica connected to B).
If server A is under load and B is applying incoming WAL records while
also streaming data via logical replication to C, then attempt to stop
server B in 'fast' mode may by unsuccessful. In this case server will
remain in PM_SHUTDOWN state indefinitely with all 'walsender' processes
running in an infinite busy loop (consuming a CPU core each). To get
server out of this state it's required either to either stop B using
'immediate' mode or stop server C (which will cause 'walsender'
processes on server B to exit). This issue is reproducible on latest
'master', as well as on current PG 16/17 branches.

Attached is a test scenario to reproduce the issue: 'test_scenario.zip'.
This archive contains shell scripts to create
the required environment (all three serves) and then to execute required
steps to get server into incorrect state. First, edit the 'test_env.sh'
file and specify path to PG binaries in PG_PATH variable and optionally
set of ports used by test instances in 'pg_port' array. Then execute the
'test_prepare.sh' script, which will create, configure and start all
three PG instances. Servers then could be started and stopped using
corresponding start and stop scripts. To reproduce the issue, ensure
that all three servers are running and execute the 'test_execute.sh'
script. This script starts 'pgbench' instance in background for 30
seconds to create load on server A, waits for 20 seconds and then try to
stop the B instance using default 'fast' mode. Expected behavior is
normal shutdown for B, while observed behavior is different: shutdown
attempt fails and each remaining 'walsender' process consumes entire CPU
core. To get out of this state one could use 'stop-C.sh' script to stop
the server C, as it will complete shutdown process of B instance as well.

My understanding is that this issue seems to be caused by the logic in
'GetStandbyFlushRecPtr' function, which returns current flush point for
received WAL data. This position is used in 'XLogSendLogical' to
calculate whether current walsender is in 'caught up' state (i.e.
whether we send all the available data to downstream instance). During
shutdown walsenders are allowed to continue their work until they are in
'caught up' state, while 'postmaster' is waiting for their completion.
Currently 'GetStandbyFlushRecPtr' returns position of last stored
record, rather than last applied record. This is correct for physical
replication as we can send data to downstream instance without applying
it to local system. However, for logical replication this seems to be
incorrect, as we could not decode data until it's applied on current
instance. So, if current stored WAL position differs from applied
position while server is being stopped, then
'WalSndLoop'/'XLogSendLogical'/'XLogReadRecord' methods will spin in a
busy loop, waiting for applied position to advance. The recovery process
is already stopped at this moment, so this will be an infinite loop.
Probably either 'GetStandbyFlushRecPtr' or
'WalSndLoop'/'XLogSendLogical' logic need to be adjusted to take into
consideration such case with  logical replication.

Attached is also a patch, which aims to fix this issue:
0001-Use-only-replayed-position-as-target-flush-point-for.patch. It
tries to to modify behavior of 'GetStandbyFlushRecPtr' function to
ensure that it returns only applied position for logical replication.
This function could be also invoked from slot synchronization routines
and in this case it retains current behavior by returning last stored
position.

Thanks,
Alexey

Attachments:

test_scenario.zipapplication/zip; name=test_scenario.zipDownload
0001-Use-only-replayed-position-as-target-flush-point-for.patchtext/x-patch; charset=UTF-8; name=0001-Use-only-replayed-position-as-target-flush-point-for.patchDownload+17-15
#2Ajin Cherian
itsajin@gmail.com
In reply to: Alexey Makhmutov (#1)
Re: Standby server with cascade logical replication could not be properly stopped under load

On Thu, May 22, 2025 at 7:25 PM Alexey Makhmutov
<a.makhmutov@postgrespro.ru> wrote:

Assuming following configuration with three connected servers A->B->C: A
(primary), B (physical standby) and C (logical replica connected to B).
If server A is under load and B is applying incoming WAL records while
also streaming data via logical replication to C, then attempt to stop
server B in 'fast' mode may by unsuccessful. In this case server will
remain in PM_SHUTDOWN state indefinitely with all 'walsender' processes
running in an infinite busy loop (consuming a CPU core each). To get
server out of this state it's required either to either stop B using
'immediate' mode or stop server C (which will cause 'walsender'
processes on server B to exit). This issue is reproducible on latest
'master', as well as on current PG 16/17 branches.

Attached is a test scenario to reproduce the issue: 'test_scenario.zip'.
This archive contains shell scripts to create
the required environment (all three serves) and then to execute required
steps to get server into incorrect state. First, edit the 'test_env.sh'
file and specify path to PG binaries in PG_PATH variable and optionally
set of ports used by test instances in 'pg_port' array. Then execute the
'test_prepare.sh' script, which will create, configure and start all
three PG instances. Servers then could be started and stopped using
corresponding start and stop scripts. To reproduce the issue, ensure
that all three servers are running and execute the 'test_execute.sh'
script. This script starts 'pgbench' instance in background for 30
seconds to create load on server A, waits for 20 seconds and then try to
stop the B instance using default 'fast' mode. Expected behavior is
normal shutdown for B, while observed behavior is different: shutdown
attempt fails and each remaining 'walsender' process consumes entire CPU
core. To get out of this state one could use 'stop-C.sh' script to stop
the server C, as it will complete shutdown process of B instance as well.

My understanding is that this issue seems to be caused by the logic in
'GetStandbyFlushRecPtr' function, which returns current flush point for
received WAL data. This position is used in 'XLogSendLogical' to
calculate whether current walsender is in 'caught up' state (i.e.
whether we send all the available data to downstream instance). During
shutdown walsenders are allowed to continue their work until they are in
'caught up' state, while 'postmaster' is waiting for their completion.
Currently 'GetStandbyFlushRecPtr' returns position of last stored
record, rather than last applied record. This is correct for physical
replication as we can send data to downstream instance without applying
it to local system. However, for logical replication this seems to be
incorrect, as we could not decode data until it's applied on current
instance. So, if current stored WAL position differs from applied
position while server is being stopped, then
'WalSndLoop'/'XLogSendLogical'/'XLogReadRecord' methods will spin in a
busy loop, waiting for applied position to advance. The recovery process
is already stopped at this moment, so this will be an infinite loop.
Probably either 'GetStandbyFlushRecPtr' or
'WalSndLoop'/'XLogSendLogical' logic need to be adjusted to take into
consideration such case with logical replication.

Attached is also a patch, which aims to fix this issue:
0001-Use-only-replayed-position-as-target-flush-point-for.patch. It
tries to to modify behavior of 'GetStandbyFlushRecPtr' function to
ensure that it returns only applied position for logical replication.
This function could be also invoked from slot synchronization routines
and in this case it retains current behavior by returning last stored
position.

Good catch, I agree with the analysis. I've also verified that the fix
works as expected.
Just a small comment: can you explicitly mention in the comments that
"for logical replication we can only send records that have already
been replayed else we might get stuck in shutdown" or something to
that effect as that distinction is important for future developers in
this area.

regards,
Ajin Cherian
Fujitsu Australia

#3shveta malik
shveta.malik@gmail.com
In reply to: Alexey Makhmutov (#1)
Re: Standby server with cascade logical replication could not be properly stopped under load

On Thu, May 22, 2025 at 7:51 AM Alexey Makhmutov
<a.makhmutov@postgrespro.ru> wrote:

Assuming following configuration with three connected servers A->B->C: A
(primary), B (physical standby) and C (logical replica connected to B).
If server A is under load and B is applying incoming WAL records while
also streaming data via logical replication to C, then attempt to stop
server B in 'fast' mode may by unsuccessful. In this case server will
remain in PM_SHUTDOWN state indefinitely with all 'walsender' processes
running in an infinite busy loop (consuming a CPU core each). To get
server out of this state it's required either to either stop B using
'immediate' mode or stop server C (which will cause 'walsender'
processes on server B to exit). This issue is reproducible on latest
'master', as well as on current PG 16/17 branches.

Attached is a test scenario to reproduce the issue: 'test_scenario.zip'.
This archive contains shell scripts to create
the required environment (all three serves) and then to execute required
steps to get server into incorrect state. First, edit the 'test_env.sh'
file and specify path to PG binaries in PG_PATH variable and optionally
set of ports used by test instances in 'pg_port' array. Then execute the
'test_prepare.sh' script, which will create, configure and start all
three PG instances. Servers then could be started and stopped using
corresponding start and stop scripts. To reproduce the issue, ensure
that all three servers are running and execute the 'test_execute.sh'
script. This script starts 'pgbench' instance in background for 30
seconds to create load on server A, waits for 20 seconds and then try to
stop the B instance using default 'fast' mode. Expected behavior is
normal shutdown for B, while observed behavior is different: shutdown
attempt fails and each remaining 'walsender' process consumes entire CPU
core. To get out of this state one could use 'stop-C.sh' script to stop
the server C, as it will complete shutdown process of B instance as well.

My understanding is that this issue seems to be caused by the logic in
'GetStandbyFlushRecPtr' function, which returns current flush point for
received WAL data. This position is used in 'XLogSendLogical' to
calculate whether current walsender is in 'caught up' state (i.e.
whether we send all the available data to downstream instance). During
shutdown walsenders are allowed to continue their work until they are in
'caught up' state, while 'postmaster' is waiting for their completion.
Currently 'GetStandbyFlushRecPtr' returns position of last stored
record, rather than last applied record. This is correct for physical
replication as we can send data to downstream instance without applying
it to local system. However, for logical replication this seems to be
incorrect, as we could not decode data until it's applied on current
instance. So, if current stored WAL position differs from applied
position while server is being stopped, then
'WalSndLoop'/'XLogSendLogical'/'XLogReadRecord' methods will spin in a
busy loop, waiting for applied position to advance. The recovery process
is already stopped at this moment, so this will be an infinite loop.
Probably either 'GetStandbyFlushRecPtr' or
'WalSndLoop'/'XLogSendLogical' logic need to be adjusted to take into
consideration such case with logical replication.

Attached is also a patch, which aims to fix this issue:
0001-Use-only-replayed-position-as-target-flush-point-for.patch. It
tries to to modify behavior of 'GetStandbyFlushRecPtr' function to
ensure that it returns only applied position for logical replication.
This function could be also invoked from slot synchronization routines
and in this case it retains current behavior by returning last stored
position.

The problem stated in 'logical-walsender' on 'physical standby' looks
genuine. I agree with the analysis for slot-sync as well. Slot-sync
does not need the fix as it deals only with flush-position and does
not care about replay-position. Since the problem area falls under
'Allow logical decoding on standbys', I am adding Bertrand for further
comments on this fix.

thanks
Shveta

#4Alexey Makhmutov
a.makhmutov@postgrespro.ru
In reply to: shveta malik (#3)
Re: Standby server with cascade logical replication could not be properly stopped under load

Hi Ajin and Shveta, thank you for looking at this issue.

On 5/22/25 12:48, Ajin Cherian wrote:

Just a small comment: can you explicitly mention in the comments that
"for logical replication we can only send records that have already
been replayed else we might get stuck in shutdown" or something to
that effect as that distinction is important for future developers in
this area.

Thank you. I've expanded the comment to make it more verbose to clarify
expected logic in this case.
Updated patch is attached as
'0001-Use-only-replayed-position-as-target-flush-point-2.patch'.

Thanks,
Alexey

Attachments:

0001-Use-only-replayed-position-as-target-flush-point-2.patchtext/x-patch; charset=UTF-8; name=0001-Use-only-replayed-position-as-target-flush-point-2.patchDownload+22-15
#5Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Alexey Makhmutov (#4)
Re: Standby server with cascade logical replication could not be properly stopped under load

Hi,

On Thu, May 22, 2025 at 04:29:37PM +0300, Alexey Makhmutov wrote:

Hi Ajin and Shveta, thank you for looking at this issue.

On 5/22/25 12:48, Ajin Cherian wrote:

Just a small comment: can you explicitly mention in the comments that
"for logical replication we can only send records that have already
been replayed else we might get stuck in shutdown" or something to
that effect as that distinction is important for future developers in
this area.

Thank you. I've expanded the comment to make it more verbose to clarify
expected logic in this case.
Updated patch is attached as
'0001-Use-only-replayed-position-as-target-flush-point-2.patch'.

Thanks for the report!

I agree with your analysis and I think that your proposed fix make sense. I
wonder if the comment on top of GetStandbyFlushRecPtr() could be updated a bit
though.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#6Alexey Makhmutov
a.makhmutov@postgrespro.ru
In reply to: Bertrand Drouvot (#5)
Re: Standby server with cascade logical replication could not be properly stopped under load

Hi Bertrand, thank you for looking at this issue.

On 5/22/25 18:31, Bertrand Drouvot wrote:

I agree with your analysis and I think that your proposed fix make sense. I
wonder if the comment on top of GetStandbyFlushRecPtr() could be updated a bit
though.

I've tried to update method comment as well in the updated patch:
0001-Use-only-replayed-position-as-target-flush-point-3-pg-master.patch.
The same patch could be applied on top of PG 17 as well.

PG 16 does not have the slot synchronization logic, so its
implementation of GetStandbyFlushRecPtr is slightly different and
comments need to be updated to reflect this discrepancy as well. So,
here is a variant of the same patch updated to be compatible with PG 16:
0002-Use-only-replayed-position-as-target-flush-point-3-pg-16.patch

Thanks,
Alexey

Attachments:

0001-Use-only-replayed-position-as-target-flush-point-3-pg-master.patchtext/x-patch; charset=UTF-8; name=0001-Use-only-replayed-position-as-target-flush-point-3-pg-master.patchDownload+28-22
0002-Use-only-replayed-position-as-target-flush-point-3-pg-16.patchtext/x-patch; charset=UTF-8; name=0002-Use-only-replayed-position-as-target-flush-point-3-pg-16.patchDownload+24-20
#7Michael Paquier
michael@paquier.xyz
In reply to: Alexey Makhmutov (#6)
Re: Standby server with cascade logical replication could not be properly stopped under load

On Thu, May 22, 2025 at 09:22:02PM +0300, Alexey Makhmutov wrote:

I've tried to update method comment as well in the updated patch:
0001-Use-only-replayed-position-as-target-flush-point-3-pg-master.patch.
The same patch could be applied on top of PG 17 as well.

PG 16 does not have the slot synchronization logic, so its implementation of
GetStandbyFlushRecPtr is slightly different and comments need to be updated
to reflect this discrepancy as well. So, here is a variant of the same patch
updated to be compatible with PG 16:
0002-Use-only-replayed-position-as-target-flush-point-3-pg-16.patch

Hmm. The scenario you are pointing at with a mix of cascading logical
and physical standbys is tricky enough that I think this warrants
having a test in the long-run. Do you think that it would be possible
to sort something out with ordered actions taken during the shutdown
sequences at least for HEAD?

Are v15 and older versions things that we should worry about when
using setups made of physical standbys?
--
Michael

#8Michael Paquier
michael@paquier.xyz
In reply to: Alexey Makhmutov (#6)
Re: Standby server with cascade logical replication could not be properly stopped under load

On Thu, May 22, 2025 at 09:22:02PM +0300, Alexey Makhmutov wrote:

I've tried to update method comment as well in the updated patch:
0001-Use-only-replayed-position-as-target-flush-point-3-pg-master.patch.
The same patch could be applied on top of PG 17 as well.

+	if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
+	{
+		TimeLineID	receiveTLI;
+		XLogRecPtr	receivePtr = GetWalRcvFlushRecPtr(NULL, &receiveTLI);
+
+		if (receiveTLI == replayTLI && receivePtr > result)
+			result = receivePtr;
+	}

Looking at the patch, I am not much a fan of looking at MyWalSnd in
GetStandbyFlushRecPtr() which does not have such a dependency yet, to
then decide if the LSN retrieved by GetWalRcvFlushRecPtr() should be
removed or used.

Wouldn't it be better to extend GetStandbyFlushRecPtr() so as the
caller can directly decide if they want to include in the maths the
last LSN flushed by the WAL receiver, depending on the code path where
the routine is called? That makes for a larger patch, of course, but
it removes the need to look at MyWalSnd inside
GetStandbyFlushRecPtr().
--
Michael

#9Alexey Makhmutov
a.makhmutov@postgrespro.ru
In reply to: Michael Paquier (#8)
Re: Standby server with cascade logical replication could not be properly stopped under load

On 5/28/25 10:35, Michael Paquier wrote:

Wouldn't it be better to extend GetStandbyFlushRecPtr() so as the
caller can directly decide if they want to include in the maths the
last LSN flushed by the WAL receiver, depending on the code path where
the routine is called? That makes for a larger patch, of course, but
it removes the need to look at MyWalSnd inside
GetStandbyFlushRecPtr().

Well, this is a great question, because it made me thinking whether we
need to change behavior of THIS function at all. Extending
'GetStandbyFlushRecPtr' to properly support logical replication will
just make it equal to the 'GetXLogReplayRecPtr'. So, basically we just
need to replace invocation of 'GetStandbyFlushRecPtr' with
'GetXLogReplayRecPtr' and this should solve the problem.

The only invoker of 'GetStandbyFlushRecPtr' outside of walsender.c is
slot synchronization routine, which expects to get 'physical' position
anyway, so it does not need to be changed. All other invocations are
performed from the 'walsender.c' file itself. Obviously, both
'StartReplication' (physical) and 'XLogSendPhysical' functions need
current logic, so we can skip them. The 'XLogSendLogical' need to be
changed to use 'GetXLogReplayRecPtr' (to get consistent behavior with
WalSndWaitForWal and actually fix the original problem discussed in this
thread). This leaves us with only single function to consider:
'IdentifySystem'. This function could be called both in case of physical
and logical replication, but I assume that we need to return flushed WAL
position in it as well, as it's described in the documentation for this
command. This also seems to be the right behavior, as primary goal for
'identify' command is to inform client about the potential replication
start point and we definitely can start replication from the flushed
position (as we already has this record), just with some delay (once our
recovery get us to this point). But maybe somebody with more knowledge
of the logical replication expectation could confirm this.

The fix seems to be very simple in this case (assuming the logic above
is correct): we only replace 'GetStandbyFlushRecPtr' call in
'XLogSendLogical' with the 'GetXLogReplayRecPtr' and that's it. Attached
is the modified patch. It could be applied on top of 16, 17 and master
versions.

As for the test: do you want this case with shutdown to be added to the
existing test plan for standby logical recovery? The condition to
trigger the problem is just to have a flushed and replayed positions
pointing to different values during standby shutdown. I could try to add
it to the test suit, but it could take some time.

Thanks,
Alexey

Attachments:

0001-Use-only-replayed-position-as-target-flush-point-4-pg-master.patchtext/x-patch; charset=UTF-8; name=0001-Use-only-replayed-position-as-target-flush-point-4-pg-master.patchDownload+10-2
#10Michael Paquier
michael@paquier.xyz
In reply to: Alexey Makhmutov (#9)
Re: Standby server with cascade logical replication could not be properly stopped under load

On Wed, May 28, 2025 at 07:12:25PM +0300, Alexey Makhmutov wrote:

Well, this is a great question, because it made me thinking whether we need
to change behavior of THIS function at all. Extending
'GetStandbyFlushRecPtr' to properly support logical replication will just
make it equal to the 'GetXLogReplayRecPtr'. So, basically we just need to
replace invocation of 'GetStandbyFlushRecPtr' with 'GetXLogReplayRecPtr' and
this should solve the problem.

Exactly. I don't want to assume that all the callers are OK with this
forced changed in the context of a logical WAL sender inside the
function GetStandbyFlushRecPtr(), especially since the change can be
isolated where it only makes sense, aka inside the logical sender
loop.

This leaves us with only single function
to consider: 'IdentifySystem'. This function could be called both in case of
physical and logical replication, but I assume that we need to return
flushed WAL position in it as well, as it's described in the documentation
for this command. This also seems to be the right behavior, as primary goal
for 'identify' command is to inform client about the potential replication
start point and we definitely can start replication from the flushed
position (as we already has this record), just with some delay (once our
recovery get us to this point). But maybe somebody with more knowledge of
the logical replication expectation could confirm this.

I don't think that there is any need to care about IDENTIFY_SYSTEM.
It would be a short-lived command. Note that we document that the LSN
returned should be the current flush LSN, that could be used as a
starting point for START_REPLICATION. START_REPLICATION has different
properties and assumptions; it's expected to last with its inner logic
required for the shutdown sequence order with WAL senders.

The fix seems to be very simple in this case (assuming the logic above is
correct): we only replace 'GetStandbyFlushRecPtr' call in 'XLogSendLogical'
with the 'GetXLogReplayRecPtr' and that's it. Attached is the modified
patch. It could be applied on top of 16, 17 and master versions.

Yeah, I think that this is a sensible move. Documenting the reason
why GetXLogReplayRecPtr() is called is important, but the comment you
are suggesting could be better, IMO, say:
"For cascading logical WAL senders, we use the replay LSN rather than
the flush LSN, as decoding would only happen with records already
replayed. This distinction is important especially during the
shutdown sequence, as cascading logical WAL senders can only catch up
with records that have been already replayed, not flushed."

That feels that I'm repeating myself a bit twice if formulated this
way. If you have a better suggestion, feel free..

As for the test: do you want this case with shutdown to be added to the
existing test plan for standby logical recovery? The condition to trigger
the problem is just to have a flushed and replayed positions pointing to
different values during standby shutdown. I could try to add it to the test
suit, but it could take some time.

Anything I could think of for the moment would require the release of
an injection point, which is not something we can do during a shutdown
sequence with the existing facility as we cannot connect to the server
while it's in this late in its shutdown sequence: new connections are
prohibited once we expect the WAL senders to catch up. We could go
without that for now to get a fix in, your scripts are enough to
push all the logical WAL senders on the physical standby in an
infinite loop once its shutdown sequence initiates.

Bertrand and others, what do you think?
--
Michael

#11Ajin Cherian
itsajin@gmail.com
In reply to: Michael Paquier (#10)
Re: Standby server with cascade logical replication could not be properly stopped under load

On Thu, May 29, 2025 at 5:02 PM Michael Paquier <michael@paquier.xyz> wrote:

nd master versions.

Yeah, I think that this is a sensible move. Documenting the reason
why GetXLogReplayRecPtr() is called is important, but the comment you
are suggesting could be better, IMO, say:
"For cascading logical WAL senders, we use the replay LSN rather than
the flush LSN, as decoding would only happen with records already
replayed. This distinction is important especially during the
shutdown sequence, as cascading logical WAL senders can only catch up
with records that have been already replayed, not flushed."

That feels that I'm repeating myself a bit twice if formulated this
way. If you have a better suggestion, feel free..

I think this new fix is much better and cleaner. A suggestion for the

comment:
"For cascading logical WAL senders, we use the replay LSN instead of the
flush LSN, since logical decoding on a standby only processes WAL that has
been replayed. This distinction becomes particularly important during
shutdown, as new WAL is no longer replayed and the last replayed LSN marks
the furthest point up to which decoding can proceed."

regards,
Ajin Cherian
Fujitsu Australia

#12Michael Paquier
michael@paquier.xyz
In reply to: Ajin Cherian (#11)
Re: Standby server with cascade logical replication could not be properly stopped under load

On Thu, May 29, 2025 at 08:28:01PM +1000, Ajin Cherian wrote:

I think this new fix is much better and cleaner. A suggestion for the
comment:
"For cascading logical WAL senders, we use the replay LSN instead of the
flush LSN, since logical decoding on a standby only processes WAL that has
been replayed. This distinction becomes particularly important during
shutdown, as new WAL is no longer replayed and the last replayed LSN marks
the furthest point up to which decoding can proceed."

Yes, that sounds better than my previous suggestion. Thanks.
--
Michael

#13Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#12)
Re: Standby server with cascade logical replication could not be properly stopped under load

On Fri, May 30, 2025 at 08:32:05AM +0900, Michael Paquier wrote:

On Thu, May 29, 2025 at 08:28:01PM +1000, Ajin Cherian wrote:

I think this new fix is much better and cleaner. A suggestion for the
comment:
"For cascading logical WAL senders, we use the replay LSN instead of the
flush LSN, since logical decoding on a standby only processes WAL that has
been replayed. This distinction becomes particularly important during
shutdown, as new WAL is no longer replayed and the last replayed LSN marks
the furthest point up to which decoding can proceed."

Yes, that sounds better than my previous suggestion. Thanks.

I have reused that wording, did more tests on v16 and v17, reproducing
both the problems reported and that the change fixes the shutdown of
the physical standby, then applied the change down to v16.

Thanks all.
--
Michael

#14G. Sl
skokoveshat@gmail.com
In reply to: Michael Paquier (#13)
Re: Standby server with cascade logical replication could not be properly stopped under load

Hi Michael !
I've found the same behaviour in the 15.12 version. Any chances for
backpatch for this version ?
--
Best regards
Gennadiy S.
On Fri, 26 Dec 2025 at 16:11, Michael Paquier <michael@paquier.xyz> wrote:

Show quoted text

On Fri, May 30, 2025 at 08:32:05AM +0900, Michael Paquier wrote:

On Thu, May 29, 2025 at 08:28:01PM +1000, Ajin Cherian wrote:

I think this new fix is much better and cleaner. A suggestion for the
comment:
"For cascading logical WAL senders, we use the replay LSN instead of the
flush LSN, since logical decoding on a standby only processes WAL that

has

been replayed. This distinction becomes particularly important during
shutdown, as new WAL is no longer replayed and the last replayed LSN

marks

the furthest point up to which decoding can proceed."

Yes, that sounds better than my previous suggestion. Thanks.

I have reused that wording, did more tests on v16 and v17, reproducing
both the problems reported and that the change fixes the shutdown of
the physical standby, then applied the change down to v16.

Thanks all.
--
Michael

#15Michael Paquier
michael@paquier.xyz
In reply to: G. Sl (#14)
Re: Standby server with cascade logical replication could not be properly stopped under load

On Fri, Dec 26, 2025 at 04:14:17PM +0500, G. Sl wrote:

I've found the same behaviour in the 15.12 version. Any chances for
backpatch for this version ?

As presented on this thread, the problem reported involves at least a
cluster configuration A -> B -> C, where:
- A is a primary.
- B is a physical standby, streaming changes from A. In the test
setup, this node includes replication slots, that have been created
while the node is in standby mode. This action can only happen in v16
and newer versions, where logical decoding on standbys has been added.
- C is a primary node, streaming logical changes from B.

Based on how the state of B required, this would not apply to v15
because it is not possible to create replication slots on a standby.
Also you may want to update to 15.15 first.

I am open to arguments or discussions if you have found a new or
different problem, of course, but there is nothing we can do without
knowing more about your setup. An even better thing would be to have
a reproducible test case based on v15 to understand your problem, but
I can say for sure that we may be dealing with something different.
Please feel free to refer to the top of the thread, which offers an
excellent example of what a reproducible test case can be. By that I
mean something that we can reuse and reproduce the issue.

Also note that the change committed is 5231ed8262c9, that relies on
the existence of am_cascading_walsender in XLogSendLogical() to not
use GetFlushRecPtr() in all cases. Before the commit, we used
GetStandbyFlushRecPtr() for that in v16. Again, v15 just uses
GetFlushRecPtr(), but it should not be possible to find yourself in a
position where XLogSendLogical() is called while on a standby, no?
--
Michael

#16G. Sl
skokoveshat@gmail.com
In reply to: Michael Paquier (#15)
Re: Standby server with cascade logical replication could not be properly stopped under load

Hi Michael ! Thx for clarifying.
No, I don't have a replica with logical slots, just a primary with logical
replication ( cdc debezium ) and sometimes it shutdown(restart) hangs until
killing of replication processes.
So it looks like I should investigate further and maybe, create a new issue.

On Mon, 29 Dec 2025 at 05:10, Michael Paquier <michael@paquier.xyz> wrote:

Show quoted text

On Fri, Dec 26, 2025 at 04:14:17PM +0500, G. Sl wrote:

I've found the same behaviour in the 15.12 version. Any chances for
backpatch for this version ?

As presented on this thread, the problem reported involves at least a
cluster configuration A -> B -> C, where:
- A is a primary.
- B is a physical standby, streaming changes from A. In the test
setup, this node includes replication slots, that have been created
while the node is in standby mode. This action can only happen in v16
and newer versions, where logical decoding on standbys has been added.
- C is a primary node, streaming logical changes from B.

Based on how the state of B required, this would not apply to v15
because it is not possible to create replication slots on a standby.
Also you may want to update to 15.15 first.

I am open to arguments or discussions if you have found a new or
different problem, of course, but there is nothing we can do without
knowing more about your setup. An even better thing would be to have
a reproducible test case based on v15 to understand your problem, but
I can say for sure that we may be dealing with something different.
Please feel free to refer to the top of the thread, which offers an
excellent example of what a reproducible test case can be. By that I
mean something that we can reuse and reproduce the issue.

Also note that the change committed is 5231ed8262c9, that relies on
the existence of am_cascading_walsender in XLogSendLogical() to not
use GetFlushRecPtr() in all cases. Before the commit, we used
GetStandbyFlushRecPtr() for that in v16. Again, v15 just uses
GetFlushRecPtr(), but it should not be possible to find yourself in a
position where XLogSendLogical() is called while on a standby, no?
--
Michael