walsender.c fileheader comment
Hi,
I was reading the walsender.c fileheader comment while studying
another thread. I think if there is logical replication in progress
then the PROCSIG_WALSND_INIT_STOPPING handler will *always* switch to
a "stopping" state: e.g.,
/*
* Handle PROCSIG_WALSND_INIT_STOPPING signal.
*/
void
HandleWalSndInitStopping(void)
{
Assert(am_walsender);
/*
* If replication has not yet started, die like with SIGTERM. If
* replication is active, only set a flag and wake up the main loop. It
* will send any outstanding WAL, wait for it to be replicated to the
* standby, and then exit gracefully.
*/
if (!replication_active)
kill(MyProcPid, SIGTERM);
else
got_STOPPING = true;
}
~~~
But the walsender.c fileheader comment seems to be saying something
slightly different. IIUC, some minor rewording of the comment is
needed so it describes the code better.
HEAD
...
* shutdown, if logical replication is in progress all existing WAL records
* are processed followed by a shutdown. Otherwise this causes the walsender
* to switch to the "stopping" state. In this state, the walsender will reject
* any further replication commands. The checkpointer begins the shutdown
...
SUGGESTION
.. shutdown. If logical replication is in progress, the walsender
switches to a "stopping" state. In this state, the walsender will
reject any further replication commands - but all existing WAL records
are processed - followed by a shutdown.
~~~
I attached a patch for the above-suggested change.
Thoughts?
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
v1-0001-walsender-fileheader-comment.patchapplication/octet-stream; name=v1-0001-walsender-fileheader-comment.patchDownload
From 7188680f0f3ce81df6bf8067940b51257fbf3b5a Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Tue, 11 Jun 2024 12:31:05 +1000
Subject: [PATCH v1] walsender fileheader comment
---
src/backend/replication/walsender.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index c623b07..1d41aea 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -27,14 +27,14 @@
* If the server is shut down, checkpointer sends us
* PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited. If
* the backend is idle or runs an SQL query this causes the backend to
- * shutdown, if logical replication is in progress all existing WAL records
- * are processed followed by a shutdown. Otherwise this causes the walsender
- * to switch to the "stopping" state. In this state, the walsender will reject
- * any further replication commands. The checkpointer begins the shutdown
- * checkpoint once all walsenders are confirmed as stopping. When the shutdown
- * checkpoint finishes, the postmaster sends us SIGUSR2. This instructs
- * walsender to send any outstanding WAL, including the shutdown checkpoint
- * record, wait for it to be replicated to the standby, and then exit.
+ * shutdown. If logical replication is in progress, the walsender switches to a
+ * "stopping" state. In this state, the walsender will reject any further
+ * replication commands - but all existing WAL records are processed - followed
+ * by a shutdown. The checkpointer begins the shutdown checkpoint once all
+ * walsenders are confirmed as stopping. When the shutdown checkpoint finishes,
+ * the postmaster sends us SIGUSR2. This instructs walsender to send any
+ * outstanding WAL, including the shutdown checkpoint record, wait for it to be
+ * replicated to the standby, and then exit.
*
*
* Portions Copyright (c) 2010-2024, PostgreSQL Global Development Group
--
1.8.3.1
On 6/11/24 04:35, Peter Smith wrote:
Hi,
I was reading the walsender.c fileheader comment while studying
another thread. I think if there is logical replication in progress
then the PROCSIG_WALSND_INIT_STOPPING handler will *always* switch to
a "stopping" state: e.g.,/*
* Handle PROCSIG_WALSND_INIT_STOPPING signal.
*/
void
HandleWalSndInitStopping(void)
{
Assert(am_walsender);/*
* If replication has not yet started, die like with SIGTERM. If
* replication is active, only set a flag and wake up the main loop. It
* will send any outstanding WAL, wait for it to be replicated to the
* standby, and then exit gracefully.
*/
if (!replication_active)
kill(MyProcPid, SIGTERM);
else
got_STOPPING = true;
}~~~
But the walsender.c fileheader comment seems to be saying something
slightly different. IIUC, some minor rewording of the comment is
needed so it describes the code better.HEAD
...
* shutdown, if logical replication is in progress all existing WAL records
* are processed followed by a shutdown. Otherwise this causes the walsender
* to switch to the "stopping" state. In this state, the walsender will reject
* any further replication commands. The checkpointer begins the shutdown
...SUGGESTION
.. shutdown. If logical replication is in progress, the walsender
switches to a "stopping" state. In this state, the walsender will
reject any further replication commands - but all existing WAL records
are processed - followed by a shutdown.~~~
I attached a patch for the above-suggested change.
Thoughts?
I did look at this, and while the explanation in the current comment may
seem a bit confusing, I'm not sure the suggested changes improve the
situation very much.
This suggests the two comments somehow disagree, but it does not say in
what exactly, so perhaps I just missed it :-(
ISTM there's a bit of confusion what is meant by "stopping" state - you
seem to be interpreting it as a general concept, where the walsender is
requested to stop (through the signal), and starts doing stuff to exit.
But the comments actually talk about WalSnd->state, where "stopping"
means it needs to be set to WALSNDSTATE_STOPPING.
And we only ever switch to that state in two places - in WalSndPhysical
and exec_replication_command. And that does not happen in regular
logical replication (which is what "logical replication is in progress"
refers to) - if you have a walsender just replicating DML, it will never
see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while
still in WALSNDSTATE_STREAMING state, and then just exit.
So from this point of view, the suggestion is actually wrong.
To conclude, I think this probably makes the comments more confusing. If
we want to make it clearer, I'd probably start by clarifying what the
"stopping" state means. Also, it's a bit surprising we may not actually
go through the "stopping" state during shutdown.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi, Thankyou for taking the time to look at this and reply.
I did look at this, and while the explanation in the current comment may
seem a bit confusing, I'm not sure the suggested changes improve the
situation very much.This suggests the two comments somehow disagree, but it does not say in
what exactly, so perhaps I just missed it :-(ISTM there's a bit of confusion what is meant by "stopping" state - you
seem to be interpreting it as a general concept, where the walsender is
requested to stop (through the signal), and starts doing stuff to exit.
But the comments actually talk about WalSnd->state, where "stopping"
means it needs to be set to WALSNDSTATE_STOPPING.
Yes, I interpreted the "stopping" state meaning as when the boolean
flag 'got_STOPPING' is assigned true.
And we only ever switch to that state in two places - in WalSndPhysical
and exec_replication_command. And that does not happen in regular
logical replication (which is what "logical replication is in progress"
refers to) - if you have a walsender just replicating DML, it will never
see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while
still in WALSNDSTATE_STREAMING state, and then just exit.So from this point of view, the suggestion is actually wrong.
OK.
To conclude, I think this probably makes the comments more confusing. If
we want to make it clearer, I'd probably start by clarifying what the
"stopping" state means. Also, it's a bit surprising we may not actually
go through the "stopping" state during shutdown.
I agree. My interpretation of the (ambiguous) "stopping" state led me
to believe the comment was quite wrong. So, this thread was only
intended as a trivial comment fix in passing but clearly there is more
to this than I anticipated. I would be happy if someone with more
knowledge about the WALSNDSTATE_STOPPING versus got_STOPPING could
disambiguate the file header comment, but that's not me, so I have
withdrawn this from the Commitfest.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On 7/19/24 07:02, Peter Smith wrote:
...
To conclude, I think this probably makes the comments more confusing. If
we want to make it clearer, I'd probably start by clarifying what the
"stopping" state means. Also, it's a bit surprising we may not actually
go through the "stopping" state during shutdown.I agree. My interpretation of the (ambiguous) "stopping" state led me
to believe the comment was quite wrong. So, this thread was only
intended as a trivial comment fix in passing but clearly there is more
to this than I anticipated. I would be happy if someone with more
knowledge about the WALSNDSTATE_STOPPING versus got_STOPPING could
disambiguate the file header comment, but that's not me, so I have
withdrawn this from the Commitfest.
Understood. Thanks for the patch anyway, I appreciate you took the time
to try to improve the comments!
I agree the state transitions in walsender are not very clear, and the
fact that it may shutdown without ever going through STOPPING state is
quite confusing. That being said, I personally don't have ambition to
improve this.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company