Use standard SIGHUP and SIGTERM handlers in autoprewarm module
Hi,
Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler,
got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup) handlers which are
similar to standard signal handlers(except for a difference [1]apw_sigterm_handler() and apw_sighup_handler() use MyProc->procLatch if (MyProc) SetLatch(&MyProc->procLatch); where as standard handlers use MyLatch SetLatch(MyLatch); Both MyProc->procLatch and MyLatch point to same, see comment from global.c /* * MyLatch points to the latch that should be used for signal handling by the * current process. It will either point to a process local latch if the * current process does not have a PGPROC entry in that moment, or to * PGPROC->procLatch if it has. *Thus it can always be used in signal handlers, * without checking for its existence.* */ struct Latch *MyLatch;). Isn't it
good to remove them and use standard SignalHandlerForConfigReload and
SignalHandlerForShutdownRequest?
Attaching the patch for the above changes.
Looks like the commit[2]commit 1e53fe0e70f610c34f4c9e770d108cd94151342c Author: Robert Haas <rhaas@postgresql.org> Date: 2019-12-17 13:03:57 -0500 replaced custom handlers with standard handlers.
Thoughts?
[1]: apw_sigterm_handler() and apw_sighup_handler() use MyProc->procLatch if (MyProc) SetLatch(&MyProc->procLatch); where as standard handlers use MyLatch SetLatch(MyLatch); Both MyProc->procLatch and MyLatch point to same, see comment from global.c /* * MyLatch points to the latch that should be used for signal handling by the * current process. It will either point to a process local latch if the * current process does not have a PGPROC entry in that moment, or to * PGPROC->procLatch if it has. *Thus it can always be used in signal handlers, * without checking for its existence.* */ struct Latch *MyLatch;
if (MyProc)
SetLatch(&MyProc->procLatch);
where as standard handlers use MyLatch
SetLatch(MyLatch);
Both MyProc->procLatch and MyLatch point to same, see comment from global.c
/*
* MyLatch points to the latch that should be used for signal handling by
the
* current process. It will either point to a process local latch if the
* current process does not have a PGPROC entry in that moment, or to
* PGPROC->procLatch if it has.
*Thus it can always be used in signal handlers, * without checking for its
existence.*
*/
struct Latch *MyLatch;
(gdb) p MyProc->procLatch
$6 = {is_set = 0, is_shared = true, owner_pid = 1448807}
(gdb) p MyLatch
*$7 = (struct Latch *) 0x7fcacc6d902c*
(gdb) p &MyProc->procLatch
*$8 = (Latch *) 0x7fcacc6d902c*
(gdb) p *MyLatch
$9 = {is_set = 0, is_shared = true, owner_pid = 1448807}
[2]: commit 1e53fe0e70f610c34f4c9e770d108cd94151342c Author: Robert Haas <rhaas@postgresql.org> Date: 2019-12-17 13:03:57 -0500
Author: Robert Haas <rhaas@postgresql.org>
Date: 2019-12-17 13:03:57 -0500
Use PostgresSigHupHandler in more places.
There seems to be no reason for every background process to have
its own flag indicating that a config-file reload is needed.
Instead, let's just use ConfigFilePending for that purpose
everywhere.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
On 2020/10/05 19:45, Bharath Rupireddy wrote:
Hi,
Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup) handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove them and  use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest?
Attaching the patch for the above changes.
Looks like the commit[2] replaced custom handlers with standard handlers.
Thoughts?
+1
The patch looks good to me.
ISTM that we can also replace StartupProcSigHupHandler() in startup.c
with SignalHandlerForConfigReload() by making the startup process use
the general shared latch instead of its own one. POC patch attached.
Thought?
Probably we can also replace sigHupHandler() in syslogger.c with
SignalHandlerForConfigReload()? This would be separate patch, though.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
On Mon, Oct 5, 2020 at 8:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/10/05 19:45, Bharath Rupireddy wrote:
Hi,
Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup) handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove them and use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest?
Attaching the patch for the above changes.
Looks like the commit[2] replaced custom handlers with standard handlers.
Thoughts?
+1
The patch looks good to me.
Thanks.
ISTM that we can also replace StartupProcSigHupHandler() in startup.c
with SignalHandlerForConfigReload() by making the startup process use
the general shared latch instead of its own one. POC patch attached.
Thought?
I'm not quite sure whether it breaks something or not. I see that
WakeupRecovery() with XLogCtl->recoveryWakeupLatch latch from the
startup process is also being used in the walreceiver process. I may
be wrong, but have some concern if the behaviour is different in case
of EXEC_BACKEND and Windows?
Another concern is that we are always using
XLogCtl->recoveryWakeupLatch in shared mode, this makes sense as this
latch is also being used in walrecevier. But sometimes, MyLatch is
created in non-shared mode as well(see InitLatch(MyLatch)).
Others may have better thoughts though.
Probably we can also replace sigHupHandler() in syslogger.c with
SignalHandlerForConfigReload()? This would be separate patch, though.
+1 to replace sigHupHandler() with SignalHandlerForConfigReload() as
the latch and the functionality are pretty much the same.
WalReceiverMai(): I think we can also replace WalRcvShutdownHandler()
with SignalHandlerForShutdownRequest() because walrcv->latch point to
&MyProc->procLatch which in turn point to MyLatch.
Thoughts? If okay, we can combine these into a single patch. I will
post an updated patch soon.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 2020/10/06 1:18, Bharath Rupireddy wrote:
On Mon, Oct 5, 2020 at 8:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/10/05 19:45, Bharath Rupireddy wrote:
Hi,
Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup) handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove them and use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest?
Attaching the patch for the above changes.
Looks like the commit[2] replaced custom handlers with standard handlers.
Thoughts?
+1
The patch looks good to me.
Thanks.
ISTM that we can also replace StartupProcSigHupHandler() in startup.c
with SignalHandlerForConfigReload() by making the startup process use
the general shared latch instead of its own one. POC patch attached.
Thought?I'm not quite sure whether it breaks something or not. I see that
WakeupRecovery() with XLogCtl->recoveryWakeupLatch latch from the
startup process is also being used in the walreceiver process. I may
be wrong, but have some concern if the behaviour is different in case
of EXEC_BACKEND and Windows?
Unless I'm wrong, regarding MyLatch, the behavior is not different
whether in EXEC_BACKEND or not. In both cases, SwitchToSharedLatch()
is called and MyLatch is set to the shared latch, i.e., MyProc->procLatch.
Another concern is that we are always using
XLogCtl->recoveryWakeupLatch in shared mode, this makes sense as this
latch is also being used in walrecevier. But sometimes, MyLatch is
created in non-shared mode as well(see InitLatch(MyLatch)).
Yes, and then the startup process calls SwitchToSharedLatch() and
repoint MyLatch to the shared one.
Others may have better thoughts though.
Okay, I will reconsider the patch and post it separately later if necessary.
Probably we can also replace sigHupHandler() in syslogger.c with
SignalHandlerForConfigReload()? This would be separate patch, though.+1 to replace sigHupHandler() with SignalHandlerForConfigReload() as
the latch and the functionality are pretty much the same.WalReceiverMai(): I think we can also replace WalRcvShutdownHandler()
with SignalHandlerForShutdownRequest() because walrcv->latch point to
&MyProc->procLatch which in turn point to MyLatch.Thoughts? If okay, we can combine these into a single patch. I will
post an updated patch soon.
+1 Or it's also ok to make each patch separately.
Anyway, thanks!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Mon, Oct 05, 2020 at 11:34:05PM +0900, Fujii Masao wrote:
ISTM that we can also replace StartupProcSigHupHandler() in startup.c
with SignalHandlerForConfigReload() by making the startup process use
the general shared latch instead of its own one. POC patch attached.
Thought?
That looks good to me. Nice cleanup.
Probably we can also replace sigHupHandler() in syslogger.c with
SignalHandlerForConfigReload()? This would be separate patch, though.
+1.
--
Michael
On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Probably we can also replace sigHupHandler() in syslogger.c with
SignalHandlerForConfigReload()? This would be separate patch, though.+1 to replace sigHupHandler() with SignalHandlerForConfigReload() as
the latch and the functionality are pretty much the same.WalReceiverMai(): I think we can also replace WalRcvShutdownHandler()
with SignalHandlerForShutdownRequest() because walrcv->latch point to
&MyProc->procLatch which in turn point to MyLatch.Thoughts? If okay, we can combine these into a single patch. I will
post an updated patch soon.+1 Or it's also ok to make each patch separately.
Anyway, thanks!
Thanks. +1 to have separate patches. I will post two separate patches
for autoprewarm and WalRcvShutdownHandler for further review. The
other two patches can be for startup.c and syslogger.c.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
+1 Or it's also ok to make each patch separately.
Anyway, thanks!Thanks. +1 to have separate patches. I will post two separate patches
for autoprewarm and WalRcvShutdownHandler for further review. The
other two patches can be for startup.c and syslogger.c.
I'm attaching all the 4 patches here together.
1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
-- This is the patch initially sent in this mail by me, I just
renamed it.
2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
-- This is the patch proposed by Fuji Masao, I just renamed it.
3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch
Please consider these patches for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
On Wed, Oct 7, 2020 at 8:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
+1 Or it's also ok to make each patch separately.
Anyway, thanks!Thanks. +1 to have separate patches. I will post two separate patches
for autoprewarm and WalRcvShutdownHandler for further review. The
other two patches can be for startup.c and syslogger.c.I'm attaching all the 4 patches here together.
1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
-- This is the patch initially sent in this mail by me, I just
renamed it.
2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
-- This is the patch proposed by Fuji Masao, I just renamed it.
3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patchPlease consider these patches for further review.
Added this to the commitfest for further review.
https://commitfest.postgresql.org/30/2756/
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 7, 2020 at 8:39 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Oct 7, 2020 at 8:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <
masao.fujii@oss.nttdata.com> wrote:
+1 Or it's also ok to make each patch separately.
Anyway, thanks!Thanks. +1 to have separate patches. I will post two separate patches
for autoprewarm and WalRcvShutdownHandler for further review. The
other two patches can be for startup.c and syslogger.c.
[ CC'd Robert Haas since he's the main name in interrupt.c,
test_shm_mq/worker.c, ]
src/test/modules/test_shm_mq/worker.c appears to do the right thing the
wrong way - it has its own custom handler instead of using die() or
SignalHandlerForShutdownRequest().
In contrast src/test/modules/worker_spi/worker_spi.c looks plain wrong.
Especially since it's quoted as an example of how to do things right. It
won't respond to SIGTERM at all while it's executing a query from its
queue, no matter how long that query takes or whether it blocks. It can
inhibit even postmaster shutdown as a result.
I was going to lob off a quick patch to fix this by making both use
quickdie() for SIGQUIT and die() for SIGTERM, but after reading for a bit
I'm no longer sure what the right choice even is. I'd welcome some opinions.
The problem is that but interrupt.c and interrupt.h actually define and
recommend different and simpler handlers for these jobs - ones that don't
actually work properly for code that calls into Pg core and might rely on
CHECK_FOR_INTERRUPTS(), InterruptsPending and ProcDiePending to properly
respect SIGTERM.
And to add to the confusion the bgworker infra adds its own different
default SIGTERM handler bgworker_die() that's weirdly in-between the
interrupt.c and postgres.c signal handling.
So I'm no longer sure how the example code should even be fixed. I'm not
convinced everyone using die() and quickdie() is good given they currently
seem to be assumed to be mainly for user backends. Maybe wwe should move
them to interrupt.c along with CHECK_FOR_INTERRUPTS(), ProcessInterrupts,
etc and document them as for all database-connected or shmem-connected
backends to use.
So in the medium term, interrupt.c's SignalHandlerForShutdownRequest() and
SignalHandlerForCrashExit() should be combined with die() and quickdie(),
integrating properly with CHECK_FOR_INTERRUPTS(), ProcessInterrupts(), etc.
We can add a hook we call before we proc_exit() in response to
ProcDiePending so backends can choose to mask it if there's a period during
which they wish to defer responding to SIGTERM, but by default everything
will respect SIGTERM -> die() sets ProcDiePending -> CHECK_FOR_INTERRUPTS()
-> ProcessInterrupts() -> proc_exit() . interrupt.c's
SignalHandlerForCrashExit() and SignalHandlerForShutdownRequest() become
deprecated/legacy. We add a separate temporary handler that's installed by
init.c for early SIGQUIT handling but document it as to be replaced after
backends start properly. We'd delete the bgw-specific signal handlers and
install die() and procdie() instead during StartBackgroundWorker - at least
if the bgw is connecting to shmem or a database. interrupt.c's
HandleMainLoopInterrupts() could be static inlined, and adopted in the
bgworker examples and all the other places that currently do
ConfigReloadPending / ProcessConfigFile() etc themselves.
It wouldn't be a clean sweep of consistent signal handling, given all the
funky stuff we have in the checkpointer, walsender, etc. But I think it
might help...
(And maybe I could even combine the various am_foo and is_bar globals we
use to identify different sorts of backend, while doing such cleanups).
On 2020/10/07 11:30, Bharath Rupireddy wrote:
On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
+1 Or it's also ok to make each patch separately.
Anyway, thanks!Thanks. +1 to have separate patches. I will post two separate patches
for autoprewarm and WalRcvShutdownHandler for further review. The
other two patches can be for startup.c and syslogger.c.I'm attaching all the 4 patches here together.
1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
-- This is the patch initially sent in this mail by me, I just
renamed it.
2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
-- This is the patch proposed by Fuji Masao, I just renamed it.
3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patchPlease consider these patches for further review.
Thanks for the patches! They look good to me. So barring any objections,
I will commit them one by one.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/10/29 15:21, Fujii Masao wrote:
On 2020/10/07 11:30, Bharath Rupireddy wrote:
On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
+1 Or it's also ok to make each patch separately.
Anyway, thanks!Thanks. +1 to have separate patches. I will post two separate patches
for autoprewarm and WalRcvShutdownHandler for further review. The
other two patches can be for startup.c and syslogger.c.I'm attaching all the 4 patches here together.
1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
--Â This is the patch initially sent in this mail by me, I just
renamed it.
2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
-- This is the patch proposed by Fuji Masao, I just renamed it.
3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patchPlease consider these patches for further review.
Thanks for the patches! They look good to me. So barring any objections,
I will commit them one by one.
I pushed the following two patches.
- v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
- v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
Regarding other two patches, I think that it's better to use MyLatch
rather than MyProc->procLatch or walrcv->latch in WaitLatch() and
ResetLatch(), like other code does. Attached are the updated versions
of the patches. Thought?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Regarding other two patches, I think that it's better to use MyLatch
rather than MyProc->procLatch or walrcv->latch in WaitLatch() and
ResetLatch(), like other code does. Attached are the updated versions
of the patches. Thought?
+1 for replacing MyProc->procLatch with MyLatch in the autoprewarm
module, and the patch looks good to me.
I'm not quite sure to replace all the places in the walreceiver
process, for instance in WalRcvForceReply() we are using spinlock to
acquire the latch pointer and . Others may have better thoughts on
this.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
At Wed, 4 Nov 2020 21:16:29 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Regarding other two patches, I think that it's better to use MyLatch
rather than MyProc->procLatch or walrcv->latch in WaitLatch() and
ResetLatch(), like other code does. Attached are the updated versions
of the patches. Thought?+1 for replacing MyProc->procLatch with MyLatch in the autoprewarm
module, and the patch looks good to me.
Looks good to me, too.
I'm not quite sure to replace all the places in the walreceiver
process, for instance in WalRcvForceReply() we are using spinlock to
acquire the latch pointer and . Others may have better thoughts on
this.
The SIGTERM part looks good. The only difference between
WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch
is set or not. I don't think it's a problem that config-reload causes
an extra wakeup. Couldn't we do the same thing for SIGHUP?
We might even be able to reload config file in
ProcessWalRcvInterrupts().
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020/11/05 12:12, Kyotaro Horiguchi wrote:
At Wed, 4 Nov 2020 21:16:29 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Regarding other two patches, I think that it's better to use MyLatch
rather than MyProc->procLatch or walrcv->latch in WaitLatch() and
ResetLatch(), like other code does. Attached are the updated versions
of the patches. Thought?+1 for replacing MyProc->procLatch with MyLatch in the autoprewarm
module, and the patch looks good to me.Looks good to me, too.
Thanks for the review! I pushed the patch.
I'm not quite sure to replace all the places in the walreceiver
process, for instance in WalRcvForceReply() we are using spinlock to
acquire the latch pointer and . Others may have better thoughts on
this.The SIGTERM part looks good. The only difference between
WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch
is set or not. I don't think it's a problem that config-reload causes
an extra wakeup. Couldn't we do the same thing for SIGHUP?
I agree that we can use even standard SIGHUP handler in walreceiver.
I'm not sure why SetLatch() was not called in walreceiver's SIGHUP
handler so far.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Fri, Nov 6, 2020 at 11:00 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
I'm not quite sure to replace all the places in the walreceiver
process, for instance in WalRcvForceReply() we are using spinlock to
acquire the latch pointer and . Others may have better thoughts on
this.The SIGTERM part looks good. The only difference between
WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch
is set or not. I don't think it's a problem that config-reload causes
an extra wakeup. Couldn't we do the same thing for SIGHUP?I agree that we can use even standard SIGHUP handler in walreceiver.
I'm not sure why SetLatch() was not called in walreceiver's SIGHUP
handler so far.
The main reason for having SetLatch() in
SignalHandlerForConfigReload() is to wake up the calling process if
waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
config file and use the reloaded config variables. Maybe we should
give a thought on the scenarios in which the walreceiver process
waits, and what happens in case the latch is set when SIGHUP is
received.
And also, I think it's worth having a look at the commit 40f908bdcdc7
that introduced WalRcvSigHupHandler() and 597a87ccc that replaced
custom latch with procLatch.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
At Sat, 7 Nov 2020 19:31:21 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
The main reason for having SetLatch() in
SignalHandlerForConfigReload() is to wake up the calling process if
waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
config file and use the reloaded config variables. Maybe we should
give a thought on the scenarios in which the walreceiver process
waits, and what happens in case the latch is set when SIGHUP is
received.
The difference is whether the config file is processed at the next
wakeup (by force-reply-request or SIGTERM) of walreceiver or
immediately. If server-reload happened frequently, say, several times
per second(?), we should consider to reduce the useless reloading, but
actually that's not the case.
And also, I think it's worth having a look at the commit 40f908bdcdc7
that introduced WalRcvSigHupHandler() and 597a87ccc that replaced
custom latch with procLatch.
At the time of the first patch, PostgreSQL processes used arbitrarily
implemented SIGHUP handlers for their own so it is natural that
walreceiver used its own one.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020/11/10 16:17, Kyotaro Horiguchi wrote:
At Sat, 7 Nov 2020 19:31:21 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
The main reason for having SetLatch() in
SignalHandlerForConfigReload() is to wake up the calling process if
waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
config file and use the reloaded config variables. Maybe we should
give a thought on the scenarios in which the walreceiver process
waits, and what happens in case the latch is set when SIGHUP is
received.The difference is whether the config file is processed at the next
wakeup (by force-reply-request or SIGTERM) of walreceiver or
immediately. If server-reload happened frequently, say, several times
per second(?), we should consider to reduce the useless reloading, but
actually that's not the case.
So, attached is the patch that makes walreceiver use both standard
SIGTERM and SIGHUP handlers. Currently I've not found any actual
issues by making walreceiver use standard SIGHUP handler, yet.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
On Tue, Nov 10, 2020 at 3:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
The main reason for having SetLatch() in
SignalHandlerForConfigReload() is to wake up the calling process if
waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
config file and use the reloaded config variables. Maybe we should
give a thought on the scenarios in which the walreceiver process
waits, and what happens in case the latch is set when SIGHUP is
received.The difference is whether the config file is processed at the next
wakeup (by force-reply-request or SIGTERM) of walreceiver or
immediately. If server-reload happened frequently, say, several times
per second(?), we should consider to reduce the useless reloading, but
actually that's not the case.So, attached is the patch that makes walreceiver use both standard
SIGTERM and SIGHUP handlers. Currently I've not found any actual
issues by making walreceiver use standard SIGHUP handler, yet.
I think it makes sense to replace WalRcv->latch with
MyProc->procLatch(as they point to the same latch) in the functions
that are called in the walreceiver process. However, we are using
walrcv->latch with spinlock, say in WalRcvForceReply() and
RequestXLogStreaming() both are called from the startup process. See
commit 45f9d08684, that made the access to the walreceiver's
latch(WalRcv->latch) by the other process(startup) spinlock protected
And looks like, in general it's a standard practice to set latch to
wake up the process if waiting in case a SIGHUP signal is received and
reload the relevant config variables.
Going by the above analysis, the v3 patch looks good to me.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hi,
Attaching a patch that replaces custom signal handlers for SIGHUP and
SIGTERM in worker_spi.c.
Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
On 2020/11/10 21:30, Bharath Rupireddy wrote:
On Tue, Nov 10, 2020 at 3:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
The main reason for having SetLatch() in
SignalHandlerForConfigReload() is to wake up the calling process if
waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
config file and use the reloaded config variables. Maybe we should
give a thought on the scenarios in which the walreceiver process
waits, and what happens in case the latch is set when SIGHUP is
received.The difference is whether the config file is processed at the next
wakeup (by force-reply-request or SIGTERM) of walreceiver or
immediately. If server-reload happened frequently, say, several times
per second(?), we should consider to reduce the useless reloading, but
actually that's not the case.So, attached is the patch that makes walreceiver use both standard
SIGTERM and SIGHUP handlers. Currently I've not found any actual
issues by making walreceiver use standard SIGHUP handler, yet.I think it makes sense to replace WalRcv->latch with
MyProc->procLatch(as they point to the same latch) in the functions
that are called in the walreceiver process. However, we are using
walrcv->latch with spinlock, say in WalRcvForceReply() and
RequestXLogStreaming() both are called from the startup process. See
commit 45f9d08684, that made the access to the walreceiver's
latch(WalRcv->latch) by the other process(startup) spinlock protectedAnd looks like, in general it's a standard practice to set latch to
wake up the process if waiting in case a SIGHUP signal is received and
reload the relevant config variables.Going by the above analysis, the v3 patch looks good to me.
Thanks for the analysis! I pushed the patch.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Thu, Nov 12, 2020 at 10:06 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
Thanks for the analysis! I pushed the patch.
Thanks! Since we are replacing custom SIGHUP and SIGTERM handlers with
standard ones, how about doing the same thing in worker_spi.c? I
posted a patch previously [1]/messages/by-id/CALj2ACXDEZhAFOTDcqO9cFSRvrFLyYOnPKrcA1UG4uZn9hUAVg@mail.gmail.com in this mail thread. If it makes sense,
please review it.
[1]: /messages/by-id/CALj2ACXDEZhAFOTDcqO9cFSRvrFLyYOnPKrcA1UG4uZn9hUAVg@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 2020/11/13 20:24, Bharath Rupireddy wrote:
On Thu, Nov 12, 2020 at 10:06 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:Thanks for the analysis! I pushed the patch.
Thanks! Since we are replacing custom SIGHUP and SIGTERM handlers with
standard ones, how about doing the same thing in worker_spi.c? I
posted a patch previously [1] in this mail thread. If it makes sense,
please review it.
I agree to simplify the worker_spi code by making it use the standard
signal handlers. But as far as I read Craig Ringer's comments upthread
about worker_spi, it's not enough to just replace the dedicated SIGTERM
handler with the standard one. ISTM that probably worker_spi should
use the signal handler handling InterruptPending and ProcDiePending
like die() does. What do you think about Craig Ringer's comments?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Thanks Craig!
On Fri, Oct 23, 2020 at 9:37 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong way - it has its own custom handler instead of using die() or SignalHandlerForShutdownRequest().
handle_sigterm() and die() are similar except that die() has extra
handling(below) for exiting immediately when waiting for input/command
from the client.
/*
* If we're in single user mode, we want to quit immediately - we can't
* rely on latches as they wouldn't work when stdin/stdout is a file.
* Rather ugly, but it's unlikely to be worthwhile to invest much more
* effort just for the benefit of single user mode.
*/
if (DoingCommandRead && whereToSendOutput != DestRemote)
ProcessInterrupts();
Having this extra handling is correct for normal backends as they can
connect directly to clients for reading input commands, the above if
condition may become true and ProcessInterrupts() may be called to
handle the SIGTERM immediately.
Note that DoingCommandRead can never be true in bg workers as it's
being set to true only in normal backend PostgresMain(). If we use
die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
a bg worker/non-backend process, there are no chances that the above
part of the code gets hit and the interrupts are processed
immediately. And also here are the bg worker process that use die()
instead of their own custom handlers: parallel workers, autoprewarm
worker, autovacuum worker, logical replication launcher and apply
worker, wal sender process
I think we can also use die() instead of handle_sigterm() in
test_shm_mq/worker.c.
I attached a patch for this change.
In contrast src/test/modules/worker_spi/worker_spi.c looks plain wrong. Especially since it's quoted as an example of how to do things right. It won't respond to SIGTERM at all while it's executing a query from its queue, no matter how long that query takes or whether it blocks. It can inhibit even postmaster shutdown as a result.
Postmaster sends SIGTERM to all children(backends and bgworkers) for
both smart shutdown(wait for children to end their work, then shut
down.) and fast shutdown(rollback active transactions and shutdown
when they are gone.) For immediate shutdown SIGQUIT is sent to
children.(see pmdie()).
For each bg worker(so is for worker_spi.c),
SignalHandlerForCrashExit() is set as SIGQUIT handler in
InitPostmasterChild(), which does nothing but exits immediately. We
can not use quickdie() here because as a bg worker we don't have
to/can not send anything to client. We are good with
SignalHandlerForCrashExit() as with all other bg workers.
Yes, if having worker_spi_sigterm/SignalHandlerForShutdownRequest,
sometimes(as explained above) the worker_spi worker may not respond to
SIGTERM. I think we should be having die() as SIGTERM handler here (as
with normal backend and parallel workers).
Attaching another patch that has replaced custom SIGTERM handler
worker_spi_sigterm() with die() and custom SIGHUP handler
worker_spi_sighup() with standard SignalHandlerForConfigReload().
I was going to lob off a quick patch to fix this by making both use quickdie() for SIGQUIT and die() for SIGTERM, but after reading for a bit I'm no longer sure what the right choice even is. I'd welcome some opinions.
We can not use quickdie() here because as a bg worker we don't have
to/can not send anything to client. We are good with
SignalHandlerForCrashExit() as with all other bg workers.
Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
On 2020/11/17 21:18, Bharath Rupireddy wrote:
Thanks Craig!
On Fri, Oct 23, 2020 at 9:37 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong way - it has its own custom handler instead of using die() or SignalHandlerForShutdownRequest().
handle_sigterm() and die() are similar except that die() has extra
handling(below) for exiting immediately when waiting for input/command
from the client.
/*
* If we're in single user mode, we want to quit immediately - we can't
* rely on latches as they wouldn't work when stdin/stdout is a file.
* Rather ugly, but it's unlikely to be worthwhile to invest much more
* effort just for the benefit of single user mode.
*/
if (DoingCommandRead && whereToSendOutput != DestRemote)
ProcessInterrupts();Having this extra handling is correct for normal backends as they can
connect directly to clients for reading input commands, the above if
condition may become true and ProcessInterrupts() may be called to
handle the SIGTERM immediately.Note that DoingCommandRead can never be true in bg workers as it's
being set to true only in normal backend PostgresMain(). If we use
die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
a bg worker/non-backend process, there are no chances that the above
part of the code gets hit and the interrupts are processed
immediately. And also here are the bg worker process that use die()
instead of their own custom handlers: parallel workers, autoprewarm
worker, autovacuum worker, logical replication launcher and apply
worker, wal sender processI think we can also use die() instead of handle_sigterm() in
test_shm_mq/worker.c.I attached a patch for this change.
Thanks for the patch! It looks good to me.
BTW, I tried to find the past discussion about why die() was not used,
but I could not yet.
In contrast src/test/modules/worker_spi/worker_spi.c looks plain wrong. Especially since it's quoted as an example of how to do things right. It won't respond to SIGTERM at all while it's executing a query from its queue, no matter how long that query takes or whether it blocks. It can inhibit even postmaster shutdown as a result.
Postmaster sends SIGTERM to all children(backends and bgworkers) for
both smart shutdown(wait for children to end their work, then shut
down.) and fast shutdown(rollback active transactions and shutdown
when they are gone.) For immediate shutdown SIGQUIT is sent to
children.(see pmdie()).For each bg worker(so is for worker_spi.c),
SignalHandlerForCrashExit() is set as SIGQUIT handler in
InitPostmasterChild(), which does nothing but exits immediately. We
can not use quickdie() here because as a bg worker we don't have
to/can not send anything to client. We are good with
SignalHandlerForCrashExit() as with all other bg workers.Yes, if having worker_spi_sigterm/SignalHandlerForShutdownRequest,
sometimes(as explained above) the worker_spi worker may not respond to
SIGTERM. I think we should be having die() as SIGTERM handler here (as
with normal backend and parallel workers).Attaching another patch that has replaced custom SIGTERM handler
worker_spi_sigterm() with die() and custom SIGHUP handler
worker_spi_sighup() with standard SignalHandlerForConfigReload().
This patch also looks good to me.
I was going to lob off a quick patch to fix this by making both use quickdie() for SIGQUIT and die() for SIGTERM, but after reading for a bit I'm no longer sure what the right choice even is. I'd welcome some opinions.
We can not use quickdie() here because as a bg worker we don't have
to/can not send anything to client. We are good with
SignalHandlerForCrashExit() as with all other bg workers.Thoughts?
I think you're right.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
handle_sigterm() and die() are similar except that die() has extra
handling(below) for exiting immediately when waiting for input/command
from the client.
/*
* If we're in single user mode, we want to quit immediately - we
can't
* rely on latches as they wouldn't work when stdin/stdout is a
file.
* Rather ugly, but it's unlikely to be worthwhile to invest much
more
* effort just for the benefit of single user mode.
*/
if (DoingCommandRead && whereToSendOutput != DestRemote)
ProcessInterrupts();Having this extra handling is correct for normal backends as they can
connect directly to clients for reading input commands, the above if
condition may become true and ProcessInterrupts() may be called to
handle the SIGTERM immediately.Note that DoingCommandRead can never be true in bg workers as it's
being set to true only in normal backend PostgresMain(). If we use
die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
a bg worker/non-backend process, there are no chances that the above
part of the code gets hit and the interrupts are processed
immediately. And also here are the bg worker process that use die()
instead of their own custom handlers: parallel workers, autoprewarm
worker, autovacuum worker, logical replication launcher and apply
worker, wal sender processI think we can also use die() instead of handle_sigterm() in
test_shm_mq/worker.c.I attached a patch for this change.
Thanks for the patch! It looks good to me.
Thanks.
BTW, I tried to find the past discussion about why die() was not used,
but I could not yet.
I think the commit 2505ce0 that altered the comment has something:
handle_sigterm() is stripped down to remove if (DoingCommandRead &&
whereToSendOutput != DestRemote) part from die() since test_shm_mq bg
worker or for that matter any bg worker do not have an equivalent of the
regular backend's command-read loop.
--- a/src/test/modules/test_shm_mq/worker.c
+++ b/src/test/modules/test_shm_mq/worker.c
@@ -60,13 +60,9 @@ test_shm_mq_main(Datum main_arg)
*
* We want CHECK_FOR_INTERRUPTS() to kill off this worker process
just as
* it would a normal user backend. To make that happen, we
establish a
- * signal handler that is a stripped-down version of die(). We
don't have
- * any equivalent of the backend's command-read loop, where
interrupts can
- * be processed immediately, so make sure ImmediateInterruptOK is
turned
- * off.
+ * signal handler that is a stripped-down version of die().
*/
pqsignal(SIGTERM, handle_sigterm);
- ImmediateInterruptOK = false;
BackgroundWorkerUnblockSignals();
Attaching another patch that has replaced custom SIGTERM handler
worker_spi_sigterm() with die() and custom SIGHUP handler
worker_spi_sighup() with standard SignalHandlerForConfigReload().This patch also looks good to me.
Thanks.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 2020/11/20 19:33, Bharath Rupireddy wrote:
On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
handle_sigterm() and die() are similar except that die() has extra
handling(below) for exiting immediately when waiting for input/command
from the client.
   /*
   * If we're in single user mode, we want to quit immediately - we can't
   * rely on latches as they wouldn't work when stdin/stdout is a file.
   * Rather ugly, but it's unlikely to be worthwhile to invest much more
   * effort just for the benefit of single user mode.
   */
   if (DoingCommandRead && whereToSendOutput != DestRemote)
     ProcessInterrupts();Having this extra handling is correct for normal backends as they can
connect directly to clients for reading input commands, the above if
condition may become true and ProcessInterrupts() may be called to
handle the SIGTERM immediately.Note that DoingCommandRead can never be true in bg workers as it's
being set to true only in normal backend PostgresMain(). If we use
die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
a bg worker/non-backend process, there are no chances that the above
part of the code gets hit and the interrupts are processed
immediately. And also here are the bg worker process that use die()
instead of their own custom handlers: parallel workers, autoprewarm
worker, autovacuum worker, logical replication launcher and apply
worker, wal sender processI think we can also use die() instead of handle_sigterm() in
test_shm_mq/worker.c.I attached a patch for this change.
Thanks for the patch! It looks good to me.
Thanks.
When I read the patch again, I found that, with the patch, the shutdown
of worker_spi causes to report the following FATAL message.
FATAL: terminating connection due to administrator command
Isn't this message confusing because it's not a connection? If so,
we need to update ProcessInterrupts() so that the proper message is
reported like other bgworkers do.
BTW, though this is not directly related to this topic, it's strange to me
that the normal shutdown of bgworker causes to emit FATAL message
and the message "background worker ... exited with exit code 1"
at the normal shutdown. I've heard sometimes that users coplained that
it's confusing to report that message for logical replication launcher
at the normal shutdown. Maybe the mechanism to shutdown bgworker
normally seems to have not been implemented well yet.
Without the patch, since worker_spi cannot handle the shutdown request
promptly while it's executing the backend code, maybe we can say this is
a bug. But worker_spi is a just example of the code, I'm not thinking to
do back-patch for now. Thought?
BTW, I tried to find the past discussion about why die() was not used,
but I could not yet.I think the commit 2505ce0 that altered the comment has something:
Thanks for the info! Will check that.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Wed, Nov 25, 2020 at 3:29 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
On 2020/11/20 19:33, Bharath Rupireddy wrote:
On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com
<mailto:masao.fujii@oss.nttdata.com>> wrote:
handle_sigterm() and die() are similar except that die() has extra
handling(below) for exiting immediately when waiting for
input/command
from the client.
/*
* If we're in single user mode, we want to quit immediately
- we can't
* rely on latches as they wouldn't work when stdin/stdout is
a file.
* Rather ugly, but it's unlikely to be worthwhile to invest
much more
* effort just for the benefit of single user mode.
*/
if (DoingCommandRead && whereToSendOutput != DestRemote)
ProcessInterrupts();Having this extra handling is correct for normal backends as they
can
connect directly to clients for reading input commands, the above
if
condition may become true and ProcessInterrupts() may be called to
handle the SIGTERM immediately.Note that DoingCommandRead can never be true in bg workers as it's
being set to true only in normal backend PostgresMain(). If we use
die()(instead of custom SIGTERM handlers such as handle_sigterm())
in
a bg worker/non-backend process, there are no chances that the
above
part of the code gets hit and the interrupts are processed
immediately. And also here are the bg worker process that use die()
instead of their own custom handlers: parallel workers, autoprewarm
worker, autovacuum worker, logical replication launcher and apply
worker, wal sender processI think we can also use die() instead of handle_sigterm() in
test_shm_mq/worker.c.I attached a patch for this change.
Thanks for the patch! It looks good to me.
Thanks.
When I read the patch again, I found that, with the patch, the shutdown
of worker_spi causes to report the following FATAL message.FATAL: terminating connection due to administrator command
Isn't this message confusing because it's not a connection? If so,
we need to update ProcessInterrupts() so that the proper message is
reported like other bgworkers do.
This is also true for all the bgworker that use the die() handler. How
about doing it the way bgworker_die() does in ProcessInterrupts()? This
would give meaningful information. Thoughts? If okay, I can make a separate
patch.
else if (RecoveryConflictPending)
{
/* Currently there is only one non-retryable recovery conflict
*/
Assert(RecoveryConflictReason ==
PROCSIG_RECOVERY_CONFLICT_DATABASE);
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,
(errcode(ERRCODE_DATABASE_DROPPED),
errmsg("terminating connection due to conflict with
recovery"),
errdetail_recovery_conflict()));
}
* else if (IsBackgroundWorker) { ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating background worker \"%s\" due to administrator
command", MyBgworkerEntry->bgw_type))); }*
else
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating connection due to administrator
command")));
BTW, though this is not directly related to this topic, it's strange to me
that the normal shutdown of bgworker causes to emit FATAL message
and the message "background worker ... exited with exit code 1"
at the normal shutdown. I've heard sometimes that users coplained that
it's confusing to report that message for logical replication launcher
at the normal shutdown. Maybe the mechanism to shutdown bgworker
normally seems to have not been implemented well yet.
What do you mean by normal shutdown of bgworker? Is it that bgworker has
exited successfully with exit code 0 or for some reason with exit code
other than 0? Or is it when the postmaster is shutdown normally?
IIUC, when a bgworker exists either normally with an exit code 0 or other
than 0, then CleanupBackgroundWorker() is called in postmaster, a
message(like below) is prepared, and the LogChildExit() is called with
either DEBUG1 or LOG level and for instance the message you specified gets
printed "background worker ... exited with exit code 1". I have not seen a
FATAL message similar to "background worker ... exited with exit code 1" at
the normal shutdown.
snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""),
rw->rw_worker.bgw_type);
LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid,
exitstatus);
Am I missing something?
If my analysis is right, then for instance, when a logical replication
launcher is exited, it logs "background worker "logical replication
launcher" exited with exit code X" with either DEBUG1 or LOG level but not
with FATAL level.
Without the patch, since worker_spi cannot handle the shutdown request
promptly while it's executing the backend code, maybe we can say this is
a bug. But worker_spi is a just example of the code, I'm not thinking to
do back-patch for now. Thought?
+1 to not backpatch it.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 2020/11/25 23:38, Bharath Rupireddy wrote:
On Wed, Nov 25, 2020 at 3:29 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
On 2020/11/20 19:33, Bharath Rupireddy wrote:
On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> wrote:
 >
 > > handle_sigterm() and die() are similar except that die() has extra
 > > handling(below) for exiting immediately when waiting for input/command
 > > from the client.
 > >    /*
 > >    * If we're in single user mode, we want to quit immediately - we can't
 > >    * rely on latches as they wouldn't work when stdin/stdout is a file.
 > >    * Rather ugly, but it's unlikely to be worthwhile to invest much more
 > >    * effort just for the benefit of single user mode.
 > >    */
 > >    if (DoingCommandRead && whereToSendOutput != DestRemote)
 > >      ProcessInterrupts();
 > >
 > > Having this extra handling is correct for normal backends as they can
 > > connect directly to clients for reading input commands, the above if
 > > condition may become true and ProcessInterrupts() may be called to
 > > handle the SIGTERM immediately.
 > >
 > > Note that DoingCommandRead can never be true in bg workers as it's
 > > being set to true only in normal backend PostgresMain(). If we use
 > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
 > > a bg worker/non-backend process, there are no chances that the above
 > > part of the code gets hit and the interrupts are processed
 > > immediately. And also here are the bg worker process that use die()
 > > instead of their own custom handlers: parallel workers, autoprewarm
 > > worker, autovacuum worker, logical replication launcher and apply
 > > worker, wal sender process
 > >
 > > I think we can also use die() instead of handle_sigterm() in
 > > test_shm_mq/worker.c.
 > >
 > > I attached a patch for this change.
 >
 > Thanks for the patch! It looks good to me.
 >Thanks.
When I read the patch again, I found that, with the patch, the shutdown
of worker_spi causes to report the following FATAL message.   FATAL:  terminating connection due to administrator command
Isn't this message confusing because it's not a connection? If so,
we need to update ProcessInterrupts() so that the proper message is
reported like other bgworkers do.This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
+1
Thanks!
    else if (RecoveryConflictPending)
    {
      /* Currently there is only one non-retryable recovery conflict */
      Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
      pgstat_report_recovery_conflict(RecoveryConflictReason);
      ereport(FATAL,
          (errcode(ERRCODE_DATABASE_DROPPED),
           errmsg("terminating connection due to conflict with recovery"),
           errdetail_recovery_conflict()));
    }
*Â Â Â Â else if (IsBackgroundWorker)
    {
        ereport(FATAL,
              (errcode(ERRCODE_ADMIN_SHUTDOWN),
              errmsg("terminating background worker \"%s\" due to administrator command",
              MyBgworkerEntry->bgw_type)));
    }*
    else
      ereport(FATAL,
          (errcode(ERRCODE_ADMIN_SHUTDOWN),
           errmsg("terminating connection due to administrator command")));BTW, though this is not directly related to this topic, it's strange to me
that the normal shutdown of bgworker causes to emit FATAL message
and the message "background worker ... exited with exit code 1"
at the normal shutdown. I've heard sometimes that users coplained that
it's confusing to report that message for logical replication launcher
at the normal shutdown. Maybe the mechanism to shutdown bgworker
normally seems to have not been implemented well yet.What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or for some reason with exit code other than 0? Or is it when the postmaster is shutdown normally?
IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is called in postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG level and for instance the message you specified gets printed "background worker ... exited with exit code 1". I have not seen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown.
snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);
LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);
Am I missing something?
If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker "logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level.
Yes, it's not with FATAL level. But that message looks like that it's
reporting error message. This is why we sometimes received
the complaints (e.g., [1]/messages/by-id/CAKFQuwZHcZnnwoFaXX2YKNT3KhaNr_+vd95Oo=f045zfn7Tetw@mail.gmail.com[2]/messages/by-id/CAEepm=1c3hG1g3iKYwfa_PDsT49RBaBJsaot_qNhPSCXBm9rzA@mail.gmail.com) about that message.
[1]: /messages/by-id/CAKFQuwZHcZnnwoFaXX2YKNT3KhaNr_+vd95Oo=f045zfn7Tetw@mail.gmail.com
/messages/by-id/CAKFQuwZHcZnnwoFaXX2YKNT3KhaNr_+vd95Oo=f045zfn7Tetw@mail.gmail.com
[2]: /messages/by-id/CAEepm=1c3hG1g3iKYwfa_PDsT49RBaBJsaot_qNhPSCXBm9rzA@mail.gmail.com
/messages/by-id/CAEepm=1c3hG1g3iKYwfa_PDsT49RBaBJsaot_qNhPSCXBm9rzA@mail.gmail.com
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Thu, Nov 26, 2020 at 7:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or for some reason with exit code other than 0? Or is it when the postmaster is shutdown normally?
IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is called in postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG level and for instance the message you specified gets printed "background worker ... exited with exit code 1". I have not seen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown.
snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);
LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);
Am I missing something?
If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker "logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level.
Yes, it's not with FATAL level. But that message looks like that it's
reporting error message. This is why we sometimes received
the complaints (e.g., [1][2]) about that message.
Oh. Should we do something about it now? Any thoughts? I have not read
fully through the threads specified, I will go through them later.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 2020/11/27 0:15, Bharath Rupireddy wrote:
On Thu, Nov 26, 2020 at 7:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or for some reason with exit code other than 0? Or is it when the postmaster is shutdown normally?
IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is called in postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG level and for instance the message you specified gets printed "background worker ... exited with exit code 1". I have not seen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown.
snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);
LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);
Am I missing something?
If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker "logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level.
Yes, it's not with FATAL level. But that message looks like that it's
reporting error message. This is why we sometimes received
the complaints (e.g., [1][2]) about that message.Oh. Should we do something about it now?
No. This is not directly related to the issue that we are discussing
as I told upthread. Of course, it's better to work on this if we can
easily fix it. But seems not... So please ignore this my comment.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Wed, Nov 25, 2020 at 8:08 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
When I read the patch again, I found that, with the patch, the shutdown
of worker_spi causes to report the following FATAL message.FATAL: terminating connection due to administrator command
Isn't this message confusing because it's not a connection? If so,
we need to update ProcessInterrupts() so that the proper message is
reported like other bgworkers do.This is also true for all the bgworker that use the die() handler. How
about doing it the way bgworker_die() does in ProcessInterrupts()? This
would give meaningful information. Thoughts? If okay, I can make a separate
patch.
Attaching the patch that improved the message for bg workers in
ProcessInterrupts(). For instance, now it looks like *FATAL: terminating
background worker "worker_spi" due to administrator command* or *FATAL:
terminating background worker "parallel worker" due to administrator
command *and so on for other bg workers.
Please review the patch.
I'm also mentioning the 2 previous patches posted in [1]/messages/by-id/CALj2ACWWy1YcngpCUn09AsXMfOzwjfNqbVosfoRY0vhhVWhVBw@mail.gmail.com. One of the patch
is for using die() instead of handle_sigterm() in test_shm_mq/worker.c and
another is for replacing custom SIGTERM handler worker_spi_sigterm() with
die() and custom SIGHUP handler worker_spi_sighup() with standard
SignalHandlerForConfigReload()
[1]: /messages/by-id/CALj2ACWWy1YcngpCUn09AsXMfOzwjfNqbVosfoRY0vhhVWhVBw@mail.gmail.com
/messages/by-id/CALj2ACWWy1YcngpCUn09AsXMfOzwjfNqbVosfoRY0vhhVWhVBw@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
On 2020/11/27 12:13, Bharath Rupireddy wrote:
On Wed, Nov 25, 2020 at 8:08 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com <mailto:bharath.rupireddyforpostgres@gmail.com>> wrote:
When I read the patch again, I found that, with the patch, the shutdown
of worker_spi causes to report the following FATAL message.   FATAL:  terminating connection due to administrator command
Isn't this message confusing because it's not a connection? If so,
we need to update ProcessInterrupts() so that the proper message is
reported like other bgworkers do.This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like *FATAL: Â terminating background worker "worker_spi" due to administrator command* or *FATAL: Â terminating background worker "parallel worker" due to administrator command *and so on for other bg workers.*
*Please review the patch.
Thanks for the patch! It looks good to me.
I'm also mentioning the 2 previous patches posted in [1]. One of the patch is for using die() instead of handle_sigterm() in test_shm_mq/worker.c and another is for replacing custom SIGTERM handler worker_spi_sigterm() with die() and custom SIGHUP handler worker_spi_sighup() with standard SignalHandlerForConfigReload()
Yeah, I pushed them. Thanks!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Fri, Nov 27, 2020 at 12:26 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
When I read the patch again, I found that, with the patch, the shutdown
of worker_spi causes to report the following FATAL message.FATAL: terminating connection due to administrator command
Isn't this message confusing because it's not a connection? If so,
we need to update ProcessInterrupts() so that the proper message is
reported like other bgworkers do.This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like *FATAL: terminating background worker "worker_spi" due to administrator command* or *FATAL: terminating background worker "parallel worker" due to administrator command *and so on for other bg workers.*
*Please review the patch.
Thanks for the patch! It looks good to me.
Thanks!
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 2020/11/27 16:47, Bharath Rupireddy wrote:
On Fri, Nov 27, 2020 at 12:26 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:When I read the patch again, I found that, with the patch, the shutdown
of worker_spi causes to report the following FATAL message.FATAL: terminating connection due to administrator command
Isn't this message confusing because it's not a connection? If so,
we need to update ProcessInterrupts() so that the proper message is
reported like other bgworkers do.This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like *FATAL: terminating background worker "worker_spi" due to administrator command* or *FATAL: terminating background worker "parallel worker" due to administrator command *and so on for other bg workers.*
*Please review the patch.
Thanks for the patch! It looks good to me.
Thanks!
Pushed. Thanks!
Since all the patches that proposed in this thread were committed,
I marked the CF entry as committed.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/11/04 18:06, Fujii Masao wrote:
On 2020/10/29 15:21, Fujii Masao wrote:
On 2020/10/07 11:30, Bharath Rupireddy wrote:
On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
+1 Or it's also ok to make each patch separately.
Anyway, thanks!Thanks. +1 to have separate patches. I will post two separate patches
for autoprewarm and WalRcvShutdownHandler for further review. The
other two patches can be for startup.c and syslogger.c.I'm attaching all the 4 patches here together.
1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
--Â This is the patch initially sent in this mail by me, I just
renamed it.
2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
-- This is the patch proposed by Fuji Masao, I just renamed it.
3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patchPlease consider these patches for further review.
Thanks for the patches! They look good to me. So barring any objections,
I will commit them one by one.I pushed the following two patches.
- v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
- v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
As I told in other thread [1]/messages/by-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com, I'm thinking to revert this patch
because this change could have bad side-effect on the startup
process waiting for recovery conflict.
Before applying the patch, the latch that the startup process
used to wait for recovery conflict was different from the latch
that SIGHUP signal handler or walreceiver process, etc set to
wake the startup process up. So SIGHUP or walreceiver didn't
wake the startup process waiting for recovery conflict up unnecessary.
But the patch got rid of the dedicated latch for signaling
the startup process. This change forced us to use the same latch
to make the startup process wait or wake up. Which caused SIGHUP
signal handler or walreceiver proces to wake the startup process
waiting on the latch for recovery conflict up unnecessarily frequently.
While waiting for recovery conflict on buffer pin, deadlock needs
to be checked at least every deadlock_timeout. But that frequent
wakeups could prevent the deadlock timer from being triggered and
could delay that deadlock checks.
Therefore, I'm thinking to revert the commit ac22929a26 and 113d3591b8.
[1]: /messages/by-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com
/messages/by-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
I pushed the following two patches.
- v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
- v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patchAs I told in other thread [1], I'm thinking to revert this patch
because this change could have bad side-effect on the startup
process waiting for recovery conflict.Before applying the patch, the latch that the startup process
used to wait for recovery conflict was different from the latch
that SIGHUP signal handler or walreceiver process, etc set to
wake the startup process up. So SIGHUP or walreceiver didn't
wake the startup process waiting for recovery conflict up unnecessary.But the patch got rid of the dedicated latch for signaling
the startup process. This change forced us to use the same latch
to make the startup process wait or wake up. Which caused SIGHUP
signal handler or walreceiver proces to wake the startup process
waiting on the latch for recovery conflict up unnecessarily
frequently.While waiting for recovery conflict on buffer pin, deadlock needs
to be checked at least every deadlock_timeout. But that frequent
wakeups could prevent the deadlock timer from being triggered and
could delay that deadlock checks.
I thought that spurious wakeups don't harm. But actually
ResolveRecoveryConflictWithBufferPin doesn't consider spurious
wakeups. Only the timer woke up ResolveRecoveryConflictWithBufferPin
before the patch comes. Currently SIGHUP and XLogFlush() (on
walreceiver) also wake up startup process.
For a moment I thought that ResolveRecoveryConflictWithBufferPin
should wake up at shutdown time by the old recovery latch but it's not
the case since it wakes up after all blockers go away. It seems to me
simpler to revert the patches than making the function properly handle
spurious wakeups.
Therefore, I'm thinking to revert the commit ac22929a26 and
113d3591b8.[1]
/messages/by-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com
As the result, I agree to revert them. But I think we need to add a
comment for the reason we don't use MyLatch for recovery-wakeup after
reverting them.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020/12/16 16:24, Kyotaro Horiguchi wrote:
At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
I pushed the following two patches.
- v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
- v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patchAs I told in other thread [1], I'm thinking to revert this patch
because this change could have bad side-effect on the startup
process waiting for recovery conflict.Before applying the patch, the latch that the startup process
used to wait for recovery conflict was different from the latch
that SIGHUP signal handler or walreceiver process, etc set to
wake the startup process up. So SIGHUP or walreceiver didn't
wake the startup process waiting for recovery conflict up unnecessary.But the patch got rid of the dedicated latch for signaling
the startup process. This change forced us to use the same latch
to make the startup process wait or wake up. Which caused SIGHUP
signal handler or walreceiver proces to wake the startup process
waiting on the latch for recovery conflict up unnecessarily
frequently.While waiting for recovery conflict on buffer pin, deadlock needs
to be checked at least every deadlock_timeout. But that frequent
wakeups could prevent the deadlock timer from being triggered and
could delay that deadlock checks.I thought that spurious wakeups don't harm. But actually
ResolveRecoveryConflictWithBufferPin doesn't consider spurious
wakeups. Only the timer woke up ResolveRecoveryConflictWithBufferPin
before the patch comes. Currently SIGHUP and XLogFlush() (on
walreceiver) also wake up startup process.For a moment I thought that ResolveRecoveryConflictWithBufferPin
should wake up at shutdown time by the old recovery latch but it's not
the case since it wakes up after all blockers go away. It seems to me
simpler to revert the patches than making the function properly handle
spurious wakeups.Therefore, I'm thinking to revert the commit ac22929a26 and
113d3591b8.[1]
/messages/by-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.comAs the result, I agree to revert them. But I think we need to add a
comment for the reason we don't use MyLatch for recovery-wakeup after
reverting them.
Agreed. Attached is the patch that reverts those patches and
adds the comments about why both procLatch and recoveryWakeupLatch
are necessary.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
On 2020/12/16 18:12, Fujii Masao wrote:
On 2020/12/16 16:24, Kyotaro Horiguchi wrote:
At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
I pushed the following two patches.
- v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
- v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patchAs I told in other thread [1], I'm thinking to revert this patch
because this change could have bad side-effect on the startup
process waiting for recovery conflict.Before applying the patch, the latch that the startup process
used to wait for recovery conflict was different from the latch
that SIGHUP signal handler or walreceiver process, etc set to
wake the startup process up. So SIGHUP or walreceiver didn't
wake the startup process waiting for recovery conflict up unnecessary.But the patch got rid of the dedicated latch for signaling
the startup process. This change forced us to use the same latch
to make the startup process wait or wake up. Which caused SIGHUP
signal handler or walreceiver proces to wake the startup process
waiting on the latch for recovery conflict up unnecessarily
frequently.While waiting for recovery conflict on buffer pin, deadlock needs
to be checked at least every deadlock_timeout. But that frequent
wakeups could prevent the deadlock timer from being triggered and
could delay that deadlock checks.I thought that spurious wakeups don't harm. But actually
ResolveRecoveryConflictWithBufferPin doesn't consider spurious
wakeups. Only the timer woke up ResolveRecoveryConflictWithBufferPin
before the patch comes. Currently SIGHUP and XLogFlush() (on
walreceiver) also wake up startup process.For a moment I thought that ResolveRecoveryConflictWithBufferPin
should wake up at shutdown time by the old recovery latch but it's not
the case since it wakes up after all blockers go away. It seems to me
simpler to revert the patches than making the function properly handle
spurious wakeups.Therefore, I'm thinking to revert the commit ac22929a26 and
113d3591b8.[1]
/messages/by-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.comAs the result, I agree to revert them. But I think we need to add a
comment for the reason we don't use MyLatch for recovery-wakeup after
reverting them.Agreed. Attached is the patch that reverts those patches and
adds the comments about why both procLatch and recoveryWakeupLatch
are necessary.
Pushed. Thanks!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi,
On 2020-12-16 18:12:39 +0900, Fujii Masao wrote:
- /* Wait to be signaled by UnpinBuffer() */ + /* + * Wait to be signaled by UnpinBuffer(). + * + * We assume that only UnpinBuffer() and the timeout requests established + * above can wake us up here. WakeupRecovery() called by walreceiver or + * SIGHUP signal handler, etc cannot do that because it uses the different + * latch from that ProcWaitForSignal() waits on. + */ ProcWaitForSignal(PG_WAIT_BUFFER_PIN);/*
Isn't this comment bogus? The latch could e.g. be set by
procsignal_sigusr1_handler(), which the startup process uses. Or it could
already be set, when entering ResolveRecoveryConflictWithBufferPin().
Why is it even relevant that we only get woken up by UnpinBuffer()?
Greetings,
Andres Freund