Redesigning postmaster death handling
Hi,
Here's an experimental patch to fix our shutdown strategy on
postmaster death, as discussed in a nearby report[1]/messages/by-id/B3C69B86-7F82-4111-B97F-0005497BB745@yandex-team.ru.
Maybe it's possible to switch to _exit() without also switching to
preemptive handling, but it seems fragile and painful for no gain.
Following that line of thinking, we might as well just ask the kernel
to hit our existing SIGQUIT handler at parent exit, on Linux/FreeBSD.
Job done.
For systems lacking that facility, the idea I'm trying out here is
that backends that detect the condition in WaitEventSetWait() should
themselves blast all backends with SIGQUIT, in a sense taking over the
role of the departed postmaster. I didn't really want any
consensus/negotiation over who's going to do that, so... they all do.
Most of the patch is just removing hundreds of lines of errors and
conditions and comments that were now unreachable.
Better ideas, glaring holes in the plan, etc, welcome.
[1]: /messages/by-id/B3C69B86-7F82-4111-B97F-0005497BB745@yandex-team.ru
Attachments:
v1-0001-SIGQUIT-on-postmaster-death.patchapplication/octet-stream; name=v1-0001-SIGQUIT-on-postmaster-death.patchDownload
From 1b476ae56cc10f5b6cb31b23ae9b2ab270034bba Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 20 Aug 2025 17:07:52 +1200
Subject: [PATCH v1] SIGQUIT on postmaster death.
Previously each backend would detect postmaster death at its next call
to WaitEventSetWait() and then call proc_exit(). That would reach
LWLockReleaseAll() to unblock other backends stuck in LWLockAcquire(),
which was unsafe: it exposed intermediate states in shared memory and
could corrupt data.
Use the existing SIGQUIT handler instead, which is simple, better tested
and preemptive. Platform differences:
1. On Linux and FreeBSD, all backends already received a signal from
the kernel when the postmaster exits. This is now redirected to
SIGQUIT.
2. On other Unix systems, as soon as WaitEventWait() sees that the
postmaster has exited in any backend, it blasts all backends with
SIGQUIT.
3. On Windows, fake signals are not really preemptive, but otherwise
work the same as 2 above.
Some API and behavior changes:
* WL_EXIT_ON_PM_DEATH is required in all processes under the postmaster
* WL_POSTMASTER_DEATH became unreachable and is removed
* several code paths that previously reported a special error message
to the client on WL_POSTMASTER_DEATH are removed and the socket will
simply be closed on _exit(); the messages were unreliable anyway,
as there are many other potential exit sites
* BGWH_POSTMASTER_DIED became unreachable and is removed
* the syslogger process, if configured, no longer waits for all
backends to close its pipe before exiting
* a few places that polled PostmasterIsAlive() explicitly in a loop
no longer need to do that
It might be reasonable to make WL_EXIT_ON_PM_DEATH implicit so that
callers don't have to mention it, but that is not done.
XXX experimental
Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Discussion: https://postgr.es/m/B3C69B86-7F82-4111-B97F-0005497BB745%40yandex-team.ru
---
doc/src/sgml/bgworker.sgml | 10 +-
src/backend/access/transam/parallel.c | 15 +--
src/backend/access/transam/xlogfuncs.c | 20 +--
src/backend/commands/vacuum.c | 9 --
src/backend/libpq/be-secure.c | 28 ----
src/backend/libpq/pqcomm.c | 2 +-
src/backend/postmaster/bgworker.c | 34 ++---
src/backend/postmaster/pgarch.c | 21 +--
src/backend/postmaster/startup.c | 28 ----
src/backend/postmaster/syslogger.c | 9 +-
src/backend/replication/syncrep.c | 18 +--
src/backend/replication/walsender.c | 7 +-
src/backend/storage/ipc/latch.c | 50 ++-----
src/backend/storage/ipc/pmsignal.c | 110 +++++++--------
src/backend/storage/ipc/waiteventset.c | 162 ++++-------------------
src/backend/tcop/postgres.c | 25 ++++
src/include/postmaster/bgworker.h | 1 -
src/include/storage/pmsignal.h | 30 +----
src/include/storage/waiteventset.h | 1 -
src/test/modules/test_shm_mq/setup.c | 4 +-
src/test/modules/worker_spi/worker_spi.c | 5 -
src/test/recovery/t/017_shm.pl | 6 +-
src/test/regress/regress.c | 21 +++
23 files changed, 164 insertions(+), 452 deletions(-)
diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 2c393385a91..686526cc93f 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -228,9 +228,7 @@ typedef struct BackgroundWorker
to suspend execution only temporarily should use an interruptible sleep
rather than exiting; this can be achieved by calling
<function>WaitLatch()</function>. Make sure the
- <literal>WL_POSTMASTER_DEATH</literal> flag is set when calling that function, and
- verify the return code for a prompt exit in the emergency case that
- <command>postgres</command> itself has terminated.
+ <literal>WL_EXIT_ON_PM_DEATH</literal> flag is set when calling that function.
</para>
<para>
@@ -268,8 +266,7 @@ typedef struct BackgroundWorker
background worker, or until the postmaster dies. If the background worker
is running, the return value will be <literal>BGWH_STARTED</literal>, and
the PID will be written to the provided address. Otherwise, the return
- value will be <literal>BGWH_STOPPED</literal> or
- <literal>BGWH_POSTMASTER_DIED</literal>.
+ value will be <literal>BGWH_STOPPED</literal>.
</para>
<para>
@@ -279,8 +276,7 @@ typedef struct BackgroundWorker
<type>BackgroundWorkerHandle *</type> obtained at registration. This
function will block until the background worker exits, or postmaster dies.
When the background worker exits, the return value is
- <literal>BGWH_STOPPED</literal>, if postmaster dies it will return
- <literal>BGWH_POSTMASTER_DIED</literal>.
+ <literal>BGWH_STOPPED</literal>.
</para>
<para>
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 94db1ec3012..3abc201f33b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -914,23 +914,10 @@ WaitForParallelWorkersToExit(ParallelContext *pcxt)
/* Wait until the workers actually die. */
for (i = 0; i < pcxt->nworkers_launched; ++i)
{
- BgwHandleStatus status;
-
if (pcxt->worker == NULL || pcxt->worker[i].bgwhandle == NULL)
continue;
- status = WaitForBackgroundWorkerShutdown(pcxt->worker[i].bgwhandle);
-
- /*
- * If the postmaster kicked the bucket, we have no chance of cleaning
- * up safely -- we won't be able to tell when our workers are actually
- * dead. This doesn't necessitate a PANIC since they will all abort
- * eventually, but we can't safely continue this session.
- */
- if (status == BGWH_POSTMASTER_DIED)
- ereport(FATAL,
- (errcode(ERRCODE_ADMIN_SHUTDOWN),
- errmsg("postmaster exited during a parallel transaction")));
+ WaitForBackgroundWorkerShutdown(pcxt->worker[i].bgwhandle);
/* Release memory. */
pfree(pcxt->worker[i].bgwhandle);
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 8c3090165f0..6a69e0c2447 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -716,8 +716,6 @@ pg_promote(PG_FUNCTION_ARGS)
#define WAITS_PER_SECOND 10
for (i = 0; i < WAITS_PER_SECOND * wait_seconds; i++)
{
- int rc;
-
ResetLatch(MyLatch);
if (!RecoveryInProgress())
@@ -725,20 +723,10 @@ pg_promote(PG_FUNCTION_ARGS)
CHECK_FOR_INTERRUPTS();
- rc = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
- 1000L / WAITS_PER_SECOND,
- WAIT_EVENT_PROMOTE);
-
- /*
- * Emergency bailout if postmaster has died. This is to avoid the
- * necessity for manual cleanup of all postmaster children.
- */
- if (rc & WL_POSTMASTER_DEATH)
- ereport(FATAL,
- (errcode(ERRCODE_ADMIN_SHUTDOWN),
- errmsg("terminating connection due to unexpected postmaster exit"),
- errcontext("while waiting on promotion")));
+ WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ 1000L / WAITS_PER_SECOND,
+ WAIT_EVENT_PROMOTE);
}
ereport(WARNING,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..5113f6f0121 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2523,15 +2523,6 @@ vacuum_delay_point(bool is_analyze)
INSTR_TIME_GET_NANOSEC(delay));
}
- /*
- * We don't want to ignore postmaster death during very long vacuums
- * with vacuum_cost_delay configured. We can't use the usual
- * WaitLatch() approach here because we want microsecond-based sleep
- * durations above.
- */
- if (IsUnderPostmaster && !PostmasterIsAlive())
- exit(1);
-
VacuumCostBalance = 0;
/*
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index d723e74e813..42c7d751431 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -218,28 +218,6 @@ retry:
WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , &event, 1,
WAIT_EVENT_CLIENT_READ);
- /*
- * If the postmaster has died, it's not safe to continue running,
- * because it is the postmaster's job to kill us if some other backend
- * exits uncleanly. Moreover, we won't run very well in this state;
- * helper processes like walwriter and the bgwriter will exit, so
- * performance may be poor. Finally, if we don't exit, pg_ctl will be
- * unable to restart the postmaster without manual intervention, so no
- * new connections can be accepted. Exiting clears the deck for a
- * postmaster restart.
- *
- * (Note that we only make this check when we would otherwise sleep on
- * our latch. We might still continue running for a while if the
- * postmaster is killed in mid-query, or even through multiple queries
- * if we never have to wait for read. We don't want to burn too many
- * cycles checking for this very rare condition, and this should cause
- * us to exit quickly in most cases.)
- */
- if (event.events & WL_POSTMASTER_DEATH)
- ereport(FATAL,
- (errcode(ERRCODE_ADMIN_SHUTDOWN),
- errmsg("terminating connection due to unexpected postmaster exit")));
-
/* Handle interrupt. */
if (event.events & WL_LATCH_SET)
{
@@ -343,12 +321,6 @@ retry:
WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , &event, 1,
WAIT_EVENT_CLIENT_WRITE);
- /* See comments in secure_read. */
- if (event.events & WL_POSTMASTER_DEATH)
- ereport(FATAL,
- (errcode(ERRCODE_ADMIN_SHUTDOWN),
- errmsg("terminating connection due to unexpected postmaster exit")));
-
/* Handle interrupt. */
if (event.events & WL_LATCH_SET)
{
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 25f739a6a17..0d4c70736dc 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -309,7 +309,7 @@ pq_init(ClientSocket *client_sock)
port->sock, NULL, NULL);
latch_pos = AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, PGINVALID_SOCKET,
MyLatch, NULL);
- AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
+ AddWaitEventToSet(FeBeWaitSet, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
NULL, NULL);
/*
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 1ad65c237c3..d91f1cf7322 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -1202,9 +1202,7 @@ GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
*
* This is like GetBackgroundWorkerPid(), except that if the worker has not
* yet started, we wait for it to do so; thus, BGWH_NOT_YET_STARTED is never
- * returned. However, if the postmaster has died, we give up and return
- * BGWH_POSTMASTER_DIED, since it that case we know that startup will not
- * take place.
+ * returned.
*
* The caller *must* have set our PID as the worker's bgw_notify_pid,
* else we will not be awoken promptly when the worker's state changes.
@@ -1213,7 +1211,6 @@ BgwHandleStatus
WaitForBackgroundWorkerStartup(BackgroundWorkerHandle *handle, pid_t *pidp)
{
BgwHandleStatus status;
- int rc;
for (;;)
{
@@ -1227,15 +1224,9 @@ WaitForBackgroundWorkerStartup(BackgroundWorkerHandle *handle, pid_t *pidp)
if (status != BGWH_NOT_YET_STARTED)
break;
- rc = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_POSTMASTER_DEATH, 0,
- WAIT_EVENT_BGWORKER_STARTUP);
-
- if (rc & WL_POSTMASTER_DEATH)
- {
- status = BGWH_POSTMASTER_DIED;
- break;
- }
+ WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
+ WAIT_EVENT_BGWORKER_STARTUP);
ResetLatch(MyLatch);
}
@@ -1247,9 +1238,7 @@ WaitForBackgroundWorkerStartup(BackgroundWorkerHandle *handle, pid_t *pidp)
* Wait for a background worker to stop.
*
* If the worker hasn't yet started, or is running, we wait for it to stop
- * and then return BGWH_STOPPED. However, if the postmaster has died, we give
- * up and return BGWH_POSTMASTER_DIED, because it's the postmaster that
- * notifies us when a worker's state changes.
+ * and then return BGWH_STOPPED.
*
* The caller *must* have set our PID as the worker's bgw_notify_pid,
* else we will not be awoken promptly when the worker's state changes.
@@ -1258,7 +1247,6 @@ BgwHandleStatus
WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle)
{
BgwHandleStatus status;
- int rc;
for (;;)
{
@@ -1270,15 +1258,9 @@ WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle)
if (status == BGWH_STOPPED)
break;
- rc = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_POSTMASTER_DEATH, 0,
- WAIT_EVENT_BGWORKER_SHUTDOWN);
-
- if (rc & WL_POSTMASTER_DEATH)
- {
- status = BGWH_POSTMASTER_DIED;
- break;
- }
+ WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
+ WAIT_EVENT_BGWORKER_SHUTDOWN);
ResetLatch(MyLatch);
}
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 78e39e5f866..dd147c3d82e 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -353,16 +353,10 @@ pgarch_MainLoop(void)
* PGARCH_AUTOWAKE_INTERVAL, or until postmaster dies.
*/
if (!time_to_stop) /* Don't wait during last iteration */
- {
- int rc;
-
- rc = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
- PGARCH_AUTOWAKE_INTERVAL * 1000L,
- WAIT_EVENT_ARCHIVER_MAIN);
- if (rc & WL_POSTMASTER_DEATH)
- time_to_stop = true;
- }
+ WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ PGARCH_AUTOWAKE_INTERVAL * 1000L,
+ WAIT_EVENT_ARCHIVER_MAIN);
/*
* The archiver quits either when the postmaster dies (not expected)
@@ -403,12 +397,9 @@ pgarch_ArchiverCopyLoop(void)
/*
* Do not initiate any more archive commands after receiving
- * SIGTERM, nor after the postmaster has died unexpectedly. The
- * first condition is to try to keep from having init SIGKILL the
- * command, and the second is to avoid conflicts with another
- * archiver spawned by a newer postmaster.
+ * SIGTERM, to try to keep from having init SIGKILL the command.
*/
- if (ShutdownRequestPending || !PostmasterIsAlive())
+ if (ShutdownRequestPending)
return;
/*
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..b2a04369654 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -35,17 +35,6 @@
#include "utils/timeout.h"
-#ifndef USE_POSTMASTER_DEATH_SIGNAL
-/*
- * On systems that need to make a system call to find out if the postmaster has
- * gone away, we'll do so only every Nth call to ProcessStartupProcInterrupts().
- * This only affects how long it takes us to detect the condition while we're
- * busy replaying WAL. Latch waits and similar which should react immediately
- * through the usual techniques.
- */
-#define POSTMASTER_POLL_RATE_LIMIT 1024
-#endif
-
/*
* Flags set by interrupt handlers for later service in the redo loop.
*/
@@ -153,10 +142,6 @@ StartupRereadConfig(void)
void
ProcessStartupProcInterrupts(void)
{
-#ifdef POSTMASTER_POLL_RATE_LIMIT
- static uint32 postmaster_poll_count = 0;
-#endif
-
/*
* Process any requests or signals received recently.
*/
@@ -172,19 +157,6 @@ ProcessStartupProcInterrupts(void)
if (shutdown_requested)
proc_exit(1);
- /*
- * Emergency bailout if postmaster has died. This is to avoid the
- * necessity for manual cleanup of all postmaster children. Do this less
- * frequently on systems for which we don't have signals to make that
- * cheap.
- */
- if (IsUnderPostmaster &&
-#ifdef POSTMASTER_POLL_RATE_LIMIT
- postmaster_poll_count++ % POSTMASTER_POLL_RATE_LIMIT == 0 &&
-#endif
- !PostmasterIsAlive())
- exit(1);
-
/* Process barrier events */
if (ProcSignalBarrierPending)
ProcessProcSignalBarrier();
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 50c2edec1f6..b397d926b57 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -330,14 +330,9 @@ SysLoggerMain(const void *startup_data, size_t startup_data_len)
/*
* Set up a reusable WaitEventSet object we'll use to wait for our latch,
* and (except on Windows) our socket.
- *
- * Unlike all other postmaster child processes, we'll ignore postmaster
- * death because we want to collect final log output from all backends and
- * then exit last. We'll do that by running until we see EOF on the
- * syslog pipe, which implies that all other backends have exited
- * (including the postmaster).
*/
- wes = CreateWaitEventSet(NULL, 2);
+ wes = CreateWaitEventSet(NULL, 3);
+ AddWaitEventToSet(wes, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET, NULL, NULL);
AddWaitEventToSet(wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
#ifndef WIN32
AddWaitEventToSet(wes, WL_SOCKET_READABLE, syslogPipe[0], NULL, NULL);
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 32cf3a48b89..3f46ea7c376 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -270,8 +270,6 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
*/
for (;;)
{
- int rc;
-
/* Must reset the latch before testing state. */
ResetLatch(MyLatch);
@@ -328,20 +326,8 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
* Wait on latch. Any condition that should wake us up will set the
* latch, so no need for timeout.
*/
- rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1,
- WAIT_EVENT_SYNC_REP);
-
- /*
- * If the postmaster dies, we'll probably never get an acknowledgment,
- * because all the wal sender processes will exit. So just bail out.
- */
- if (rc & WL_POSTMASTER_DEATH)
- {
- ProcDiePending = true;
- whereToSendOutput = DestNone;
- SyncRepCancelWait();
- break;
- }
+ WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, -1,
+ WAIT_EVENT_SYNC_REP);
}
/*
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 0855bae3535..40e9c124f15 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -3826,12 +3826,7 @@ WalSndWait(uint32 socket_events, long timeout, uint32 wait_event)
else if (MyWalSnd->kind == REPLICATION_KIND_LOGICAL)
ConditionVariablePrepareToSleep(&WalSndCtl->wal_replay_cv);
- if (WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, wait_event) == 1 &&
- (event.events & WL_POSTMASTER_DEATH))
- {
- ConditionVariableCancelSleep();
- proc_exit(1);
- }
+ WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, wait_event);
ConditionVariableCancelSleep();
}
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index beadeb5e46a..609a90c3c57 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -29,7 +29,6 @@ static WaitEventSet *LatchWaitSet;
/* The positions of the latch and PM death events in LatchWaitSet */
#define LatchWaitSetLatchPos 0
-#define LatchWaitSetPostmasterDeathPos 1
void
InitializeLatchWaitSet(void)
@@ -44,16 +43,9 @@ InitializeLatchWaitSet(void)
MyLatch, NULL);
Assert(latch_pos == LatchWaitSetLatchPos);
- /*
- * WaitLatch will modify this to WL_EXIT_ON_PM_DEATH or
- * WL_POSTMASTER_DEATH on each call.
- */
if (IsUnderPostmaster)
- {
- latch_pos = AddWaitEventToSet(LatchWaitSet, WL_EXIT_ON_PM_DEATH,
- PGINVALID_SOCKET, NULL, NULL);
- Assert(latch_pos == LatchWaitSetPostmasterDeathPos);
- }
+ AddWaitEventToSet(LatchWaitSet, WL_EXIT_ON_PM_DEATH,
+ PGINVALID_SOCKET, NULL, NULL);
}
/*
@@ -174,25 +166,17 @@ WaitLatch(Latch *latch, int wakeEvents, long timeout,
{
WaitEvent event;
- /* Postmaster-managed callers must handle postmaster death somehow. */
- Assert(!IsUnderPostmaster ||
- (wakeEvents & WL_EXIT_ON_PM_DEATH) ||
- (wakeEvents & WL_POSTMASTER_DEATH));
+ /* Postmaster-managed callers must exit on postmaster death. */
+ Assert(!IsUnderPostmaster || (wakeEvents & WL_EXIT_ON_PM_DEATH));
/*
- * Some callers may have a latch other than MyLatch, or no latch at all,
- * or want to handle postmaster death differently. It's cheap to assign
- * those, so just do it every time.
+ * Some callers may have a latch other than MyLatch, or no latch at all.
+ * It's cheap to assign that, so just do it every time.
*/
if (!(wakeEvents & WL_LATCH_SET))
latch = NULL;
ModifyWaitEvent(LatchWaitSet, LatchWaitSetLatchPos, WL_LATCH_SET, latch);
- if (IsUnderPostmaster)
- ModifyWaitEvent(LatchWaitSet, LatchWaitSetPostmasterDeathPos,
- (wakeEvents & (WL_EXIT_ON_PM_DEATH | WL_POSTMASTER_DEATH)),
- NULL);
-
if (WaitEventSetWait(LatchWaitSet,
(wakeEvents & WL_TIMEOUT) ? timeout : -1,
&event, 1,
@@ -210,10 +194,8 @@ WaitLatch(Latch *latch, int wakeEvents, long timeout,
* to be reported as readable/writable/connected, so that the caller can deal
* with the condition.
*
- * wakeEvents must include either WL_EXIT_ON_PM_DEATH for automatic exit
- * if the postmaster dies or WL_POSTMASTER_DEATH for a flag set in the
- * return value if the postmaster dies. The latter is useful for rare cases
- * where some behavior other than immediate exit is needed.
+ * wakeEvents must include WL_EXIT_ON_PM_DEATH for automatic exit
+ * if the postmaster dies.
*
* NB: These days this is just a wrapper around the WaitEventSet API. When
* using a latch very frequently, consider creating a longer living
@@ -237,14 +219,8 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
AddWaitEventToSet(set, WL_LATCH_SET, PGINVALID_SOCKET,
latch, NULL);
- /* Postmaster-managed callers must handle postmaster death somehow. */
- Assert(!IsUnderPostmaster ||
- (wakeEvents & WL_EXIT_ON_PM_DEATH) ||
- (wakeEvents & WL_POSTMASTER_DEATH));
-
- if ((wakeEvents & WL_POSTMASTER_DEATH) && IsUnderPostmaster)
- AddWaitEventToSet(set, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
- NULL, NULL);
+ /* Postmaster-managed callers must exit on postmaster death. */
+ Assert(!IsUnderPostmaster || (wakeEvents & WL_EXIT_ON_PM_DEATH));
if ((wakeEvents & WL_EXIT_ON_PM_DEATH) && IsUnderPostmaster)
AddWaitEventToSet(set, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
@@ -263,11 +239,7 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
if (rc == 0)
ret |= WL_TIMEOUT;
else
- {
- ret |= event.events & (WL_LATCH_SET |
- WL_POSTMASTER_DEATH |
- WL_SOCKET_MASK);
- }
+ ret |= event.events & (WL_LATCH_SET | WL_SOCKET_MASK);
FreeWaitEventSet(set);
diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index f2ea01622f9..b1327685c18 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -22,14 +22,18 @@
#endif
#include "miscadmin.h"
+#include "port/atomics.h"
#include "postmaster/postmaster.h"
#include "replication/walsender.h"
#include "storage/ipc.h"
#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/procnumber.h"
#include "storage/shmem.h"
#include "utils/memutils.h"
+
/*
* The postmaster is signaled by its children by sending SIGUSR1. The
* specific reason is communicated via flags in shared memory. We keep
@@ -90,36 +94,12 @@ NON_EXEC_STATIC volatile PMSignalData *PMSignalState = NULL;
*/
static int num_child_flags;
-/*
- * Signal handler to be notified if postmaster dies.
- */
-#ifdef USE_POSTMASTER_DEATH_SIGNAL
-volatile sig_atomic_t postmaster_possibly_dead = false;
-
-static void
-postmaster_death_handler(SIGNAL_ARGS)
-{
- postmaster_possibly_dead = true;
-}
-
-/*
- * The available signals depend on the OS. SIGUSR1 and SIGUSR2 are already
- * used for other things, so choose another one.
- *
- * Currently, we assume that we can always find a signal to use. That
- * seems like a reasonable assumption for all platforms that are modern
- * enough to have a parent-death signaling mechanism.
- */
-#if defined(SIGINFO)
-#define POSTMASTER_DEATH_SIGNAL SIGINFO
-#elif defined(SIGPWR)
-#define POSTMASTER_DEATH_SIGNAL SIGPWR
-#else
-#error "cannot find a signal to use for postmaster death"
+/* Can we ask the kernel to signal children on postmaster death? */
+#if (defined(HAVE_SYS_PRCTL_H) && defined(PR_SET_PDEATHSIG)) || \
+ (defined(HAVE_SYS_PROCCTL_H) && defined(PROC_PDEATHSIG_CTL))
+#define USE_POSTMASTER_DEATH_SIGNAL
#endif
-#endif /* USE_POSTMASTER_DEATH_SIGNAL */
-
static void MarkPostmasterChildInactive(int code, Datum arg);
/*
@@ -336,24 +316,11 @@ MarkPostmasterChildInactive(int code, Datum arg)
/*
- * PostmasterIsAliveInternal - check whether postmaster process is still alive
- *
- * This is the slow path of PostmasterIsAlive(), where the caller has already
- * checked 'postmaster_possibly_dead'. (On platforms that don't support
- * a signal for parent death, PostmasterIsAlive() is just an alias for this.)
+ * PostmasterIsAlive - check whether postmaster process is still alive
*/
bool
-PostmasterIsAliveInternal(void)
+PostmasterIsAlive(void)
{
-#ifdef USE_POSTMASTER_DEATH_SIGNAL
- /*
- * Reset the flag before checking, so that we don't miss a signal if
- * postmaster dies right after the check. If postmaster was indeed dead,
- * we'll re-arm it before returning to caller.
- */
- postmaster_possibly_dead = false;
-#endif
-
#ifndef WIN32
{
char c;
@@ -374,10 +341,6 @@ PostmasterIsAliveInternal(void)
* call.
*/
-#ifdef USE_POSTMASTER_DEATH_SIGNAL
- postmaster_possibly_dead = true;
-#endif
-
if (rc < 0)
elog(FATAL, "read on postmaster death monitoring pipe failed: %m");
else if (rc > 0)
@@ -387,30 +350,19 @@ PostmasterIsAliveInternal(void)
}
}
-#else /* WIN32 */
- if (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT)
- return true;
- else
- {
-#ifdef USE_POSTMASTER_DEATH_SIGNAL
- postmaster_possibly_dead = true;
-#endif
- return false;
- }
+#else /* !WIN32 */
+ return WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT;
#endif /* WIN32 */
}
/*
- * PostmasterDeathSignalInit - request signal on postmaster death if possible
+ * PostmasterDeathSignalInit - request SIGQUIT on postmaster death if possible
*/
void
PostmasterDeathSignalInit(void)
{
#ifdef USE_POSTMASTER_DEATH_SIGNAL
- int signum = POSTMASTER_DEATH_SIGNAL;
-
- /* Register our signal handler. */
- pqsignal(signum, postmaster_death_handler);
+ int signum = SIGQUIT;
/* Request a signal on parent exit. */
#if defined(PR_SET_PDEATHSIG)
@@ -424,9 +376,37 @@ PostmasterDeathSignalInit(void)
#endif
/*
- * Just in case the parent was gone already and we missed it, we'd better
- * check the slow way on the first call.
+ * If the postmaster exited concurrently, this process might have been
+ * re-parented to init or another reaper process already. Close that race
+ * with an explicit poll.
*/
- postmaster_possibly_dead = true;
+ if (!PostmasterIsAlive())
+ ExitOnPostmasterDeath();
#endif /* USE_POSTMASTER_DEATH_SIGNAL */
}
+
+/*
+ * Called when the postmaster is known to have exited.
+ */
+void
+ExitOnPostmasterDeath(void)
+{
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+
+ /*
+ * Propagate knowledge of postmaster exit by sending SIGQUIT to all other
+ * backends, on systems where the kernel won't do that.
+ */
+ pg_memory_barrier();
+ for (ProcNumber p = 0; p < ProcGlobal->allProcCount; p++)
+ {
+ PGPROC *proc = GetPGProcByNumber(p);
+ pid_t pid = proc->pid;
+
+ if (pid != 0 && pid != MyProcPid)
+ kill(pid, SIGQUIT);
+ }
+#endif
+
+ raise(SIGQUIT);
+}
diff --git a/src/backend/storage/ipc/waiteventset.c b/src/backend/storage/ipc/waiteventset.c
index 7c0e66900f9..e190bc2bee4 100644
--- a/src/backend/storage/ipc/waiteventset.c
+++ b/src/backend/storage/ipc/waiteventset.c
@@ -11,7 +11,7 @@
* - a latch being set from another process or from signal handler in the same
* process (WL_LATCH_SET)
* - data to become readable or writeable on a socket (WL_SOCKET_*)
- * - postmaster death (WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH)
+ * - postmaster death (WL_EXIT_ON_PM_DEATH)
* - timeout (WL_TIMEOUT)
*
* Implementation
@@ -135,13 +135,6 @@ struct WaitEventSet
Latch *latch;
int latch_pos;
- /*
- * WL_EXIT_ON_PM_DEATH is converted to WL_POSTMASTER_DEATH, but this flag
- * is set so that we'll exit immediately if postmaster death is detected,
- * instead of returning.
- */
- bool exit_on_postmaster_death;
-
#if defined(WAIT_USE_EPOLL)
int epoll_fd;
/* epoll_wait returns events in a user provided arrays, allocate once */
@@ -150,7 +143,6 @@ struct WaitEventSet
int kqueue_fd;
/* kevent returns events in a user provided arrays, allocate once */
struct kevent *kqueue_ret_events;
- bool report_postmaster_not_running;
#elif defined(WAIT_USE_POLL)
/* poll expects events to be waited on every poll() call, prepare once */
struct pollfd *pollfds;
@@ -413,7 +405,6 @@ CreateWaitEventSet(ResourceOwner resowner, int nevents)
set->latch = NULL;
set->nevents_space = nevents;
- set->exit_on_postmaster_death = false;
if (resowner != NULL)
{
@@ -448,7 +439,6 @@ CreateWaitEventSet(ResourceOwner resowner, int nevents)
errno = save_errno;
elog(ERROR, "fcntl(F_SETFD) failed on kqueue descriptor: %m");
}
- set->report_postmaster_not_running = false;
#elif defined(WAIT_USE_WIN32)
/*
@@ -500,7 +490,7 @@ FreeWaitEventSet(WaitEventSet *set)
{
/* uses the latch's HANDLE */
}
- else if (cur_event->events & WL_POSTMASTER_DEATH)
+ else if (cur_event->events & WL_EXIT_ON_PM_DEATH)
{
/* uses PostmasterHandle */
}
@@ -536,7 +526,6 @@ FreeWaitEventSetAfterFork(WaitEventSet *set)
/* ---
* Add an event to the set. Possible events are:
* - WL_LATCH_SET: Wait for the latch to be set
- * - WL_POSTMASTER_DEATH: Wait for postmaster to die
* - WL_SOCKET_READABLE: Wait for socket to become readable,
* can be combined in one event with other WL_SOCKET_* events
* - WL_SOCKET_WRITEABLE: Wait for socket to become writeable,
@@ -574,12 +563,6 @@ AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch,
/* not enough space */
Assert(set->nevents < set->nevents_space);
- if (events == WL_EXIT_ON_PM_DEATH)
- {
- events = WL_POSTMASTER_DEATH;
- set->exit_on_postmaster_death = true;
- }
-
if (latch)
{
if (latch->owner_pid != MyProcPid)
@@ -623,7 +606,7 @@ AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch,
#endif
#endif
}
- else if (events == WL_POSTMASTER_DEATH)
+ else if (events == WL_EXIT_ON_PM_DEATH)
{
#ifndef WIN32
event->fd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
@@ -666,21 +649,6 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
old_events = event->events;
#endif
- /*
- * Allow switching between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH.
- *
- * Note that because WL_EXIT_ON_PM_DEATH is mapped to WL_POSTMASTER_DEATH
- * in AddWaitEventToSet(), this needs to be checked before the fast-path
- * below that checks if 'events' has changed.
- */
- if (event->events == WL_POSTMASTER_DEATH)
- {
- if (events != WL_POSTMASTER_DEATH && events != WL_EXIT_ON_PM_DEATH)
- elog(ERROR, "cannot remove postmaster death event");
- set->exit_on_postmaster_death = ((events & WL_EXIT_ON_PM_DEATH) != 0);
- return;
- }
-
/*
* If neither the event mask nor the associated latch changes, return
* early. That's an important optimization for some sockets, where
@@ -750,7 +718,7 @@ WaitEventAdjustEpoll(WaitEventSet *set, WaitEvent *event, int action)
Assert(set->latch != NULL);
epoll_ev.events |= EPOLLIN;
}
- else if (event->events == WL_POSTMASTER_DEATH)
+ else if (event->events == WL_EXIT_ON_PM_DEATH)
{
epoll_ev.events |= EPOLLIN;
}
@@ -799,7 +767,7 @@ WaitEventAdjustPoll(WaitEventSet *set, WaitEvent *event)
Assert(set->latch != NULL);
pollfd->events = POLLIN;
}
- else if (event->events == WL_POSTMASTER_DEATH)
+ else if (event->events == WL_EXIT_ON_PM_DEATH)
{
pollfd->events = POLLIN;
}
@@ -888,12 +856,12 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
Assert(event->events != WL_LATCH_SET || set->latch != NULL);
Assert(event->events == WL_LATCH_SET ||
- event->events == WL_POSTMASTER_DEATH ||
+ event->events == WL_EXIT_ON_PM_DEATH ||
(event->events & (WL_SOCKET_READABLE |
WL_SOCKET_WRITEABLE |
WL_SOCKET_CLOSED)));
- if (event->events == WL_POSTMASTER_DEATH)
+ if (event->events == WL_EXIT_ON_PM_DEATH)
{
/*
* Unlike all the other implementations, we detect postmaster death
@@ -947,31 +915,29 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
/*
* When adding the postmaster's pid, we have to consider that it might
* already have exited and perhaps even been replaced by another process
- * with the same pid. If so, we have to defer reporting this as an event
- * until the next call to WaitEventSetWaitBlock().
+ * with the same pid.
*/
-
if (rc < 0)
{
- if (event->events == WL_POSTMASTER_DEATH &&
+ if (event->events == WL_EXIT_ON_PM_DEATH &&
(errno == ESRCH || errno == EACCES))
- set->report_postmaster_not_running = true;
+ ExitOnPostmasterDeath(); /* does not return */
else
ereport(ERROR,
(errcode_for_socket_access(),
errmsg("%s() failed: %m",
"kevent")));
}
- else if (event->events == WL_POSTMASTER_DEATH &&
+ else if (event->events == WL_EXIT_ON_PM_DEATH &&
PostmasterPid != getppid() &&
!PostmasterIsAlive())
{
/*
- * The extra PostmasterIsAliveInternal() check prevents false alarms
- * on systems that give a different value for getppid() while being
+ * The extra PostmasterIsAlive() check prevents false alarms on
+ * systems that give a different value for getppid() while being
* traced by a debugger.
*/
- set->report_postmaster_not_running = true;
+ ExitOnPostmasterDeath(); /* does not return */
}
}
@@ -988,7 +954,7 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
Assert(set->latch != NULL);
*handle = set->latch->event;
}
- else if (event->events == WL_POSTMASTER_DEATH)
+ else if (event->events == WL_EXIT_ON_PM_DEATH)
{
*handle = PostmasterHandle;
}
@@ -1241,29 +1207,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
returned_events++;
}
}
- else if (cur_event->events == WL_POSTMASTER_DEATH &&
+ else if (cur_event->events == WL_EXIT_ON_PM_DEATH &&
cur_epoll_event->events & (EPOLLIN | EPOLLERR | EPOLLHUP))
{
- /*
- * We expect an EPOLLHUP when the remote end is closed, but
- * because we don't expect the pipe to become readable or to have
- * any errors either, treat those cases as postmaster death, too.
- *
- * Be paranoid about a spurious event signaling the postmaster as
- * being dead. There have been reports about that happening with
- * older primitives (select(2) to be specific), and a spurious
- * WL_POSTMASTER_DEATH event would be painful. Re-checking doesn't
- * cost much.
- */
- if (!PostmasterIsAliveInternal())
- {
- if (set->exit_on_postmaster_death)
- proc_exit(1);
- occurred_events->fd = PGINVALID_SOCKET;
- occurred_events->events = WL_POSTMASTER_DEATH;
- occurred_events++;
- returned_events++;
- }
+ /* Double-check out of paranoia about spurious readiness events. */
+ if (!PostmasterIsAlive())
+ ExitOnPostmasterDeath(); /* does not return */
}
else if (cur_event->events & (WL_SOCKET_READABLE |
WL_SOCKET_WRITEABLE |
@@ -1333,19 +1282,6 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
timeout_p = &timeout;
}
- /*
- * Report postmaster events discovered by WaitEventAdjustKqueue() or an
- * earlier call to WaitEventSetWait().
- */
- if (unlikely(set->report_postmaster_not_running))
- {
- if (set->exit_on_postmaster_death)
- proc_exit(1);
- occurred_events->fd = PGINVALID_SOCKET;
- occurred_events->events = WL_POSTMASTER_DEATH;
- return 1;
- }
-
/* Sleep */
rc = kevent(set->kqueue_fd, NULL, 0,
set->kqueue_ret_events,
@@ -1400,23 +1336,11 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
returned_events++;
}
}
- else if (cur_event->events == WL_POSTMASTER_DEATH &&
+ else if (cur_event->events == WL_EXIT_ON_PM_DEATH &&
cur_kqueue_event->filter == EVFILT_PROC &&
(cur_kqueue_event->fflags & NOTE_EXIT) != 0)
{
- /*
- * The kernel will tell this kqueue object only once about the
- * exit of the postmaster, so let's remember that for next time so
- * that we provide level-triggered semantics.
- */
- set->report_postmaster_not_running = true;
-
- if (set->exit_on_postmaster_death)
- proc_exit(1);
- occurred_events->fd = PGINVALID_SOCKET;
- occurred_events->events = WL_POSTMASTER_DEATH;
- occurred_events++;
- returned_events++;
+ ExitOnPostmasterDeath(); /* does not return */
}
else if (cur_event->events & (WL_SOCKET_READABLE |
WL_SOCKET_WRITEABLE |
@@ -1525,29 +1449,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
returned_events++;
}
}
- else if (cur_event->events == WL_POSTMASTER_DEATH &&
+ else if (cur_event->events == WL_EXIT_ON_PM_DEATH &&
(cur_pollfd->revents & (POLLIN | POLLHUP | POLLERR | POLLNVAL)))
{
- /*
- * We expect an POLLHUP when the remote end is closed, but because
- * we don't expect the pipe to become readable or to have any
- * errors either, treat those cases as postmaster death, too.
- *
- * Be paranoid about a spurious event signaling the postmaster as
- * being dead. There have been reports about that happening with
- * older primitives (select(2) to be specific), and a spurious
- * WL_POSTMASTER_DEATH event would be painful. Re-checking doesn't
- * cost much.
- */
- if (!PostmasterIsAliveInternal())
- {
- if (set->exit_on_postmaster_death)
- proc_exit(1);
- occurred_events->fd = PGINVALID_SOCKET;
- occurred_events->events = WL_POSTMASTER_DEATH;
- occurred_events++;
- returned_events++;
- }
+ /* Double-check out of paranoia about spurious readiness events. */
+ if (!PostmasterIsAlive())
+ ExitOnPostmasterDeath(); /* does not return */
}
else if (cur_event->events & (WL_SOCKET_READABLE |
WL_SOCKET_WRITEABLE |
@@ -1741,24 +1648,9 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
returned_events++;
}
}
- else if (cur_event->events == WL_POSTMASTER_DEATH)
+ else if (cur_event->events == WL_EXIT_ON_PM_DEATH)
{
- /*
- * Postmaster apparently died. Since the consequences of falsely
- * returning WL_POSTMASTER_DEATH could be pretty unpleasant, we
- * take the trouble to positively verify this with
- * PostmasterIsAlive(), even though there is no known reason to
- * think that the event could be falsely set on Windows.
- */
- if (!PostmasterIsAliveInternal())
- {
- if (set->exit_on_postmaster_death)
- proc_exit(1);
- occurred_events->fd = PGINVALID_SOCKET;
- occurred_events->events = WL_POSTMASTER_DEATH;
- occurred_events++;
- returned_events++;
- }
+ ExitOnPostmasterDeath(); /* does not return */
}
else if (cur_event->events & WL_SOCKET_MASK)
{
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0cecd464902..9fb553c633e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2966,6 +2966,29 @@ quickdie(SIGNAL_ARGS)
*/
error_context_stack = NULL;
+ /*
+ * A signal might arrive from the kernel (see
+ * PostmasterDeathSignalInit()), or from another backend that has detected
+ * postmaster death (see ExitOnPostmasterDeath()).
+ */
+ if (!PostmasterIsAlive())
+ {
+ ereport(WARNING_CLIENT_ONLY,
+ (errcode(ERRCODE_ADMIN_SHUTDOWN),
+ errmsg("terminating connection because the postmaster has exited")));
+
+#ifdef HAVE_SETSID
+
+ /*
+ * Propagate the signal to the rest of the session. See
+ * signal_child() in postmaster.c for motivation.
+ */
+ kill(-MyProcPid, SIGQUIT);
+#endif
+
+ goto exit;
+ }
+
/*
* When responding to a postmaster-issued signal, we send the message only
* to the client; sending to the server log just creates log spam, plus
@@ -3002,6 +3025,8 @@ quickdie(SIGNAL_ARGS)
break;
}
+exit:
+
/*
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
* because shared memory may be corrupted, so we don't want to try to
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index 058667a47a0..f3221cdf450 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -105,7 +105,6 @@ typedef enum BgwHandleStatus
BGWH_STARTED, /* worker is running */
BGWH_NOT_YET_STARTED, /* worker hasn't been started yet */
BGWH_STOPPED, /* worker has exited */
- BGWH_POSTMASTER_DIED, /* postmaster died; worker status unclear */
} BgwHandleStatus;
struct BackgroundWorkerHandle;
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 428aa3fd68a..b9f60bdae20 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -77,34 +77,8 @@ extern bool MarkPostmasterChildSlotUnassigned(int slot);
extern bool IsPostmasterChildWalSender(int slot);
extern void RegisterPostmasterChildActive(void);
extern void MarkPostmasterChildWalSender(void);
-extern bool PostmasterIsAliveInternal(void);
+extern bool PostmasterIsAlive(void);
extern void PostmasterDeathSignalInit(void);
-
-
-/*
- * Do we have a way to ask for a signal on parent death?
- *
- * If we do, pmsignal.c will set up a signal handler, that sets a flag when
- * the parent dies. Checking the flag first makes PostmasterIsAlive() a lot
- * cheaper in usual case that the postmaster is alive.
- */
-#if (defined(HAVE_SYS_PRCTL_H) && defined(PR_SET_PDEATHSIG)) || \
- (defined(HAVE_SYS_PROCCTL_H) && defined(PROC_PDEATHSIG_CTL))
-#define USE_POSTMASTER_DEATH_SIGNAL
-#endif
-
-#ifdef USE_POSTMASTER_DEATH_SIGNAL
-extern PGDLLIMPORT volatile sig_atomic_t postmaster_possibly_dead;
-
-static inline bool
-PostmasterIsAlive(void)
-{
- if (likely(!postmaster_possibly_dead))
- return true;
- return PostmasterIsAliveInternal();
-}
-#else
-#define PostmasterIsAlive() PostmasterIsAliveInternal()
-#endif
+extern void ExitOnPostmasterDeath(void);
#endif /* PMSIGNAL_H */
diff --git a/src/include/storage/waiteventset.h b/src/include/storage/waiteventset.h
index dd514d52991..8f1fd93e394 100644
--- a/src/include/storage/waiteventset.h
+++ b/src/include/storage/waiteventset.h
@@ -35,7 +35,6 @@
#define WL_SOCKET_READABLE (1 << 1)
#define WL_SOCKET_WRITEABLE (1 << 2)
#define WL_TIMEOUT (1 << 3) /* not for WaitEventSetWait() */
-#define WL_POSTMASTER_DEATH (1 << 4)
#define WL_EXIT_ON_PM_DEATH (1 << 5)
#ifdef WIN32
#define WL_SOCKET_CONNECTED (1 << 6)
diff --git a/src/test/modules/test_shm_mq/setup.c b/src/test/modules/test_shm_mq/setup.c
index 2a20ffb1273..e98b499a1b6 100644
--- a/src/test/modules/test_shm_mq/setup.c
+++ b/src/test/modules/test_shm_mq/setup.c
@@ -306,14 +306,14 @@ check_worker_status(worker_state *wstate)
{
int n;
- /* If any workers (or the postmaster) have died, we have failed. */
+ /* If any workers have died, we have failed. */
for (n = 0; n < wstate->nworkers; ++n)
{
BgwHandleStatus status;
pid_t pid;
status = GetBackgroundWorkerPid(wstate->handle[n], &pid);
- if (status == BGWH_STOPPED || status == BGWH_POSTMASTER_DIED)
+ if (status == BGWH_STOPPED)
return false;
}
diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index 9c53d896b6a..86190cc803d 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -478,11 +478,6 @@ worker_spi_launch(PG_FUNCTION_ARGS)
(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
errmsg("could not start background process"),
errhint("More details may be available in the server log.")));
- if (status == BGWH_POSTMASTER_DIED)
- ereport(ERROR,
- (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
- errmsg("cannot start background processes without postmaster"),
- errhint("Kill all remaining database processes and restart the database.")));
Assert(status == BGWH_STARTED);
PG_RETURN_INT32(pid);
diff --git a/src/test/recovery/t/017_shm.pl b/src/test/recovery/t/017_shm.pl
index c73aa3f0c2c..7f76ee6b367 100644
--- a/src/test/recovery/t/017_shm.pl
+++ b/src/test/recovery/t/017_shm.pl
@@ -113,12 +113,12 @@ log_ipcs();
my $regress_shlib = $ENV{REGRESS_SHLIB};
$gnat->safe_psql('postgres', <<EOSQL);
-CREATE FUNCTION wait_pid(int)
+CREATE FUNCTION sleep_blocked(int)
RETURNS void
AS '$regress_shlib'
LANGUAGE C STRICT;
EOSQL
-my $slow_query = 'SELECT wait_pid(pg_backend_pid())';
+my $slow_query = "SELECT sleep_blocked(${PostgreSQL::Test::Utils::timeout_default})";
my ($stdout, $stderr);
my $slow_client = IPC::Run::start(
[
@@ -169,7 +169,7 @@ command_fails_like(
log_ipcs();
# cleanup slow backend
-PostgreSQL::Test::Utils::system_log('pg_ctl', 'kill', 'QUIT', $slow_pid);
+PostgreSQL::Test::Utils::system_log('pg_ctl', 'kill', 'KILL', $slow_pid);
$slow_client->finish; # client has detected backend termination
log_ipcs();
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 465ac148ac9..c049208cb8c 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -18,6 +18,7 @@
#include <math.h>
#include <signal.h>
+#include <unistd.h>
#include "access/detoast.h"
#include "access/htup_details.h"
@@ -30,6 +31,7 @@
#include "executor/executor.h"
#include "executor/spi.h"
#include "funcapi.h"
+#include "libpq/pqsignal.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/supportnodes.h"
@@ -492,6 +494,25 @@ wait_pid(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+PG_FUNCTION_INFO_V1(sleep_blocked);
+
+/* Used by src/test/recovery/t/017_shm.pl to prevent exit after PM death */
+Datum
+sleep_blocked(PG_FUNCTION_ARGS)
+{
+ int seconds = PG_GETARG_INT32(0);
+ sigset_t save;
+ sigset_t mask;
+
+ /* Await SIGKILL or timeout. */
+ sigfillset(&mask);
+ sigprocmask(SIG_BLOCK, &mask, &save);
+ sleep(seconds);
+ sigprocmask(SIG_SETMASK, &save, NULL);
+
+ PG_RETURN_VOID();
+}
+
static void
test_atomic_flag(void)
{
--
2.39.5 (Apple Git-154)
Thomas Munro <thomas.munro@gmail.com> writes:
Here's an experimental patch to fix our shutdown strategy on
postmaster death, as discussed in a nearby report[1].
Thanks for tackling this topic.
For systems lacking that facility, the idea I'm trying out here is
that backends that detect the condition in WaitEventSetWait() should
themselves blast all backends with SIGQUIT, in a sense taking over the
role of the departed postmaster.
Hmm. Up to now, we have not had an assumption that postmaster
children are aware of every other postmaster child. In particular,
not all postmaster children have PGPROC entries. How much does
this matter? What happens if the shared PGPROC array is corrupt?
I didn't really want any
consensus/negotiation over who's going to do that, so... they all do.
Agreed on that point.
Most of the patch is just removing hundreds of lines of errors and
conditions and comments that were now unreachable.
The patch would likely be a lot more readable if you split out the
"delete unreachable code" part into a separate step.
regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes:
Following that line of thinking, we might as well just ask the kernel
to hit our existing SIGQUIT handler at parent exit, on Linux/FreeBSD.
Job done.
One other thought here: do we *really* want such a critical-and-hard-
to-test aspect of our behavior to be handled completely differently
on different platforms? I'd lean to ignoring the Linux/FreeBSD
facilities, because otherwise we're basically doubling our testing
problems in exchange for not much.
regards, tom lane
On Thu, Aug 21, 2025 at 5:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm. Up to now, we have not had an assumption that postmaster
children are aware of every other postmaster child. In particular,
not all postmaster children have PGPROC entries. How much does
this matter? What happens if the shared PGPROC array is corrupt?
It's also how we set latches, but yeah it's certainly an issue.
Other ideas:
1. My other patch that used O_ASYNC (= ask the kernel to send SIGIO
when the pipe becomes readable) worked, but required a pipe or socket
pair per backend and is not actually in any standard. I think it is
available almost everywhere anyway. I could rejuvenate that just to
try out again.
2. I wonder if we could make better use of session IDs. I understand
that we use them to signal eg archiver + its children, but I wonder if
we could use a different granularity. postmaster's sid for most
stuff, and per-backend sids when really needed, and then you just have
to signal a small number of sessions, perhaps more than one but not
much more. We pretend that setsid is optional but it's old POSIX and
everywhere. I also know that Windows has a similar thing, I just
haven't looked into it.
Most of the patch is just removing hundreds of lines of errors and
conditions and comments that were now unreachable.The patch would likely be a lot more readable if you split out the
"delete unreachable code" part into a separate step.
Will do.
On Thu, Aug 21, 2025 at 5:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
One other thought here: do we *really* want such a critical-and-hard-
to-test aspect of our behavior to be handled completely differently
on different platforms? I'd lean to ignoring the Linux/FreeBSD
facilities, because otherwise we're basically doubling our testing
problems in exchange for not much.
Yeah. That attraction is that it's extremely simple and reliable:
set-and-forget, adding one line that sends you into well tested
immediate shutdown code. Combined with the fact that most of our user
base has it, that seemed attractive. The reliability aspects I was
thinking of are: (1) the kernel's knowledge of the process tree is
infallible by definition, (2) it's handled asynchronously on
postmaster exit, not after a POLLHUP, EVFILT_PROCESS, or process
HANDLE event that must be consumed synchronously by at least one
child.
For (2), in practice I think it's close to 100% certain that one
backend will currently or very soon be in WaitEventSetWait() and thus
drive the cleanup operation, and I think it's probably good enough.
For example, even if your backends are all busy, there's basically
always a bunch of "launchers" and other auxiliary processes ready and
waiting to deal with it. But it's possible to dream up extreme
theoretical scenarios where that bet fails: imagine if every single
backend except for one is current waiting for a lock in sem_wait()
(let's say it's the same lock for simplicity). I previously said in
some throwaway comment that they can't all be blocked in sem_wait() or
you already have a deadlock (a programming bug that isn't this
system's fault), but if the postmaster AND the backend that holds the
lock are killed by the OOM killer, you lose. Those backends would
need to be cleaned up manually by an administrator in all released
versions of PostgreSQL, and it's be not better with the v1 patch on
Windows and macOS. They'd all eat SIGQUIT on a Linux or FreeBSD
system with the v1 patch, so paper at least it's more hole-proof.
I agree that it would be nice to have just one system though, and of
course to make it completely reliable everywhere without complicated
theories.
One argument I thought of against PROC_PDEATHSIG_CTL is that its
simplicity also takes away some possibilities. Yesterday I wrote
"taking over the role of the departed Postmaster", and realised it's
not the whole enchilada: do we also want the "issuing SIGKILL to
recalcitrant children" bit? I don't want this system to be
complicated, rather the opposite, but I wonder if there is a nice way
to make it run *literally* the same code as the postmaster. We'd need
bulletproof data structure sharing, or preferably, no sharing of
modifiable data at all. Some ideas I'm looking into: better use of
process groups, or maybe doing the book keeping in memory that is not
even mapped into children until they need it. Or something.
Researching...