Refactoring postmaster's code to cleanup after child exit

Started by Heikki Linnakangasover 1 year ago43 messages

hlinnaka@iki.fi

over 1 year ago

4 attachment(s)

Reading through postmaster code, I spotted some refactoring
opportunities to make it slightly more readable.

Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList to see if it was a bgworker process. If not
found, it scans through the BackendList to see if it was a backend
process (which it really should be then). That feels a bit silly,
because every running background worker process also has an entry in
BackendList. There's a lot of duplication between
CleanupBackgroundWorker and CleanupBackend.

Before commit 8a02b3d732, we used to created Backend entries only for
background worker processes that connected to a database, not for other
background worker processes. I think that's why we have the code
structure we have. But now that we have a Backend entry for all bgworker
processes, it's more natural to have single function to deal with both
regular backends and bgworkers.

So I came up with the attached patches. This doesn't make any meaningful
user-visible changes, except for some incidental changes in log messages
(see commit message for details).

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v1-0001-Fix-outdated-comment-all-running-bgworkers-are-in.patchtext/x-patch; charset=UTF-8; name=v1-0001-Fix-outdated-comment-all-running-bgworkers-are-in.patchDownload

From dd0ee8533a3ab5d037ca7c070bf89ad94c96d4b2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sat, 6 Jul 2024 20:55:19 +0300
Subject: [PATCH v1 1/4] Fix outdated comment; all running bgworkers are in
 BackendList

Before commit 8a02b3d732, only bgworkers that connected to a database
had an entry in the Backendlist. Commit 8a02b3d732 changed that, but
forgot to update this comment.
---
 src/include/postmaster/bgworker_internals.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index 9106a0ef3f..61ba54117a 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -26,14 +26,14 @@
 /*
  * List of background workers, private to postmaster.
  *
- * A worker that requests a database connection during registration will have
- * rw_backend set, and will be present in BackendList.  Note: do not rely on
- * rw_backend being non-NULL for shmem-connected workers!
+ * All workers that are currently running will have rw_backend set, and will
+ * be present in BackendList.
  */
 typedef struct RegisteredBgWorker
 {
 	BackgroundWorker rw_worker; /* its registry entry */
-	struct bkend *rw_backend;	/* its BackendList entry, or NULL */
+	struct bkend *rw_backend;	/* its BackendList entry, or NULL if not
+								 * running */
 	pid_t		rw_pid;			/* 0 if not running */
 	int			rw_child_slot;
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
-- 
2.39.2

v1-0002-Minor-refactoring-of-assign_backendlist_entry.patchtext/x-patch; charset=UTF-8; name=v1-0002-Minor-refactoring-of-assign_backendlist_entry.patchDownload

From f0646000628359a1ce2a01be25fe993b50aeb396 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sat, 6 Jul 2024 20:55:28 +0300
Subject: [PATCH v1 2/4] Minor refactoring of assign_backendlist_entry()

Make assign_backendlist_entry() responsible just for allocating
Backend struct. Linking it to the RegisteredBgWorker is the caller's
responsibility now. Seems more clear that way.
---
 src/backend/postmaster/postmaster.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6f974a8d21..ac54798965 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -422,7 +422,7 @@ static void TerminateChildren(int signal);
 #define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
 
 static int	CountChildren(int target);
-static bool assign_backendlist_entry(RegisteredBgWorker *rw);
+static Backend *assign_backendlist_entry(void);
 static void maybe_start_bgworkers(void);
 static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
 static pid_t StartChildProcess(BackendType type);
@@ -4160,6 +4160,7 @@ MaxLivePostmasterChildren(void)
 static bool
 do_start_bgworker(RegisteredBgWorker *rw)
 {
+	Backend    *bn;
 	pid_t		worker_pid;
 
 	Assert(rw->rw_pid == 0);
@@ -4174,11 +4175,14 @@ do_start_bgworker(RegisteredBgWorker *rw)
 	 * tried again right away, most likely we'd find ourselves hitting the
 	 * same resource-exhaustion condition.
 	 */
-	if (!assign_backendlist_entry(rw))
+	bn = assign_backendlist_entry();
+	if (bn == NULL)
 	{
 		rw->rw_crashed_at = GetCurrentTimestamp();
 		return false;
 	}
+	rw->rw_backend = bn;
+	rw->rw_child_slot = bn->child_slot;
 
 	ereport(DEBUG1,
 			(errmsg_internal("starting background worker process \"%s\"",
@@ -4254,12 +4258,10 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
  * Allocate the Backend struct for a connected background worker, but don't
  * add it to the list of backends just yet.
  *
- * On failure, return false without changing any worker state.
- *
- * Some info from the Backend is copied into the passed rw.
+ * On failure, return NULL.
  */
-static bool
-assign_backendlist_entry(RegisteredBgWorker *rw)
+static Backend *
+assign_backendlist_entry(void)
 {
 	Backend    *bn;
 
@@ -4273,7 +4275,7 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
 				 errmsg("no slot available for new background worker process")));
-		return false;
+		return NULL;
 	}
 
 	/*
@@ -4287,7 +4289,7 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("could not generate random cancel key")));
-		return false;
+		return NULL;
 	}
 
 	bn = palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
@@ -4296,7 +4298,7 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
 				 errmsg("out of memory")));
-		return false;
+		return NULL;
 	}
 
 	bn->cancel_key = MyCancelKey;
@@ -4305,10 +4307,7 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
 	bn->dead_end = false;
 	bn->bgworker_notify = false;
 
-	rw->rw_backend = bn;
-	rw->rw_child_slot = bn->child_slot;
-
-	return true;
+	return bn;
 }
 
 /*
-- 
2.39.2

v1-0003-Make-BackgroundWorkerList-doubly-linked.patchtext/x-patch; charset=UTF-8; name=v1-0003-Make-BackgroundWorkerList-doubly-linked.patchDownload

From d9873670381f3c523d4c32dd58d475f2eaeb7a94 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sat, 6 Jul 2024 20:56:51 +0300
Subject: [PATCH v1 3/4] Make BackgroundWorkerList doubly-linked

This allows ForgetBackgroundWorker() and ReportBackgroundWorkerExit()
to take a RegisteredBgWorker pointer as argument, rather than a list
iterator. That feels a little more natural. But more importantly, this
paves the way for more refactoring in the next commit.
---
 src/backend/postmaster/bgworker.c           | 62 ++++++++++-----------
 src/backend/postmaster/postmaster.c         | 40 ++++++-------
 src/include/postmaster/bgworker_internals.h | 10 ++--
 3 files changed, 54 insertions(+), 58 deletions(-)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 77707bb384..981d8177b0 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -37,7 +37,7 @@
 /*
  * The postmaster's list of registered background workers, in private memory.
  */
-slist_head	BackgroundWorkerList = SLIST_STATIC_INIT(BackgroundWorkerList);
+dlist_head	BackgroundWorkerList = DLIST_STATIC_INIT(BackgroundWorkerList);
 
 /*
  * BackgroundWorkerSlots exist in shared memory and can be accessed (via
@@ -168,7 +168,7 @@ BackgroundWorkerShmemInit(void)
 										   &found);
 	if (!IsUnderPostmaster)
 	{
-		slist_iter	siter;
+		dlist_iter	iter;
 		int			slotno = 0;
 
 		BackgroundWorkerData->total_slots = max_worker_processes;
@@ -181,12 +181,12 @@ BackgroundWorkerShmemInit(void)
 		 * correspondence between the postmaster's private list and the array
 		 * in shared memory.
 		 */
-		slist_foreach(siter, &BackgroundWorkerList)
+		dlist_foreach(iter, &BackgroundWorkerList)
 		{
 			BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
 			RegisteredBgWorker *rw;
 
-			rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+			rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 			Assert(slotno < max_worker_processes);
 			slot->in_use = true;
 			slot->terminate = false;
@@ -220,13 +220,13 @@ BackgroundWorkerShmemInit(void)
 static RegisteredBgWorker *
 FindRegisteredWorkerBySlotNumber(int slotno)
 {
-	slist_iter	siter;
+	dlist_iter	iter;
 
-	slist_foreach(siter, &BackgroundWorkerList)
+	dlist_foreach(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		if (rw->rw_shmem_slot == slotno)
 			return rw;
 	}
@@ -413,29 +413,25 @@ BackgroundWorkerStateChange(bool allow_new_workers)
 				(errmsg_internal("registering background worker \"%s\"",
 								 rw->rw_worker.bgw_name)));
 
-		slist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
+		dlist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
 	}
 }
 
 /*
  * Forget about a background worker that's no longer needed.
  *
- * The worker must be identified by passing an slist_mutable_iter that
- * points to it.  This convention allows deletion of workers during
- * searches of the worker list, and saves having to search the list again.
+ * NOTE: The entry is unlinked from BackgroundWorkerList.  If the caller is
+ * iterating through it, better use a mutable iterator!
  *
  * Caller is responsible for notifying bgw_notify_pid, if appropriate.
  *
  * This function must be invoked only in the postmaster.
  */
 void
-ForgetBackgroundWorker(slist_mutable_iter *cur)
+ForgetBackgroundWorker(RegisteredBgWorker *rw)
 {
-	RegisteredBgWorker *rw;
 	BackgroundWorkerSlot *slot;
 
-	rw = slist_container(RegisteredBgWorker, rw_lnode, cur->cur);
-
 	Assert(rw->rw_shmem_slot < max_worker_processes);
 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 	Assert(slot->in_use);
@@ -454,7 +450,7 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
 			(errmsg_internal("unregistering background worker \"%s\"",
 							 rw->rw_worker.bgw_name)));
 
-	slist_delete_current(cur);
+	dlist_delete(&rw->rw_lnode);
 	pfree(rw);
 }
 
@@ -480,17 +476,17 @@ ReportBackgroundWorkerPID(RegisteredBgWorker *rw)
  * Report that the PID of a background worker is now zero because a
  * previously-running background worker has exited.
  *
+ * NOTE: The entry may be unlinked from BackgroundWorkerList.  If the caller
+ * is iterating through it, better use a mutable iterator!
+ *
  * This function should only be called from the postmaster.
  */
 void
-ReportBackgroundWorkerExit(slist_mutable_iter *cur)
+ReportBackgroundWorkerExit(RegisteredBgWorker *rw)
 {
-	RegisteredBgWorker *rw;
 	BackgroundWorkerSlot *slot;
 	int			notify_pid;
 
-	rw = slist_container(RegisteredBgWorker, rw_lnode, cur->cur);
-
 	Assert(rw->rw_shmem_slot < max_worker_processes);
 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 	slot->pid = rw->rw_pid;
@@ -505,7 +501,7 @@ ReportBackgroundWorkerExit(slist_mutable_iter *cur)
 	 */
 	if (rw->rw_terminate ||
 		rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
-		ForgetBackgroundWorker(cur);
+		ForgetBackgroundWorker(rw);
 
 	if (notify_pid != 0)
 		kill(notify_pid, SIGUSR1);
@@ -519,13 +515,13 @@ ReportBackgroundWorkerExit(slist_mutable_iter *cur)
 void
 BackgroundWorkerStopNotifications(pid_t pid)
 {
-	slist_iter	siter;
+	dlist_iter	iter;
 
-	slist_foreach(siter, &BackgroundWorkerList)
+	dlist_foreach(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		if (rw->rw_worker.bgw_notify_pid == pid)
 			rw->rw_worker.bgw_notify_pid = 0;
 	}
@@ -546,14 +542,14 @@ BackgroundWorkerStopNotifications(pid_t pid)
 void
 ForgetUnstartedBackgroundWorkers(void)
 {
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 		BackgroundWorkerSlot *slot;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		Assert(rw->rw_shmem_slot < max_worker_processes);
 		slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 
@@ -564,7 +560,7 @@ ForgetUnstartedBackgroundWorkers(void)
 			/* ... then zap it, and notify the waiter */
 			int			notify_pid = rw->rw_worker.bgw_notify_pid;
 
-			ForgetBackgroundWorker(&iter);
+			ForgetBackgroundWorker(rw);
 			if (notify_pid != 0)
 				kill(notify_pid, SIGUSR1);
 		}
@@ -584,13 +580,13 @@ ForgetUnstartedBackgroundWorkers(void)
 void
 ResetBackgroundWorkerCrashTimes(void)
 {
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 		if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
 		{
@@ -601,7 +597,7 @@ ResetBackgroundWorkerCrashTimes(void)
 			 * parallel_terminate_count will get incremented after we've
 			 * already zeroed parallel_register_count, which would be bad.)
 			 */
-			ForgetBackgroundWorker(&iter);
+			ForgetBackgroundWorker(rw);
 		}
 		else
 		{
@@ -1036,7 +1032,7 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 	rw->rw_crashed_at = 0;
 	rw->rw_terminate = false;
 
-	slist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
+	dlist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
 }
 
 /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ac54798965..f376d3b77b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1543,7 +1543,7 @@ DetermineSleepTime(void)
 
 	if (HaveCrashedWorker)
 	{
-		slist_mutable_iter siter;
+		dlist_mutable_iter iter;
 
 		/*
 		 * When there are crashed bgworkers, we sleep just long enough that
@@ -1551,12 +1551,12 @@ DetermineSleepTime(void)
 		 * determine the minimum of all wakeup times according to most recent
 		 * crash time and requested restart interval.
 		 */
-		slist_foreach_modify(siter, &BackgroundWorkerList)
+		dlist_foreach_modify(iter, &BackgroundWorkerList)
 		{
 			RegisteredBgWorker *rw;
 			TimestampTz this_wakeup;
 
-			rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+			rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 			if (rw->rw_crashed_at == 0)
 				continue;
@@ -1564,7 +1564,7 @@ DetermineSleepTime(void)
 			if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART
 				|| rw->rw_terminate)
 			{
-				ForgetBackgroundWorker(&siter);
+				ForgetBackgroundWorker(rw);
 				continue;
 			}
 
@@ -2696,13 +2696,13 @@ CleanupBackgroundWorker(int pid,
 						int exitstatus) /* child's exit status */
 {
 	char		namebuf[MAXPGPATH];
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 		if (rw->rw_pid != pid)
 			continue;
@@ -2768,7 +2768,7 @@ CleanupBackgroundWorker(int pid,
 		rw->rw_backend = NULL;
 		rw->rw_pid = 0;
 		rw->rw_child_slot = 0;
-		ReportBackgroundWorkerExit(&iter);	/* report child death */
+		ReportBackgroundWorkerExit(rw); /* report child death */
 
 		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
 					 namebuf, pid, exitstatus);
@@ -2873,8 +2873,8 @@ CleanupBackend(int pid,
 static void
 HandleChildCrash(int pid, int exitstatus, const char *procname)
 {
-	dlist_mutable_iter iter;
-	slist_iter	siter;
+	dlist_iter	iter;
+	dlist_mutable_iter miter;
 	Backend    *bp;
 	bool		take_action;
 
@@ -2896,11 +2896,11 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	}
 
 	/* Process background workers. */
-	slist_foreach(siter, &BackgroundWorkerList)
+	dlist_foreach(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		if (rw->rw_pid == 0)
 			continue;			/* not running */
 		if (rw->rw_pid == pid)
@@ -2933,9 +2933,9 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	}
 
 	/* Process regular backends */
-	dlist_foreach_modify(iter, &BackendList)
+	dlist_foreach_modify(miter, &BackendList)
 	{
-		bp = dlist_container(Backend, elem, iter.cur);
+		bp = dlist_container(Backend, elem, miter.cur);
 
 		if (bp->pid == pid)
 		{
@@ -2949,7 +2949,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 				ShmemBackendArrayRemove(bp);
 #endif
 			}
-			dlist_delete(iter.cur);
+			dlist_delete(miter.cur);
 			pfree(bp);
 			/* Keep looping so we can signal remaining backends */
 		}
@@ -4327,7 +4327,7 @@ maybe_start_bgworkers(void)
 #define MAX_BGWORKERS_TO_LAUNCH 100
 	int			num_launched = 0;
 	TimestampTz now = 0;
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
 	/*
 	 * During crash recovery, we have no need to be called until the state
@@ -4344,11 +4344,11 @@ maybe_start_bgworkers(void)
 	StartWorkerNeeded = false;
 	HaveCrashedWorker = false;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 		/* ignore if already running */
 		if (rw->rw_pid != 0)
@@ -4357,7 +4357,7 @@ maybe_start_bgworkers(void)
 		/* if marked for death, clean up and remove from list */
 		if (rw->rw_terminate)
 		{
-			ForgetBackgroundWorker(&iter);
+			ForgetBackgroundWorker(rw);
 			continue;
 		}
 
@@ -4376,7 +4376,7 @@ maybe_start_bgworkers(void)
 
 				notify_pid = rw->rw_worker.bgw_notify_pid;
 
-				ForgetBackgroundWorker(&iter);
+				ForgetBackgroundWorker(rw);
 
 				/* Report worker is gone now. */
 				if (notify_pid != 0)
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index 61ba54117a..e55e38af65 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -39,17 +39,17 @@ typedef struct RegisteredBgWorker
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
 	int			rw_shmem_slot;
 	bool		rw_terminate;
-	slist_node	rw_lnode;		/* list link */
+	dlist_node	rw_lnode;		/* list link */
 } RegisteredBgWorker;
 
-extern PGDLLIMPORT slist_head BackgroundWorkerList;
+extern PGDLLIMPORT dlist_head BackgroundWorkerList;
 
 extern Size BackgroundWorkerShmemSize(void);
 extern void BackgroundWorkerShmemInit(void);
 extern void BackgroundWorkerStateChange(bool allow_new_workers);
-extern void ForgetBackgroundWorker(slist_mutable_iter *cur);
-extern void ReportBackgroundWorkerPID(RegisteredBgWorker *);
-extern void ReportBackgroundWorkerExit(slist_mutable_iter *cur);
+extern void ForgetBackgroundWorker(RegisteredBgWorker *rw);
+extern void ReportBackgroundWorkerPID(RegisteredBgWorker *rw);
+extern void ReportBackgroundWorkerExit(RegisteredBgWorker *rw);
 extern void BackgroundWorkerStopNotifications(pid_t pid);
 extern void ForgetUnstartedBackgroundWorkers(void);
 extern void ResetBackgroundWorkerCrashTimes(void);
-- 
2.39.2

v1-0004-Refactor-code-to-handle-death-of-a-backend-or-bgw.patchtext/x-patch; charset=UTF-8; name=v1-0004-Refactor-code-to-handle-death-of-a-backend-or-bgw.patchDownload

From 8ba618344a13b166eb8f343d5a3c730a3eb3feb0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sat, 6 Jul 2024 21:49:00 +0300
Subject: [PATCH v1 4/4] Refactor code to handle death of a backend or bgworker
 in postmaster

Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList, to see if it the child process was a
background worker. If not found, then it scans through BackendList to
see if it was a regular backend. That leads to some duplication
between the bgworker and regular backend cleanup code, as both have an
entry in the BackendList that needs to be cleaned up in the same way.
Refactor that so that we scan just the BackendList to find the child
process, and if it was a background worker, do the additional
bgworker-specific cleanup in addition to the normal Backend cleanup.

Change HandleChildCrash so that it doesn't try to handle the cleanup
of the process that already exited, only the signaling of all the
other processes. When called for any of the aux processes, the caller
cleared the *PID global variable, so the code in HandleChildCrash() to
do that was unused.

On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's
now logged with that exit code, instead of 0. Also, if a bgworker
exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is
restarted. Previously it was treated as a normal exit.

If a child process is not found in the BackendList, the log message
now calls it "untracked child process" rather than "server process".
Arguably that should be a PANIC, because we do track all the child
processes in the list, so failing to find a child process is highly
unexpected. But if we want to change that, let's discuss and do that
as a separate commit.
---
 src/backend/postmaster/bgworker.c           |   4 -
 src/backend/postmaster/postmaster.c         | 448 +++++++-------------
 src/include/postmaster/bgworker_internals.h |   7 +-
 3 files changed, 166 insertions(+), 293 deletions(-)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 981d8177b0..b83967cda3 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -401,9 +401,7 @@ BackgroundWorkerStateChange(bool allow_new_workers)
 		}
 
 		/* Initialize postmaster bookkeeping. */
-		rw->rw_backend = NULL;
 		rw->rw_pid = 0;
-		rw->rw_child_slot = 0;
 		rw->rw_crashed_at = 0;
 		rw->rw_shmem_slot = slotno;
 		rw->rw_terminate = false;
@@ -1026,9 +1024,7 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 	}
 
 	rw->rw_worker = *worker;
-	rw->rw_backend = NULL;
 	rw->rw_pid = 0;
-	rw->rw_child_slot = 0;
 	rw->rw_crashed_at = 0;
 	rw->rw_terminate = false;
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index f376d3b77b..35c92bfc26 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -172,6 +172,7 @@ typedef struct bkend
 	int			child_slot;		/* PMChildSlot for this backend, if any */
 	int			bkend_type;		/* child process flavor, see above */
 	bool		dead_end;		/* is it going to send an error and quit? */
+	RegisteredBgWorker *rw;		/* bgworker info, if this is a bgworker */
 	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
 	dlist_node	elem;			/* list link in BackendList */
 } Backend;
@@ -401,8 +402,7 @@ static void process_pm_child_exit(void);
 static void process_pm_reload_request(void);
 static void process_pm_shutdown_request(void);
 static void dummy_handler(SIGNAL_ARGS);
-static void CleanupBackend(int pid, int exitstatus);
-static bool CleanupBackgroundWorker(int pid, int exitstatus);
+static void CleanupBackend(Backend *bp, int exitstatus);
 static void HandleChildCrash(int pid, int exitstatus, const char *procname);
 static void LogChildExit(int lev, const char *procname,
 						 int pid, int exitstatus);
@@ -2362,6 +2362,9 @@ process_pm_child_exit(void)
 
 	while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0)
 	{
+		bool		found;
+		dlist_mutable_iter iter;
+
 		/*
 		 * Check if this child was a startup process.
 		 */
@@ -2661,18 +2664,34 @@ process_pm_child_exit(void)
 			continue;
 		}
 
-		/* Was it one of our background workers? */
-		if (CleanupBackgroundWorker(pid, exitstatus))
+		/*
+		 * Was it a backend or background worker?
+		 */
+		found = false;
+		dlist_foreach_modify(iter, &BackendList)
 		{
-			/* have it be restarted */
-			HaveCrashedWorker = true;
-			continue;
+			Backend    *bp = dlist_container(Backend, elem, iter.cur);
+
+			if (bp->pid == pid)
+			{
+				dlist_delete(iter.cur);
+				CleanupBackend(bp, exitstatus);
+				found = true;
+				break;
+			}
 		}
 
 		/*
-		 * Else do standard backend child cleanup.
+		 * We don't know anything about this child process.  That's highly
+		 * unexpected, as we do track all the child processes that we fork.
 		 */
-		CleanupBackend(pid, exitstatus);
+		if (!found)
+		{
+			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+				HandleChildCrash(pid, exitstatus, _("untracked child process"));
+			else
+				LogChildExit(LOG, _("untracked child process"), pid, exitstatus);
+		}
 	}							/* loop over pending child-death reports */
 
 	/*
@@ -2683,116 +2702,31 @@ process_pm_child_exit(void)
 }
 
 /*
- * Scan the bgworkers list and see if the given PID (which has just stopped
- * or crashed) is in it.  Handle its shutdown if so, and return true.  If not a
- * bgworker, return false.
+ * CleanupBackend -- cleanup after terminated backend or background worker.
  *
- * This is heavily based on CleanupBackend.  One important difference is that
- * we don't know yet that the dying process is a bgworker, so we must be silent
- * until we're sure it is.
+ * Remove all local state associated with backend.
  */
-static bool
-CleanupBackgroundWorker(int pid,
-						int exitstatus) /* child's exit status */
+static void
+CleanupBackend(Backend *bp,
+			   int exitstatus)	/* child's exit status. */
 {
 	char		namebuf[MAXPGPATH];
-	dlist_mutable_iter iter;
+	char	   *procname;
+	bool		crashed = false;
 
-	dlist_foreach_modify(iter, &BackgroundWorkerList)
+	/* Construct a process name for log message */
+	if (bp->dead_end)
+	{
+		procname = _("dead end backend");
+	}
+	else if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
 	{
-		RegisteredBgWorker *rw;
-
-		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-
-		if (rw->rw_pid != pid)
-			continue;
-
-#ifdef WIN32
-		/* see CleanupBackend */
-		if (exitstatus == ERROR_WAIT_NO_CHILDREN)
-			exitstatus = 0;
-#endif
-
 		snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""),
-				 rw->rw_worker.bgw_type);
-
-
-		if (!EXIT_STATUS_0(exitstatus))
-		{
-			/* Record timestamp, so we know when to restart the worker. */
-			rw->rw_crashed_at = GetCurrentTimestamp();
-		}
-		else
-		{
-			/* Zero exit status means terminate */
-			rw->rw_crashed_at = 0;
-			rw->rw_terminate = true;
-		}
-
-		/*
-		 * Additionally, just like a backend, any exit status other than 0 or
-		 * 1 is considered a crash and causes a system-wide restart.
-		 */
-		if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
-		{
-			HandleChildCrash(pid, exitstatus, namebuf);
-			return true;
-		}
-
-		/*
-		 * We must release the postmaster child slot. If the worker failed to
-		 * do so, it did not clean up after itself, requiring a crash-restart
-		 * cycle.
-		 */
-		if (!ReleasePostmasterChildSlot(rw->rw_child_slot))
-		{
-			HandleChildCrash(pid, exitstatus, namebuf);
-			return true;
-		}
-
-		/* Get it out of the BackendList and clear out remaining data */
-		dlist_delete(&rw->rw_backend->elem);
-#ifdef EXEC_BACKEND
-		ShmemBackendArrayRemove(rw->rw_backend);
-#endif
-
-		/*
-		 * It's possible that this background worker started some OTHER
-		 * background worker and asked to be notified when that worker started
-		 * or stopped.  If so, cancel any notifications destined for the
-		 * now-dead backend.
-		 */
-		if (rw->rw_backend->bgworker_notify)
-			BackgroundWorkerStopNotifications(rw->rw_pid);
-		pfree(rw->rw_backend);
-		rw->rw_backend = NULL;
-		rw->rw_pid = 0;
-		rw->rw_child_slot = 0;
-		ReportBackgroundWorkerExit(rw); /* report child death */
-
-		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
-					 namebuf, pid, exitstatus);
-
-		return true;
+				 bp->rw->rw_worker.bgw_type);
+		procname = namebuf;
 	}
-
-	return false;
-}
-
-/*
- * CleanupBackend -- cleanup after terminated backend.
- *
- * Remove all local state associated with backend.
- *
- * If you change this, see also CleanupBackgroundWorker.
- */
-static void
-CleanupBackend(int pid,
-			   int exitstatus)	/* child's exit status. */
-{
-	dlist_mutable_iter iter;
-
-	LogChildExit(DEBUG2, _("server process"), pid, exitstatus);
+	else
+		procname = _("server process");
 
 	/*
 	 * If a backend dies in an ugly way then we must signal all other backends
@@ -2800,6 +2734,8 @@ CleanupBackend(int pid,
 	 * assume everything is all right and proceed to remove the backend from
 	 * the active backend list.
 	 */
+	if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+		crashed = true;
 
 #ifdef WIN32
 
@@ -2812,55 +2748,79 @@ CleanupBackend(int pid,
 	 */
 	if (exitstatus == ERROR_WAIT_NO_CHILDREN)
 	{
-		LogChildExit(LOG, _("server process"), pid, exitstatus);
-		exitstatus = 0;
+		LogChildExit(LOG, procname, bp->pid, exitstatus);
+		crashed = false;
 	}
 #endif
 
-	if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+	/*
+	 * If the process attached to shared memory, check that it detached
+	 * cleanly.
+	 */
+	if (!bp->dead_end)
+	{
+		if (!ReleasePostmasterChildSlot(bp->child_slot))
+		{
+			/*
+			 * Uh-oh, the child failed to clean itself up.  Treat as a crash
+			 * after all.
+			 */
+			crashed = true;
+		}
+#ifdef EXEC_BACKEND
+		ShmemBackendArrayRemove(bp);
+#endif
+	}
+
+	if (crashed)
 	{
-		HandleChildCrash(pid, exitstatus, _("server process"));
+		HandleChildCrash(bp->pid, exitstatus, namebuf);
+		pfree(bp);
 		return;
 	}
 
-	dlist_foreach_modify(iter, &BackendList)
+	/*
+	 * This backend may have been slated to receive SIGUSR1 when some
+	 * background worker started or stopped.  Cancel those notifications, as
+	 * we don't want to signal PIDs that are not PostgreSQL backends.  This
+	 * gets skipped in the (probably very common) case where the backend has
+	 * never requested any such notifications.
+	 */
+	if (bp->bgworker_notify)
+		BackgroundWorkerStopNotifications(bp->pid);
+
+	/*
+	 * If it was a background worker, also update its RegisteredWorker entry.
+	 */
+	if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
 	{
-		Backend    *bp = dlist_container(Backend, elem, iter.cur);
+		RegisteredBgWorker *rw = bp->rw;
 
-		if (bp->pid == pid)
+		if (!EXIT_STATUS_0(exitstatus))
 		{
-			if (!bp->dead_end)
-			{
-				if (!ReleasePostmasterChildSlot(bp->child_slot))
-				{
-					/*
-					 * Uh-oh, the child failed to clean itself up.  Treat as a
-					 * crash after all.
-					 */
-					HandleChildCrash(pid, exitstatus, _("server process"));
-					return;
-				}
-#ifdef EXEC_BACKEND
-				ShmemBackendArrayRemove(bp);
-#endif
-			}
-			if (bp->bgworker_notify)
-			{
-				/*
-				 * This backend may have been slated to receive SIGUSR1 when
-				 * some background worker started or stopped.  Cancel those
-				 * notifications, as we don't want to signal PIDs that are not
-				 * PostgreSQL backends.  This gets skipped in the (probably
-				 * very common) case where the backend has never requested any
-				 * such notifications.
-				 */
-				BackgroundWorkerStopNotifications(bp->pid);
-			}
-			dlist_delete(iter.cur);
-			pfree(bp);
-			break;
+			/* Record timestamp, so we know when to restart the worker. */
+			rw->rw_crashed_at = GetCurrentTimestamp();
+		}
+		else
+		{
+			/* Zero exit status means terminate */
+			rw->rw_crashed_at = 0;
+			rw->rw_terminate = true;
 		}
+
+		rw->rw_pid = 0;
+		ReportBackgroundWorkerExit(rw); /* report child death */
+
+		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
+					 procname, bp->pid, exitstatus);
+
+		/* have it be restarted */
+		HaveCrashedWorker = true;
 	}
+	else
+		LogChildExit(DEBUG2, procname, bp->pid, exitstatus);
+
+	pfree(bp);
 }
 
 /*
@@ -2869,13 +2829,14 @@ CleanupBackend(int pid,
  *
  * The objectives here are to clean up our local state about the child
  * process, and to signal all other remaining children to quickdie.
+ *
+ * If it's a backend, the caller has already removed it from the
+ * BackendList. If it's an aux process, the corresponding *PID global variable
+ * has been reset already.
  */
 static void
 HandleChildCrash(int pid, int exitstatus, const char *procname)
 {
-	dlist_iter	iter;
-	dlist_mutable_iter miter;
-	Backend    *bp;
 	bool		take_action;
 
 	/*
@@ -2895,145 +2856,64 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 		SetQuitSignalReason(PMQUIT_FOR_CRASH);
 	}
 
-	/* Process background workers. */
-	dlist_foreach(iter, &BackgroundWorkerList)
+	if (take_action)
 	{
-		RegisteredBgWorker *rw;
+		dlist_iter	iter;
 
-		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-		if (rw->rw_pid == 0)
-			continue;			/* not running */
-		if (rw->rw_pid == pid)
-		{
-			/*
-			 * Found entry for freshly-dead worker, so remove it.
-			 */
-			(void) ReleasePostmasterChildSlot(rw->rw_child_slot);
-			dlist_delete(&rw->rw_backend->elem);
-#ifdef EXEC_BACKEND
-			ShmemBackendArrayRemove(rw->rw_backend);
-#endif
-			pfree(rw->rw_backend);
-			rw->rw_backend = NULL;
-			rw->rw_pid = 0;
-			rw->rw_child_slot = 0;
-			/* don't reset crashed_at */
-			/* don't report child stop, either */
-			/* Keep looping so we can signal remaining workers */
-		}
-		else
+		dlist_foreach(iter, &BackendList)
 		{
-			/*
-			 * This worker is still alive.  Unless we did so already, tell it
-			 * to commit hara-kiri.
-			 */
-			if (take_action)
-				sigquit_child(rw->rw_pid);
-		}
-	}
-
-	/* Process regular backends */
-	dlist_foreach_modify(miter, &BackendList)
-	{
-		bp = dlist_container(Backend, elem, miter.cur);
+			Backend    *bp = dlist_container(Backend, elem, iter.cur);
 
-		if (bp->pid == pid)
-		{
-			/*
-			 * Found entry for freshly-dead backend, so remove it.
-			 */
-			if (!bp->dead_end)
-			{
-				(void) ReleasePostmasterChildSlot(bp->child_slot);
-#ifdef EXEC_BACKEND
-				ShmemBackendArrayRemove(bp);
-#endif
-			}
-			dlist_delete(miter.cur);
-			pfree(bp);
-			/* Keep looping so we can signal remaining backends */
-		}
-		else
-		{
 			/*
 			 * This backend is still alive.  Unless we did so already, tell it
 			 * to commit hara-kiri.
 			 *
 			 * We could exclude dead_end children here, but at least when
 			 * sending SIGABRT it seems better to include them.
-			 *
-			 * Background workers were already processed above; ignore them
-			 * here.
 			 */
-			if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
-				continue;
+			sigquit_child(bp->pid);
+		}
 
-			if (take_action)
-				sigquit_child(bp->pid);
+		if (StartupPID != 0)
+		{
+			sigquit_child(StartupPID);
+			StartupStatus = STARTUP_SIGNALED;
 		}
-	}
 
-	/* Take care of the startup process too */
-	if (pid == StartupPID)
-	{
-		StartupPID = 0;
-		/* Caller adjusts StartupStatus, so don't touch it here */
-	}
-	else if (StartupPID != 0 && take_action)
-	{
-		sigquit_child(StartupPID);
-		StartupStatus = STARTUP_SIGNALED;
-	}
+		/* Take care of the bgwriter too */
+		if (BgWriterPID != 0)
+			sigquit_child(BgWriterPID);
+
+		/* Take care of the checkpointer too */
+		if (CheckpointerPID != 0)
+			sigquit_child(CheckpointerPID);
+
+		/* Take care of the walwriter too */
+		if (WalWriterPID != 0)
+			sigquit_child(WalWriterPID);
+
+		/* Take care of the walreceiver too */
+		if (WalReceiverPID != 0)
+			sigquit_child(WalReceiverPID);
 
-	/* Take care of the bgwriter too */
-	if (pid == BgWriterPID)
-		BgWriterPID = 0;
-	else if (BgWriterPID != 0 && take_action)
-		sigquit_child(BgWriterPID);
-
-	/* Take care of the checkpointer too */
-	if (pid == CheckpointerPID)
-		CheckpointerPID = 0;
-	else if (CheckpointerPID != 0 && take_action)
-		sigquit_child(CheckpointerPID);
-
-	/* Take care of the walwriter too */
-	if (pid == WalWriterPID)
-		WalWriterPID = 0;
-	else if (WalWriterPID != 0 && take_action)
-		sigquit_child(WalWriterPID);
-
-	/* Take care of the walreceiver too */
-	if (pid == WalReceiverPID)
-		WalReceiverPID = 0;
-	else if (WalReceiverPID != 0 && take_action)
-		sigquit_child(WalReceiverPID);
-
-	/* Take care of the walsummarizer too */
-	if (pid == WalSummarizerPID)
-		WalSummarizerPID = 0;
-	else if (WalSummarizerPID != 0 && take_action)
-		sigquit_child(WalSummarizerPID);
-
-	/* Take care of the autovacuum launcher too */
-	if (pid == AutoVacPID)
-		AutoVacPID = 0;
-	else if (AutoVacPID != 0 && take_action)
-		sigquit_child(AutoVacPID);
-
-	/* Take care of the archiver too */
-	if (pid == PgArchPID)
-		PgArchPID = 0;
-	else if (PgArchPID != 0 && take_action)
-		sigquit_child(PgArchPID);
-
-	/* Take care of the slot sync worker too */
-	if (pid == SlotSyncWorkerPID)
-		SlotSyncWorkerPID = 0;
-	else if (SlotSyncWorkerPID != 0 && take_action)
-		sigquit_child(SlotSyncWorkerPID);
-
-	/* We do NOT restart the syslogger */
+		/* Take care of the walsummarizer too */
+		if (WalSummarizerPID != 0)
+			sigquit_child(WalSummarizerPID);
+
+		/* Take care of the autovacuum launcher too */
+		if (AutoVacPID != 0)
+			sigquit_child(AutoVacPID);
+
+		/* Take care of the archiver too */
+		if (PgArchPID != 0)
+			sigquit_child(PgArchPID);
+
+		/* Take care of the slot sync worker too */
+		if (SlotSyncWorkerPID != 0)
+			sigquit_child(SlotSyncWorkerPID);
+
+		/* We do NOT restart the syslogger */
+	}
 
 	if (Shutdown != ImmediateShutdown)
 		FatalError = true;
@@ -3578,6 +3458,7 @@ BackendStartup(ClientSocket *client_sock)
 	startup_data.canAcceptConnections = canAcceptConnections(BACKEND_TYPE_NORMAL);
 	bn->dead_end = (startup_data.canAcceptConnections != CAC_OK);
 	bn->cancel_key = MyCancelKey;
+	bn->rw = NULL;
 
 	/*
 	 * Unless it's a dead_end child, assign it a child slot number
@@ -3993,6 +3874,7 @@ StartAutovacuumWorker(void)
 			bn->dead_end = false;
 			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 			bn->bgworker_notify = false;
+			bn->rw = NULL;
 
 			bn->pid = StartChildProcess(B_AUTOVAC_WORKER);
 			if (bn->pid > 0)
@@ -4181,8 +4063,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 		rw->rw_crashed_at = GetCurrentTimestamp();
 		return false;
 	}
-	rw->rw_backend = bn;
-	rw->rw_child_slot = bn->child_slot;
+	bn->rw = rw;
 
 	ereport(DEBUG1,
 			(errmsg_internal("starting background worker process \"%s\"",
@@ -4195,10 +4076,9 @@ do_start_bgworker(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errmsg("could not fork background worker process: %m")));
 		/* undo what assign_backendlist_entry did */
-		ReleasePostmasterChildSlot(rw->rw_child_slot);
-		rw->rw_child_slot = 0;
-		pfree(rw->rw_backend);
-		rw->rw_backend = NULL;
+		ReleasePostmasterChildSlot(bn->child_slot);
+		pfree(bn);
+
 		/* mark entry as crashed, so we'll try again later */
 		rw->rw_crashed_at = GetCurrentTimestamp();
 		return false;
@@ -4206,12 +4086,12 @@ do_start_bgworker(RegisteredBgWorker *rw)
 
 	/* in postmaster, fork successful ... */
 	rw->rw_pid = worker_pid;
-	rw->rw_backend->pid = rw->rw_pid;
+	bn->pid = rw->rw_pid;
 	ReportBackgroundWorkerPID(rw);
 	/* add new worker to lists of backends */
-	dlist_push_head(&BackendList, &rw->rw_backend->elem);
+	dlist_push_head(&BackendList, &bn->elem);
 #ifdef EXEC_BACKEND
-	ShmemBackendArrayAdd(rw->rw_backend);
+	ShmemBackendArrayAdd(bn);
 #endif
 	return true;
 }
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index e55e38af65..309a91124b 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -26,16 +26,13 @@
 /*
  * List of background workers, private to postmaster.
  *
- * All workers that are currently running will have rw_backend set, and will
- * be present in BackendList.
+ * All workers that are currently running will also have an entry in
+ * BackendList.
  */
 typedef struct RegisteredBgWorker
 {
 	BackgroundWorker rw_worker; /* its registry entry */
-	struct bkend *rw_backend;	/* its BackendList entry, or NULL if not
-								 * running */
 	pid_t		rw_pid;			/* 0 if not running */
-	int			rw_child_slot;
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
 	int			rw_shmem_slot;
 	bool		rw_terminate;
-- 
2.39.2

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Heikki Linnakangas (#1)

4 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

On 06/07/2024 22:01, Heikki Linnakangas wrote:

Reading through postmaster code, I spotted some refactoring
opportunities to make it slightly more readable.

Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList to see if it was a bgworker process. If not
found, it scans through the BackendList to see if it was a backend
process (which it really should be then). That feels a bit silly,
because every running background worker process also has an entry in
BackendList. There's a lot of duplication between
CleanupBackgroundWorker and CleanupBackend.

Before commit 8a02b3d732, we used to created Backend entries only for
background worker processes that connected to a database, not for other
background worker processes. I think that's why we have the code
structure we have. But now that we have a Backend entry for all bgworker
processes, it's more natural to have single function to deal with both
regular backends and bgworkers.

So I came up with the attached patches. This doesn't make any meaningful
user-visible changes, except for some incidental changes in log messages
(see commit message for details).

New patch version attached. Fixed conflicts with recent commits, no
other changes.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v2-0001-Fix-outdated-comment-all-running-bgworkers-are-in.patchtext/x-patch; charset=UTF-8; name=v2-0001-Fix-outdated-comment-all-running-bgworkers-are-in.patchDownload

From 84ca49efab16cc2699f8446684a5ebe63dad1c38 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 29 Jul 2024 23:13:16 +0300
Subject: [PATCH v2 1/4] Fix outdated comment; all running bgworkers are in
 BackendList

Before commit 8a02b3d732, only bgworkers that connected to a database
had an entry in the Backendlist. Commit 8a02b3d732 changed that, but
forgot to update this comment.

Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
---
 src/include/postmaster/bgworker_internals.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index 9106a0ef3f..61ba54117a 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -26,14 +26,14 @@
 /*
  * List of background workers, private to postmaster.
  *
- * A worker that requests a database connection during registration will have
- * rw_backend set, and will be present in BackendList.  Note: do not rely on
- * rw_backend being non-NULL for shmem-connected workers!
+ * All workers that are currently running will have rw_backend set, and will
+ * be present in BackendList.
  */
 typedef struct RegisteredBgWorker
 {
 	BackgroundWorker rw_worker; /* its registry entry */
-	struct bkend *rw_backend;	/* its BackendList entry, or NULL */
+	struct bkend *rw_backend;	/* its BackendList entry, or NULL if not
+								 * running */
 	pid_t		rw_pid;			/* 0 if not running */
 	int			rw_child_slot;
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
-- 
2.39.2

v2-0002-Minor-refactoring-of-assign_backendlist_entry.patchtext/x-patch; charset=UTF-8; name=v2-0002-Minor-refactoring-of-assign_backendlist_entry.patchDownload

From e541a6d2c8481cd9f9c75fb2328e8e6031cddac6 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 29 Jul 2024 23:13:56 +0300
Subject: [PATCH v2 2/4] Minor refactoring of assign_backendlist_entry()

Make assign_backendlist_entry() responsible just for allocating the
Backend struct. Linking it to the RegisteredBgWorker is the caller's
responsibility now. Seems more clear that way.

Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
---
 src/backend/postmaster/postmaster.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 02442a4b85..a3e9e8fdc0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -416,7 +416,7 @@ static void TerminateChildren(int signal);
 #define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
 
 static int	CountChildren(int target);
-static bool assign_backendlist_entry(RegisteredBgWorker *rw);
+static Backend *assign_backendlist_entry(void);
 static void maybe_start_bgworkers(void);
 static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
 static pid_t StartChildProcess(BackendType type);
@@ -4028,6 +4028,7 @@ MaxLivePostmasterChildren(void)
 static bool
 do_start_bgworker(RegisteredBgWorker *rw)
 {
+	Backend    *bn;
 	pid_t		worker_pid;
 
 	Assert(rw->rw_pid == 0);
@@ -4042,11 +4043,14 @@ do_start_bgworker(RegisteredBgWorker *rw)
 	 * tried again right away, most likely we'd find ourselves hitting the
 	 * same resource-exhaustion condition.
 	 */
-	if (!assign_backendlist_entry(rw))
+	bn = assign_backendlist_entry();
+	if (bn == NULL)
 	{
 		rw->rw_crashed_at = GetCurrentTimestamp();
 		return false;
 	}
+	rw->rw_backend = bn;
+	rw->rw_child_slot = bn->child_slot;
 
 	ereport(DEBUG1,
 			(errmsg_internal("starting background worker process \"%s\"",
@@ -4119,12 +4123,10 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
  * Allocate the Backend struct for a connected background worker, but don't
  * add it to the list of backends just yet.
  *
- * On failure, return false without changing any worker state.
- *
- * Some info from the Backend is copied into the passed rw.
+ * On failure, return NULL.
  */
-static bool
-assign_backendlist_entry(RegisteredBgWorker *rw)
+static Backend *
+assign_backendlist_entry(void)
 {
 	Backend    *bn;
 
@@ -4138,7 +4140,7 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
 				 errmsg("no slot available for new background worker process")));
-		return false;
+		return NULL;
 	}
 
 	bn = palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
@@ -4147,7 +4149,7 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errcode(ERRCODE_OUT_OF_MEMORY),
 				 errmsg("out of memory")));
-		return false;
+		return NULL;
 	}
 
 	bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
@@ -4155,10 +4157,7 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
 	bn->dead_end = false;
 	bn->bgworker_notify = false;
 
-	rw->rw_backend = bn;
-	rw->rw_child_slot = bn->child_slot;
-
-	return true;
+	return bn;
 }
 
 /*
-- 
2.39.2

v2-0003-Make-BackgroundWorkerList-doubly-linked.patchtext/x-patch; charset=UTF-8; name=v2-0003-Make-BackgroundWorkerList-doubly-linked.patchDownload

From c8b9c9a30121342ca15265d1d0c6ea533282f106 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 29 Jul 2024 23:14:00 +0300
Subject: [PATCH v2 3/4] Make BackgroundWorkerList doubly-linked

This allows ForgetBackgroundWorker() and ReportBackgroundWorkerExit()
to take a RegisteredBgWorker pointer as argument, rather than a list
iterator. That feels a little more natural. But more importantly, this
paves the way for more refactoring in the next commit.

Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
---
 src/backend/postmaster/bgworker.c           | 62 ++++++++++-----------
 src/backend/postmaster/postmaster.c         | 40 ++++++-------
 src/include/postmaster/bgworker_internals.h | 10 ++--
 3 files changed, 54 insertions(+), 58 deletions(-)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 77707bb384..981d8177b0 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -37,7 +37,7 @@
 /*
  * The postmaster's list of registered background workers, in private memory.
  */
-slist_head	BackgroundWorkerList = SLIST_STATIC_INIT(BackgroundWorkerList);
+dlist_head	BackgroundWorkerList = DLIST_STATIC_INIT(BackgroundWorkerList);
 
 /*
  * BackgroundWorkerSlots exist in shared memory and can be accessed (via
@@ -168,7 +168,7 @@ BackgroundWorkerShmemInit(void)
 										   &found);
 	if (!IsUnderPostmaster)
 	{
-		slist_iter	siter;
+		dlist_iter	iter;
 		int			slotno = 0;
 
 		BackgroundWorkerData->total_slots = max_worker_processes;
@@ -181,12 +181,12 @@ BackgroundWorkerShmemInit(void)
 		 * correspondence between the postmaster's private list and the array
 		 * in shared memory.
 		 */
-		slist_foreach(siter, &BackgroundWorkerList)
+		dlist_foreach(iter, &BackgroundWorkerList)
 		{
 			BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
 			RegisteredBgWorker *rw;
 
-			rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+			rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 			Assert(slotno < max_worker_processes);
 			slot->in_use = true;
 			slot->terminate = false;
@@ -220,13 +220,13 @@ BackgroundWorkerShmemInit(void)
 static RegisteredBgWorker *
 FindRegisteredWorkerBySlotNumber(int slotno)
 {
-	slist_iter	siter;
+	dlist_iter	iter;
 
-	slist_foreach(siter, &BackgroundWorkerList)
+	dlist_foreach(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		if (rw->rw_shmem_slot == slotno)
 			return rw;
 	}
@@ -413,29 +413,25 @@ BackgroundWorkerStateChange(bool allow_new_workers)
 				(errmsg_internal("registering background worker \"%s\"",
 								 rw->rw_worker.bgw_name)));
 
-		slist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
+		dlist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
 	}
 }
 
 /*
  * Forget about a background worker that's no longer needed.
  *
- * The worker must be identified by passing an slist_mutable_iter that
- * points to it.  This convention allows deletion of workers during
- * searches of the worker list, and saves having to search the list again.
+ * NOTE: The entry is unlinked from BackgroundWorkerList.  If the caller is
+ * iterating through it, better use a mutable iterator!
  *
  * Caller is responsible for notifying bgw_notify_pid, if appropriate.
  *
  * This function must be invoked only in the postmaster.
  */
 void
-ForgetBackgroundWorker(slist_mutable_iter *cur)
+ForgetBackgroundWorker(RegisteredBgWorker *rw)
 {
-	RegisteredBgWorker *rw;
 	BackgroundWorkerSlot *slot;
 
-	rw = slist_container(RegisteredBgWorker, rw_lnode, cur->cur);
-
 	Assert(rw->rw_shmem_slot < max_worker_processes);
 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 	Assert(slot->in_use);
@@ -454,7 +450,7 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
 			(errmsg_internal("unregistering background worker \"%s\"",
 							 rw->rw_worker.bgw_name)));
 
-	slist_delete_current(cur);
+	dlist_delete(&rw->rw_lnode);
 	pfree(rw);
 }
 
@@ -480,17 +476,17 @@ ReportBackgroundWorkerPID(RegisteredBgWorker *rw)
  * Report that the PID of a background worker is now zero because a
  * previously-running background worker has exited.
  *
+ * NOTE: The entry may be unlinked from BackgroundWorkerList.  If the caller
+ * is iterating through it, better use a mutable iterator!
+ *
  * This function should only be called from the postmaster.
  */
 void
-ReportBackgroundWorkerExit(slist_mutable_iter *cur)
+ReportBackgroundWorkerExit(RegisteredBgWorker *rw)
 {
-	RegisteredBgWorker *rw;
 	BackgroundWorkerSlot *slot;
 	int			notify_pid;
 
-	rw = slist_container(RegisteredBgWorker, rw_lnode, cur->cur);
-
 	Assert(rw->rw_shmem_slot < max_worker_processes);
 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 	slot->pid = rw->rw_pid;
@@ -505,7 +501,7 @@ ReportBackgroundWorkerExit(slist_mutable_iter *cur)
 	 */
 	if (rw->rw_terminate ||
 		rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
-		ForgetBackgroundWorker(cur);
+		ForgetBackgroundWorker(rw);
 
 	if (notify_pid != 0)
 		kill(notify_pid, SIGUSR1);
@@ -519,13 +515,13 @@ ReportBackgroundWorkerExit(slist_mutable_iter *cur)
 void
 BackgroundWorkerStopNotifications(pid_t pid)
 {
-	slist_iter	siter;
+	dlist_iter	iter;
 
-	slist_foreach(siter, &BackgroundWorkerList)
+	dlist_foreach(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		if (rw->rw_worker.bgw_notify_pid == pid)
 			rw->rw_worker.bgw_notify_pid = 0;
 	}
@@ -546,14 +542,14 @@ BackgroundWorkerStopNotifications(pid_t pid)
 void
 ForgetUnstartedBackgroundWorkers(void)
 {
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 		BackgroundWorkerSlot *slot;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		Assert(rw->rw_shmem_slot < max_worker_processes);
 		slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 
@@ -564,7 +560,7 @@ ForgetUnstartedBackgroundWorkers(void)
 			/* ... then zap it, and notify the waiter */
 			int			notify_pid = rw->rw_worker.bgw_notify_pid;
 
-			ForgetBackgroundWorker(&iter);
+			ForgetBackgroundWorker(rw);
 			if (notify_pid != 0)
 				kill(notify_pid, SIGUSR1);
 		}
@@ -584,13 +580,13 @@ ForgetUnstartedBackgroundWorkers(void)
 void
 ResetBackgroundWorkerCrashTimes(void)
 {
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 		if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
 		{
@@ -601,7 +597,7 @@ ResetBackgroundWorkerCrashTimes(void)
 			 * parallel_terminate_count will get incremented after we've
 			 * already zeroed parallel_register_count, which would be bad.)
 			 */
-			ForgetBackgroundWorker(&iter);
+			ForgetBackgroundWorker(rw);
 		}
 		else
 		{
@@ -1036,7 +1032,7 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 	rw->rw_crashed_at = 0;
 	rw->rw_terminate = false;
 
-	slist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
+	dlist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
 }
 
 /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a3e9e8fdc0..fc00e39c44 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1531,7 +1531,7 @@ DetermineSleepTime(void)
 
 	if (HaveCrashedWorker)
 	{
-		slist_mutable_iter siter;
+		dlist_mutable_iter iter;
 
 		/*
 		 * When there are crashed bgworkers, we sleep just long enough that
@@ -1539,12 +1539,12 @@ DetermineSleepTime(void)
 		 * determine the minimum of all wakeup times according to most recent
 		 * crash time and requested restart interval.
 		 */
-		slist_foreach_modify(siter, &BackgroundWorkerList)
+		dlist_foreach_modify(iter, &BackgroundWorkerList)
 		{
 			RegisteredBgWorker *rw;
 			TimestampTz this_wakeup;
 
-			rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+			rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 			if (rw->rw_crashed_at == 0)
 				continue;
@@ -1552,7 +1552,7 @@ DetermineSleepTime(void)
 			if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART
 				|| rw->rw_terminate)
 			{
-				ForgetBackgroundWorker(&siter);
+				ForgetBackgroundWorker(rw);
 				continue;
 			}
 
@@ -2625,13 +2625,13 @@ CleanupBackgroundWorker(int pid,
 						int exitstatus) /* child's exit status */
 {
 	char		namebuf[MAXPGPATH];
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 		if (rw->rw_pid != pid)
 			continue;
@@ -2694,7 +2694,7 @@ CleanupBackgroundWorker(int pid,
 		rw->rw_backend = NULL;
 		rw->rw_pid = 0;
 		rw->rw_child_slot = 0;
-		ReportBackgroundWorkerExit(&iter);	/* report child death */
+		ReportBackgroundWorkerExit(rw); /* report child death */
 
 		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
 					 namebuf, pid, exitstatus);
@@ -2796,8 +2796,8 @@ CleanupBackend(int pid,
 static void
 HandleChildCrash(int pid, int exitstatus, const char *procname)
 {
-	dlist_mutable_iter iter;
-	slist_iter	siter;
+	dlist_iter	iter;
+	dlist_mutable_iter miter;
 	Backend    *bp;
 	bool		take_action;
 
@@ -2819,11 +2819,11 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	}
 
 	/* Process background workers. */
-	slist_foreach(siter, &BackgroundWorkerList)
+	dlist_foreach(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		if (rw->rw_pid == 0)
 			continue;			/* not running */
 		if (rw->rw_pid == pid)
@@ -2853,9 +2853,9 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	}
 
 	/* Process regular backends */
-	dlist_foreach_modify(iter, &BackendList)
+	dlist_foreach_modify(miter, &BackendList)
 	{
-		bp = dlist_container(Backend, elem, iter.cur);
+		bp = dlist_container(Backend, elem, miter.cur);
 
 		if (bp->pid == pid)
 		{
@@ -2866,7 +2866,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 			{
 				(void) ReleasePostmasterChildSlot(bp->child_slot);
 			}
-			dlist_delete(iter.cur);
+			dlist_delete(miter.cur);
 			pfree(bp);
 			/* Keep looping so we can signal remaining backends */
 		}
@@ -4177,7 +4177,7 @@ maybe_start_bgworkers(void)
 #define MAX_BGWORKERS_TO_LAUNCH 100
 	int			num_launched = 0;
 	TimestampTz now = 0;
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
 	/*
 	 * During crash recovery, we have no need to be called until the state
@@ -4194,11 +4194,11 @@ maybe_start_bgworkers(void)
 	StartWorkerNeeded = false;
 	HaveCrashedWorker = false;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 		/* ignore if already running */
 		if (rw->rw_pid != 0)
@@ -4207,7 +4207,7 @@ maybe_start_bgworkers(void)
 		/* if marked for death, clean up and remove from list */
 		if (rw->rw_terminate)
 		{
-			ForgetBackgroundWorker(&iter);
+			ForgetBackgroundWorker(rw);
 			continue;
 		}
 
@@ -4226,7 +4226,7 @@ maybe_start_bgworkers(void)
 
 				notify_pid = rw->rw_worker.bgw_notify_pid;
 
-				ForgetBackgroundWorker(&iter);
+				ForgetBackgroundWorker(rw);
 
 				/* Report worker is gone now. */
 				if (notify_pid != 0)
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index 61ba54117a..e55e38af65 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -39,17 +39,17 @@ typedef struct RegisteredBgWorker
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
 	int			rw_shmem_slot;
 	bool		rw_terminate;
-	slist_node	rw_lnode;		/* list link */
+	dlist_node	rw_lnode;		/* list link */
 } RegisteredBgWorker;
 
-extern PGDLLIMPORT slist_head BackgroundWorkerList;
+extern PGDLLIMPORT dlist_head BackgroundWorkerList;
 
 extern Size BackgroundWorkerShmemSize(void);
 extern void BackgroundWorkerShmemInit(void);
 extern void BackgroundWorkerStateChange(bool allow_new_workers);
-extern void ForgetBackgroundWorker(slist_mutable_iter *cur);
-extern void ReportBackgroundWorkerPID(RegisteredBgWorker *);
-extern void ReportBackgroundWorkerExit(slist_mutable_iter *cur);
+extern void ForgetBackgroundWorker(RegisteredBgWorker *rw);
+extern void ReportBackgroundWorkerPID(RegisteredBgWorker *rw);
+extern void ReportBackgroundWorkerExit(RegisteredBgWorker *rw);
 extern void BackgroundWorkerStopNotifications(pid_t pid);
 extern void ForgetUnstartedBackgroundWorkers(void);
 extern void ResetBackgroundWorkerCrashTimes(void);
-- 
2.39.2

v2-0004-Refactor-code-to-handle-death-of-a-backend-or-bgw.patchtext/x-patch; charset=UTF-8; name=v2-0004-Refactor-code-to-handle-death-of-a-backend-or-bgw.patchDownload

From 8ad8a09e65ce52efa34de638a7a4a0161945574b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 29 Jul 2024 23:14:04 +0300
Subject: [PATCH v2 4/4] Refactor code to handle death of a backend or bgworker
 in postmaster

Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList, to see if it the child process was a
background worker. If not found, then it scans through BackendList to
see if it was a regular backend. That leads to some duplication
between the bgworker and regular backend cleanup code, as both have an
entry in the BackendList that needs to be cleaned up in the same way.
Refactor that so that we scan just the BackendList to find the child
process, and if it was a background worker, do the additional
bgworker-specific cleanup in addition to the normal Backend cleanup.

Change HandleChildCrash so that it doesn't try to handle the cleanup
of the process that already exited, only the signaling of all the
other processes. When called for any of the aux processes, the caller
cleared the *PID global variable, so the code in HandleChildCrash() to
do that was unused.

On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's
now logged with that exit code, instead of 0. Also, if a bgworker
exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is
restarted. Previously it was treated as a normal exit.

If a child process is not found in the BackendList, the log message
now calls it "untracked child process" rather than "server process".
Arguably that should be a PANIC, because we do track all the child
processes in the list, so failing to find a child process is highly
unexpected. But if we want to change that, let's discuss and do that
as a separate commit.

Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
---
 src/backend/postmaster/bgworker.c           |   4 -
 src/backend/postmaster/postmaster.c         | 434 ++++++++------------
 src/include/postmaster/bgworker_internals.h |   7 +-
 3 files changed, 165 insertions(+), 280 deletions(-)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 981d8177b0..b83967cda3 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -401,9 +401,7 @@ BackgroundWorkerStateChange(bool allow_new_workers)
 		}
 
 		/* Initialize postmaster bookkeeping. */
-		rw->rw_backend = NULL;
 		rw->rw_pid = 0;
-		rw->rw_child_slot = 0;
 		rw->rw_crashed_at = 0;
 		rw->rw_shmem_slot = slotno;
 		rw->rw_terminate = false;
@@ -1026,9 +1024,7 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 	}
 
 	rw->rw_worker = *worker;
-	rw->rw_backend = NULL;
 	rw->rw_pid = 0;
-	rw->rw_child_slot = 0;
 	rw->rw_crashed_at = 0;
 	rw->rw_terminate = false;
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index fc00e39c44..2c23a402b0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -171,6 +171,7 @@ typedef struct bkend
 	int			child_slot;		/* PMChildSlot for this backend, if any */
 	int			bkend_type;		/* child process flavor, see above */
 	bool		dead_end;		/* is it going to send an error and quit? */
+	RegisteredBgWorker *rw;		/* bgworker info, if this is a bgworker */
 	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
 	dlist_node	elem;			/* list link in BackendList */
 } Backend;
@@ -396,8 +397,7 @@ static void process_pm_child_exit(void);
 static void process_pm_reload_request(void);
 static void process_pm_shutdown_request(void);
 static void dummy_handler(SIGNAL_ARGS);
-static void CleanupBackend(int pid, int exitstatus);
-static bool CleanupBackgroundWorker(int pid, int exitstatus);
+static void CleanupBackend(Backend *bp, int exitstatus);
 static void HandleChildCrash(int pid, int exitstatus, const char *procname);
 static void LogChildExit(int lev, const char *procname,
 						 int pid, int exitstatus);
@@ -2291,6 +2291,9 @@ process_pm_child_exit(void)
 
 	while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0)
 	{
+		bool		found;
+		dlist_mutable_iter iter;
+
 		/*
 		 * Check if this child was a startup process.
 		 */
@@ -2590,18 +2593,34 @@ process_pm_child_exit(void)
 			continue;
 		}
 
-		/* Was it one of our background workers? */
-		if (CleanupBackgroundWorker(pid, exitstatus))
+		/*
+		 * Was it a backend or background worker?
+		 */
+		found = false;
+		dlist_foreach_modify(iter, &BackendList)
 		{
-			/* have it be restarted */
-			HaveCrashedWorker = true;
-			continue;
+			Backend    *bp = dlist_container(Backend, elem, iter.cur);
+
+			if (bp->pid == pid)
+			{
+				dlist_delete(iter.cur);
+				CleanupBackend(bp, exitstatus);
+				found = true;
+				break;
+			}
 		}
 
 		/*
-		 * Else do standard backend child cleanup.
+		 * We don't know anything about this child process.  That's highly
+		 * unexpected, as we do track all the child processes that we fork.
 		 */
-		CleanupBackend(pid, exitstatus);
+		if (!found)
+		{
+			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+				HandleChildCrash(pid, exitstatus, _("untracked child process"));
+			else
+				LogChildExit(LOG, _("untracked child process"), pid, exitstatus);
+		}
 	}							/* loop over pending child-death reports */
 
 	/*
@@ -2612,113 +2631,31 @@ process_pm_child_exit(void)
 }
 
 /*
- * Scan the bgworkers list and see if the given PID (which has just stopped
- * or crashed) is in it.  Handle its shutdown if so, and return true.  If not a
- * bgworker, return false.
+ * CleanupBackend -- cleanup after terminated backend or background worker.
  *
- * This is heavily based on CleanupBackend.  One important difference is that
- * we don't know yet that the dying process is a bgworker, so we must be silent
- * until we're sure it is.
+ * Remove all local state associated with backend.
  */
-static bool
-CleanupBackgroundWorker(int pid,
-						int exitstatus) /* child's exit status */
+static void
+CleanupBackend(Backend *bp,
+			   int exitstatus)	/* child's exit status. */
 {
 	char		namebuf[MAXPGPATH];
-	dlist_mutable_iter iter;
+	char	   *procname;
+	bool		crashed = false;
 
-	dlist_foreach_modify(iter, &BackgroundWorkerList)
+	/* Construct a process name for log message */
+	if (bp->dead_end)
+	{
+		procname = _("dead end backend");
+	}
+	else if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
 	{
-		RegisteredBgWorker *rw;
-
-		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-
-		if (rw->rw_pid != pid)
-			continue;
-
-#ifdef WIN32
-		/* see CleanupBackend */
-		if (exitstatus == ERROR_WAIT_NO_CHILDREN)
-			exitstatus = 0;
-#endif
-
 		snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""),
-				 rw->rw_worker.bgw_type);
-
-
-		if (!EXIT_STATUS_0(exitstatus))
-		{
-			/* Record timestamp, so we know when to restart the worker. */
-			rw->rw_crashed_at = GetCurrentTimestamp();
-		}
-		else
-		{
-			/* Zero exit status means terminate */
-			rw->rw_crashed_at = 0;
-			rw->rw_terminate = true;
-		}
-
-		/*
-		 * Additionally, just like a backend, any exit status other than 0 or
-		 * 1 is considered a crash and causes a system-wide restart.
-		 */
-		if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
-		{
-			HandleChildCrash(pid, exitstatus, namebuf);
-			return true;
-		}
-
-		/*
-		 * We must release the postmaster child slot. If the worker failed to
-		 * do so, it did not clean up after itself, requiring a crash-restart
-		 * cycle.
-		 */
-		if (!ReleasePostmasterChildSlot(rw->rw_child_slot))
-		{
-			HandleChildCrash(pid, exitstatus, namebuf);
-			return true;
-		}
-
-		/* Get it out of the BackendList and clear out remaining data */
-		dlist_delete(&rw->rw_backend->elem);
-
-		/*
-		 * It's possible that this background worker started some OTHER
-		 * background worker and asked to be notified when that worker started
-		 * or stopped.  If so, cancel any notifications destined for the
-		 * now-dead backend.
-		 */
-		if (rw->rw_backend->bgworker_notify)
-			BackgroundWorkerStopNotifications(rw->rw_pid);
-		pfree(rw->rw_backend);
-		rw->rw_backend = NULL;
-		rw->rw_pid = 0;
-		rw->rw_child_slot = 0;
-		ReportBackgroundWorkerExit(rw); /* report child death */
-
-		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
-					 namebuf, pid, exitstatus);
-
-		return true;
+				 bp->rw->rw_worker.bgw_type);
+		procname = namebuf;
 	}
-
-	return false;
-}
-
-/*
- * CleanupBackend -- cleanup after terminated backend.
- *
- * Remove all local state associated with backend.
- *
- * If you change this, see also CleanupBackgroundWorker.
- */
-static void
-CleanupBackend(int pid,
-			   int exitstatus)	/* child's exit status. */
-{
-	dlist_mutable_iter iter;
-
-	LogChildExit(DEBUG2, _("server process"), pid, exitstatus);
+	else
+		procname = _("server process");
 
 	/*
 	 * If a backend dies in an ugly way then we must signal all other backends
@@ -2726,6 +2663,8 @@ CleanupBackend(int pid,
 	 * assume everything is all right and proceed to remove the backend from
 	 * the active backend list.
 	 */
+	if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+		crashed = true;
 
 #ifdef WIN32
 
@@ -2738,52 +2677,79 @@ CleanupBackend(int pid,
 	 */
 	if (exitstatus == ERROR_WAIT_NO_CHILDREN)
 	{
-		LogChildExit(LOG, _("server process"), pid, exitstatus);
-		exitstatus = 0;
+		LogChildExit(LOG, procname, bp->pid, exitstatus);
+		crashed = false;
 	}
 #endif
 
-	if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+	/*
+	 * If the process attached to shared memory, check that it detached
+	 * cleanly.
+	 */
+	if (!bp->dead_end)
+	{
+		if (!ReleasePostmasterChildSlot(bp->child_slot))
+		{
+			/*
+			 * Uh-oh, the child failed to clean itself up.  Treat as a crash
+			 * after all.
+			 */
+			crashed = true;
+		}
+#ifdef EXEC_BACKEND
+		ShmemBackendArrayRemove(bp);
+#endif
+	}
+
+	if (crashed)
 	{
-		HandleChildCrash(pid, exitstatus, _("server process"));
+		HandleChildCrash(bp->pid, exitstatus, namebuf);
+		pfree(bp);
 		return;
 	}
 
-	dlist_foreach_modify(iter, &BackendList)
+	/*
+	 * This backend may have been slated to receive SIGUSR1 when some
+	 * background worker started or stopped.  Cancel those notifications, as
+	 * we don't want to signal PIDs that are not PostgreSQL backends.  This
+	 * gets skipped in the (probably very common) case where the backend has
+	 * never requested any such notifications.
+	 */
+	if (bp->bgworker_notify)
+		BackgroundWorkerStopNotifications(bp->pid);
+
+	/*
+	 * If it was a background worker, also update its RegisteredWorker entry.
+	 */
+	if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
 	{
-		Backend    *bp = dlist_container(Backend, elem, iter.cur);
+		RegisteredBgWorker *rw = bp->rw;
 
-		if (bp->pid == pid)
+		if (!EXIT_STATUS_0(exitstatus))
 		{
-			if (!bp->dead_end)
-			{
-				if (!ReleasePostmasterChildSlot(bp->child_slot))
-				{
-					/*
-					 * Uh-oh, the child failed to clean itself up.  Treat as a
-					 * crash after all.
-					 */
-					HandleChildCrash(pid, exitstatus, _("server process"));
-					return;
-				}
-			}
-			if (bp->bgworker_notify)
-			{
-				/*
-				 * This backend may have been slated to receive SIGUSR1 when
-				 * some background worker started or stopped.  Cancel those
-				 * notifications, as we don't want to signal PIDs that are not
-				 * PostgreSQL backends.  This gets skipped in the (probably
-				 * very common) case where the backend has never requested any
-				 * such notifications.
-				 */
-				BackgroundWorkerStopNotifications(bp->pid);
-			}
-			dlist_delete(iter.cur);
-			pfree(bp);
-			break;
+			/* Record timestamp, so we know when to restart the worker. */
+			rw->rw_crashed_at = GetCurrentTimestamp();
+		}
+		else
+		{
+			/* Zero exit status means terminate */
+			rw->rw_crashed_at = 0;
+			rw->rw_terminate = true;
 		}
+
+		rw->rw_pid = 0;
+		ReportBackgroundWorkerExit(rw); /* report child death */
+
+		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
+					 procname, bp->pid, exitstatus);
+
+		/* have it be restarted */
+		HaveCrashedWorker = true;
 	}
+	else
+		LogChildExit(DEBUG2, procname, bp->pid, exitstatus);
+
+	pfree(bp);
 }
 
 /*
@@ -2792,13 +2758,14 @@ CleanupBackend(int pid,
  *
  * The objectives here are to clean up our local state about the child
  * process, and to signal all other remaining children to quickdie.
+ *
+ * If it's a backend, the caller has already removed it from the
+ * BackendList. If it's an aux process, the corresponding *PID global variable
+ * has been reset already.
  */
 static void
 HandleChildCrash(int pid, int exitstatus, const char *procname)
 {
-	dlist_iter	iter;
-	dlist_mutable_iter miter;
-	Backend    *bp;
 	bool		take_action;
 
 	/*
@@ -2818,139 +2785,64 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 		SetQuitSignalReason(PMQUIT_FOR_CRASH);
 	}
 
-	/* Process background workers. */
-	dlist_foreach(iter, &BackgroundWorkerList)
+	if (take_action)
 	{
-		RegisteredBgWorker *rw;
+		dlist_iter	iter;
 
-		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-		if (rw->rw_pid == 0)
-			continue;			/* not running */
-		if (rw->rw_pid == pid)
-		{
-			/*
-			 * Found entry for freshly-dead worker, so remove it.
-			 */
-			(void) ReleasePostmasterChildSlot(rw->rw_child_slot);
-			dlist_delete(&rw->rw_backend->elem);
-			pfree(rw->rw_backend);
-			rw->rw_backend = NULL;
-			rw->rw_pid = 0;
-			rw->rw_child_slot = 0;
-			/* don't reset crashed_at */
-			/* don't report child stop, either */
-			/* Keep looping so we can signal remaining workers */
-		}
-		else
+		dlist_foreach(iter, &BackendList)
 		{
-			/*
-			 * This worker is still alive.  Unless we did so already, tell it
-			 * to commit hara-kiri.
-			 */
-			if (take_action)
-				sigquit_child(rw->rw_pid);
-		}
-	}
-
-	/* Process regular backends */
-	dlist_foreach_modify(miter, &BackendList)
-	{
-		bp = dlist_container(Backend, elem, miter.cur);
+			Backend    *bp = dlist_container(Backend, elem, iter.cur);
 
-		if (bp->pid == pid)
-		{
-			/*
-			 * Found entry for freshly-dead backend, so remove it.
-			 */
-			if (!bp->dead_end)
-			{
-				(void) ReleasePostmasterChildSlot(bp->child_slot);
-			}
-			dlist_delete(miter.cur);
-			pfree(bp);
-			/* Keep looping so we can signal remaining backends */
-		}
-		else
-		{
 			/*
 			 * This backend is still alive.  Unless we did so already, tell it
 			 * to commit hara-kiri.
 			 *
 			 * We could exclude dead_end children here, but at least when
 			 * sending SIGABRT it seems better to include them.
-			 *
-			 * Background workers were already processed above; ignore them
-			 * here.
 			 */
-			if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
-				continue;
+			sigquit_child(bp->pid);
+		}
 
-			if (take_action)
-				sigquit_child(bp->pid);
+		if (StartupPID != 0)
+		{
+			sigquit_child(StartupPID);
+			StartupStatus = STARTUP_SIGNALED;
 		}
-	}
 
-	/* Take care of the startup process too */
-	if (pid == StartupPID)
-	{
-		StartupPID = 0;
-		/* Caller adjusts StartupStatus, so don't touch it here */
-	}
-	else if (StartupPID != 0 && take_action)
-	{
-		sigquit_child(StartupPID);
-		StartupStatus = STARTUP_SIGNALED;
-	}
+		/* Take care of the bgwriter too */
+		if (BgWriterPID != 0)
+			sigquit_child(BgWriterPID);
+
+		/* Take care of the checkpointer too */
+		if (CheckpointerPID != 0)
+			sigquit_child(CheckpointerPID);
+
+		/* Take care of the walwriter too */
+		if (WalWriterPID != 0)
+			sigquit_child(WalWriterPID);
 
-	/* Take care of the bgwriter too */
-	if (pid == BgWriterPID)
-		BgWriterPID = 0;
-	else if (BgWriterPID != 0 && take_action)
-		sigquit_child(BgWriterPID);
-
-	/* Take care of the checkpointer too */
-	if (pid == CheckpointerPID)
-		CheckpointerPID = 0;
-	else if (CheckpointerPID != 0 && take_action)
-		sigquit_child(CheckpointerPID);
-
-	/* Take care of the walwriter too */
-	if (pid == WalWriterPID)
-		WalWriterPID = 0;
-	else if (WalWriterPID != 0 && take_action)
-		sigquit_child(WalWriterPID);
-
-	/* Take care of the walreceiver too */
-	if (pid == WalReceiverPID)
-		WalReceiverPID = 0;
-	else if (WalReceiverPID != 0 && take_action)
-		sigquit_child(WalReceiverPID);
-
-	/* Take care of the walsummarizer too */
-	if (pid == WalSummarizerPID)
-		WalSummarizerPID = 0;
-	else if (WalSummarizerPID != 0 && take_action)
-		sigquit_child(WalSummarizerPID);
-
-	/* Take care of the autovacuum launcher too */
-	if (pid == AutoVacPID)
-		AutoVacPID = 0;
-	else if (AutoVacPID != 0 && take_action)
-		sigquit_child(AutoVacPID);
-
-	/* Take care of the archiver too */
-	if (pid == PgArchPID)
-		PgArchPID = 0;
-	else if (PgArchPID != 0 && take_action)
-		sigquit_child(PgArchPID);
-
-	/* Take care of the slot sync worker too */
-	if (pid == SlotSyncWorkerPID)
-		SlotSyncWorkerPID = 0;
-	else if (SlotSyncWorkerPID != 0 && take_action)
-		sigquit_child(SlotSyncWorkerPID);
-
-	/* We do NOT restart the syslogger */
+		/* Take care of the walreceiver too */
+		if (WalReceiverPID != 0)
+			sigquit_child(WalReceiverPID);
+
+		/* Take care of the walsummarizer too */
+		if (WalSummarizerPID != 0)
+			sigquit_child(WalSummarizerPID);
+
+		/* Take care of the autovacuum launcher too */
+		if (AutoVacPID != 0)
+			sigquit_child(AutoVacPID);
+
+		/* Take care of the archiver too */
+		if (PgArchPID != 0)
+			sigquit_child(PgArchPID);
+
+		/* Take care of the slot sync worker too */
+		if (SlotSyncWorkerPID != 0)
+			sigquit_child(SlotSyncWorkerPID);
+
+		/* We do NOT restart the syslogger */
+	}
 
 	if (Shutdown != ImmediateShutdown)
 		FatalError = true;
@@ -3480,6 +3372,7 @@ BackendStartup(ClientSocket *client_sock)
 	/* Pass down canAcceptConnections state */
 	startup_data.canAcceptConnections = canAcceptConnections(BACKEND_TYPE_NORMAL);
 	bn->dead_end = (startup_data.canAcceptConnections != CAC_OK);
+	bn->rw = NULL;
 
 	/*
 	 * Unless it's a dead_end child, assign it a child slot number
@@ -3865,6 +3758,7 @@ StartAutovacuumWorker(void)
 			bn->dead_end = false;
 			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 			bn->bgworker_notify = false;
+			bn->rw = NULL;
 
 			bn->pid = StartChildProcess(B_AUTOVAC_WORKER);
 			if (bn->pid > 0)
@@ -4049,8 +3943,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 		rw->rw_crashed_at = GetCurrentTimestamp();
 		return false;
 	}
-	rw->rw_backend = bn;
-	rw->rw_child_slot = bn->child_slot;
+	bn->rw = rw;
 
 	ereport(DEBUG1,
 			(errmsg_internal("starting background worker process \"%s\"",
@@ -4063,10 +3956,9 @@ do_start_bgworker(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errmsg("could not fork background worker process: %m")));
 		/* undo what assign_backendlist_entry did */
-		ReleasePostmasterChildSlot(rw->rw_child_slot);
-		rw->rw_child_slot = 0;
-		pfree(rw->rw_backend);
-		rw->rw_backend = NULL;
+		ReleasePostmasterChildSlot(bn->child_slot);
+		pfree(bn);
+
 		/* mark entry as crashed, so we'll try again later */
 		rw->rw_crashed_at = GetCurrentTimestamp();
 		return false;
@@ -4074,10 +3966,10 @@ do_start_bgworker(RegisteredBgWorker *rw)
 
 	/* in postmaster, fork successful ... */
 	rw->rw_pid = worker_pid;
-	rw->rw_backend->pid = rw->rw_pid;
+	bn->pid = rw->rw_pid;
 	ReportBackgroundWorkerPID(rw);
 	/* add new worker to lists of backends */
-	dlist_push_head(&BackendList, &rw->rw_backend->elem);
+	dlist_push_head(&BackendList, &bn->elem);
 	return true;
 }
 
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index e55e38af65..309a91124b 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -26,16 +26,13 @@
 /*
  * List of background workers, private to postmaster.
  *
- * All workers that are currently running will have rw_backend set, and will
- * be present in BackendList.
+ * All workers that are currently running will also have an entry in
+ * BackendList.
  */
 typedef struct RegisteredBgWorker
 {
 	BackgroundWorker rw_worker; /* its registry entry */
-	struct bkend *rw_backend;	/* its BackendList entry, or NULL if not
-								 * running */
 	pid_t		rw_pid;			/* 0 if not running */
-	int			rw_child_slot;
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
 	int			rw_shmem_slot;
 	bool		rw_terminate;
-- 
2.39.2

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Heikki Linnakangas (#2)

11 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

I committed the first two trivial patches, and have continued to work on
postmaster.c, and how it manages all the child processes.

This is a lot of patches. They're built on top of each other, because
that's the order I developed them in, but they probably could be applied
in different order too. Please help me by reviewing these, before the
stack grows even larger :-). Even partial reviews would be very helpful.
I suggest to start reading them in order, and when you get tired, just
send any comments you have up to that point.

* v3-0001-Make-BackgroundWorkerList-doubly-linked.patch

This is the same refactoring patch I started this thread with.

* v3-0003-Fix-comment-on-processes-being-kept-over-a-restar.patch
* v3-0004-Consolidate-postmaster-code-to-launch-background-.patch

Little refactoring of how postmaster launches the background processes.

* v3-0005-Add-test-for-connection-limits.patch
* v3-0006-Add-test-for-dead-end-backends.patch

A few new TAP tests for dead-end backends and enforcing connection
limits. We didn't have much coverage for these before.

* v3-0007-Use-an-shmem_exit-callback-to-remove-backend-from.patch
* v3-0008-Introduce-a-separate-BackendType-for-dead-end-chi.patch

Some preliminary refactoring towards patch
v3-0010-Assign-a-child-slot-to-every-postmaster-child-pro.patch

* v3-0009-Kill-dead-end-children-when-there-s-nothing-else-.patch

I noticed that we never send SIGTERM or SIGQUIT to dead-end backends,
which seems silly. If the server is shutting down, dead-end backends
might prevent the shutdown from completing. Dead-end backends will
expire after authentication_timoeut (default 60s), so it won't last for
too long, but still seems like we should kill dead-end backends if
they're the only children preventing shutdown from completing.

* 3-0010-Assign-a-child-slot-to-every-postmaster-child-pro.patch

This is what I consider the main patch in this series. Currently, only
regular backens, bgworkers and autovacuum workers have a PMChildFlags
slot, which is used to detect when a postmaster child exits in an
unclean way (in addition to the exit code). This patch assigns a child
slot for all processes, except for dead-end backends. That includes all
the aux processes.

While we're at it, I created separate pools of child slots for different
kinds of backends, which fixes the issue that opening a lot of client
connections can exhaust all the slots, so that background workers or
autovacuum workers cannot start either [1]/messages/by-id/55d2f50c-0b81-4b33-b202-cd2a406d69a3@iki.fi.

[1]: /messages/by-id/55d2f50c-0b81-4b33-b202-cd2a406d69a3@iki.fi
/messages/by-id/55d2f50c-0b81-4b33-b202-cd2a406d69a3@iki.fi

* v3-0011-Pass-MyPMChildSlot-as-an-explicit-argument-to-chi.patch

One more little refactoring, to pass MyPMChildSlot to the child process
differently.

Where is all this leading? I'm not sure exactly, but having a postmaster
child slot for every postmaster child seems highly useful. We could move
the ProcSignal machinery to use those slot numbers for the indexes to
the ProcSignal array, instead of ProcSignal, for example. That would
allow all processes to participate in the signalling, even before they
have a PGPROC entry. (Or with Thomas's interrupts refactoring, the
interrupts array). With the multithreading work, PMChild struct could
store a thread id, or whatever is needed for threads to communicate with
each other. In any case, seems like it will come handy.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v3-0001-Make-BackgroundWorkerList-doubly-linked.patchtext/x-patch; charset=UTF-8; name=v3-0001-Make-BackgroundWorkerList-doubly-linked.patchDownload

From c5881b9f4b89b762cd8ef925936eec3602b565b5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 29 Jul 2024 23:14:00 +0300
Subject: [PATCH v3 01/12] Make BackgroundWorkerList doubly-linked

This allows ForgetBackgroundWorker() and ReportBackgroundWorkerExit()
to take a RegisteredBgWorker pointer as argument, rather than a list
iterator. That feels a little more natural. But more importantly, this
paves the way for more refactoring in the next commit.

Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
---
 src/backend/postmaster/bgworker.c           | 62 ++++++++++-----------
 src/backend/postmaster/postmaster.c         | 40 ++++++-------
 src/include/postmaster/bgworker_internals.h | 10 ++--
 3 files changed, 54 insertions(+), 58 deletions(-)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 77707bb384..981d8177b0 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -37,7 +37,7 @@
 /*
  * The postmaster's list of registered background workers, in private memory.
  */
-slist_head	BackgroundWorkerList = SLIST_STATIC_INIT(BackgroundWorkerList);
+dlist_head	BackgroundWorkerList = DLIST_STATIC_INIT(BackgroundWorkerList);
 
 /*
  * BackgroundWorkerSlots exist in shared memory and can be accessed (via
@@ -168,7 +168,7 @@ BackgroundWorkerShmemInit(void)
 										   &found);
 	if (!IsUnderPostmaster)
 	{
-		slist_iter	siter;
+		dlist_iter	iter;
 		int			slotno = 0;
 
 		BackgroundWorkerData->total_slots = max_worker_processes;
@@ -181,12 +181,12 @@ BackgroundWorkerShmemInit(void)
 		 * correspondence between the postmaster's private list and the array
 		 * in shared memory.
 		 */
-		slist_foreach(siter, &BackgroundWorkerList)
+		dlist_foreach(iter, &BackgroundWorkerList)
 		{
 			BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
 			RegisteredBgWorker *rw;
 
-			rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+			rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 			Assert(slotno < max_worker_processes);
 			slot->in_use = true;
 			slot->terminate = false;
@@ -220,13 +220,13 @@ BackgroundWorkerShmemInit(void)
 static RegisteredBgWorker *
 FindRegisteredWorkerBySlotNumber(int slotno)
 {
-	slist_iter	siter;
+	dlist_iter	iter;
 
-	slist_foreach(siter, &BackgroundWorkerList)
+	dlist_foreach(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		if (rw->rw_shmem_slot == slotno)
 			return rw;
 	}
@@ -413,29 +413,25 @@ BackgroundWorkerStateChange(bool allow_new_workers)
 				(errmsg_internal("registering background worker \"%s\"",
 								 rw->rw_worker.bgw_name)));
 
-		slist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
+		dlist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
 	}
 }
 
 /*
  * Forget about a background worker that's no longer needed.
  *
- * The worker must be identified by passing an slist_mutable_iter that
- * points to it.  This convention allows deletion of workers during
- * searches of the worker list, and saves having to search the list again.
+ * NOTE: The entry is unlinked from BackgroundWorkerList.  If the caller is
+ * iterating through it, better use a mutable iterator!
  *
  * Caller is responsible for notifying bgw_notify_pid, if appropriate.
  *
  * This function must be invoked only in the postmaster.
  */
 void
-ForgetBackgroundWorker(slist_mutable_iter *cur)
+ForgetBackgroundWorker(RegisteredBgWorker *rw)
 {
-	RegisteredBgWorker *rw;
 	BackgroundWorkerSlot *slot;
 
-	rw = slist_container(RegisteredBgWorker, rw_lnode, cur->cur);
-
 	Assert(rw->rw_shmem_slot < max_worker_processes);
 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 	Assert(slot->in_use);
@@ -454,7 +450,7 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
 			(errmsg_internal("unregistering background worker \"%s\"",
 							 rw->rw_worker.bgw_name)));
 
-	slist_delete_current(cur);
+	dlist_delete(&rw->rw_lnode);
 	pfree(rw);
 }
 
@@ -480,17 +476,17 @@ ReportBackgroundWorkerPID(RegisteredBgWorker *rw)
  * Report that the PID of a background worker is now zero because a
  * previously-running background worker has exited.
  *
+ * NOTE: The entry may be unlinked from BackgroundWorkerList.  If the caller
+ * is iterating through it, better use a mutable iterator!
+ *
  * This function should only be called from the postmaster.
  */
 void
-ReportBackgroundWorkerExit(slist_mutable_iter *cur)
+ReportBackgroundWorkerExit(RegisteredBgWorker *rw)
 {
-	RegisteredBgWorker *rw;
 	BackgroundWorkerSlot *slot;
 	int			notify_pid;
 
-	rw = slist_container(RegisteredBgWorker, rw_lnode, cur->cur);
-
 	Assert(rw->rw_shmem_slot < max_worker_processes);
 	slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 	slot->pid = rw->rw_pid;
@@ -505,7 +501,7 @@ ReportBackgroundWorkerExit(slist_mutable_iter *cur)
 	 */
 	if (rw->rw_terminate ||
 		rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
-		ForgetBackgroundWorker(cur);
+		ForgetBackgroundWorker(rw);
 
 	if (notify_pid != 0)
 		kill(notify_pid, SIGUSR1);
@@ -519,13 +515,13 @@ ReportBackgroundWorkerExit(slist_mutable_iter *cur)
 void
 BackgroundWorkerStopNotifications(pid_t pid)
 {
-	slist_iter	siter;
+	dlist_iter	iter;
 
-	slist_foreach(siter, &BackgroundWorkerList)
+	dlist_foreach(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		if (rw->rw_worker.bgw_notify_pid == pid)
 			rw->rw_worker.bgw_notify_pid = 0;
 	}
@@ -546,14 +542,14 @@ BackgroundWorkerStopNotifications(pid_t pid)
 void
 ForgetUnstartedBackgroundWorkers(void)
 {
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 		BackgroundWorkerSlot *slot;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		Assert(rw->rw_shmem_slot < max_worker_processes);
 		slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
 
@@ -564,7 +560,7 @@ ForgetUnstartedBackgroundWorkers(void)
 			/* ... then zap it, and notify the waiter */
 			int			notify_pid = rw->rw_worker.bgw_notify_pid;
 
-			ForgetBackgroundWorker(&iter);
+			ForgetBackgroundWorker(rw);
 			if (notify_pid != 0)
 				kill(notify_pid, SIGUSR1);
 		}
@@ -584,13 +580,13 @@ ForgetUnstartedBackgroundWorkers(void)
 void
 ResetBackgroundWorkerCrashTimes(void)
 {
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 		if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
 		{
@@ -601,7 +597,7 @@ ResetBackgroundWorkerCrashTimes(void)
 			 * parallel_terminate_count will get incremented after we've
 			 * already zeroed parallel_register_count, which would be bad.)
 			 */
-			ForgetBackgroundWorker(&iter);
+			ForgetBackgroundWorker(rw);
 		}
 		else
 		{
@@ -1036,7 +1032,7 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 	rw->rw_crashed_at = 0;
 	rw->rw_terminate = false;
 
-	slist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
+	dlist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
 }
 
 /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a3e9e8fdc0..fc00e39c44 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1531,7 +1531,7 @@ DetermineSleepTime(void)
 
 	if (HaveCrashedWorker)
 	{
-		slist_mutable_iter siter;
+		dlist_mutable_iter iter;
 
 		/*
 		 * When there are crashed bgworkers, we sleep just long enough that
@@ -1539,12 +1539,12 @@ DetermineSleepTime(void)
 		 * determine the minimum of all wakeup times according to most recent
 		 * crash time and requested restart interval.
 		 */
-		slist_foreach_modify(siter, &BackgroundWorkerList)
+		dlist_foreach_modify(iter, &BackgroundWorkerList)
 		{
 			RegisteredBgWorker *rw;
 			TimestampTz this_wakeup;
 
-			rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+			rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 			if (rw->rw_crashed_at == 0)
 				continue;
@@ -1552,7 +1552,7 @@ DetermineSleepTime(void)
 			if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART
 				|| rw->rw_terminate)
 			{
-				ForgetBackgroundWorker(&siter);
+				ForgetBackgroundWorker(rw);
 				continue;
 			}
 
@@ -2625,13 +2625,13 @@ CleanupBackgroundWorker(int pid,
 						int exitstatus) /* child's exit status */
 {
 	char		namebuf[MAXPGPATH];
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 		if (rw->rw_pid != pid)
 			continue;
@@ -2694,7 +2694,7 @@ CleanupBackgroundWorker(int pid,
 		rw->rw_backend = NULL;
 		rw->rw_pid = 0;
 		rw->rw_child_slot = 0;
-		ReportBackgroundWorkerExit(&iter);	/* report child death */
+		ReportBackgroundWorkerExit(rw); /* report child death */
 
 		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
 					 namebuf, pid, exitstatus);
@@ -2796,8 +2796,8 @@ CleanupBackend(int pid,
 static void
 HandleChildCrash(int pid, int exitstatus, const char *procname)
 {
-	dlist_mutable_iter iter;
-	slist_iter	siter;
+	dlist_iter	iter;
+	dlist_mutable_iter miter;
 	Backend    *bp;
 	bool		take_action;
 
@@ -2819,11 +2819,11 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	}
 
 	/* Process background workers. */
-	slist_foreach(siter, &BackgroundWorkerList)
+	dlist_foreach(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 		if (rw->rw_pid == 0)
 			continue;			/* not running */
 		if (rw->rw_pid == pid)
@@ -2853,9 +2853,9 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	}
 
 	/* Process regular backends */
-	dlist_foreach_modify(iter, &BackendList)
+	dlist_foreach_modify(miter, &BackendList)
 	{
-		bp = dlist_container(Backend, elem, iter.cur);
+		bp = dlist_container(Backend, elem, miter.cur);
 
 		if (bp->pid == pid)
 		{
@@ -2866,7 +2866,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 			{
 				(void) ReleasePostmasterChildSlot(bp->child_slot);
 			}
-			dlist_delete(iter.cur);
+			dlist_delete(miter.cur);
 			pfree(bp);
 			/* Keep looping so we can signal remaining backends */
 		}
@@ -4177,7 +4177,7 @@ maybe_start_bgworkers(void)
 #define MAX_BGWORKERS_TO_LAUNCH 100
 	int			num_launched = 0;
 	TimestampTz now = 0;
-	slist_mutable_iter iter;
+	dlist_mutable_iter iter;
 
 	/*
 	 * During crash recovery, we have no need to be called until the state
@@ -4194,11 +4194,11 @@ maybe_start_bgworkers(void)
 	StartWorkerNeeded = false;
 	HaveCrashedWorker = false;
 
-	slist_foreach_modify(iter, &BackgroundWorkerList)
+	dlist_foreach_modify(iter, &BackgroundWorkerList)
 	{
 		RegisteredBgWorker *rw;
 
-		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
 
 		/* ignore if already running */
 		if (rw->rw_pid != 0)
@@ -4207,7 +4207,7 @@ maybe_start_bgworkers(void)
 		/* if marked for death, clean up and remove from list */
 		if (rw->rw_terminate)
 		{
-			ForgetBackgroundWorker(&iter);
+			ForgetBackgroundWorker(rw);
 			continue;
 		}
 
@@ -4226,7 +4226,7 @@ maybe_start_bgworkers(void)
 
 				notify_pid = rw->rw_worker.bgw_notify_pid;
 
-				ForgetBackgroundWorker(&iter);
+				ForgetBackgroundWorker(rw);
 
 				/* Report worker is gone now. */
 				if (notify_pid != 0)
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index 61ba54117a..e55e38af65 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -39,17 +39,17 @@ typedef struct RegisteredBgWorker
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
 	int			rw_shmem_slot;
 	bool		rw_terminate;
-	slist_node	rw_lnode;		/* list link */
+	dlist_node	rw_lnode;		/* list link */
 } RegisteredBgWorker;
 
-extern PGDLLIMPORT slist_head BackgroundWorkerList;
+extern PGDLLIMPORT dlist_head BackgroundWorkerList;
 
 extern Size BackgroundWorkerShmemSize(void);
 extern void BackgroundWorkerShmemInit(void);
 extern void BackgroundWorkerStateChange(bool allow_new_workers);
-extern void ForgetBackgroundWorker(slist_mutable_iter *cur);
-extern void ReportBackgroundWorkerPID(RegisteredBgWorker *);
-extern void ReportBackgroundWorkerExit(slist_mutable_iter *cur);
+extern void ForgetBackgroundWorker(RegisteredBgWorker *rw);
+extern void ReportBackgroundWorkerPID(RegisteredBgWorker *rw);
+extern void ReportBackgroundWorkerExit(RegisteredBgWorker *rw);
 extern void BackgroundWorkerStopNotifications(pid_t pid);
 extern void ForgetUnstartedBackgroundWorkers(void);
 extern void ResetBackgroundWorkerCrashTimes(void);
-- 
2.39.2

v3-0002-Refactor-code-to-handle-death-of-a-backend-or-bgw.patchtext/x-patch; charset=UTF-8; name=v3-0002-Refactor-code-to-handle-death-of-a-backend-or-bgw.patchDownload

From 6f59e21f98c9c95ef1d39f13718916f4ed571dbf Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 30 Jul 2024 14:28:19 +0300
Subject: [PATCH v3 02/12] Refactor code to handle death of a backend or
 bgworker in postmaster

Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList, to see if it the child process was a
background worker. If not found, then it scans through BackendList to
see if it was a regular backend. That leads to some duplication
between the bgworker and regular backend cleanup code, as both have an
entry in the BackendList that needs to be cleaned up in the same way.
Refactor that so that we scan just the BackendList to find the child
process, and if it was a background worker, do the additional
bgworker-specific cleanup in addition to the normal Backend cleanup.

Change HandleChildCrash so that it doesn't try to handle the cleanup
of the process that already exited, only the signaling of all the
other processes. When called for any of the aux processes, the caller
cleared the *PID global variable, so the code in HandleChildCrash() to
do that was unused.

On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's
now logged with that exit code, instead of 0. Also, if a bgworker
exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is
restarted. Previously it was treated as a normal exit.

If a child process is not found in the BackendList, the log message
now calls it "untracked child process" rather than "server process".
Arguably that should be a PANIC, because we do track all the child
processes in the list, so failing to find a child process is highly
unexpected. But if we want to change that, let's discuss and do that
as a separate commit.

Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
---
 src/backend/postmaster/bgworker.c           |   4 -
 src/backend/postmaster/postmaster.c         | 431 ++++++++------------
 src/include/postmaster/bgworker_internals.h |   7 +-
 3 files changed, 162 insertions(+), 280 deletions(-)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 981d8177b0..b83967cda3 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -401,9 +401,7 @@ BackgroundWorkerStateChange(bool allow_new_workers)
 		}
 
 		/* Initialize postmaster bookkeeping. */
-		rw->rw_backend = NULL;
 		rw->rw_pid = 0;
-		rw->rw_child_slot = 0;
 		rw->rw_crashed_at = 0;
 		rw->rw_shmem_slot = slotno;
 		rw->rw_terminate = false;
@@ -1026,9 +1024,7 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
 	}
 
 	rw->rw_worker = *worker;
-	rw->rw_backend = NULL;
 	rw->rw_pid = 0;
-	rw->rw_child_slot = 0;
 	rw->rw_crashed_at = 0;
 	rw->rw_terminate = false;
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index fc00e39c44..6e753f3865 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -171,6 +171,7 @@ typedef struct bkend
 	int			child_slot;		/* PMChildSlot for this backend, if any */
 	int			bkend_type;		/* child process flavor, see above */
 	bool		dead_end;		/* is it going to send an error and quit? */
+	RegisteredBgWorker *rw;		/* bgworker info, if this is a bgworker */
 	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
 	dlist_node	elem;			/* list link in BackendList */
 } Backend;
@@ -396,8 +397,7 @@ static void process_pm_child_exit(void);
 static void process_pm_reload_request(void);
 static void process_pm_shutdown_request(void);
 static void dummy_handler(SIGNAL_ARGS);
-static void CleanupBackend(int pid, int exitstatus);
-static bool CleanupBackgroundWorker(int pid, int exitstatus);
+static void CleanupBackend(Backend *bp, int exitstatus);
 static void HandleChildCrash(int pid, int exitstatus, const char *procname);
 static void LogChildExit(int lev, const char *procname,
 						 int pid, int exitstatus);
@@ -2291,6 +2291,9 @@ process_pm_child_exit(void)
 
 	while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0)
 	{
+		bool		found;
+		dlist_mutable_iter iter;
+
 		/*
 		 * Check if this child was a startup process.
 		 */
@@ -2590,18 +2593,34 @@ process_pm_child_exit(void)
 			continue;
 		}
 
-		/* Was it one of our background workers? */
-		if (CleanupBackgroundWorker(pid, exitstatus))
+		/*
+		 * Was it a backend or background worker?
+		 */
+		found = false;
+		dlist_foreach_modify(iter, &BackendList)
 		{
-			/* have it be restarted */
-			HaveCrashedWorker = true;
-			continue;
+			Backend    *bp = dlist_container(Backend, elem, iter.cur);
+
+			if (bp->pid == pid)
+			{
+				dlist_delete(iter.cur);
+				CleanupBackend(bp, exitstatus);
+				found = true;
+				break;
+			}
 		}
 
 		/*
-		 * Else do standard backend child cleanup.
+		 * We don't know anything about this child process.  That's highly
+		 * unexpected, as we do track all the child processes that we fork.
 		 */
-		CleanupBackend(pid, exitstatus);
+		if (!found)
+		{
+			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+				HandleChildCrash(pid, exitstatus, _("untracked child process"));
+			else
+				LogChildExit(LOG, _("untracked child process"), pid, exitstatus);
+		}
 	}							/* loop over pending child-death reports */
 
 	/*
@@ -2612,113 +2631,31 @@ process_pm_child_exit(void)
 }
 
 /*
- * Scan the bgworkers list and see if the given PID (which has just stopped
- * or crashed) is in it.  Handle its shutdown if so, and return true.  If not a
- * bgworker, return false.
+ * CleanupBackend -- cleanup after terminated backend or background worker.
  *
- * This is heavily based on CleanupBackend.  One important difference is that
- * we don't know yet that the dying process is a bgworker, so we must be silent
- * until we're sure it is.
+ * Remove all local state associated with backend.
  */
-static bool
-CleanupBackgroundWorker(int pid,
-						int exitstatus) /* child's exit status */
+static void
+CleanupBackend(Backend *bp,
+			   int exitstatus)	/* child's exit status. */
 {
 	char		namebuf[MAXPGPATH];
-	dlist_mutable_iter iter;
+	char	   *procname;
+	bool		crashed = false;
 
-	dlist_foreach_modify(iter, &BackgroundWorkerList)
+	/* Construct a process name for log message */
+	if (bp->dead_end)
+	{
+		procname = _("dead end backend");
+	}
+	else if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
 	{
-		RegisteredBgWorker *rw;
-
-		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-
-		if (rw->rw_pid != pid)
-			continue;
-
-#ifdef WIN32
-		/* see CleanupBackend */
-		if (exitstatus == ERROR_WAIT_NO_CHILDREN)
-			exitstatus = 0;
-#endif
-
 		snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""),
-				 rw->rw_worker.bgw_type);
-
-
-		if (!EXIT_STATUS_0(exitstatus))
-		{
-			/* Record timestamp, so we know when to restart the worker. */
-			rw->rw_crashed_at = GetCurrentTimestamp();
-		}
-		else
-		{
-			/* Zero exit status means terminate */
-			rw->rw_crashed_at = 0;
-			rw->rw_terminate = true;
-		}
-
-		/*
-		 * Additionally, just like a backend, any exit status other than 0 or
-		 * 1 is considered a crash and causes a system-wide restart.
-		 */
-		if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
-		{
-			HandleChildCrash(pid, exitstatus, namebuf);
-			return true;
-		}
-
-		/*
-		 * We must release the postmaster child slot. If the worker failed to
-		 * do so, it did not clean up after itself, requiring a crash-restart
-		 * cycle.
-		 */
-		if (!ReleasePostmasterChildSlot(rw->rw_child_slot))
-		{
-			HandleChildCrash(pid, exitstatus, namebuf);
-			return true;
-		}
-
-		/* Get it out of the BackendList and clear out remaining data */
-		dlist_delete(&rw->rw_backend->elem);
-
-		/*
-		 * It's possible that this background worker started some OTHER
-		 * background worker and asked to be notified when that worker started
-		 * or stopped.  If so, cancel any notifications destined for the
-		 * now-dead backend.
-		 */
-		if (rw->rw_backend->bgworker_notify)
-			BackgroundWorkerStopNotifications(rw->rw_pid);
-		pfree(rw->rw_backend);
-		rw->rw_backend = NULL;
-		rw->rw_pid = 0;
-		rw->rw_child_slot = 0;
-		ReportBackgroundWorkerExit(rw); /* report child death */
-
-		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
-					 namebuf, pid, exitstatus);
-
-		return true;
+				 bp->rw->rw_worker.bgw_type);
+		procname = namebuf;
 	}
-
-	return false;
-}
-
-/*
- * CleanupBackend -- cleanup after terminated backend.
- *
- * Remove all local state associated with backend.
- *
- * If you change this, see also CleanupBackgroundWorker.
- */
-static void
-CleanupBackend(int pid,
-			   int exitstatus)	/* child's exit status. */
-{
-	dlist_mutable_iter iter;
-
-	LogChildExit(DEBUG2, _("server process"), pid, exitstatus);
+	else
+		procname = _("server process");
 
 	/*
 	 * If a backend dies in an ugly way then we must signal all other backends
@@ -2726,6 +2663,8 @@ CleanupBackend(int pid,
 	 * assume everything is all right and proceed to remove the backend from
 	 * the active backend list.
 	 */
+	if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+		crashed = true;
 
 #ifdef WIN32
 
@@ -2738,52 +2677,76 @@ CleanupBackend(int pid,
 	 */
 	if (exitstatus == ERROR_WAIT_NO_CHILDREN)
 	{
-		LogChildExit(LOG, _("server process"), pid, exitstatus);
-		exitstatus = 0;
+		LogChildExit(LOG, procname, bp->pid, exitstatus);
+		crashed = false;
 	}
 #endif
 
-	if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+	/*
+	 * If the process attached to shared memory, check that it detached
+	 * cleanly.
+	 */
+	if (!bp->dead_end)
 	{
-		HandleChildCrash(pid, exitstatus, _("server process"));
+		if (!ReleasePostmasterChildSlot(bp->child_slot))
+		{
+			/*
+			 * Uh-oh, the child failed to clean itself up.  Treat as a crash
+			 * after all.
+			 */
+			crashed = true;
+		}
+	}
+
+	if (crashed)
+	{
+		HandleChildCrash(bp->pid, exitstatus, namebuf);
+		pfree(bp);
 		return;
 	}
 
-	dlist_foreach_modify(iter, &BackendList)
+	/*
+	 * This backend may have been slated to receive SIGUSR1 when some
+	 * background worker started or stopped.  Cancel those notifications, as
+	 * we don't want to signal PIDs that are not PostgreSQL backends.  This
+	 * gets skipped in the (probably very common) case where the backend has
+	 * never requested any such notifications.
+	 */
+	if (bp->bgworker_notify)
+		BackgroundWorkerStopNotifications(bp->pid);
+
+	/*
+	 * If it was a background worker, also update its RegisteredWorker entry.
+	 */
+	if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
 	{
-		Backend    *bp = dlist_container(Backend, elem, iter.cur);
+		RegisteredBgWorker *rw = bp->rw;
 
-		if (bp->pid == pid)
+		if (!EXIT_STATUS_0(exitstatus))
 		{
-			if (!bp->dead_end)
-			{
-				if (!ReleasePostmasterChildSlot(bp->child_slot))
-				{
-					/*
-					 * Uh-oh, the child failed to clean itself up.  Treat as a
-					 * crash after all.
-					 */
-					HandleChildCrash(pid, exitstatus, _("server process"));
-					return;
-				}
-			}
-			if (bp->bgworker_notify)
-			{
-				/*
-				 * This backend may have been slated to receive SIGUSR1 when
-				 * some background worker started or stopped.  Cancel those
-				 * notifications, as we don't want to signal PIDs that are not
-				 * PostgreSQL backends.  This gets skipped in the (probably
-				 * very common) case where the backend has never requested any
-				 * such notifications.
-				 */
-				BackgroundWorkerStopNotifications(bp->pid);
-			}
-			dlist_delete(iter.cur);
-			pfree(bp);
-			break;
+			/* Record timestamp, so we know when to restart the worker. */
+			rw->rw_crashed_at = GetCurrentTimestamp();
+		}
+		else
+		{
+			/* Zero exit status means terminate */
+			rw->rw_crashed_at = 0;
+			rw->rw_terminate = true;
 		}
+
+		rw->rw_pid = 0;
+		ReportBackgroundWorkerExit(rw); /* report child death */
+
+		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
+					 procname, bp->pid, exitstatus);
+
+		/* have it be restarted */
+		HaveCrashedWorker = true;
 	}
+	else
+		LogChildExit(DEBUG2, procname, bp->pid, exitstatus);
+
+	pfree(bp);
 }
 
 /*
@@ -2792,13 +2755,14 @@ CleanupBackend(int pid,
  *
  * The objectives here are to clean up our local state about the child
  * process, and to signal all other remaining children to quickdie.
+ *
+ * If it's a backend, the caller has already removed it from the
+ * BackendList. If it's an aux process, the corresponding *PID global variable
+ * has been reset already.
  */
 static void
 HandleChildCrash(int pid, int exitstatus, const char *procname)
 {
-	dlist_iter	iter;
-	dlist_mutable_iter miter;
-	Backend    *bp;
 	bool		take_action;
 
 	/*
@@ -2818,139 +2782,64 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 		SetQuitSignalReason(PMQUIT_FOR_CRASH);
 	}
 
-	/* Process background workers. */
-	dlist_foreach(iter, &BackgroundWorkerList)
+	if (take_action)
 	{
-		RegisteredBgWorker *rw;
+		dlist_iter	iter;
 
-		rw = dlist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-		if (rw->rw_pid == 0)
-			continue;			/* not running */
-		if (rw->rw_pid == pid)
+		dlist_foreach(iter, &BackendList)
 		{
-			/*
-			 * Found entry for freshly-dead worker, so remove it.
-			 */
-			(void) ReleasePostmasterChildSlot(rw->rw_child_slot);
-			dlist_delete(&rw->rw_backend->elem);
-			pfree(rw->rw_backend);
-			rw->rw_backend = NULL;
-			rw->rw_pid = 0;
-			rw->rw_child_slot = 0;
-			/* don't reset crashed_at */
-			/* don't report child stop, either */
-			/* Keep looping so we can signal remaining workers */
-		}
-		else
-		{
-			/*
-			 * This worker is still alive.  Unless we did so already, tell it
-			 * to commit hara-kiri.
-			 */
-			if (take_action)
-				sigquit_child(rw->rw_pid);
-		}
-	}
-
-	/* Process regular backends */
-	dlist_foreach_modify(miter, &BackendList)
-	{
-		bp = dlist_container(Backend, elem, miter.cur);
+			Backend    *bp = dlist_container(Backend, elem, iter.cur);
 
-		if (bp->pid == pid)
-		{
-			/*
-			 * Found entry for freshly-dead backend, so remove it.
-			 */
-			if (!bp->dead_end)
-			{
-				(void) ReleasePostmasterChildSlot(bp->child_slot);
-			}
-			dlist_delete(miter.cur);
-			pfree(bp);
-			/* Keep looping so we can signal remaining backends */
-		}
-		else
-		{
 			/*
 			 * This backend is still alive.  Unless we did so already, tell it
 			 * to commit hara-kiri.
 			 *
 			 * We could exclude dead_end children here, but at least when
 			 * sending SIGABRT it seems better to include them.
-			 *
-			 * Background workers were already processed above; ignore them
-			 * here.
 			 */
-			if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
-				continue;
+			sigquit_child(bp->pid);
+		}
 
-			if (take_action)
-				sigquit_child(bp->pid);
+		if (StartupPID != 0)
+		{
+			sigquit_child(StartupPID);
+			StartupStatus = STARTUP_SIGNALED;
 		}
-	}
 
-	/* Take care of the startup process too */
-	if (pid == StartupPID)
-	{
-		StartupPID = 0;
-		/* Caller adjusts StartupStatus, so don't touch it here */
-	}
-	else if (StartupPID != 0 && take_action)
-	{
-		sigquit_child(StartupPID);
-		StartupStatus = STARTUP_SIGNALED;
-	}
+		/* Take care of the bgwriter too */
+		if (BgWriterPID != 0)
+			sigquit_child(BgWriterPID);
+
+		/* Take care of the checkpointer too */
+		if (CheckpointerPID != 0)
+			sigquit_child(CheckpointerPID);
+
+		/* Take care of the walwriter too */
+		if (WalWriterPID != 0)
+			sigquit_child(WalWriterPID);
+
+		/* Take care of the walreceiver too */
+		if (WalReceiverPID != 0)
+			sigquit_child(WalReceiverPID);
+
+		/* Take care of the walsummarizer too */
+		if (WalSummarizerPID != 0)
+			sigquit_child(WalSummarizerPID);
+
+		/* Take care of the autovacuum launcher too */
+		if (AutoVacPID != 0)
+			sigquit_child(AutoVacPID);
 
-	/* Take care of the bgwriter too */
-	if (pid == BgWriterPID)
-		BgWriterPID = 0;
-	else if (BgWriterPID != 0 && take_action)
-		sigquit_child(BgWriterPID);
-
-	/* Take care of the checkpointer too */
-	if (pid == CheckpointerPID)
-		CheckpointerPID = 0;
-	else if (CheckpointerPID != 0 && take_action)
-		sigquit_child(CheckpointerPID);
-
-	/* Take care of the walwriter too */
-	if (pid == WalWriterPID)
-		WalWriterPID = 0;
-	else if (WalWriterPID != 0 && take_action)
-		sigquit_child(WalWriterPID);
-
-	/* Take care of the walreceiver too */
-	if (pid == WalReceiverPID)
-		WalReceiverPID = 0;
-	else if (WalReceiverPID != 0 && take_action)
-		sigquit_child(WalReceiverPID);
-
-	/* Take care of the walsummarizer too */
-	if (pid == WalSummarizerPID)
-		WalSummarizerPID = 0;
-	else if (WalSummarizerPID != 0 && take_action)
-		sigquit_child(WalSummarizerPID);
-
-	/* Take care of the autovacuum launcher too */
-	if (pid == AutoVacPID)
-		AutoVacPID = 0;
-	else if (AutoVacPID != 0 && take_action)
-		sigquit_child(AutoVacPID);
-
-	/* Take care of the archiver too */
-	if (pid == PgArchPID)
-		PgArchPID = 0;
-	else if (PgArchPID != 0 && take_action)
-		sigquit_child(PgArchPID);
-
-	/* Take care of the slot sync worker too */
-	if (pid == SlotSyncWorkerPID)
-		SlotSyncWorkerPID = 0;
-	else if (SlotSyncWorkerPID != 0 && take_action)
-		sigquit_child(SlotSyncWorkerPID);
-
-	/* We do NOT restart the syslogger */
+		/* Take care of the archiver too */
+		if (PgArchPID != 0)
+			sigquit_child(PgArchPID);
+
+		/* Take care of the slot sync worker too */
+		if (SlotSyncWorkerPID != 0)
+			sigquit_child(SlotSyncWorkerPID);
+
+		/* We do NOT restart the syslogger */
+	}
 
 	if (Shutdown != ImmediateShutdown)
 		FatalError = true;
@@ -3480,6 +3369,7 @@ BackendStartup(ClientSocket *client_sock)
 	/* Pass down canAcceptConnections state */
 	startup_data.canAcceptConnections = canAcceptConnections(BACKEND_TYPE_NORMAL);
 	bn->dead_end = (startup_data.canAcceptConnections != CAC_OK);
+	bn->rw = NULL;
 
 	/*
 	 * Unless it's a dead_end child, assign it a child slot number
@@ -3865,6 +3755,7 @@ StartAutovacuumWorker(void)
 			bn->dead_end = false;
 			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 			bn->bgworker_notify = false;
+			bn->rw = NULL;
 
 			bn->pid = StartChildProcess(B_AUTOVAC_WORKER);
 			if (bn->pid > 0)
@@ -4049,8 +3940,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 		rw->rw_crashed_at = GetCurrentTimestamp();
 		return false;
 	}
-	rw->rw_backend = bn;
-	rw->rw_child_slot = bn->child_slot;
+	bn->rw = rw;
 
 	ereport(DEBUG1,
 			(errmsg_internal("starting background worker process \"%s\"",
@@ -4063,10 +3953,9 @@ do_start_bgworker(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errmsg("could not fork background worker process: %m")));
 		/* undo what assign_backendlist_entry did */
-		ReleasePostmasterChildSlot(rw->rw_child_slot);
-		rw->rw_child_slot = 0;
-		pfree(rw->rw_backend);
-		rw->rw_backend = NULL;
+		ReleasePostmasterChildSlot(bn->child_slot);
+		pfree(bn);
+
 		/* mark entry as crashed, so we'll try again later */
 		rw->rw_crashed_at = GetCurrentTimestamp();
 		return false;
@@ -4074,10 +3963,10 @@ do_start_bgworker(RegisteredBgWorker *rw)
 
 	/* in postmaster, fork successful ... */
 	rw->rw_pid = worker_pid;
-	rw->rw_backend->pid = rw->rw_pid;
+	bn->pid = rw->rw_pid;
 	ReportBackgroundWorkerPID(rw);
 	/* add new worker to lists of backends */
-	dlist_push_head(&BackendList, &rw->rw_backend->elem);
+	dlist_push_head(&BackendList, &bn->elem);
 	return true;
 }
 
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index e55e38af65..309a91124b 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -26,16 +26,13 @@
 /*
  * List of background workers, private to postmaster.
  *
- * All workers that are currently running will have rw_backend set, and will
- * be present in BackendList.
+ * All workers that are currently running will also have an entry in
+ * BackendList.
  */
 typedef struct RegisteredBgWorker
 {
 	BackgroundWorker rw_worker; /* its registry entry */
-	struct bkend *rw_backend;	/* its BackendList entry, or NULL if not
-								 * running */
 	pid_t		rw_pid;			/* 0 if not running */
-	int			rw_child_slot;
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
 	int			rw_shmem_slot;
 	bool		rw_terminate;
-- 
2.39.2

v3-0003-Fix-comment-on-processes-being-kept-over-a-restar.patchtext/x-patch; charset=UTF-8; name=v3-0003-Fix-comment-on-processes-being-kept-over-a-restar.patchDownload

From 3db4626d6ec21518682d38bd259a80c76f759be5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 21:53:44 +0300
Subject: [PATCH v3 03/12] Fix comment on processes being kept over a restart

All child processes except the syslogger are killed on a restart. The
archiver might be already running though, if it was started during
recovery.

The split in the comments between "other special children" and the
first group of "background tasks" seemed really arbitrary, so I just
merged them all into one group.
---
 src/backend/postmaster/postmaster.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6e753f3865..b0152a0068 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2386,9 +2386,9 @@ process_pm_child_exit(void)
 			connsAllowed = true;
 
 			/*
-			 * Crank up the background tasks, if we didn't do that already
-			 * when we entered consistent recovery state.  It doesn't matter
-			 * if this fails, we'll just try again later.
+			 * Crank up any background tasks that we didn't start earlier
+			 * already.  It doesn't matter if any of these fail, we'll just
+			 * try again later.
 			 */
 			if (CheckpointerPID == 0)
 				CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
@@ -2397,18 +2397,11 @@ process_pm_child_exit(void)
 			if (WalWriterPID == 0)
 				WalWriterPID = StartChildProcess(B_WAL_WRITER);
 			MaybeStartWalSummarizer();
-
-			/*
-			 * Likewise, start other special children as needed.  In a restart
-			 * situation, some of them may be alive already.
-			 */
 			if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0)
 				AutoVacPID = StartChildProcess(B_AUTOVAC_LAUNCHER);
 			if (PgArchStartupAllowed() && PgArchPID == 0)
 				PgArchPID = StartChildProcess(B_ARCHIVER);
 			MaybeStartSlotSyncWorker();
-
-			/* workers may be scheduled to start now */
 			maybe_start_bgworkers();
 
 			/* at this point we are really open for business */
-- 
2.39.2

v3-0004-Consolidate-postmaster-code-to-launch-background-.patchtext/x-patch; charset=UTF-8; name=v3-0004-Consolidate-postmaster-code-to-launch-background-.patchDownload

From 41a6a279f16855b4e2e6978f172eb0c1c299aa50 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 21:13:09 +0300
Subject: [PATCH v3 04/12] Consolidate postmaster code to launch background
 processes

Much of the code in process_pm_child_exit() to launch replacement
processes when one exits or when progressing to next postmaster state
was unnecessary, because the ServerLoop will launch any missing
background processes anyway. Remove the redundant code and let
ServerLoop handle it.

In ServerLoop, move the code to launch all the processes to a new
subroutine, to group it all together.
---
 src/backend/postmaster/postmaster.c | 278 ++++++++++++----------------
 1 file changed, 121 insertions(+), 157 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b0152a0068..d9a2783fb6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -417,26 +417,12 @@ static void TerminateChildren(int signal);
 
 static int	CountChildren(int target);
 static Backend *assign_backendlist_entry(void);
+static void LaunchMissingBackgroundProcesses(void);
 static void maybe_start_bgworkers(void);
 static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
 static pid_t StartChildProcess(BackendType type);
 static void StartAutovacuumWorker(void);
-static void MaybeStartWalReceiver(void);
-static void MaybeStartWalSummarizer(void);
 static void InitPostmasterDeathWatchHandle(void);
-static void MaybeStartSlotSyncWorker(void);
-
-/*
- * Archiver is allowed to start up at the current postmaster state?
- *
- * If WAL archiving is enabled always, we are allowed to start archiver
- * even during recovery.
- */
-#define PgArchStartupAllowed()	\
-	(((XLogArchivingActive() && pmState == PM_RUN) ||			\
-	  (XLogArchivingAlways() &&									  \
-	   (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY))) && \
-	 PgArchCanRestart())
 
 #ifdef WIN32
 #define WNOHANG 0				/* ignored, so any integer value will do */
@@ -1670,53 +1656,11 @@ ServerLoop(void)
 			}
 		}
 
-		/* If we have lost the log collector, try to start a new one */
-		if (SysLoggerPID == 0 && Logging_collector)
-			SysLoggerPID = SysLogger_Start();
-
-		/*
-		 * If no background writer process is running, and we are not in a
-		 * state that prevents it, start one.  It doesn't matter if this
-		 * fails, we'll just try again later.  Likewise for the checkpointer.
-		 */
-		if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-			pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
-		{
-			if (CheckpointerPID == 0)
-				CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
-			if (BgWriterPID == 0)
-				BgWriterPID = StartChildProcess(B_BG_WRITER);
-		}
-
 		/*
-		 * Likewise, if we have lost the walwriter process, try to start a new
-		 * one.  But this is needed only in normal operation (else we cannot
-		 * be writing any new WAL).
+		 * If we have need to launch any background processes after changing
+		 * state or because some exited, do so now.
 		 */
-		if (WalWriterPID == 0 && pmState == PM_RUN)
-			WalWriterPID = StartChildProcess(B_WAL_WRITER);
-
-		/*
-		 * If we have lost the autovacuum launcher, try to start a new one. We
-		 * don't want autovacuum to run in binary upgrade mode because
-		 * autovacuum might update relfrozenxid for empty tables before the
-		 * physical files are put in place.
-		 */
-		if (!IsBinaryUpgrade && AutoVacPID == 0 &&
-			(AutoVacuumingActive() || start_autovac_launcher) &&
-			pmState == PM_RUN)
-		{
-			AutoVacPID = StartChildProcess(B_AUTOVAC_LAUNCHER);
-			if (AutoVacPID != 0)
-				start_autovac_launcher = false; /* signal processed */
-		}
-
-		/* If we have lost the archiver, try to start a new one. */
-		if (PgArchPID == 0 && PgArchStartupAllowed())
-			PgArchPID = StartChildProcess(B_ARCHIVER);
-
-		/* If we need to start a slot sync worker, try to do that now */
-		MaybeStartSlotSyncWorker();
+		LaunchMissingBackgroundProcesses();
 
 		/* If we need to signal the autovacuum launcher, do so now */
 		if (avlauncher_needs_signal)
@@ -1726,17 +1670,6 @@ ServerLoop(void)
 				kill(AutoVacPID, SIGUSR2);
 		}
 
-		/* If we need to start a WAL receiver, try to do that now */
-		if (WalReceiverRequested)
-			MaybeStartWalReceiver();
-
-		/* If we need to start a WAL summarizer, try to do that now */
-		MaybeStartWalSummarizer();
-
-		/* Get other worker processes running, if needed */
-		if (StartWorkerNeeded || HaveCrashedWorker)
-			maybe_start_bgworkers();
-
 #ifdef HAVE_PTHREAD_IS_THREADED_NP
 
 		/*
@@ -2386,23 +2319,12 @@ process_pm_child_exit(void)
 			connsAllowed = true;
 
 			/*
-			 * Crank up any background tasks that we didn't start earlier
-			 * already.  It doesn't matter if any of these fail, we'll just
-			 * try again later.
+			 * At the next iteration of the postmaster's main loop, we will
+			 * crank up the background tasks like the checkpointer, autovacuum
+			 * launcher, and background workers that were not started earlier
+			 * already.
 			 */
-			if (CheckpointerPID == 0)
-				CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
-			if (BgWriterPID == 0)
-				BgWriterPID = StartChildProcess(B_BG_WRITER);
-			if (WalWriterPID == 0)
-				WalWriterPID = StartChildProcess(B_WAL_WRITER);
-			MaybeStartWalSummarizer();
-			if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0)
-				AutoVacPID = StartChildProcess(B_AUTOVAC_LAUNCHER);
-			if (PgArchStartupAllowed() && PgArchPID == 0)
-				PgArchPID = StartChildProcess(B_ARCHIVER);
-			MaybeStartSlotSyncWorker();
-			maybe_start_bgworkers();
+			StartWorkerNeeded = true;
 
 			/* at this point we are really open for business */
 			ereport(LOG,
@@ -2541,11 +2463,8 @@ process_pm_child_exit(void)
 		/*
 		 * Was it the archiver?  If exit status is zero (normal) or one (FATAL
 		 * exit), we assume everything is all right just like normal backends
-		 * and just try to restart a new one so that we immediately retry
-		 * archiving remaining files. (If fail, we'll try again in future
-		 * cycles of the postmaster's main loop.) Unless we were waiting for
-		 * it to shut down; don't restart it in that case, and
-		 * PostmasterStateMachine() will advance to the next shutdown step.
+		 * and just try to start a new one on the next cycle of the
+		 * postmaster's main loop, to retry archiving remaining files.
 		 */
 		if (pid == PgArchPID)
 		{
@@ -2553,8 +2472,6 @@ process_pm_child_exit(void)
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("archiver process"));
-			if (PgArchStartupAllowed())
-				PgArchPID = StartChildProcess(B_ARCHIVER);
 			continue;
 		}
 
@@ -3199,6 +3116,116 @@ PostmasterStateMachine(void)
 	}
 }
 
+/*
+ * Check the current pmState and the status of any background processes.  If
+ * there are any background processes missing that should be running in the
+ * current state, but are not, launch them.
+ */
+static void
+LaunchMissingBackgroundProcesses(void)
+{
+	/* If we have lost the log collector, try to start a new one */
+	if (SysLoggerPID == 0 && Logging_collector)
+		SysLoggerPID = SysLogger_Start();
+
+	/*
+	 * If no background writer process is running, and we are not in a state
+	 * that prevents it, start one.  It doesn't matter if this fails, we'll
+	 * just try again later.  Likewise for the checkpointer.
+	 */
+	if (pmState == PM_RUN || pmState == PM_RECOVERY ||
+		pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
+	{
+		if (CheckpointerPID == 0)
+			CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
+		if (BgWriterPID == 0)
+			BgWriterPID = StartChildProcess(B_BG_WRITER);
+	}
+
+	/*
+	 * Likewise, if we have lost the walwriter process, try to start a new
+	 * one.  But this is needed only in normal operation (else we cannot be
+	 * writing any new WAL).
+	 */
+	if (WalWriterPID == 0 && pmState == PM_RUN)
+		WalWriterPID = StartChildProcess(B_WAL_WRITER);
+
+	/*
+	 * If we have lost the autovacuum launcher, try to start a new one.  We
+	 * don't want autovacuum to run in binary upgrade mode because autovacuum
+	 * might update relfrozenxid for empty tables before the physical files
+	 * are put in place.
+	 */
+	if (!IsBinaryUpgrade && AutoVacPID == 0 &&
+		(AutoVacuumingActive() || start_autovac_launcher) &&
+		pmState == PM_RUN)
+	{
+		AutoVacPID = StartChildProcess(B_AUTOVAC_LAUNCHER);
+		if (AutoVacPID != 0)
+			start_autovac_launcher = false; /* signal processed */
+	}
+
+	/*
+	 * If we have lost the archiver, try to start a new one.
+	 *
+	 * If WAL archiving is enabled always, we are allowed to start archiver
+	 * even during recovery.
+	 */
+	if (PgArchPID == 0 &&
+		((XLogArchivingActive() && pmState == PM_RUN) ||
+		 (XLogArchivingAlways() && (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY))) &&
+		PgArchCanRestart())
+		PgArchPID = StartChildProcess(B_ARCHIVER);
+
+	/*
+	 * If we need to start a slot sync worker, try to do that now
+	 *
+	 * We allow to start the slot sync worker when we are on a hot standby,
+	 * fast or immediate shutdown is not in progress, slot sync parameters
+	 * are configured correctly, and it is the first time of worker's launch,
+	 * or enough time has passed since the worker was launched last.
+	 */
+	if (SlotSyncWorkerPID == 0 && pmState == PM_HOT_STANDBY &&
+		Shutdown <= SmartShutdown && sync_replication_slots &&
+		ValidateSlotSyncParams(LOG) && SlotSyncWorkerCanRestart())
+		SlotSyncWorkerPID = StartChildProcess(B_SLOTSYNC_WORKER);
+
+	/*
+	 * If we need to start a WAL receiver, try to do that now
+	 *
+	 * Note: if WalReceiverPID is already nonzero, it might seem that we
+	 * should clear WalReceiverRequested.  However, there's a race condition
+	 * if the walreceiver terminates and the startup process immediately
+	 * requests a new one: it's quite possible to get the signal for the
+	 * request before reaping the dead walreceiver process.  Better to risk
+	 * launching an extra walreceiver than to miss launching one we need.
+	 * (The walreceiver code has logic to recognize that it should go away if
+	 * not needed.)
+	 */
+	if (WalReceiverRequested)
+	{
+		if (WalReceiverPID == 0 &&
+			(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
+			 pmState == PM_HOT_STANDBY) &&
+			Shutdown <= SmartShutdown)
+		{
+			WalReceiverPID = StartChildProcess(B_WAL_RECEIVER);
+			if (WalReceiverPID != 0)
+				WalReceiverRequested = false;
+			/* else leave the flag set, so we'll try again later */
+		}
+	}
+
+	/* If we need to start a WAL summarizer, try to do that now */
+	if (summarize_wal && WalSummarizerPID == 0 &&
+		(pmState == PM_RUN || pmState == PM_HOT_STANDBY) &&
+		Shutdown <= SmartShutdown)
+		WalSummarizerPID = StartChildProcess(B_WAL_SUMMARIZER);
+
+	/* Get other worker processes running, if needed */
+	if (StartWorkerNeeded || HaveCrashedWorker)
+		maybe_start_bgworkers();
+}
 
 /*
  * Send a signal to a postmaster child process
@@ -3550,9 +3577,6 @@ process_pm_pmsignal(void)
 		StartWorkerNeeded = true;
 	}
 
-	if (StartWorkerNeeded || HaveCrashedWorker)
-		maybe_start_bgworkers();
-
 	/* Tell syslogger to rotate logfile if requested */
 	if (SysLoggerPID != 0)
 	{
@@ -3592,9 +3616,7 @@ process_pm_pmsignal(void)
 	if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER))
 	{
 		/* Startup Process wants us to start the walreceiver process. */
-		/* Start immediately if possible, else remember request for later. */
 		WalReceiverRequested = true;
-		MaybeStartWalReceiver();
 	}
 
 	/*
@@ -3788,64 +3810,6 @@ StartAutovacuumWorker(void)
 	}
 }
 
-/*
- * MaybeStartWalReceiver
- *		Start the WAL receiver process, if not running and our state allows.
- *
- * Note: if WalReceiverPID is already nonzero, it might seem that we should
- * clear WalReceiverRequested.  However, there's a race condition if the
- * walreceiver terminates and the startup process immediately requests a new
- * one: it's quite possible to get the signal for the request before reaping
- * the dead walreceiver process.  Better to risk launching an extra
- * walreceiver than to miss launching one we need.  (The walreceiver code
- * has logic to recognize that it should go away if not needed.)
- */
-static void
-MaybeStartWalReceiver(void)
-{
-	if (WalReceiverPID == 0 &&
-		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
-		 pmState == PM_HOT_STANDBY) &&
-		Shutdown <= SmartShutdown)
-	{
-		WalReceiverPID = StartChildProcess(B_WAL_RECEIVER);
-		if (WalReceiverPID != 0)
-			WalReceiverRequested = false;
-		/* else leave the flag set, so we'll try again later */
-	}
-}
-
-/*
- * MaybeStartWalSummarizer
- *		Start the WAL summarizer process, if not running and our state allows.
- */
-static void
-MaybeStartWalSummarizer(void)
-{
-	if (summarize_wal && WalSummarizerPID == 0 &&
-		(pmState == PM_RUN || pmState == PM_HOT_STANDBY) &&
-		Shutdown <= SmartShutdown)
-		WalSummarizerPID = StartChildProcess(B_WAL_SUMMARIZER);
-}
-
-
-/*
- * MaybeStartSlotSyncWorker
- * 		Start the slot sync worker, if not running and our state allows.
- *
- * We allow to start the slot sync worker when we are on a hot standby,
- * fast or immediate shutdown is not in progress, slot sync parameters
- * are configured correctly, and it is the first time of worker's launch,
- * or enough time has passed since the worker was launched last.
- */
-static void
-MaybeStartSlotSyncWorker(void)
-{
-	if (SlotSyncWorkerPID == 0 && pmState == PM_HOT_STANDBY &&
-		Shutdown <= SmartShutdown && sync_replication_slots &&
-		ValidateSlotSyncParams(LOG) && SlotSyncWorkerCanRestart())
-		SlotSyncWorkerPID = StartChildProcess(B_SLOTSYNC_WORKER);
-}
 
 /*
  * Create the opts file
-- 
2.39.2

v3-0005-Add-test-for-connection-limits.patchtext/x-patch; charset=UTF-8; name=v3-0005-Add-test-for-connection-limits.patchDownload

From 5d9c11df6ad75ed1cde8de223f9875b729245f37 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 18:24:08 +0300
Subject: [PATCH v3 05/12] Add test for connection limits

---
 src/test/Makefile                             |  2 +-
 src/test/meson.build                          |  1 +
 src/test/postmaster/Makefile                  | 23 ++++++
 src/test/postmaster/README                    | 27 +++++++
 src/test/postmaster/meson.build               | 12 +++
 .../postmaster/t/001_connection_limits.pl     | 79 +++++++++++++++++++
 6 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 src/test/postmaster/Makefile
 create mode 100644 src/test/postmaster/README
 create mode 100644 src/test/postmaster/meson.build
 create mode 100644 src/test/postmaster/t/001_connection_limits.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index dbd3192874..abdd6e5a98 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl postmaster regress isolation modules authentication recovery subscription
 
 ifeq ($(with_icu),yes)
 SUBDIRS += icu
diff --git a/src/test/meson.build b/src/test/meson.build
index c3d0dfedf1..67376e4b7f 100644
--- a/src/test/meson.build
+++ b/src/test/meson.build
@@ -4,6 +4,7 @@ subdir('regress')
 subdir('isolation')
 
 subdir('authentication')
+subdir('postmaster')
 subdir('recovery')
 subdir('subscription')
 subdir('modules')
diff --git a/src/test/postmaster/Makefile b/src/test/postmaster/Makefile
new file mode 100644
index 0000000000..dfcce9c9ee
--- /dev/null
+++ b/src/test/postmaster/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/postmaster
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/postmaster/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/postmaster
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean:
+	rm -rf tmp_check
diff --git a/src/test/postmaster/README b/src/test/postmaster/README
new file mode 100644
index 0000000000..7e47bf5cff
--- /dev/null
+++ b/src/test/postmaster/README
@@ -0,0 +1,27 @@
+src/test/postmaster/README
+
+Regression tests for postmaster
+===============================
+
+This directory contains a test suite for postmaster's handling of
+connections, connection limits, and startup/shutdown sequence.
+
+
+Running the tests
+=================
+
+NOTE: You must have given the --enable-tap-tests argument to configure.
+
+Run
+    make check
+or
+    make installcheck
+You can use "make installcheck" if you previously did "make install".
+In that case, the code in the installation tree is tested.  With
+"make check", a temporary installation tree is built from the current
+sources and then tested.
+
+Either way, this test initializes, starts, and stops a test Postgres
+cluster.
+
+See src/test/perl/README for more info about running these tests.
diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
new file mode 100644
index 0000000000..c2de2e0eb5
--- /dev/null
+++ b/src/test/postmaster/meson.build
@@ -0,0 +1,12 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+tests += {
+  'name': 'postmaster',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'tests': [
+      't/001_connection_limits.pl',
+    ],
+  },
+}
diff --git a/src/test/postmaster/t/001_connection_limits.pl b/src/test/postmaster/t/001_connection_limits.pl
new file mode 100644
index 0000000000..f50aae4949
--- /dev/null
+++ b/src/test/postmaster/t/001_connection_limits.pl
@@ -0,0 +1,79 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# Test connection limits, i.e. max_connections, reserved_connections
+# and superuser_reserved_connections.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize the server with specific low connection limits
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "max_connections = 6");
+$node->append_conf('postgresql.conf', "reserved_connections = 2");
+$node->append_conf('postgresql.conf', "superuser_reserved_connections = 1");
+$node->append_conf('postgresql.conf', "log_connections = on");
+$node->start;
+
+$node->safe_psql(
+	'postgres', qq{
+CREATE USER regress_regular LOGIN;
+CREATE USER regress_reserved LOGIN;
+GRANT pg_use_reserved_connections TO regress_reserved;
+CREATE USER regress_superuser LOGIN SUPERUSER;
+});
+
+# With the limits we set in postgresql.conf, we can establish:
+# - 3 connections for any user with no special privileges
+# - 2 more connections for users belonging to "pg_use_reserved_connections"
+# - 1 more connection for superuser
+
+sub background_psql_as_user
+{
+	my $user = shift;
+
+	return $node->background_psql(
+		'postgres',
+		on_error_die => 1,
+		extra_params => [ '-U', $user ]);
+}
+
+my @sessions = ();
+
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with privileges of the "pg_use_reserved_connections" role/
+);
+
+push(@sessions, background_psql_as_user('regress_reserved'));
+push(@sessions, background_psql_as_user('regress_reserved'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with the SUPERUSER attribute/
+);
+
+push(@sessions, background_psql_as_user('regress_superuser'));
+$node->connect_fails(
+	"dbname=postgres user=regress_superuser",
+	"superuser_reserved_connections limit",
+	expected_stderr => qr/FATAL:  sorry, too many clients already/);
+
+# TODO: test that query cancellation is still possible
+
+foreach my $session (@sessions)
+{
+	$session->quit;
+}
+
+done_testing();
-- 
2.39.2

v3-0006-Add-test-for-dead-end-backends.patchtext/x-patch; charset=UTF-8; name=v3-0006-Add-test-for-dead-end-backends.patchDownload

From 67dca31131f526be4c04feb941b453a320f29830 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 18:36:10 +0300
Subject: [PATCH v3 06/12] Add test for dead-end backends

The code path for launching a dead-end backend because we're out of
slots was not covered by any tests, so add one. (Some tests did hit
the case of launching a dead-end backend because the server is still
starting up, though, so the gap in our test coverage wasn't as big as
it sounds.)
---
 src/test/perl/PostgreSQL/Test/Cluster.pm      | 39 +++++++++++++++++++
 .../postmaster/t/001_connection_limits.pl     | 17 +++++++-
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 32ee98aebc..6d09f9c5f8 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -104,6 +104,7 @@ use File::Path qw(rmtree mkpath);
 use File::Spec;
 use File::stat qw(stat);
 use File::Temp ();
+use IO::Socket::INET;
 use IPC::Run;
 use PostgreSQL::Version;
 use PostgreSQL::Test::RecursiveCopy;
@@ -284,6 +285,44 @@ sub connstr
 	return "port=$pgport host=$pghost dbname='$dbname'";
 }
 
+=pod
+
+=item $node->raw_connect()
+
+Open a raw TCP or Unix domain socket connection to the server. This
+used by low-level protocol and connection limit tests.
+
+=cut
+
+sub raw_connect
+{
+	my ($self) = @_;
+	my $pgport = $self->port;
+	my $pghost = $self->host;
+
+	my $socket;
+	if ($PostgreSQL::Test::Utils::use_unix_sockets)
+	{
+		require IO::Socket::UNIX;
+		my $path = "$pghost/.s.PGSQL.$pgport";
+
+		$socket = IO::Socket::UNIX->new(
+			Type => SOCK_STREAM(),
+			Peer => $path,
+		) or die "Cannot create socket - $IO::Socket::errstr\n";
+	}
+	else
+	{
+		$socket = IO::Socket::INET->new(
+			PeerHost => $pghost,
+			PeerPort => $pgport,
+			Proto => 'tcp'
+		) or die "Cannot create socket - $IO::Socket::errstr\n";
+	}
+	return $socket;
+}
+
+
 =pod
 
 =item $node->group_access()
diff --git a/src/test/postmaster/t/001_connection_limits.pl b/src/test/postmaster/t/001_connection_limits.pl
index f50aae4949..3547b28bdd 100644
--- a/src/test/postmaster/t/001_connection_limits.pl
+++ b/src/test/postmaster/t/001_connection_limits.pl
@@ -43,6 +43,7 @@ sub background_psql_as_user
 }
 
 my @sessions = ();
+my @raw_connections = ();
 
 push(@sessions, background_psql_as_user('regress_regular'));
 push(@sessions, background_psql_as_user('regress_regular'));
@@ -69,11 +70,25 @@ $node->connect_fails(
 	"superuser_reserved_connections limit",
 	expected_stderr => qr/FATAL:  sorry, too many clients already/);
 
-# TODO: test that query cancellation is still possible
+# We can still open TCP (or Unix domain socket) connections, but
+# beyond a certain number (roughly 2x max_connections), they will be
+# "dead-end backends".
+for (my $i = 0; $i <= 20; $i++)
+{
+	push(@raw_connections, $node->raw_connect());
+}
+
+# TODO: test that query cancellation is still possible. A dead-end
+# backend can process a query cancellation packet.
 
+# Clean up
 foreach my $session (@sessions)
 {
 	$session->quit;
 }
+foreach my $socket (@raw_connections)
+{
+	$socket->close();
+}
 
 done_testing();
-- 
2.39.2

v3-0007-Use-an-shmem_exit-callback-to-remove-backend-from.patchtext/x-patch; charset=UTF-8; name=v3-0007-Use-an-shmem_exit-callback-to-remove-backend-from.patchDownload

From e71056398cd19c2800e174e9a4356eb2dd5456f1 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 15:58:41 +0300
Subject: [PATCH v3 07/12] Use an shmem_exit callback to remove backend from
 PMChildFlags on exit

This seems nicer than having to duplicate the logic between
InitProcess() and ProcKill() for which child processes have a
PMChildFlags slot.

Move the MarkPostmasterChildActive() call earlier in InitProcess(),
out of the section protected by the spinlock.
---
 src/backend/storage/ipc/pmsignal.c | 10 ++++++--
 src/backend/storage/lmgr/proc.c    | 38 ++++++++++--------------------
 src/include/storage/pmsignal.h     |  1 -
 3 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index 27844b46a2..cb99e77476 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -24,6 +24,7 @@
 #include "miscadmin.h"
 #include "postmaster/postmaster.h"
 #include "replication/walsender.h"
+#include "storage/ipc.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/memutils.h"
@@ -121,6 +122,8 @@ postmaster_death_handler(SIGNAL_ARGS)
 
 #endif							/* USE_POSTMASTER_DEATH_SIGNAL */
 
+static void MarkPostmasterChildInactive(int code, Datum arg);
+
 /*
  * PMSignalShmemSize
  *		Compute space needed for pmsignal.c's shared memory
@@ -328,6 +331,9 @@ MarkPostmasterChildActive(void)
 	slot--;
 	Assert(PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED);
 	PMSignalState->PMChildFlags[slot] = PM_CHILD_ACTIVE;
+
+	/* Arrange to clean up at exit. */
+	on_shmem_exit(MarkPostmasterChildInactive, 0);
 }
 
 /*
@@ -352,8 +358,8 @@ MarkPostmasterChildWalSender(void)
  * MarkPostmasterChildInactive - mark a postmaster child as done using
  * shared memory.  This is called in the child process.
  */
-void
-MarkPostmasterChildInactive(void)
+static void
+MarkPostmasterChildInactive(int code, Datum arg)
 {
 	int			slot = MyPMChildSlot;
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 1b23efb26f..37b1c67600 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -307,6 +307,19 @@ InitProcess(void)
 	if (MyProc != NULL)
 		elog(ERROR, "you already exist");
 
+	/*
+	 * Before we start accessing the shared memory in a serious way, mark
+	 * ourselves as an active postmaster child; this is so that the postmaster
+	 * can detect it if we exit without cleaning up.  (XXX autovac launcher
+	 * currently doesn't participate in this; it probably should.)
+	 *
+	 * Slot sync worker also does not participate in it, see comments atop
+	 * 'struct bkend' in postmaster.c.
+	 */
+	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
+		!AmLogicalSlotSyncWorkerProcess())
+		MarkPostmasterChildActive();
+
 	/* Decide which list should supply our PGPROC. */
 	if (AmAutoVacuumLauncherProcess() || AmAutoVacuumWorkerProcess())
 		procgloballist = &ProcGlobal->autovacFreeProcs;
@@ -359,19 +372,6 @@ InitProcess(void)
 	 */
 	Assert(MyProc->procgloballist == procgloballist);
 
-	/*
-	 * Now that we have a PGPROC, mark ourselves as an active postmaster
-	 * child; this is so that the postmaster can detect it if we exit without
-	 * cleaning up.  (XXX autovac launcher currently doesn't participate in
-	 * this; it probably should.)
-	 *
-	 * Slot sync worker also does not participate in it, see comments atop
-	 * 'struct bkend' in postmaster.c.
-	 */
-	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
-		!AmLogicalSlotSyncWorkerProcess())
-		MarkPostmasterChildActive();
-
 	/*
 	 * Initialize all fields of MyProc, except for those previously
 	 * initialized by InitProcGlobal.
@@ -941,18 +941,6 @@ ProcKill(int code, Datum arg)
 
 	SpinLockRelease(ProcStructLock);
 
-	/*
-	 * This process is no longer present in shared memory in any meaningful
-	 * way, so tell the postmaster we've cleaned up acceptably well. (XXX
-	 * autovac launcher should be included here someday)
-	 *
-	 * Slot sync worker is also not a postmaster child, so skip this shared
-	 * memory related processing here.
-	 */
-	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
-		!AmLogicalSlotSyncWorkerProcess())
-		MarkPostmasterChildInactive();
-
 	/* wake autovac launcher if needed -- see comments in FreeWorkerInfo */
 	if (AutovacuumLauncherPid != 0)
 		kill(AutovacuumLauncherPid, SIGUSR2);
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 0c9a7e32a8..3b9336b83c 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -74,7 +74,6 @@ extern int	AssignPostmasterChildSlot(void);
 extern bool ReleasePostmasterChildSlot(int slot);
 extern bool IsPostmasterChildWalSender(int slot);
 extern void MarkPostmasterChildActive(void);
-extern void MarkPostmasterChildInactive(void);
 extern void MarkPostmasterChildWalSender(void);
 extern bool PostmasterIsAliveInternal(void);
 extern void PostmasterDeathSignalInit(void);
-- 
2.39.2

v3-0008-Introduce-a-separate-BackendType-for-dead-end-chi.patchtext/x-patch; charset=UTF-8; name=v3-0008-Introduce-a-separate-BackendType-for-dead-end-chi.patchDownload

From ae9d3e8646a250bc93fc50bded359c703baa9977 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 17:24:12 +0300
Subject: [PATCH v3 08/12] Introduce a separate BackendType for dead-end
 children

And replace postmaster.c's own "backend type" codes with BackendType

XXX: While working on this, many times I accidentally did something
like "foo |= B_SOMETHING" instead of "foo |= 1 << B_SOMETHING", when
constructing arguments to SignalSomeChildren or CountChildren, and
things broke in very subtle ways taking a long time to debug. The old
constants that were already bitmasks avoided that. Maybe we need some
macro magic or something to make this less error-prone.
---
 src/backend/postmaster/postmaster.c    | 106 ++++++++++++-------------
 src/backend/utils/activity/pgstat_io.c |   3 +
 src/backend/utils/init/miscinit.c      |   3 +
 src/include/miscadmin.h                |   1 +
 4 files changed, 56 insertions(+), 57 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d9a2783fb6..2e917c5ea9 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -129,15 +129,11 @@
 
 
 /*
- * Possible types of a backend. Beyond being the possible bkend_type values in
- * struct bkend, these are OR-able request flag bits for SignalSomeChildren()
- * and CountChildren().
+ * CountChildren and SignalSomeChildren use a uint32 bitmask argument to
+ * represent BackendTypes to count or signal.
  */
-#define BACKEND_TYPE_NORMAL		0x0001	/* normal backend */
-#define BACKEND_TYPE_AUTOVAC	0x0002	/* autovacuum worker process */
-#define BACKEND_TYPE_WALSND		0x0004	/* walsender process */
-#define BACKEND_TYPE_BGWORKER	0x0008	/* bgworker process */
-#define BACKEND_TYPE_ALL		0x000F	/* OR of all the above */
+#define BACKEND_TYPE_ALL 0xffffffff
+StaticAssertDecl(BACKEND_NUM_TYPES < 32, "too many backend types for uint32");
 
 /*
  * List of active backends (or child processes anyway; we don't actually
@@ -148,7 +144,7 @@
  * As shown in the above set of backend types, this list includes not only
  * "normal" client sessions, but also autovacuum workers, walsenders, and
  * background workers.  (Note that at the time of launch, walsenders are
- * labeled BACKEND_TYPE_NORMAL; we relabel them to BACKEND_TYPE_WALSND
+ * labeled B_BACKEND; we relabel them to B_WAL_SENDER
  * upon noticing they've changed their PMChildFlags entry.  Hence that check
  * must be done before any operation that needs to distinguish walsenders
  * from normal backends.)
@@ -157,7 +153,8 @@
  * the purpose of sending a friendly rejection message to a would-be client.
  * We must track them because they are attached to shared memory, but we know
  * they will never become live backends.  dead_end children are not assigned a
- * PMChildSlot.  dead_end children have bkend_type NORMAL.
+ * PMChildSlot.  dead_end children have bkend_type B_DEAD_END_BACKEND.
+ * FIXME: a dead-end backend can send query cancel?
  *
  * "Special" children such as the startup, bgwriter, autovacuum launcher, and
  * slot sync worker tasks are not in this list.  They are tracked via StartupPID
@@ -169,8 +166,7 @@ typedef struct bkend
 {
 	pid_t		pid;			/* process id of backend */
 	int			child_slot;		/* PMChildSlot for this backend, if any */
-	int			bkend_type;		/* child process flavor, see above */
-	bool		dead_end;		/* is it going to send an error and quit? */
+	BackendType bkend_type;		/* child process flavor, see above */
 	RegisteredBgWorker *rw;		/* bgworker info, if this is a bgworker */
 	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
 	dlist_node	elem;			/* list link in BackendList */
@@ -410,12 +406,13 @@ static void report_fork_failure_to_client(ClientSocket *client_sock, int errnum)
 static CAC_state canAcceptConnections(int backend_type);
 static void signal_child(pid_t pid, int signal);
 static void sigquit_child(pid_t pid);
-static bool SignalSomeChildren(int signal, int target);
+static bool SignalSomeChildren(int signal, uint32 targetMask);
 static void TerminateChildren(int signal);
 
-#define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
+#define SignalChildren(sig)		\
+	SignalSomeChildren(sig, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND))
 
-static int	CountChildren(int target);
+static int	CountChildren(uint32 targetMask);
 static Backend *assign_backendlist_entry(void);
 static void LaunchMissingBackgroundProcesses(void);
 static void maybe_start_bgworkers(void);
@@ -1765,7 +1762,7 @@ canAcceptConnections(int backend_type)
 	 * bgworker_should_start_now() decided whether the DB state allows them.
 	 */
 	if (pmState != PM_RUN && pmState != PM_HOT_STANDBY &&
-		backend_type != BACKEND_TYPE_BGWORKER)
+		backend_type != B_BG_WORKER)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
@@ -1782,7 +1779,7 @@ canAcceptConnections(int backend_type)
 	 * "Smart shutdown" restrictions are applied only to normal connections,
 	 * not to autovac workers or bgworkers.
 	 */
-	if (!connsAllowed && backend_type == BACKEND_TYPE_NORMAL)
+	if (!connsAllowed && backend_type == B_BACKEND)
 		return CAC_SHUTDOWN;	/* shutdown is pending */
 
 	/*
@@ -1797,7 +1794,7 @@ canAcceptConnections(int backend_type)
 	 * The limit here must match the sizes of the per-child-process arrays;
 	 * see comments for MaxLivePostmasterChildren().
 	 */
-	if (CountChildren(BACKEND_TYPE_ALL) >= MaxLivePostmasterChildren())
+	if (CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) >= MaxLivePostmasterChildren())
 		result = CAC_TOOMANY;
 
 	return result;
@@ -2554,11 +2551,11 @@ CleanupBackend(Backend *bp,
 	bool		crashed = false;
 
 	/* Construct a process name for log message */
-	if (bp->dead_end)
+	if (bp->bkend_type == B_DEAD_END_BACKEND)
 	{
 		procname = _("dead end backend");
 	}
-	else if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
+	else if (bp->bkend_type == B_BG_WORKER)
 	{
 		snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""),
 				 bp->rw->rw_worker.bgw_type);
@@ -2596,7 +2593,7 @@ CleanupBackend(Backend *bp,
 	 * If the process attached to shared memory, check that it detached
 	 * cleanly.
 	 */
-	if (!bp->dead_end)
+	if (bp->bkend_type != B_DEAD_END_BACKEND)
 	{
 		if (!ReleasePostmasterChildSlot(bp->child_slot))
 		{
@@ -2628,7 +2625,7 @@ CleanupBackend(Backend *bp,
 	/*
 	 * If it was a background worker, also update its RegisteredWorker entry.
 	 */
-	if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
+	if (bp->bkend_type == B_BG_WORKER)
 	{
 		RegisteredBgWorker *rw = bp->rw;
 
@@ -2851,7 +2848,7 @@ PostmasterStateMachine(void)
 			 * This state ends when we have no normal client backends running.
 			 * Then we're ready to stop other children.
 			 */
-			if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
+			if (CountChildren(1 << B_BACKEND) == 0)
 				pmState = PM_STOP_BACKENDS;
 		}
 	}
@@ -2872,7 +2869,7 @@ PostmasterStateMachine(void)
 
 		/* Signal all backend children except walsenders */
 		SignalSomeChildren(SIGTERM,
-						   BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
+						   BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND));
 		/* and the autovac launcher too */
 		if (AutoVacPID != 0)
 			signal_child(AutoVacPID, SIGTERM);
@@ -2914,7 +2911,7 @@ PostmasterStateMachine(void)
 		 * here. Walsenders and archiver are also disregarded, they will be
 		 * terminated later after writing the checkpoint record.
 		 */
-		if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 &&
+		if (CountChildren(BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND)) == 0 &&
 			StartupPID == 0 &&
 			WalReceiverPID == 0 &&
 			WalSummarizerPID == 0 &&
@@ -2988,7 +2985,7 @@ PostmasterStateMachine(void)
 		 * left by now anyway; what we're really waiting for is walsenders and
 		 * archiver.
 		 */
-		if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0)
+		if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) == 0)
 		{
 			pmState = PM_WAIT_DEAD_END;
 		}
@@ -3285,10 +3282,10 @@ sigquit_child(pid_t pid)
 
 /*
  * Send a signal to the targeted children (but NOT special children;
- * dead_end children are never signaled, either).
+ * dead_end children are never signaled, either XXX).
  */
 static bool
-SignalSomeChildren(int signal, int target)
+SignalSomeChildren(int signal, uint32 targetMask)
 {
 	dlist_iter	iter;
 	bool		signaled = false;
@@ -3297,24 +3294,21 @@ SignalSomeChildren(int signal, int target)
 	{
 		Backend    *bp = dlist_container(Backend, elem, iter.cur);
 
-		if (bp->dead_end)
-			continue;
-
 		/*
 		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
 		 * it first and avoid touching shared memory for every child.
 		 */
-		if (target != BACKEND_TYPE_ALL)
+		if (targetMask != BACKEND_TYPE_ALL)
 		{
 			/*
 			 * Assign bkend_type for any recently announced WAL Sender
 			 * processes.
 			 */
-			if (bp->bkend_type == BACKEND_TYPE_NORMAL &&
+			if (bp->bkend_type == B_BACKEND &&
 				IsPostmasterChildWalSender(bp->child_slot))
-				bp->bkend_type = BACKEND_TYPE_WALSND;
+				bp->bkend_type = B_WAL_SENDER;
 
-			if (!(target & bp->bkend_type))
+			if ((targetMask & (1 << bp->bkend_type)) == 0)
 				continue;
 		}
 
@@ -3387,17 +3381,22 @@ BackendStartup(ClientSocket *client_sock)
 	}
 
 	/* Pass down canAcceptConnections state */
-	startup_data.canAcceptConnections = canAcceptConnections(BACKEND_TYPE_NORMAL);
-	bn->dead_end = (startup_data.canAcceptConnections != CAC_OK);
+	startup_data.canAcceptConnections = canAcceptConnections(B_BACKEND);
 	bn->rw = NULL;
 
 	/*
 	 * Unless it's a dead_end child, assign it a child slot number
 	 */
-	if (!bn->dead_end)
+	if (startup_data.canAcceptConnections == CAC_OK)
+	{
+		bn->bkend_type = B_BACKEND;	/* Can change later to WALSND */
 		bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
+	}
 	else
+	{
+		bn->bkend_type = B_DEAD_END_BACKEND;
 		bn->child_slot = 0;
+	}
 
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
@@ -3410,7 +3409,7 @@ BackendStartup(ClientSocket *client_sock)
 		/* in parent, fork failed */
 		int			save_errno = errno;
 
-		if (!bn->dead_end)
+		if (bn->child_slot != 0)
 			(void) ReleasePostmasterChildSlot(bn->child_slot);
 		pfree(bn);
 		errno = save_errno;
@@ -3430,7 +3429,6 @@ BackendStartup(ClientSocket *client_sock)
 	 * of backends.
 	 */
 	bn->pid = pid;
-	bn->bkend_type = BACKEND_TYPE_NORMAL;	/* Can change later to WALSND */
 	dlist_push_head(&BackendList, &bn->elem);
 
 	return STATUS_OK;
@@ -3664,11 +3662,10 @@ dummy_handler(SIGNAL_ARGS)
 }
 
 /*
- * Count up number of child processes of specified types (dead_end children
- * are always excluded).
+ * Count up number of child processes of specified types.
  */
 static int
-CountChildren(int target)
+CountChildren(uint32 targetMask)
 {
 	dlist_iter	iter;
 	int			cnt = 0;
@@ -3677,24 +3674,21 @@ CountChildren(int target)
 	{
 		Backend    *bp = dlist_container(Backend, elem, iter.cur);
 
-		if (bp->dead_end)
-			continue;
-
 		/*
 		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
 		 * it first and avoid touching shared memory for every child.
 		 */
-		if (target != BACKEND_TYPE_ALL)
+		if (targetMask != BACKEND_TYPE_ALL)
 		{
 			/*
 			 * Assign bkend_type for any recently announced WAL Sender
 			 * processes.
 			 */
-			if (bp->bkend_type == BACKEND_TYPE_NORMAL &&
+			if (bp->bkend_type == B_BACKEND &&
 				IsPostmasterChildWalSender(bp->child_slot))
-				bp->bkend_type = BACKEND_TYPE_WALSND;
+				bp->bkend_type = B_WAL_SENDER;
 
-			if (!(target & bp->bkend_type))
+			if ((targetMask & (1 << bp->bkend_type)) == 0)
 				continue;
 		}
 
@@ -3761,13 +3755,13 @@ StartAutovacuumWorker(void)
 	 * we have to check to avoid race-condition problems during DB state
 	 * changes.
 	 */
-	if (canAcceptConnections(BACKEND_TYPE_AUTOVAC) == CAC_OK)
+	if (canAcceptConnections(B_AUTOVAC_WORKER) == CAC_OK)
 	{
 		bn = (Backend *) palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
 		if (bn)
 		{
-			/* Autovac workers are not dead_end and need a child slot */
-			bn->dead_end = false;
+			/* Autovac workers need a child slot */
+			bn->bkend_type = B_AUTOVAC_WORKER;
 			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 			bn->bgworker_notify = false;
 			bn->rw = NULL;
@@ -3775,7 +3769,6 @@ StartAutovacuumWorker(void)
 			bn->pid = StartChildProcess(B_AUTOVAC_WORKER);
 			if (bn->pid > 0)
 			{
-				bn->bkend_type = BACKEND_TYPE_AUTOVAC;
 				dlist_push_head(&BackendList, &bn->elem);
 				/* all OK */
 				return;
@@ -3981,7 +3974,7 @@ assign_backendlist_entry(void)
 	 * only possible failure is CAC_TOOMANY, so we just log an error message
 	 * based on that rather than checking the error code precisely.
 	 */
-	if (canAcceptConnections(BACKEND_TYPE_BGWORKER) != CAC_OK)
+	if (canAcceptConnections(B_BG_WORKER) != CAC_OK)
 	{
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
@@ -3999,8 +3992,7 @@ assign_backendlist_entry(void)
 	}
 
 	bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
-	bn->bkend_type = BACKEND_TYPE_BGWORKER;
-	bn->dead_end = false;
+	bn->bkend_type = B_BG_WORKER;
 	bn->bgworker_notify = false;
 
 	return bn;
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 8af55989ee..9bad1040d6 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -312,6 +312,8 @@ pgstat_io_snapshot_cb(void)
 *
 * The following BackendTypes do not participate in the cumulative stats
 * subsystem or do not perform IO on which we currently track:
+* - Dead-end backend because it is not connected to shared memory and
+*   doesn't do any IO
 * - Syslogger because it is not connected to shared memory
 * - Archiver because most relevant archiving IO is delegated to a
 *   specialized command or module
@@ -334,6 +336,7 @@ pgstat_tracks_io_bktype(BackendType bktype)
 	switch (bktype)
 	{
 		case B_INVALID:
+		case B_DEAD_END_BACKEND:
 		case B_ARCHIVER:
 		case B_LOGGER:
 		case B_WAL_RECEIVER:
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 537d92c0cf..ae8b1a4331 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -281,6 +281,9 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_BACKEND:
 			backendDesc = "client backend";
 			break;
+		case B_DEAD_END_BACKEND:
+			backendDesc = "dead-end client backend";
+			break;
 		case B_BG_WORKER:
 			backendDesc = "background worker";
 			break;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index ac16233b71..b21c4d43b9 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -337,6 +337,7 @@ typedef enum BackendType
 
 	/* Backends and other backend-like processes */
 	B_BACKEND,
+	B_DEAD_END_BACKEND,
 	B_AUTOVAC_LAUNCHER,
 	B_AUTOVAC_WORKER,
 	B_BG_WORKER,
-- 
2.39.2

v3-0009-Kill-dead-end-children-when-there-s-nothing-else-.patchtext/x-patch; charset=UTF-8; name=v3-0009-Kill-dead-end-children-when-there-s-nothing-else-.patchDownload

From a425f6f89afcf2bc9cf95b291bb17084bf79d21e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 18:40:20 +0300
Subject: [PATCH v3 09/12] Kill dead-end children when there's nothing else
 left

Previously, the postmaster would never try to kill dead-end child
processes, even if there was no other processes left. A dead-end
backend will eventually exit, when authentication_timeout expires, but
if a dead-end backend is the only thing that's preventing the server
from shutting down, it seems better to kill it immediately. It's
particularly important, if there was a bug in the early startup code
that prevented a dead-end child from timing out and exiting normally.

Includes a test for that case where a dead-end backend previously kept
the server from shutting down.
---
 src/backend/postmaster/postmaster.c     | 35 +++++++-------
 src/test/postmaster/meson.build         |  1 +
 src/test/postmaster/t/002_start_stop.pl | 64 +++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 19 deletions(-)
 create mode 100644 src/test/postmaster/t/002_start_stop.pl

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 2e917c5ea9..bfadb995cb 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -409,9 +409,6 @@ static void sigquit_child(pid_t pid);
 static bool SignalSomeChildren(int signal, uint32 targetMask);
 static void TerminateChildren(int signal);
 
-#define SignalChildren(sig)		\
-	SignalSomeChildren(sig, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND))
-
 static int	CountChildren(uint32 targetMask);
 static Backend *assign_backendlist_entry(void);
 static void LaunchMissingBackgroundProcesses(void);
@@ -1963,7 +1960,7 @@ process_pm_reload_request(void)
 		ereport(LOG,
 				(errmsg("received SIGHUP, reloading configuration files")));
 		ProcessConfigFile(PGC_SIGHUP);
-		SignalChildren(SIGHUP);
+		SignalSomeChildren(SIGHUP, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND));
 		if (StartupPID != 0)
 			signal_child(StartupPID, SIGHUP);
 		if (BgWriterPID != 0)
@@ -2382,7 +2379,7 @@ process_pm_child_exit(void)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalSomeChildren(SIGUSR2, BACKEND_TYPE_ALL & (1 << B_DEAD_END_BACKEND));
 
 				pmState = PM_SHUTDOWN_2;
 			}
@@ -2867,7 +2864,7 @@ PostmasterStateMachine(void)
 		 */
 		ForgetUnstartedBackgroundWorkers();
 
-		/* Signal all backend children except walsenders */
+		/* Signal all backend children except walsenders and dead-end backends */
 		SignalSomeChildren(SIGTERM,
 						   BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND));
 		/* and the autovac launcher too */
@@ -2925,10 +2922,11 @@ PostmasterStateMachine(void)
 			if (Shutdown >= ImmediateShutdown || FatalError)
 			{
 				/*
-				 * Start waiting for dead_end children to die.  This state
-				 * change causes ServerLoop to stop creating new ones.
+				 * Stop any dead_end children and stop creating new ones.
 				 */
 				pmState = PM_WAIT_DEAD_END;
+				ConfigurePostmasterWaitSet(false);
+				SignalSomeChildren(SIGQUIT, 1 << B_DEAD_END_BACKEND);
 
 				/*
 				 * We already SIGQUIT'd the archiver and stats processes, if
@@ -2967,9 +2965,10 @@ PostmasterStateMachine(void)
 					 */
 					FatalError = true;
 					pmState = PM_WAIT_DEAD_END;
+					ConfigurePostmasterWaitSet(false);
 
-					/* Kill the walsenders and archiver too */
-					SignalChildren(SIGQUIT);
+					/* Kill the walsenders and archiver, too */
+					SignalSomeChildren(SIGQUIT, BACKEND_TYPE_ALL);
 					if (PgArchPID != 0)
 						signal_child(PgArchPID, SIGQUIT);
 				}
@@ -2987,15 +2986,14 @@ PostmasterStateMachine(void)
 		 */
 		if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) == 0)
 		{
+			ConfigurePostmasterWaitSet(false);
+			SignalSomeChildren(SIGTERM, 1 << B_DEAD_END_BACKEND);
 			pmState = PM_WAIT_DEAD_END;
 		}
 	}
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
-		/* Don't allow any new socket connection events. */
-		ConfigurePostmasterWaitSet(false);
-
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3281,8 +3279,7 @@ sigquit_child(pid_t pid)
 }
 
 /*
- * Send a signal to the targeted children (but NOT special children;
- * dead_end children are never signaled, either XXX).
+ * Send a signal to the targeted children (but NOT special children).
  */
 static bool
 SignalSomeChildren(int signal, uint32 targetMask)
@@ -3313,8 +3310,8 @@ SignalSomeChildren(int signal, uint32 targetMask)
 		}
 
 		ereport(DEBUG4,
-				(errmsg_internal("sending signal %d to process %d",
-								 signal, (int) bp->pid)));
+				(errmsg_internal("sending signal %d to %s process %d",
+								 signal, GetBackendTypeDesc(bp->bkend_type), (int) bp->pid)));
 		signal_child(bp->pid, signal);
 		signaled = true;
 	}
@@ -3323,12 +3320,12 @@ SignalSomeChildren(int signal, uint32 targetMask)
 
 /*
  * Send a termination signal to children.  This considers all of our children
- * processes, except syslogger and dead_end backends.
+ * processes, except syslogger.
  */
 static void
 TerminateChildren(int signal)
 {
-	SignalChildren(signal);
+	SignalSomeChildren(signal, BACKEND_TYPE_ALL);
 	if (StartupPID != 0)
 	{
 		signal_child(StartupPID, signal);
diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
index c2de2e0eb5..2d89adf520 100644
--- a/src/test/postmaster/meson.build
+++ b/src/test/postmaster/meson.build
@@ -7,6 +7,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_connection_limits.pl',
+      't/002_start_stop.pl',
     ],
   },
 }
diff --git a/src/test/postmaster/t/002_start_stop.pl b/src/test/postmaster/t/002_start_stop.pl
new file mode 100644
index 0000000000..6f114659fa
--- /dev/null
+++ b/src/test/postmaster/t/002_start_stop.pl
@@ -0,0 +1,64 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# XXX
+# XXX
+
+use IO::Socket::INET;
+use IO::Socket::UNIX;
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use Time::HiRes qw(time);
+
+# Initialize the server with low connection limits, to test dead-end backends
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "max_connections = 5");
+$node->append_conf('postgresql.conf', "log_connections = on");
+$node->append_conf('postgresql.conf', "log_min_messages = debug2");
+
+# XX
+$node->append_conf('postgresql.conf', "authentication_timeout = '120 s'");
+
+$node->start;
+
+my @sessions = ();
+my @raw_connections = ();
+
+#for (my $i=0; $i <= 5; $i++) {
+#	push(@sessions, $node->background_psql('postgres', on_error_die => 1));
+#}
+#$node->connect_fails("dbname=postgres", "max_connections reached",
+#					 expected_stderr => qr/FATAL:  sorry, too many clients already/);
+
+# We can still open TCP (or Unix domain socket) connections, but beyond a
+# certain number (roughly 2x max_connections), they will be "dead-end backends"
+for (my $i = 0; $i <= 20; $i++)
+{
+	push(@raw_connections, $node->raw_connect());
+}
+
+# Test that the dead-end backends don't prevent the server from stopping.
+my $before = time();
+$node->stop();
+my $elapsed = time() - $before;
+ok($elapsed < 60);
+
+$node->start();
+
+$node->connect_ok("dbname=postgres", "works after restart");
+
+# Clean up
+foreach my $session (@sessions)
+{
+	$session->quit;
+}
+foreach my $socket (@raw_connections)
+{
+	$socket->close();
+}
+
+done_testing();
-- 
2.39.2

v3-0010-Assign-a-child-slot-to-every-postmaster-child-pro.patchtext/x-patch; charset=UTF-8; name=v3-0010-Assign-a-child-slot-to-every-postmaster-child-pro.patchDownload

From 7cf06ed8729c6be02a9fbc77f595491683dd0651 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 17:54:23 +0300
Subject: [PATCH v3 10/12] Assign a child slot to every postmaster child
 process

Previously, only backends, autovacuum workers, and background workers
had an entry in the PMChildFlags array. With this commit, all
postmaster child processes, including all the aux processes, have an
entry.

We now maintain separate free-lists for different kinds of
backends. That ensures that there are always slots available for
autovacuum and background workers. Previously, pre-authorization
backends could prevent autovacuum or background workers from starting
up, by using up all the slots.

The code to manage the slots in the postmaster process is in a new
pmchild.c source file. Because postmaster.c is just so large.

Assigning pmsignal slot numbers is now pmchild.c's responsibility.
This replaces the PMChildInUse array in pmsignal.c.
---
 src/backend/postmaster/Makefile         |   1 +
 src/backend/postmaster/launch_backend.c |   1 +
 src/backend/postmaster/meson.build      |   1 +
 src/backend/postmaster/pmchild.c        | 287 ++++++++++
 src/backend/postmaster/postmaster.c     | 711 ++++++++++--------------
 src/backend/storage/ipc/pmsignal.c      |  83 +--
 src/backend/storage/lmgr/proc.c         |  12 +-
 src/include/postmaster/postmaster.h     |  40 ++
 src/include/storage/pmsignal.h          |   2 +-
 src/tools/pgindent/typedefs.list        |   2 +-
 10 files changed, 653 insertions(+), 487 deletions(-)
 create mode 100644 src/backend/postmaster/pmchild.c

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index db08543d19..c977d91785 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	launch_backend.o \
 	pgarch.o \
 	postmaster.o \
+	pmchild.o \
 	startup.o \
 	syslogger.o \
 	walsummarizer.o \
diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index 0ae23fdf55..b0b91dc97f 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -182,6 +182,7 @@ static child_process_kind child_process_kinds[] = {
 	[B_INVALID] = {"invalid", NULL, false},
 
 	[B_BACKEND] = {"backend", BackendMain, true},
+	[B_DEAD_END_BACKEND] = {"dead-end backend", BackendMain, true},
 	[B_AUTOVAC_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
 	[B_AUTOVAC_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
 	[B_BG_WORKER] = {"bgworker", BackgroundWorkerMain, true},
diff --git a/src/backend/postmaster/meson.build b/src/backend/postmaster/meson.build
index 0ea4bbe084..388848bb52 100644
--- a/src/backend/postmaster/meson.build
+++ b/src/backend/postmaster/meson.build
@@ -11,6 +11,7 @@ backend_sources += files(
   'launch_backend.c',
   'pgarch.c',
   'postmaster.c',
+  'pmchild.c',
   'startup.c',
   'syslogger.c',
   'walsummarizer.c',
diff --git a/src/backend/postmaster/pmchild.c b/src/backend/postmaster/pmchild.c
new file mode 100644
index 0000000000..e86982d6d1
--- /dev/null
+++ b/src/backend/postmaster/pmchild.c
@@ -0,0 +1,287 @@
+/*-------------------------------------------------------------------------
+ *
+ * pmchild.c
+ *	  Functions for keeping track of postmaster child processes.
+ *
+ * Keep track of all child processes, so that when a process exits, we know
+ * kind of a process it was and can clean up accordingly.  Every child process
+ * is allocated a PMChild struct, from a fixed pool of structs.  The size of
+ * the pool is determined by various settings that configure how many worker
+ * processes and backend connections are allowed, i.e. autovacuum_max_workers,
+ * max_worker_processes, max_wal_senders, and max_connections.
+ *
+ * The structures and functions in this file are private to the postmaster
+ * process.  But note that there is an array in shared memory, managed by
+ * pmsignal.c, that mirrors this.
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/pmchild.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "postmaster/autovacuum.h"
+#include "postmaster/postmaster.h"
+#include "replication/walsender.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+
+/*
+ * Freelists for different kinds of child processes.  We maintain separate
+ * pools for them, so that launching a lot of backends cannot exchaust all the
+ * slots, and prevent autovacuum or an aux process from launching.
+ */
+static dlist_head freeBackendList;
+static dlist_head freeAutoVacWorkerList;
+static dlist_head freeBgWorkerList;
+static dlist_head freeAuxList;
+
+/*
+ * List of active child processes.  This includes dead-end children.
+ */
+dlist_head	ActiveChildList;
+
+/*
+ * MaxLivePostmasterChildren
+ *
+ * This reports the number postmaster child processes that can be active.  It
+ * includes all children except for dead_end children.  This allows the array
+ * in shared memory (PMChildFlags) to have a fixed maximum size.
+ */
+int
+MaxLivePostmasterChildren(void)
+{
+	int			n = 0;
+
+	/* We know exactly how mamy worker and aux processes can be active */
+	n += autovacuum_max_workers;
+	n += max_worker_processes;
+	n += NUM_AUXILIARY_PROCS;
+
+	/*
+	 * We allow more connections here than we can have backends because some
+	 * might still be authenticating; they might fail auth, or some existing
+	 * backend might exit before the auth cycle is completed.  The exact
+	 * MaxBackends limit is enforced when a new backend tries to join the
+	 * shared-inval backend array.
+	 */
+	n += 2 * (MaxConnections + max_wal_senders);
+
+	return n;
+}
+
+static void
+init_slot(PMChild *pmchild, int slotno, dlist_head *freelist)
+{
+	pmchild->pid = 0;
+	pmchild->child_slot = slotno + 1;
+	pmchild->bkend_type = B_INVALID;
+	pmchild->rw = NULL;
+	pmchild->bgworker_notify = false;
+	dlist_push_tail(freelist, &pmchild->elem);
+}
+
+/*
+ * Initialize at postmaster startup
+ */
+void
+InitPostmasterChildSlots(void)
+{
+	int			num_pmchild_slots;
+	int			slotno;
+	PMChild    *slots;
+
+	dlist_init(&freeBackendList);
+	dlist_init(&freeAutoVacWorkerList);
+	dlist_init(&freeBgWorkerList);
+	dlist_init(&freeAuxList);
+	dlist_init(&ActiveChildList);
+
+	num_pmchild_slots = MaxLivePostmasterChildren();
+
+	slots = palloc(num_pmchild_slots * sizeof(PMChild));
+
+	slotno = 0;
+	for (int i = 0; i < 2 * (MaxConnections + max_wal_senders); i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeBackendList);
+		slotno++;
+	}
+	for (int i = 0; i < autovacuum_max_workers; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeAutoVacWorkerList);
+		slotno++;
+	}
+	for (int i = 0; i < max_worker_processes; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeBgWorkerList);
+		slotno++;
+	}
+	for (int i = 0; i < NUM_AUXILIARY_PROCS; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeAuxList);
+		slotno++;
+	}
+	Assert(slotno == num_pmchild_slots);
+}
+
+/* Return the appropriate free-list for the given backend type */
+static dlist_head *
+GetFreeList(BackendType btype)
+{
+	switch (btype)
+	{
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_WAL_SENDER:
+		case B_SLOTSYNC_WORKER:
+			return &freeBackendList;
+		case B_AUTOVAC_WORKER:
+			return &freeAutoVacWorkerList;
+
+			/*
+			 * Auxiliary processes.  There can be only one of each of these
+			 * running at a time.
+			 */
+		case B_AUTOVAC_LAUNCHER:
+		case B_ARCHIVER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+		case B_WAL_RECEIVER:
+		case B_WAL_SUMMARIZER:
+		case B_WAL_WRITER:
+			return &freeAuxList;
+
+			/*
+			 * Logger is not connected to shared memory, and does not have a
+			 * PGPROC entry, but we still allocate a child slot for it.
+			 */
+		case B_LOGGER:
+			return &freeAuxList;
+
+		case B_STANDALONE_BACKEND:
+		case B_INVALID:
+		case B_DEAD_END_BACKEND:
+			break;
+	}
+	elog(ERROR, "unexpected BackendType: %d", (int) btype);
+	return NULL;
+}
+
+/*
+ * Allocate a PMChild entry for a backend of given type.
+ *
+ * The entry is taken from the right pool.
+ *
+ * pmchild->child_slot is unique among all active child processes
+ */
+PMChild *
+AssignPostmasterChildSlot(BackendType btype)
+{
+	dlist_head *freelist;
+	PMChild    *pmchild;
+
+	freelist = GetFreeList(btype);
+
+	if (dlist_is_empty(freelist))
+		return NULL;
+
+	pmchild = dlist_container(PMChild, elem, dlist_pop_head_node(freelist));
+	pmchild->pid = 0;
+	pmchild->bkend_type = btype;
+	pmchild->rw = NULL;
+	pmchild->bgworker_notify = true;
+
+	/*
+	 * pmchild->child_slot for each entry was initialized when the array of
+	 * slots was allocated.
+	 */
+
+	dlist_push_head(&ActiveChildList, &pmchild->elem);
+
+	ReservePostmasterChildSlot(pmchild->child_slot);
+
+	/* FIXME: find a more elegant way to pass this */
+	MyPMChildSlot = pmchild->child_slot;
+
+	elog(DEBUG2, "assigned pm child slot %d for %s", pmchild->child_slot, PostmasterChildName(btype));
+
+	return pmchild;
+}
+
+/*
+ * Release a PMChild slot, after the child process has exited.
+ *
+ * Returns true if the child detached cleanly from shared memory, false
+ * otherwise (see ReleasePostmasterChildSlot).
+ */
+bool
+FreePostmasterChildSlot(PMChild *pmchild)
+{
+	elog(LOG, "releasing pm child slot %d", pmchild->child_slot);
+
+	dlist_delete(&pmchild->elem);
+	if (pmchild->bkend_type == B_DEAD_END_BACKEND)
+	{
+		pfree(pmchild);
+		return true;
+	}
+	else
+	{
+		dlist_head *freelist;
+
+		freelist = GetFreeList(pmchild->bkend_type);
+		dlist_push_head(freelist, &pmchild->elem);
+		return ReleasePostmasterChildSlot(pmchild->child_slot);
+	}
+}
+
+PMChild *
+FindPostmasterChildByPid(int pid)
+{
+	dlist_iter	iter;
+
+	dlist_foreach(iter, &ActiveChildList)
+	{
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+		if (bp->pid == pid)
+			return bp;
+	}
+	return NULL;
+}
+
+/*
+ * Allocate a PMChild struct for a dead-end backend.  Dead-end children are
+ * not assigned a child_slot number.  The struct is palloc'd; returns NULL if
+ * out of memory.
+ */
+PMChild *
+AllocDeadEndChild(void)
+{
+	PMChild    *pmchild;
+
+	elog(LOG, "allocating dead-end child");
+
+	pmchild = (PMChild *) palloc_extended(sizeof(PMChild), MCXT_ALLOC_NO_OOM);
+	if (pmchild)
+	{
+		pmchild->pid = 0;
+		pmchild->child_slot = 0;
+		pmchild->bkend_type = B_DEAD_END_BACKEND;
+		pmchild->rw = NULL;
+		pmchild->bgworker_notify = false;
+
+		dlist_push_head(&ActiveChildList, &pmchild->elem);
+	}
+
+	return pmchild;
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index bfadb995cb..a928a04c7a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -135,49 +135,8 @@
 #define BACKEND_TYPE_ALL 0xffffffff
 StaticAssertDecl(BACKEND_NUM_TYPES < 32, "too many backend types for uint32");
 
-/*
- * List of active backends (or child processes anyway; we don't actually
- * know whether a given child has become a backend or is still in the
- * authorization phase).  This is used mainly to keep track of how many
- * children we have and send them appropriate signals when necessary.
- *
- * As shown in the above set of backend types, this list includes not only
- * "normal" client sessions, but also autovacuum workers, walsenders, and
- * background workers.  (Note that at the time of launch, walsenders are
- * labeled B_BACKEND; we relabel them to B_WAL_SENDER
- * upon noticing they've changed their PMChildFlags entry.  Hence that check
- * must be done before any operation that needs to distinguish walsenders
- * from normal backends.)
- *
- * Also, "dead_end" children are in it: these are children launched just for
- * the purpose of sending a friendly rejection message to a would-be client.
- * We must track them because they are attached to shared memory, but we know
- * they will never become live backends.  dead_end children are not assigned a
- * PMChildSlot.  dead_end children have bkend_type B_DEAD_END_BACKEND.
- * FIXME: a dead-end backend can send query cancel?
- *
- * "Special" children such as the startup, bgwriter, autovacuum launcher, and
- * slot sync worker tasks are not in this list.  They are tracked via StartupPID
- * and other pid_t variables below.  (Thus, there can't be more than one of any
- * given "special" child process type.  We use BackendList entries for any
- * child process there can be more than one of.)
- */
-typedef struct bkend
-{
-	pid_t		pid;			/* process id of backend */
-	int			child_slot;		/* PMChildSlot for this backend, if any */
-	BackendType bkend_type;		/* child process flavor, see above */
-	RegisteredBgWorker *rw;		/* bgworker info, if this is a bgworker */
-	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
-	dlist_node	elem;			/* list link in BackendList */
-} Backend;
-
-static dlist_head BackendList = DLIST_STATIC_INIT(BackendList);
-
 BackgroundWorker *MyBgworkerEntry = NULL;
 
-
-
 /* The socket number we are listening for connections on */
 int			PostPortNumber = DEF_PGPORT;
 
@@ -229,17 +188,17 @@ bool		remove_temp_files_after_crash = true;
 bool		send_abort_for_crash = false;
 bool		send_abort_for_kill = false;
 
-/* PIDs of special child processes; 0 when not running */
-static pid_t StartupPID = 0,
-			BgWriterPID = 0,
-			CheckpointerPID = 0,
-			WalWriterPID = 0,
-			WalReceiverPID = 0,
-			WalSummarizerPID = 0,
-			AutoVacPID = 0,
-			PgArchPID = 0,
-			SysLoggerPID = 0,
-			SlotSyncWorkerPID = 0;
+/* special child processes; NULL when not running */
+static PMChild *StartupPMChild = NULL,
+		   *BgWriterPMChild = NULL,
+		   *CheckpointerPMChild = NULL,
+		   *WalWriterPMChild = NULL,
+		   *WalReceiverPMChild = NULL,
+		   *WalSummarizerPMChild = NULL,
+		   *AutoVacLauncherPMChild = NULL,
+		   *PgArchPMChild = NULL,
+		   *SysLoggerPMChild = NULL,
+		   *SlotSyncWorkerPMChild = NULL;
 
 /* Startup process's status */
 typedef enum
@@ -287,7 +246,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
  * PM_HOT_STANDBY state.  (connsAllowed can also restrict launching.)
  * In other states we handle connection requests by launching "dead_end"
  * child processes, which will simply send the client an error message and
- * quit.  (We track these in the BackendList so that we can know when they
+ * quit.  (We track these in the ActiveChildList so that we can know when they
  * are all gone; this is important because they're still connected to shared
  * memory, and would interfere with an attempt to destroy the shmem segment,
  * possibly leading to SHMALL failure when we try to make a new one.)
@@ -393,7 +352,7 @@ static void process_pm_child_exit(void);
 static void process_pm_reload_request(void);
 static void process_pm_shutdown_request(void);
 static void dummy_handler(SIGNAL_ARGS);
-static void CleanupBackend(Backend *bp, int exitstatus);
+static void CleanupBackend(PMChild *bp, int exitstatus);
 static void HandleChildCrash(int pid, int exitstatus, const char *procname);
 static void LogChildExit(int lev, const char *procname,
 						 int pid, int exitstatus);
@@ -403,18 +362,18 @@ static void ExitPostmaster(int status) pg_attribute_noreturn();
 static int	ServerLoop(void);
 static int	BackendStartup(ClientSocket *client_sock);
 static void report_fork_failure_to_client(ClientSocket *client_sock, int errnum);
-static CAC_state canAcceptConnections(int backend_type);
-static void signal_child(pid_t pid, int signal);
-static void sigquit_child(pid_t pid);
+static CAC_state canAcceptConnections(BackendType backend_type);
+static void signal_child(PMChild *pmchild, int signal);
+static void sigquit_child(PMChild *pmchild);
 static bool SignalSomeChildren(int signal, uint32 targetMask);
 static void TerminateChildren(int signal);
 
 static int	CountChildren(uint32 targetMask);
-static Backend *assign_backendlist_entry(void);
+static PMChild *assign_backendlist_entry(void);
 static void LaunchMissingBackgroundProcesses(void);
 static void maybe_start_bgworkers(void);
 static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
-static pid_t StartChildProcess(BackendType type);
+static PMChild *StartChildProcess(BackendType type);
 static void StartAutovacuumWorker(void);
 static void InitPostmasterDeathWatchHandle(void);
 
@@ -893,9 +852,11 @@ PostmasterMain(int argc, char *argv[])
 
 	/*
 	 * Now that loadable modules have had their chance to alter any GUCs,
-	 * calculate MaxBackends.
+	 * calculate MaxBackends, and initialize the machinery to track child
+	 * processes.
 	 */
 	InitializeMaxBackends();
+	InitPostmasterChildSlots();
 
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
@@ -1019,7 +980,15 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * If enabled, start up syslogger collection subprocess
 	 */
-	SysLoggerPID = SysLogger_Start();
+	SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
+	if (!SysLoggerPMChild)
+		elog(ERROR, "no postmaster child slot available for syslogger");
+	SysLoggerPMChild->pid = SysLogger_Start();
+	if (SysLoggerPMChild->pid == 0)
+	{
+		FreePostmasterChildSlot(SysLoggerPMChild);
+		SysLoggerPMChild = NULL;
+	}
 
 	/*
 	 * Reset whereToSendOutput from DestDebug (its starting state) to
@@ -1321,16 +1290,16 @@ PostmasterMain(int argc, char *argv[])
 	AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
 
 	/* Start bgwriter and checkpointer so they can help with recovery */
-	if (CheckpointerPID == 0)
-		CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
-	if (BgWriterPID == 0)
-		BgWriterPID = StartChildProcess(B_BG_WRITER);
+	if (CheckpointerPMChild == NULL)
+		CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
+	if (BgWriterPMChild == NULL)
+		BgWriterPMChild = StartChildProcess(B_BG_WRITER);
 
 	/*
 	 * We're ready to rock and roll...
 	 */
-	StartupPID = StartChildProcess(B_STARTUP);
-	Assert(StartupPID != 0);
+	StartupPMChild = StartChildProcess(B_STARTUP);
+	Assert(StartupPMChild != NULL);
 	StartupStatus = STARTUP_RUNNING;
 	pmState = PM_STARTUP;
 
@@ -1660,8 +1629,8 @@ ServerLoop(void)
 		if (avlauncher_needs_signal)
 		{
 			avlauncher_needs_signal = false;
-			if (AutoVacPID != 0)
-				kill(AutoVacPID, SIGUSR2);
+			if (AutoVacLauncherPMChild != NULL)
+				kill(AutoVacLauncherPMChild->pid, SIGUSR2);
 		}
 
 #ifdef HAVE_PTHREAD_IS_THREADED_NP
@@ -1748,7 +1717,7 @@ ServerLoop(void)
  * know whether a NORMAL connection might turn into a walsender.)
  */
 static CAC_state
-canAcceptConnections(int backend_type)
+canAcceptConnections(BackendType backend_type)
 {
 	CAC_state	result = CAC_OK;
 
@@ -1779,21 +1748,6 @@ canAcceptConnections(int backend_type)
 	if (!connsAllowed && backend_type == B_BACKEND)
 		return CAC_SHUTDOWN;	/* shutdown is pending */
 
-	/*
-	 * Don't start too many children.
-	 *
-	 * We allow more connections here than we can have backends because some
-	 * might still be authenticating; they might fail auth, or some existing
-	 * backend might exit before the auth cycle is completed.  The exact
-	 * MaxBackends limit is enforced when a new backend tries to join the
-	 * shared-inval backend array.
-	 *
-	 * The limit here must match the sizes of the per-child-process arrays;
-	 * see comments for MaxLivePostmasterChildren().
-	 */
-	if (CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) >= MaxLivePostmasterChildren())
-		result = CAC_TOOMANY;
-
 	return result;
 }
 
@@ -1961,26 +1915,6 @@ process_pm_reload_request(void)
 				(errmsg("received SIGHUP, reloading configuration files")));
 		ProcessConfigFile(PGC_SIGHUP);
 		SignalSomeChildren(SIGHUP, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND));
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGHUP);
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGHUP);
-		if (CheckpointerPID != 0)
-			signal_child(CheckpointerPID, SIGHUP);
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGHUP);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGHUP);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGHUP);
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGHUP);
-		if (PgArchPID != 0)
-			signal_child(PgArchPID, SIGHUP);
-		if (SysLoggerPID != 0)
-			signal_child(SysLoggerPID, SIGHUP);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGHUP);
 
 		/* Reload authentication config files too */
 		if (!load_hba())
@@ -2218,15 +2152,15 @@ process_pm_child_exit(void)
 
 	while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0)
 	{
-		bool		found;
-		dlist_mutable_iter iter;
+		PMChild    *pmchild;
 
 		/*
 		 * Check if this child was a startup process.
 		 */
-		if (pid == StartupPID)
+		if (StartupPMChild && pid == StartupPMChild->pid)
 		{
-			StartupPID = 0;
+			FreePostmasterChildSlot(StartupPMChild);
+			StartupPMChild = NULL;
 
 			/*
 			 * Startup process exited in response to a shutdown request (or it
@@ -2338,9 +2272,10 @@ process_pm_child_exit(void)
 		 * one at the next iteration of the postmaster's main loop, if
 		 * necessary.  Any other exit condition is treated as a crash.
 		 */
-		if (pid == BgWriterPID)
+		if (BgWriterPMChild && pid == BgWriterPMChild->pid)
 		{
-			BgWriterPID = 0;
+			FreePostmasterChildSlot(BgWriterPMChild);
+			BgWriterPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("background writer process"));
@@ -2350,9 +2285,10 @@ process_pm_child_exit(void)
 		/*
 		 * Was it the checkpointer?
 		 */
-		if (pid == CheckpointerPID)
+		if (CheckpointerPMChild && pid == CheckpointerPMChild->pid)
 		{
-			CheckpointerPID = 0;
+			FreePostmasterChildSlot(CheckpointerPMChild);
+			CheckpointerPMChild = NULL;
 			if (EXIT_STATUS_0(exitstatus) && pmState == PM_SHUTDOWN)
 			{
 				/*
@@ -2372,14 +2308,14 @@ process_pm_child_exit(void)
 				Assert(Shutdown > NoShutdown);
 
 				/* Waken archiver for the last time */
-				if (PgArchPID != 0)
-					signal_child(PgArchPID, SIGUSR2);
+				if (PgArchPMChild != NULL)
+					signal_child(PgArchPMChild, SIGUSR2);
 
 				/*
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalSomeChildren(SIGUSR2, BACKEND_TYPE_ALL & (1 << B_DEAD_END_BACKEND));
+				SignalSomeChildren(SIGUSR2, (1 << B_WAL_SENDER));
 
 				pmState = PM_SHUTDOWN_2;
 			}
@@ -2401,9 +2337,10 @@ process_pm_child_exit(void)
 		 * new one at the next iteration of the postmaster's main loop, if
 		 * necessary.  Any other exit condition is treated as a crash.
 		 */
-		if (pid == WalWriterPID)
+		if (WalWriterPMChild && pid == WalWriterPMChild->pid)
 		{
-			WalWriterPID = 0;
+			FreePostmasterChildSlot(WalWriterPMChild);
+			WalWriterPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("WAL writer process"));
@@ -2416,9 +2353,10 @@ process_pm_child_exit(void)
 		 * backends.  (If we need a new wal receiver, we'll start one at the
 		 * next iteration of the postmaster's main loop.)
 		 */
-		if (pid == WalReceiverPID)
+		if (WalReceiverPMChild && pid == WalReceiverPMChild->pid)
 		{
-			WalReceiverPID = 0;
+			FreePostmasterChildSlot(WalReceiverPMChild);
+			WalReceiverPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("WAL receiver process"));
@@ -2430,9 +2368,10 @@ process_pm_child_exit(void)
 		 * a new one at the next iteration of the postmaster's main loop, if
 		 * necessary.  Any other exit condition is treated as a crash.
 		 */
-		if (pid == WalSummarizerPID)
+		if (WalSummarizerPMChild && pid == WalSummarizerPMChild->pid)
 		{
-			WalSummarizerPID = 0;
+			FreePostmasterChildSlot(WalSummarizerPMChild);
+			WalSummarizerPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("WAL summarizer process"));
@@ -2445,9 +2384,10 @@ process_pm_child_exit(void)
 		 * loop, if necessary.  Any other exit condition is treated as a
 		 * crash.
 		 */
-		if (pid == AutoVacPID)
+		if (AutoVacLauncherPMChild && pid == AutoVacLauncherPMChild->pid)
 		{
-			AutoVacPID = 0;
+			FreePostmasterChildSlot(AutoVacLauncherPMChild);
+			AutoVacLauncherPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("autovacuum launcher process"));
@@ -2460,9 +2400,10 @@ process_pm_child_exit(void)
 		 * and just try to start a new one on the next cycle of the
 		 * postmaster's main loop, to retry archiving remaining files.
 		 */
-		if (pid == PgArchPID)
+		if (PgArchPMChild && pid == PgArchPMChild->pid)
 		{
-			PgArchPID = 0;
+			FreePostmasterChildSlot(PgArchPMChild);
+			PgArchPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("archiver process"));
@@ -2470,11 +2411,15 @@ process_pm_child_exit(void)
 		}
 
 		/* Was it the system logger?  If so, try to start a new one */
-		if (pid == SysLoggerPID)
+		if (SysLoggerPMChild && pid == SysLoggerPMChild->pid)
 		{
-			SysLoggerPID = 0;
 			/* for safety's sake, launch new logger *first* */
-			SysLoggerPID = SysLogger_Start();
+			SysLoggerPMChild->pid = SysLogger_Start();
+			if (SysLoggerPMChild->pid == 0)
+			{
+				FreePostmasterChildSlot(SysLoggerPMChild);
+				SysLoggerPMChild = NULL;
+			}
 			if (!EXIT_STATUS_0(exitstatus))
 				LogChildExit(LOG, _("system logger process"),
 							 pid, exitstatus);
@@ -2488,9 +2433,10 @@ process_pm_child_exit(void)
 		 * start a new one at the next iteration of the postmaster's main
 		 * loop, if necessary. Any other exit condition is treated as a crash.
 		 */
-		if (pid == SlotSyncWorkerPID)
+		if (SlotSyncWorkerPMChild && pid == SlotSyncWorkerPMChild->pid)
 		{
-			SlotSyncWorkerPID = 0;
+			FreePostmasterChildSlot(SlotSyncWorkerPMChild);
+			SlotSyncWorkerPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("slot sync worker process"));
@@ -2500,25 +2446,17 @@ process_pm_child_exit(void)
 		/*
 		 * Was it a backend or background worker?
 		 */
-		found = false;
-		dlist_foreach_modify(iter, &BackendList)
+		pmchild = FindPostmasterChildByPid(pid);
+		if (pmchild)
 		{
-			Backend    *bp = dlist_container(Backend, elem, iter.cur);
-
-			if (bp->pid == pid)
-			{
-				dlist_delete(iter.cur);
-				CleanupBackend(bp, exitstatus);
-				found = true;
-				break;
-			}
+			CleanupBackend(pmchild, exitstatus);
 		}
 
 		/*
 		 * We don't know anything about this child process.  That's highly
 		 * unexpected, as we do track all the child processes that we fork.
 		 */
-		if (!found)
+		else
 		{
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus, _("untracked child process"));
@@ -2540,14 +2478,19 @@ process_pm_child_exit(void)
  * Remove all local state associated with backend.
  */
 static void
-CleanupBackend(Backend *bp,
+CleanupBackend(PMChild *bp,
 			   int exitstatus)	/* child's exit status. */
 {
 	char		namebuf[MAXPGPATH];
 	char	   *procname;
 	bool		crashed = false;
+	pid_t		bp_pid;
+	bool		bp_bgworker_notify;
+	BackendType bp_bkend_type;
+	RegisteredBgWorker *rw;
 
 	/* Construct a process name for log message */
+	/* FIXME: use GetBackendTypeDesc here? How does the localization of that work? */
 	if (bp->bkend_type == B_DEAD_END_BACKEND)
 	{
 		procname = _("dead end backend");
@@ -2587,25 +2530,28 @@ CleanupBackend(Backend *bp,
 #endif
 
 	/*
-	 * If the process attached to shared memory, check that it detached
-	 * cleanly.
+	 * Release the PMChild entry.
+	 *
+	 * If the process attached to shared memory, this also checks that it
+	 * detached cleanly.
 	 */
-	if (bp->bkend_type != B_DEAD_END_BACKEND)
+	bp_pid = bp->pid;
+	bp_bgworker_notify = bp->bgworker_notify;
+	bp_bkend_type = bp->bkend_type;
+	rw = bp->rw;
+	if (!FreePostmasterChildSlot(bp))
 	{
-		if (!ReleasePostmasterChildSlot(bp->child_slot))
-		{
-			/*
-			 * Uh-oh, the child failed to clean itself up.  Treat as a crash
-			 * after all.
-			 */
-			crashed = true;
-		}
+		/*
+		 * Uh-oh, the child failed to clean itself up.  Treat as a crash after
+		 * all.
+		 */
+		crashed = true;
 	}
+	bp = NULL;
 
 	if (crashed)
 	{
-		HandleChildCrash(bp->pid, exitstatus, namebuf);
-		pfree(bp);
+		HandleChildCrash(bp_pid, exitstatus, namebuf);
 		return;
 	}
 
@@ -2616,16 +2562,14 @@ CleanupBackend(Backend *bp,
 	 * gets skipped in the (probably very common) case where the backend has
 	 * never requested any such notifications.
 	 */
-	if (bp->bgworker_notify)
-		BackgroundWorkerStopNotifications(bp->pid);
+	if (bp_bgworker_notify)
+		BackgroundWorkerStopNotifications(bp_pid);
 
 	/*
 	 * If it was a background worker, also update its RegisteredWorker entry.
 	 */
-	if (bp->bkend_type == B_BG_WORKER)
+	if (bp_bkend_type == B_BG_WORKER)
 	{
-		RegisteredBgWorker *rw = bp->rw;
-
 		if (!EXIT_STATUS_0(exitstatus))
 		{
 			/* Record timestamp, so we know when to restart the worker. */
@@ -2642,15 +2586,13 @@ CleanupBackend(Backend *bp,
 		ReportBackgroundWorkerExit(rw); /* report child death */
 
 		LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
-					 procname, bp->pid, exitstatus);
+					 procname, bp_pid, exitstatus);
 
 		/* have it be restarted */
 		HaveCrashedWorker = true;
 	}
 	else
-		LogChildExit(DEBUG2, procname, bp->pid, exitstatus);
-
-	pfree(bp);
+		LogChildExit(DEBUG2, procname, bp_pid, exitstatus);
 }
 
 /*
@@ -2690,9 +2632,16 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	{
 		dlist_iter	iter;
 
-		dlist_foreach(iter, &BackendList)
+		dlist_foreach(iter, &ActiveChildList)
 		{
-			Backend    *bp = dlist_container(Backend, elem, iter.cur);
+			PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+			/* We do NOT restart the syslogger */
+			if (bp == SysLoggerPMChild)
+				continue;
+
+			if (bp == StartupPMChild)
+				StartupStatus = STARTUP_SIGNALED;
 
 			/*
 			 * This backend is still alive.  Unless we did so already, tell it
@@ -2701,48 +2650,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 			 * We could exclude dead_end children here, but at least when
 			 * sending SIGABRT it seems better to include them.
 			 */
-			sigquit_child(bp->pid);
+			sigquit_child(bp);
 		}
-
-		if (StartupPID != 0)
-		{
-			sigquit_child(StartupPID);
-			StartupStatus = STARTUP_SIGNALED;
-		}
-
-		/* Take care of the bgwriter too */
-		if (BgWriterPID != 0)
-			sigquit_child(BgWriterPID);
-
-		/* Take care of the checkpointer too */
-		if (CheckpointerPID != 0)
-			sigquit_child(CheckpointerPID);
-
-		/* Take care of the walwriter too */
-		if (WalWriterPID != 0)
-			sigquit_child(WalWriterPID);
-
-		/* Take care of the walreceiver too */
-		if (WalReceiverPID != 0)
-			sigquit_child(WalReceiverPID);
-
-		/* Take care of the walsummarizer too */
-		if (WalSummarizerPID != 0)
-			sigquit_child(WalSummarizerPID);
-
-		/* Take care of the autovacuum launcher too */
-		if (AutoVacPID != 0)
-			sigquit_child(AutoVacPID);
-
-		/* Take care of the archiver too */
-		if (PgArchPID != 0)
-			sigquit_child(PgArchPID);
-
-		/* Take care of the slot sync worker too */
-		if (SlotSyncWorkerPID != 0)
-			sigquit_child(SlotSyncWorkerPID);
-
-		/* We do NOT restart the syslogger */
 	}
 
 	if (Shutdown != ImmediateShutdown)
@@ -2845,7 +2754,7 @@ PostmasterStateMachine(void)
 			 * This state ends when we have no normal client backends running.
 			 * Then we're ready to stop other children.
 			 */
-			if (CountChildren(1 << B_BACKEND) == 0)
+			if (CountChildren(B_BACKEND) == 0)
 				pmState = PM_STOP_BACKENDS;
 		}
 	}
@@ -2857,6 +2766,8 @@ PostmasterStateMachine(void)
 	 */
 	if (pmState == PM_STOP_BACKENDS)
 	{
+		uint32		targetMask;
+
 		/*
 		 * Forget any pending requests for background workers, since we're no
 		 * longer willing to launch any new workers.  (If additional requests
@@ -2864,29 +2775,27 @@ PostmasterStateMachine(void)
 		 */
 		ForgetUnstartedBackgroundWorkers();
 
-		/* Signal all backend children except walsenders and dead-end backends */
-		SignalSomeChildren(SIGTERM,
-						   BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND));
+		/* Signal all backend children except walsenders */
+		/* dead-end children are not signalled yet */
+		targetMask = (1 << B_BACKEND);
+		targetMask |= (1 << B_BG_WORKER);
+
 		/* and the autovac launcher too */
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGTERM);
+		targetMask |= (1 << B_AUTOVAC_LAUNCHER);
 		/* and the bgwriter too */
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGTERM);
+		targetMask |= (1 << B_BG_WRITER);
 		/* and the walwriter too */
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGTERM);
+		targetMask |= (1 << B_WAL_WRITER);
 		/* If we're in recovery, also stop startup and walreceiver procs */
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGTERM);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGTERM);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGTERM);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGTERM);
+		targetMask |= (1 << B_STARTUP);
+		targetMask |= (1 << B_WAL_RECEIVER);
+
+		targetMask |= (1 << B_WAL_SUMMARIZER);
+		targetMask |= (1 << B_SLOTSYNC_WORKER);
 		/* checkpointer, archiver, stats, and syslogger may continue for now */
 
+		SignalSomeChildren(SIGTERM, targetMask);
+
 		/* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
 		pmState = PM_WAIT_BACKENDS;
 	}
@@ -2908,16 +2817,14 @@ PostmasterStateMachine(void)
 		 * here. Walsenders and archiver are also disregarded, they will be
 		 * terminated later after writing the checkpoint record.
 		 */
-		if (CountChildren(BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND)) == 0 &&
-			StartupPID == 0 &&
-			WalReceiverPID == 0 &&
-			WalSummarizerPID == 0 &&
-			BgWriterPID == 0 &&
-			(CheckpointerPID == 0 ||
-			 (!FatalError && Shutdown < ImmediateShutdown)) &&
-			WalWriterPID == 0 &&
-			AutoVacPID == 0 &&
-			SlotSyncWorkerPID == 0)
+		uint32		remaining;
+
+		remaining = (1 << B_WAL_SENDER) | (1 << B_ARCHIVER) | (1 << B_LOGGER);
+		remaining |= (1 << B_DEAD_END_BACKEND);
+		if (!FatalError && Shutdown < ImmediateShutdown)
+			remaining |= (1 << B_CHECKPOINTER);
+
+		if (CountChildren(BACKEND_TYPE_ALL & ~remaining) == 0)
 		{
 			if (Shutdown >= ImmediateShutdown || FatalError)
 			{
@@ -2943,12 +2850,12 @@ PostmasterStateMachine(void)
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
-				if (CheckpointerPID == 0)
-					CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
+				if (CheckpointerPMChild == NULL)
+					CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
 				/* And tell it to shut down */
-				if (CheckpointerPID != 0)
+				if (CheckpointerPMChild != NULL)
 				{
-					signal_child(CheckpointerPID, SIGUSR2);
+					signal_child(CheckpointerPMChild, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
 				else
@@ -2969,8 +2876,8 @@ PostmasterStateMachine(void)
 
 					/* Kill the walsenders and archiver, too */
 					SignalSomeChildren(SIGQUIT, BACKEND_TYPE_ALL);
-					if (PgArchPID != 0)
-						signal_child(PgArchPID, SIGQUIT);
+					if (PgArchPMChild != NULL)
+						signal_child(PgArchPMChild, SIGQUIT);
 				}
 			}
 		}
@@ -2984,7 +2891,10 @@ PostmasterStateMachine(void)
 		 * left by now anyway; what we're really waiting for is walsenders and
 		 * archiver.
 		 */
-		if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) == 0)
+		uint32		remaining;
+
+		remaining = (1 << B_LOGGER) | (1 << B_DEAD_END_BACKEND);
+		if (CountChildren(BACKEND_TYPE_ALL & ~remaining) == 0)
 		{
 			ConfigurePostmasterWaitSet(false);
 			SignalSomeChildren(SIGTERM, 1 << B_DEAD_END_BACKEND);
@@ -3006,17 +2916,19 @@ PostmasterStateMachine(void)
 		 * normal state transition leading up to PM_WAIT_DEAD_END, or during
 		 * FatalError processing.
 		 */
-		if (dlist_is_empty(&BackendList) && PgArchPID == 0)
+		if (dlist_is_empty(&ActiveChildList) ||
+			(dlist_head_node(&ActiveChildList) == &SysLoggerPMChild->elem &&
+			 dlist_tail_node(&ActiveChildList) == &SysLoggerPMChild->elem))
 		{
 			/* These other guys should be dead already */
-			Assert(StartupPID == 0);
-			Assert(WalReceiverPID == 0);
-			Assert(WalSummarizerPID == 0);
-			Assert(BgWriterPID == 0);
-			Assert(CheckpointerPID == 0);
-			Assert(WalWriterPID == 0);
-			Assert(AutoVacPID == 0);
-			Assert(SlotSyncWorkerPID == 0);
+			Assert(StartupPMChild == NULL);
+			Assert(WalReceiverPMChild == NULL);
+			Assert(WalSummarizerPMChild == NULL);
+			Assert(BgWriterPMChild == NULL);
+			Assert(CheckpointerPMChild == NULL);
+			Assert(WalWriterPMChild == NULL);
+			Assert(AutoVacLauncherPMChild == NULL);
+			Assert(SlotSyncWorkerPMChild == NULL);
 			/* syslogger is not considered here */
 			pmState = PM_NO_CHILDREN;
 		}
@@ -3099,8 +3011,8 @@ PostmasterStateMachine(void)
 		/* re-create shared memory and semaphores */
 		CreateSharedMemoryAndSemaphores();
 
-		StartupPID = StartChildProcess(B_STARTUP);
-		Assert(StartupPID != 0);
+		StartupPMChild = StartChildProcess(B_STARTUP);
+		Assert(StartupPMChild != NULL);
 		StartupStatus = STARTUP_RUNNING;
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
@@ -3120,8 +3032,21 @@ static void
 LaunchMissingBackgroundProcesses(void)
 {
 	/* If we have lost the log collector, try to start a new one */
-	if (SysLoggerPID == 0 && Logging_collector)
-		SysLoggerPID = SysLogger_Start();
+	if (SysLoggerPMChild == NULL && Logging_collector)
+	{
+		SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
+		if (!SysLoggerPMChild)
+			elog(LOG, "no postmaster child slot available for syslogger");
+		else
+		{
+			SysLoggerPMChild->pid = SysLogger_Start();
+			if (SysLoggerPMChild->pid == 0)
+			{
+				FreePostmasterChildSlot(SysLoggerPMChild);
+				SysLoggerPMChild = NULL;
+			}
+		}
+	}
 
 	/*
 	 * If no background writer process is running, and we are not in a state
@@ -3131,10 +3056,10 @@ LaunchMissingBackgroundProcesses(void)
 	if (pmState == PM_RUN || pmState == PM_RECOVERY ||
 		pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
 	{
-		if (CheckpointerPID == 0)
-			CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
-		if (BgWriterPID == 0)
-			BgWriterPID = StartChildProcess(B_BG_WRITER);
+		if (CheckpointerPMChild == NULL)
+			CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
+		if (BgWriterPMChild == NULL)
+			BgWriterPMChild = StartChildProcess(B_BG_WRITER);
 	}
 
 	/*
@@ -3142,8 +3067,8 @@ LaunchMissingBackgroundProcesses(void)
 	 * one.  But this is needed only in normal operation (else we cannot be
 	 * writing any new WAL).
 	 */
-	if (WalWriterPID == 0 && pmState == PM_RUN)
-		WalWriterPID = StartChildProcess(B_WAL_WRITER);
+	if (WalWriterPMChild == NULL && pmState == PM_RUN)
+		WalWriterPMChild = StartChildProcess(B_WAL_WRITER);
 
 	/*
 	 * If we have lost the autovacuum launcher, try to start a new one.  We
@@ -3151,12 +3076,12 @@ LaunchMissingBackgroundProcesses(void)
 	 * might update relfrozenxid for empty tables before the physical files
 	 * are put in place.
 	 */
-	if (!IsBinaryUpgrade && AutoVacPID == 0 &&
+	if (!IsBinaryUpgrade && AutoVacLauncherPMChild == NULL &&
 		(AutoVacuumingActive() || start_autovac_launcher) &&
 		pmState == PM_RUN)
 	{
-		AutoVacPID = StartChildProcess(B_AUTOVAC_LAUNCHER);
-		if (AutoVacPID != 0)
+		AutoVacLauncherPMChild = StartChildProcess(B_AUTOVAC_LAUNCHER);
+		if (AutoVacLauncherPMChild != NULL)
 			start_autovac_launcher = false; /* signal processed */
 	}
 
@@ -3166,24 +3091,24 @@ LaunchMissingBackgroundProcesses(void)
 	 * If WAL archiving is enabled always, we are allowed to start archiver
 	 * even during recovery.
 	 */
-	if (PgArchPID == 0 &&
+	if (PgArchPMChild == NULL &&
 		((XLogArchivingActive() && pmState == PM_RUN) ||
 		 (XLogArchivingAlways() && (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY))) &&
 		PgArchCanRestart())
-		PgArchPID = StartChildProcess(B_ARCHIVER);
+		PgArchPMChild = StartChildProcess(B_ARCHIVER);
 
 	/*
 	 * If we need to start a slot sync worker, try to do that now
 	 *
 	 * We allow to start the slot sync worker when we are on a hot standby,
-	 * fast or immediate shutdown is not in progress, slot sync parameters
-	 * are configured correctly, and it is the first time of worker's launch,
-	 * or enough time has passed since the worker was launched last.
+	 * fast or immediate shutdown is not in progress, slot sync parameters are
+	 * configured correctly, and it is the first time of worker's launch, or
+	 * enough time has passed since the worker was launched last.
 	 */
-	if (SlotSyncWorkerPID == 0 && pmState == PM_HOT_STANDBY &&
+	if (SlotSyncWorkerPMChild == NULL && pmState == PM_HOT_STANDBY &&
 		Shutdown <= SmartShutdown && sync_replication_slots &&
 		ValidateSlotSyncParams(LOG) && SlotSyncWorkerCanRestart())
-		SlotSyncWorkerPID = StartChildProcess(B_SLOTSYNC_WORKER);
+		SlotSyncWorkerPMChild = StartChildProcess(B_SLOTSYNC_WORKER);
 
 	/*
 	 * If we need to start a WAL receiver, try to do that now
@@ -3199,23 +3124,23 @@ LaunchMissingBackgroundProcesses(void)
 	 */
 	if (WalReceiverRequested)
 	{
-		if (WalReceiverPID == 0 &&
+		if (WalReceiverPMChild == NULL &&
 			(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 			 pmState == PM_HOT_STANDBY) &&
 			Shutdown <= SmartShutdown)
 		{
-			WalReceiverPID = StartChildProcess(B_WAL_RECEIVER);
-			if (WalReceiverPID != 0)
+			WalReceiverPMChild = StartChildProcess(B_WAL_RECEIVER);
+			if (WalReceiverPMChild != 0)
 				WalReceiverRequested = false;
 			/* else leave the flag set, so we'll try again later */
 		}
 	}
 
 	/* If we need to start a WAL summarizer, try to do that now */
-	if (summarize_wal && WalSummarizerPID == 0 &&
+	if (summarize_wal && WalSummarizerPMChild == NULL &&
 		(pmState == PM_RUN || pmState == PM_HOT_STANDBY) &&
 		Shutdown <= SmartShutdown)
-		WalSummarizerPID = StartChildProcess(B_WAL_SUMMARIZER);
+		WalSummarizerPMChild = StartChildProcess(B_WAL_SUMMARIZER);
 
 	/* Get other worker processes running, if needed */
 	if (StartWorkerNeeded || HaveCrashedWorker)
@@ -3239,8 +3164,14 @@ LaunchMissingBackgroundProcesses(void)
  * child twice will not cause any problems.
  */
 static void
-signal_child(pid_t pid, int signal)
+signal_child(PMChild *pmchild, int signal)
 {
+	pid_t		pid;
+
+	if (pmchild == NULL || pmchild->pid == 0)
+		return;
+	pid = pmchild->pid;
+
 	if (kill(pid, signal) < 0)
 		elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
 #ifdef HAVE_SETSID
@@ -3269,13 +3200,13 @@ signal_child(pid_t pid, int signal)
  * to use SIGABRT to collect per-child core dumps.
  */
 static void
-sigquit_child(pid_t pid)
+sigquit_child(PMChild *pmchild)
 {
 	ereport(DEBUG2,
 			(errmsg_internal("sending %s to process %d",
 							 (send_abort_for_crash ? "SIGABRT" : "SIGQUIT"),
-							 (int) pid)));
-	signal_child(pid, (send_abort_for_crash ? SIGABRT : SIGQUIT));
+							 (int) pmchild->pid)));
+	signal_child(pmchild, (send_abort_for_crash ? SIGABRT : SIGQUIT));
 }
 
 /*
@@ -3287,13 +3218,13 @@ SignalSomeChildren(int signal, uint32 targetMask)
 	dlist_iter	iter;
 	bool		signaled = false;
 
-	dlist_foreach(iter, &BackendList)
+	dlist_foreach(iter, &ActiveChildList)
 	{
-		Backend    *bp = dlist_container(Backend, elem, iter.cur);
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
 
 		/*
-		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
-		 * it first and avoid touching shared memory for every child.
+		 * Since targetMask == BACKEND_TYPE_ALL is the most common case, we
+		 * test it first and avoid touching shared memory for every child.
 		 */
 		if (targetMask != BACKEND_TYPE_ALL)
 		{
@@ -3312,7 +3243,7 @@ SignalSomeChildren(int signal, uint32 targetMask)
 		ereport(DEBUG4,
 				(errmsg_internal("sending signal %d to %s process %d",
 								 signal, GetBackendTypeDesc(bp->bkend_type), (int) bp->pid)));
-		signal_child(bp->pid, signal);
+		signal_child(bp, signal);
 		signaled = true;
 	}
 	return signaled;
@@ -3325,29 +3256,12 @@ SignalSomeChildren(int signal, uint32 targetMask)
 static void
 TerminateChildren(int signal)
 {
-	SignalSomeChildren(signal, BACKEND_TYPE_ALL);
-	if (StartupPID != 0)
+	SignalSomeChildren(signal, BACKEND_TYPE_ALL & ~(1 << B_LOGGER));
+	if (StartupPMChild != NULL)
 	{
-		signal_child(StartupPID, signal);
 		if (signal == SIGQUIT || signal == SIGKILL || signal == SIGABRT)
 			StartupStatus = STARTUP_SIGNALED;
 	}
-	if (BgWriterPID != 0)
-		signal_child(BgWriterPID, signal);
-	if (CheckpointerPID != 0)
-		signal_child(CheckpointerPID, signal);
-	if (WalWriterPID != 0)
-		signal_child(WalWriterPID, signal);
-	if (WalReceiverPID != 0)
-		signal_child(WalReceiverPID, signal);
-	if (WalSummarizerPID != 0)
-		signal_child(WalSummarizerPID, signal);
-	if (AutoVacPID != 0)
-		signal_child(AutoVacPID, signal);
-	if (PgArchPID != 0)
-		signal_child(PgArchPID, signal);
-	if (SlotSyncWorkerPID != 0)
-		signal_child(SlotSyncWorkerPID, signal);
 }
 
 /*
@@ -3360,45 +3274,45 @@ TerminateChildren(int signal)
 static int
 BackendStartup(ClientSocket *client_sock)
 {
-	Backend    *bn;				/* for backend cleanup */
+	PMChild    *bn = NULL;
 	pid_t		pid;
 	BackendStartupData startup_data;
+	CAC_state	cac;
 
-	/*
-	 * Create backend data structure.  Better before the fork() so we can
-	 * handle failure cleanly.
-	 */
-	bn = (Backend *) palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
+	cac = canAcceptConnections(B_BACKEND);
+	if (cac == CAC_OK)
+	{
+		bn = AssignPostmasterChildSlot(B_BACKEND);
+		if (!bn)
+		{
+			/*
+			 * Too many regular child processes; launch a dead-end child
+			 * process instead.
+			 */
+			cac = CAC_TOOMANY;
+		}
+	}
 	if (!bn)
 	{
-		ereport(LOG,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-		return STATUS_ERROR;
+		bn = AllocDeadEndChild();
+		if (!bn)
+		{
+			ereport(LOG,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+			return STATUS_ERROR;
+		}
 	}
 
 	/* Pass down canAcceptConnections state */
-	startup_data.canAcceptConnections = canAcceptConnections(B_BACKEND);
+	startup_data.canAcceptConnections = cac;
 	bn->rw = NULL;
 
-	/*
-	 * Unless it's a dead_end child, assign it a child slot number
-	 */
-	if (startup_data.canAcceptConnections == CAC_OK)
-	{
-		bn->bkend_type = B_BACKEND;	/* Can change later to WALSND */
-		bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
-	}
-	else
-	{
-		bn->bkend_type = B_DEAD_END_BACKEND;
-		bn->child_slot = 0;
-	}
-
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
-	pid = postmaster_child_launch(B_BACKEND,
+	MyPMChildSlot = bn->child_slot;
+	pid = postmaster_child_launch(bn->bkend_type,
 								  (char *) &startup_data, sizeof(startup_data),
 								  client_sock);
 	if (pid < 0)
@@ -3406,9 +3320,7 @@ BackendStartup(ClientSocket *client_sock)
 		/* in parent, fork failed */
 		int			save_errno = errno;
 
-		if (bn->child_slot != 0)
-			(void) ReleasePostmasterChildSlot(bn->child_slot);
-		pfree(bn);
+		(void) FreePostmasterChildSlot(bn);
 		errno = save_errno;
 		ereport(LOG,
 				(errmsg("could not fork new process for connection: %m")));
@@ -3426,7 +3338,6 @@ BackendStartup(ClientSocket *client_sock)
 	 * of backends.
 	 */
 	bn->pid = pid;
-	dlist_push_head(&BackendList, &bn->elem);
 
 	return STATUS_OK;
 }
@@ -3525,9 +3436,9 @@ process_pm_pmsignal(void)
 		 * Start the archiver if we're responsible for (re-)archiving received
 		 * files.
 		 */
-		Assert(PgArchPID == 0);
+		Assert(PgArchPMChild == NULL);
 		if (XLogArchivingAlways())
-			PgArchPID = StartChildProcess(B_ARCHIVER);
+			PgArchPMChild = StartChildProcess(B_ARCHIVER);
 
 		/*
 		 * If we aren't planning to enter hot standby mode later, treat
@@ -3573,16 +3484,16 @@ process_pm_pmsignal(void)
 	}
 
 	/* Tell syslogger to rotate logfile if requested */
-	if (SysLoggerPID != 0)
+	if (SysLoggerPMChild != NULL)
 	{
 		if (CheckLogrotateSignal())
 		{
-			signal_child(SysLoggerPID, SIGUSR1);
+			signal_child(SysLoggerPMChild, SIGUSR1);
 			RemoveLogrotateSignalFiles();
 		}
 		else if (CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE))
 		{
-			signal_child(SysLoggerPID, SIGUSR1);
+			signal_child(SysLoggerPMChild, SIGUSR1);
 		}
 	}
 
@@ -3629,7 +3540,7 @@ process_pm_pmsignal(void)
 		PostmasterStateMachine();
 	}
 
-	if (StartupPID != 0 &&
+	if (StartupPMChild != NULL &&
 		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 		 pmState == PM_HOT_STANDBY) &&
 		CheckPromoteSignal())
@@ -3640,7 +3551,7 @@ process_pm_pmsignal(void)
 		 * Leave the promote signal file in place and let the Startup process
 		 * do the unlink.
 		 */
-		signal_child(StartupPID, SIGUSR2);
+		signal_child(StartupPMChild, SIGUSR2);
 	}
 }
 
@@ -3667,13 +3578,13 @@ CountChildren(uint32 targetMask)
 	dlist_iter	iter;
 	int			cnt = 0;
 
-	dlist_foreach(iter, &BackendList)
+	dlist_foreach(iter, &ActiveChildList)
 	{
-		Backend    *bp = dlist_container(Backend, elem, iter.cur);
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
 
 		/*
-		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
-		 * it first and avoid touching shared memory for every child.
+		 * Since targetMask == BACKEND_TYPE_ALL is the most common case, we
+		 * test it first and avoid touching shared memory for every child.
 		 */
 		if (targetMask != BACKEND_TYPE_ALL)
 		{
@@ -3704,15 +3615,33 @@ CountChildren(uint32 targetMask)
  * Return value of StartChildProcess is subprocess' PID, or 0 if failed
  * to start subprocess.
  */
-static pid_t
+static PMChild *
 StartChildProcess(BackendType type)
 {
+	PMChild    *pmchild;
 	pid_t		pid;
 
+	pmchild = AssignPostmasterChildSlot(type);
+	if (!pmchild)
+	{
+		if (type == B_AUTOVAC_WORKER)
+			ereport(LOG,
+					(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+					 errmsg("no slot available for new autovacuum worker process")));
+		else
+		{
+			/* shouldn't happen because we allocate enough slots */
+			elog(LOG, "no postmaster child slot available for aux process");
+		}
+		return NULL;
+	}
+
+	MyPMChildSlot = pmchild->child_slot;
 	pid = postmaster_child_launch(type, NULL, 0, NULL);
 	if (pid < 0)
 	{
 		/* in parent, fork failed */
+		FreePostmasterChildSlot(pmchild);
 		ereport(LOG,
 				(errmsg("could not fork \"%s\" process: %m", PostmasterChildName(type))));
 
@@ -3722,13 +3651,14 @@ StartChildProcess(BackendType type)
 		 */
 		if (type == B_STARTUP)
 			ExitPostmaster(1);
-		return 0;
+		return NULL;
 	}
 
 	/*
 	 * in parent, successful fork
 	 */
-	return pid;
+	pmchild->pid = pid;
+	return pmchild;
 }
 
 /*
@@ -3743,7 +3673,7 @@ StartChildProcess(BackendType type)
 static void
 StartAutovacuumWorker(void)
 {
-	Backend    *bn;
+	PMChild    *bn;
 
 	/*
 	 * If not in condition to run a process, don't try, but handle it like a
@@ -3754,34 +3684,20 @@ StartAutovacuumWorker(void)
 	 */
 	if (canAcceptConnections(B_AUTOVAC_WORKER) == CAC_OK)
 	{
-		bn = (Backend *) palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
+		bn = StartChildProcess(B_AUTOVAC_WORKER);
 		if (bn)
 		{
-			/* Autovac workers need a child slot */
-			bn->bkend_type = B_AUTOVAC_WORKER;
-			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 			bn->bgworker_notify = false;
 			bn->rw = NULL;
-
-			bn->pid = StartChildProcess(B_AUTOVAC_WORKER);
-			if (bn->pid > 0)
-			{
-				dlist_push_head(&BackendList, &bn->elem);
-				/* all OK */
-				return;
-			}
-
+			return;
+		}
+		else
+		{
 			/*
 			 * fork failed, fall through to report -- actual error message was
 			 * logged by StartChildProcess
 			 */
-			(void) ReleasePostmasterChildSlot(bn->child_slot);
-			pfree(bn);
 		}
-		else
-			ereport(LOG,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
 	}
 
 	/*
@@ -3793,7 +3709,7 @@ StartAutovacuumWorker(void)
 	 * quick succession between the autovac launcher and postmaster in case
 	 * things get ugly.
 	 */
-	if (AutoVacPID != 0)
+	if (AutoVacLauncherPMChild != NULL)
 	{
 		AutoVacWorkerFailed();
 		avlauncher_needs_signal = true;
@@ -3837,23 +3753,6 @@ CreateOptsFile(int argc, char *argv[], char *fullprogname)
 }
 
 
-/*
- * MaxLivePostmasterChildren
- *
- * This reports the number of entries needed in the per-child-process array
- * (PMChildFlags).  It includes regular backends, autovac workers, walsenders
- * and background workers, but not special children nor dead_end children.
- * This allows the array to have a fixed maximum size, to wit the same
- * too-many-children limit enforced by canAcceptConnections().  The exact value
- * isn't too critical as long as it's more than MaxBackends.
- */
-int
-MaxLivePostmasterChildren(void)
-{
-	return 2 * (MaxConnections + autovacuum_max_workers + 1 +
-				max_wal_senders + max_worker_processes);
-}
-
 /*
  * Start a new bgworker.
  * Starting time conditions must have been checked already.
@@ -3866,7 +3765,7 @@ MaxLivePostmasterChildren(void)
 static bool
 do_start_bgworker(RegisteredBgWorker *rw)
 {
-	Backend    *bn;
+	PMChild    *bn;
 	pid_t		worker_pid;
 
 	Assert(rw->rw_pid == 0);
@@ -3893,6 +3792,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 			(errmsg_internal("starting background worker process \"%s\"",
 							 rw->rw_worker.bgw_name)));
 
+	MyPMChildSlot = bn->child_slot;
 	worker_pid = postmaster_child_launch(B_BG_WORKER, (char *) &rw->rw_worker, sizeof(BackgroundWorker), NULL);
 	if (worker_pid == -1)
 	{
@@ -3900,8 +3800,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errmsg("could not fork background worker process: %m")));
 		/* undo what assign_backendlist_entry did */
-		ReleasePostmasterChildSlot(bn->child_slot);
-		pfree(bn);
+		FreePostmasterChildSlot(bn);
 
 		/* mark entry as crashed, so we'll try again later */
 		rw->rw_crashed_at = GetCurrentTimestamp();
@@ -3912,8 +3811,6 @@ do_start_bgworker(RegisteredBgWorker *rw)
 	rw->rw_pid = worker_pid;
 	bn->pid = rw->rw_pid;
 	ReportBackgroundWorkerPID(rw);
-	/* add new worker to lists of backends */
-	dlist_push_head(&BackendList, &bn->elem);
 	return true;
 }
 
@@ -3961,17 +3858,13 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
  *
  * On failure, return NULL.
  */
-static Backend *
+static PMChild *
 assign_backendlist_entry(void)
 {
-	Backend    *bn;
+	PMChild    *bn;
 
-	/*
-	 * Check that database state allows another connection.  Currently the
-	 * only possible failure is CAC_TOOMANY, so we just log an error message
-	 * based on that rather than checking the error code precisely.
-	 */
-	if (canAcceptConnections(B_BG_WORKER) != CAC_OK)
+	bn = AssignPostmasterChildSlot(B_BG_WORKER);
+	if (bn == NULL)
 	{
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
@@ -3979,16 +3872,6 @@ assign_backendlist_entry(void)
 		return NULL;
 	}
 
-	bn = palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
-	if (bn == NULL)
-	{
-		ereport(LOG,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-		return NULL;
-	}
-
-	bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 	bn->bkend_type = B_BG_WORKER;
 	bn->bgworker_notify = false;
 
@@ -4129,11 +4012,11 @@ bool
 PostmasterMarkPIDForWorkerNotify(int pid)
 {
 	dlist_iter	iter;
-	Backend    *bp;
+	PMChild    *bp;
 
-	dlist_foreach(iter, &BackendList)
+	dlist_foreach(iter, &ActiveChildList)
 	{
-		bp = dlist_container(Backend, elem, iter.cur);
+		bp = dlist_container(PMChild, elem, iter.cur);
 		if (bp->pid == pid)
 		{
 			bp->bgworker_notify = true;
diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index cb99e77476..86970bf69b 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -47,11 +47,11 @@
  * exited without performing proper shutdown.  The per-child-process flags
  * have three possible states: UNUSED, ASSIGNED, ACTIVE.  An UNUSED slot is
  * available for assignment.  An ASSIGNED slot is associated with a postmaster
- * child process, but either the process has not touched shared memory yet,
- * or it has successfully cleaned up after itself.  A ACTIVE slot means the
- * process is actively using shared memory.  The slots are assigned to
- * child processes at random, and postmaster.c is responsible for tracking
- * which one goes with which PID.
+ * child process, but either the process has not touched shared memory yet, or
+ * it has successfully cleaned up after itself.  An ACTIVE slot means the
+ * process is actively using shared memory.  The slots are assigned to child
+ * processes by postmaster, and postmaster.c is responsible for tracking which
+ * one goes with which PID.
  *
  * Actually there is a fourth state, WALSENDER.  This is just like ACTIVE,
  * but carries the extra information that the child is a WAL sender.
@@ -83,15 +83,6 @@ struct PMSignalData
 /* PMSignalState pointer is valid in both postmaster and child processes */
 NON_EXEC_STATIC volatile PMSignalData *PMSignalState = NULL;
 
-/*
- * These static variables are valid only in the postmaster.  We keep a
- * duplicative private array so that we can trust its state even if some
- * failing child has clobbered the PMSignalData struct in shared memory.
- */
-static int	num_child_inuse;	/* # of entries in PMChildInUse[] */
-static int	next_child_inuse;	/* next slot to try to assign */
-static bool *PMChildInUse;		/* true if i'th flag slot is assigned */
-
 /*
  * Signal handler to be notified if postmaster dies.
  */
@@ -155,25 +146,7 @@ PMSignalShmemInit(void)
 	{
 		/* initialize all flags to zeroes */
 		MemSet(unvolatize(PMSignalData *, PMSignalState), 0, PMSignalShmemSize());
-		num_child_inuse = MaxLivePostmasterChildren();
-		PMSignalState->num_child_flags = num_child_inuse;
-
-		/*
-		 * Also allocate postmaster's private PMChildInUse[] array.  We
-		 * might've already done that in a previous shared-memory creation
-		 * cycle, in which case free the old array to avoid a leak.  (Do it
-		 * like this to support the possibility that MaxLivePostmasterChildren
-		 * changed.)  In a standalone backend, we do not need this.
-		 */
-		if (PostmasterContext != NULL)
-		{
-			if (PMChildInUse)
-				pfree(PMChildInUse);
-			PMChildInUse = (bool *)
-				MemoryContextAllocZero(PostmasterContext,
-									   num_child_inuse * sizeof(bool));
-		}
-		next_child_inuse = 0;
+		PMSignalState->num_child_flags = MaxLivePostmasterChildren();
 	}
 }
 
@@ -239,41 +212,22 @@ GetQuitSignalReason(void)
 
 
 /*
- * AssignPostmasterChildSlot - select an unused slot for a new postmaster
- * child process, and set its state to ASSIGNED.  Returns a slot number
- * (one to N).
+ * ReservePostmasterChildSlot - mark the given slot as ASSIGNED for a new
+ * postmaster child process.
  *
  * Only the postmaster is allowed to execute this routine, so we need no
  * special locking.
  */
-int
-AssignPostmasterChildSlot(void)
+void
+ReservePostmasterChildSlot(int slot)
 {
-	int			slot = next_child_inuse;
-	int			n;
+	Assert(slot > 0 && slot <= PMSignalState->num_child_flags);
+	slot--;
 
-	/*
-	 * Scan for a free slot.  Notice that we trust nothing about the contents
-	 * of PMSignalState, but use only postmaster-local data for this decision.
-	 * We track the last slot assigned so as not to waste time repeatedly
-	 * rescanning low-numbered slots.
-	 */
-	for (n = num_child_inuse; n > 0; n--)
-	{
-		if (--slot < 0)
-			slot = num_child_inuse - 1;
-		if (!PMChildInUse[slot])
-		{
-			PMChildInUse[slot] = true;
-			PMSignalState->PMChildFlags[slot] = PM_CHILD_ASSIGNED;
-			next_child_inuse = slot;
-			return slot + 1;
-		}
-	}
+	if (PMSignalState->PMChildFlags[slot] != PM_CHILD_UNUSED)
+		elog(FATAL, "postmaster child slot is already in use");
 
-	/* Out of slots ... should never happen, else postmaster.c messed up */
-	elog(FATAL, "no free slots in PMChildFlags array");
-	return 0;					/* keep compiler quiet */
+	PMSignalState->PMChildFlags[slot] = PM_CHILD_ASSIGNED;
 }
 
 /*
@@ -288,17 +242,18 @@ ReleasePostmasterChildSlot(int slot)
 {
 	bool		result;
 
-	Assert(slot > 0 && slot <= num_child_inuse);
+	Assert(slot > 0 && slot <= PMSignalState->num_child_flags);
 	slot--;
 
 	/*
 	 * Note: the slot state might already be unused, because the logic in
 	 * postmaster.c is such that this might get called twice when a child
 	 * crashes.  So we don't try to Assert anything about the state.
+	 *
+	 * FIXME: does that still happen?
 	 */
 	result = (PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED);
 	PMSignalState->PMChildFlags[slot] = PM_CHILD_UNUSED;
-	PMChildInUse[slot] = false;
 	return result;
 }
 
@@ -309,7 +264,7 @@ ReleasePostmasterChildSlot(int slot)
 bool
 IsPostmasterChildWalSender(int slot)
 {
-	Assert(slot > 0 && slot <= num_child_inuse);
+	Assert(slot > 0 && slot <= PMSignalState->num_child_flags);
 	slot--;
 
 	if (PMSignalState->PMChildFlags[slot] == PM_CHILD_WALSENDER)
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 37b1c67600..24892d31a5 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -310,14 +310,9 @@ InitProcess(void)
 	/*
 	 * Before we start accessing the shared memory in a serious way, mark
 	 * ourselves as an active postmaster child; this is so that the postmaster
-	 * can detect it if we exit without cleaning up.  (XXX autovac launcher
-	 * currently doesn't participate in this; it probably should.)
-	 *
-	 * Slot sync worker also does not participate in it, see comments atop
-	 * 'struct bkend' in postmaster.c.
+	 * can detect it if we exit without cleaning up.
 	 */
-	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
-		!AmLogicalSlotSyncWorkerProcess())
+	if (IsUnderPostmaster)
 		MarkPostmasterChildActive();
 
 	/* Decide which list should supply our PGPROC. */
@@ -535,6 +530,9 @@ InitAuxiliaryProcess(void)
 	if (MyProc != NULL)
 		elog(ERROR, "you already exist");
 
+	if (IsUnderPostmaster)
+		MarkPostmasterChildActive();
+
 	/*
 	 * We use the ProcStructLock to protect assignment and releasing of
 	 * AuxiliaryProcs entries.
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 63c12917cf..deca2e8370 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -13,8 +13,39 @@
 #ifndef _POSTMASTER_H
 #define _POSTMASTER_H
 
+#include "lib/ilist.h"
 #include "miscadmin.h"
 
+/*
+ * A struct representing an active postmaster child process.  This is used
+ * mainly to keep track of how many children we have and send them appropriate
+ * signals when necessary.  All postmaster child processes are assigned a
+ * PMChild entry. That includes "normal" client sessions, but also autovacuum
+ * workers, walsenders, background workers, and aux processes.  (Note that at
+ * the time of launch, walsenders are labeled B_BACKEND; we relabel them to
+ * B_WAL_SENDER upon noticing they've changed their PMChildFlags entry.  Hence
+ * that check must be done before any operation that needs to distinguish
+ * walsenders from normal backends.)
+ *
+ * "dead_end" children are also allocated a PMChild entry: these are children
+ * launched just for the purpose of sending a friendly rejection message to a
+ * would-be client.  We must track them because they are attached to shared
+ * memory, but we know they will never become live backends.
+ *
+ * 'child_slot' is an identifier that is unique across all running child
+ * processes.  It is used as an index into the PMChildFlags array. dead_end
+ * children are not assigned a child_slot.
+ */
+typedef struct
+{
+	pid_t		pid;			/* process id of backend */
+	int			child_slot;		/* PMChildSlot for this backend, if any */
+	BackendType bkend_type;		/* child process flavor, see above */
+	struct RegisteredBgWorker *rw;	/* bgworker info, if this is a bgworker */
+	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
+	dlist_node	elem;			/* list link in BackendList */
+} PMChild;
+
 /* GUC options */
 extern PGDLLIMPORT bool EnableSSL;
 extern PGDLLIMPORT int SuperuserReservedConnections;
@@ -80,6 +111,15 @@ const char *PostmasterChildName(BackendType child_type);
 extern void SubPostmasterMain(int argc, char *argv[]) pg_attribute_noreturn();
 #endif
 
+/* prototypes for functions in pmchild.c */
+extern dlist_head ActiveChildList;
+
+extern void InitPostmasterChildSlots(void);
+extern PMChild *AssignPostmasterChildSlot(BackendType btype);
+extern bool FreePostmasterChildSlot(PMChild *pmchild);
+extern PMChild *FindPostmasterChildByPid(int pid);
+extern PMChild *AllocDeadEndChild(void);
+
 /*
  * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
  * for buffer references in buf_internals.h.  This limitation could be lifted
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 3b9336b83c..2ab198fc31 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -70,7 +70,7 @@ extern void SendPostmasterSignal(PMSignalReason reason);
 extern bool CheckPostmasterSignal(PMSignalReason reason);
 extern void SetQuitSignalReason(QuitSignalReason reason);
 extern QuitSignalReason GetQuitSignalReason(void);
-extern int	AssignPostmasterChildSlot(void);
+extern void ReservePostmasterChildSlot(int slot);
 extern bool ReleasePostmasterChildSlot(int slot);
 extern bool IsPostmasterChildWalSender(int slot);
 extern void MarkPostmasterChildActive(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8de9978ad8..b43d6eb558 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -230,7 +230,6 @@ BTWriteState
 BUF_MEM
 BYTE
 BY_HANDLE_FILE_INFORMATION
-Backend
 BackendParameters
 BackendStartupData
 BackendState
@@ -1927,6 +1926,7 @@ PLyTransformToOb
 PLyTupleToOb
 PLyUnicode_FromStringAndSize_t
 PLy_elog_impl_t
+PMChild
 PMINIDUMP_CALLBACK_INFORMATION
 PMINIDUMP_EXCEPTION_INFORMATION
 PMINIDUMP_USER_STREAM_INFORMATION
-- 
2.39.2

v3-0011-Pass-MyPMChildSlot-as-an-explicit-argument-to-chi.patchtext/x-patch; charset=UTF-8; name=v3-0011-Pass-MyPMChildSlot-as-an-explicit-argument-to-chi.patchDownload

From 35ffc568518383110fac9aee6316219d5708ac0d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 22:58:17 +0300
Subject: [PATCH v3 11/12] Pass MyPMChildSlot as an explicit argument to child
 process

All the other global variables passed from postmaster to child are
have the same value in all the processes, while MyPMChildSlot is more
like a parameter to each child process.
---
 src/backend/postmaster/launch_backend.c | 32 ++++++++++++++++---------
 src/backend/postmaster/pmchild.c        |  3 ---
 src/backend/postmaster/postmaster.c     | 16 ++++++-------
 src/backend/postmaster/syslogger.c      |  8 ++++---
 src/include/postmaster/postmaster.h     |  1 +
 src/include/postmaster/syslogger.h      |  2 +-
 6 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index b0b91dc97f..4e93bd1d94 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -96,7 +96,6 @@ typedef int InheritableSocket;
 typedef struct
 {
 	char		DataDir[MAXPGPATH];
-	int			MyPMChildSlot;
 #ifndef WIN32
 	unsigned long UsedShmemSegID;
 #else
@@ -137,6 +136,8 @@ typedef struct
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
 
+	int			MyPMChildSlot;
+
 	/*
 	 * These are only used by backend processes, but are here because passing
 	 * a socket needs some special handling on Windows. 'client_sock' is an
@@ -158,13 +159,16 @@ typedef struct
 static void read_backend_variables(char *id, char **startup_data, size_t *startup_data_len);
 static void restore_backend_variables(BackendParameters *param);
 
-static bool save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
+static bool save_backend_variables(BackendParameters *param, int child_slot,
+								   ClientSocket *client_sock,
 #ifdef WIN32
 								   HANDLE childProcess, pid_t childPid,
 #endif
 								   char *startup_data, size_t startup_data_len);
 
-static pid_t internal_forkexec(const char *child_kind, char *startup_data, size_t startup_data_len, ClientSocket *client_sock);
+static pid_t internal_forkexec(const char *child_kind, int child_slot,
+							   char *startup_data, size_t startup_data_len,
+							   ClientSocket *client_sock);
 
 #endif							/* EXEC_BACKEND */
 
@@ -226,7 +230,7 @@ PostmasterChildName(BackendType child_type)
  * the child process.
  */
 pid_t
-postmaster_child_launch(BackendType child_type,
+postmaster_child_launch(BackendType child_type, int child_slot,
 						char *startup_data, size_t startup_data_len,
 						ClientSocket *client_sock)
 {
@@ -235,7 +239,7 @@ postmaster_child_launch(BackendType child_type,
 	Assert(IsPostmasterEnvironment && !IsUnderPostmaster);
 
 #ifdef EXEC_BACKEND
-	pid = internal_forkexec(child_process_kinds[child_type].name,
+	pid = internal_forkexec(child_process_kinds[child_type].name, child_slot,
 							startup_data, startup_data_len, client_sock);
 	/* the child process will arrive in SubPostmasterMain */
 #else							/* !EXEC_BACKEND */
@@ -263,6 +267,7 @@ postmaster_child_launch(BackendType child_type,
 		 */
 		MemoryContextSwitchTo(TopMemoryContext);
 
+		MyPMChildSlot = child_slot;
 		if (client_sock)
 		{
 			MyClientSocket = palloc(sizeof(ClientSocket));
@@ -289,7 +294,8 @@ postmaster_child_launch(BackendType child_type,
  * - fork():s, and then exec():s the child process
  */
 static pid_t
-internal_forkexec(const char *child_kind, char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
+internal_forkexec(const char *child_kind, int child_slot,
+				  char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
 {
 	static unsigned long tmpBackendFileNum = 0;
 	pid_t		pid;
@@ -309,7 +315,7 @@ internal_forkexec(const char *child_kind, char *startup_data, size_t startup_dat
 	 */
 	paramsz = SizeOfBackendParameters(startup_data_len);
 	param = palloc0(paramsz);
-	if (!save_backend_variables(param, client_sock, startup_data, startup_data_len))
+	if (!save_backend_variables(param, child_slot, client_sock, startup_data, startup_data_len))
 	{
 		pfree(param);
 		return -1;				/* log made by save_backend_variables */
@@ -398,7 +404,8 @@ internal_forkexec(const char *child_kind, char *startup_data, size_t startup_dat
  *	 file is complete.
  */
 static pid_t
-internal_forkexec(const char *child_kind, char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
+internal_forkexec(const char *child_kind, int child_slot,
+				  char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
 {
 	int			retry_count = 0;
 	STARTUPINFO si;
@@ -479,7 +486,9 @@ retry:
 		return -1;
 	}
 
-	if (!save_backend_variables(param, client_sock, pi.hProcess, pi.dwProcessId, startup_data, startup_data_len))
+	if (!save_backend_variables(param, child_slot, client_sock,
+								pi.hProcess, pi.dwProcessId,
+								startup_data, startup_data_len))
 	{
 		/*
 		 * log made by save_backend_variables, but we have to clean up the
@@ -691,7 +700,8 @@ static void read_inheritable_socket(SOCKET *dest, InheritableSocket *src);
 
 /* Save critical backend variables into the BackendParameters struct */
 static bool
-save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
+save_backend_variables(BackendParameters *param,
+					   int child_slot, ClientSocket *client_sock,
 #ifdef WIN32
 					   HANDLE childProcess, pid_t childPid,
 #endif
@@ -708,7 +718,7 @@ save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
 
 	strlcpy(param->DataDir, DataDir, MAXPGPATH);
 
-	param->MyPMChildSlot = MyPMChildSlot;
+	param->MyPMChildSlot = child_slot;
 
 #ifdef WIN32
 	param->ShmemProtectiveRegion = ShmemProtectiveRegion;
diff --git a/src/backend/postmaster/pmchild.c b/src/backend/postmaster/pmchild.c
index e86982d6d1..25cccc5514 100644
--- a/src/backend/postmaster/pmchild.c
+++ b/src/backend/postmaster/pmchild.c
@@ -209,9 +209,6 @@ AssignPostmasterChildSlot(BackendType btype)
 
 	ReservePostmasterChildSlot(pmchild->child_slot);
 
-	/* FIXME: find a more elegant way to pass this */
-	MyPMChildSlot = pmchild->child_slot;
-
 	elog(DEBUG2, "assigned pm child slot %d for %s", pmchild->child_slot, PostmasterChildName(btype));
 
 	return pmchild;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a928a04c7a..67035241b3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -983,7 +983,7 @@ PostmasterMain(int argc, char *argv[])
 	SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
 	if (!SysLoggerPMChild)
 		elog(ERROR, "no postmaster child slot available for syslogger");
-	SysLoggerPMChild->pid = SysLogger_Start();
+	SysLoggerPMChild->pid = SysLogger_Start(SysLoggerPMChild->child_slot);
 	if (SysLoggerPMChild->pid == 0)
 	{
 		FreePostmasterChildSlot(SysLoggerPMChild);
@@ -2414,7 +2414,7 @@ process_pm_child_exit(void)
 		if (SysLoggerPMChild && pid == SysLoggerPMChild->pid)
 		{
 			/* for safety's sake, launch new logger *first* */
-			SysLoggerPMChild->pid = SysLogger_Start();
+			SysLoggerPMChild->pid = SysLogger_Start(SysLoggerPMChild->child_slot);
 			if (SysLoggerPMChild->pid == 0)
 			{
 				FreePostmasterChildSlot(SysLoggerPMChild);
@@ -3039,7 +3039,7 @@ LaunchMissingBackgroundProcesses(void)
 			elog(LOG, "no postmaster child slot available for syslogger");
 		else
 		{
-			SysLoggerPMChild->pid = SysLogger_Start();
+			SysLoggerPMChild->pid = SysLogger_Start(SysLoggerPMChild->child_slot);
 			if (SysLoggerPMChild->pid == 0)
 			{
 				FreePostmasterChildSlot(SysLoggerPMChild);
@@ -3311,8 +3311,7 @@ BackendStartup(ClientSocket *client_sock)
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
-	MyPMChildSlot = bn->child_slot;
-	pid = postmaster_child_launch(bn->bkend_type,
+	pid = postmaster_child_launch(bn->bkend_type, bn->child_slot,
 								  (char *) &startup_data, sizeof(startup_data),
 								  client_sock);
 	if (pid < 0)
@@ -3636,8 +3635,7 @@ StartChildProcess(BackendType type)
 		return NULL;
 	}
 
-	MyPMChildSlot = pmchild->child_slot;
-	pid = postmaster_child_launch(type, NULL, 0, NULL);
+	pid = postmaster_child_launch(type, pmchild->child_slot, NULL, 0, NULL);
 	if (pid < 0)
 	{
 		/* in parent, fork failed */
@@ -3792,8 +3790,8 @@ do_start_bgworker(RegisteredBgWorker *rw)
 			(errmsg_internal("starting background worker process \"%s\"",
 							 rw->rw_worker.bgw_name)));
 
-	MyPMChildSlot = bn->child_slot;
-	worker_pid = postmaster_child_launch(B_BG_WORKER, (char *) &rw->rw_worker, sizeof(BackgroundWorker), NULL);
+	worker_pid = postmaster_child_launch(B_BG_WORKER, bn->child_slot,
+										 (char *) &rw->rw_worker, sizeof(BackgroundWorker), NULL);
 	if (worker_pid == -1)
 	{
 		/* in postmaster, fork failed ... */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 7951599fa8..d68853d429 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -590,7 +590,7 @@ SysLoggerMain(char *startup_data, size_t startup_data_len)
  * Postmaster subroutine to start a syslogger subprocess.
  */
 int
-SysLogger_Start(void)
+SysLogger_Start(int child_slot)
 {
 	pid_t		sysloggerPid;
 	char	   *filename;
@@ -699,9 +699,11 @@ SysLogger_Start(void)
 	startup_data.syslogFile = syslogger_fdget(syslogFile);
 	startup_data.csvlogFile = syslogger_fdget(csvlogFile);
 	startup_data.jsonlogFile = syslogger_fdget(jsonlogFile);
-	sysloggerPid = postmaster_child_launch(B_LOGGER, (char *) &startup_data, sizeof(startup_data), NULL);
+	sysloggerPid = postmaster_child_launch(B_LOGGER, child_slot,
+										   (char *) &startup_data, sizeof(startup_data), NULL);
 #else
-	sysloggerPid = postmaster_child_launch(B_LOGGER, NULL, 0, NULL);
+	sysloggerPid = postmaster_child_launch(B_LOGGER, child_slot,
+										   NULL, 0, NULL);
 #endif							/* EXEC_BACKEND */
 
 	if (sysloggerPid == -1)
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index deca2e8370..81a3520021 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -103,6 +103,7 @@ extern PGDLLIMPORT struct ClientSocket *MyClientSocket;
 
 /* prototypes for functions in launch_backend.c */
 extern pid_t postmaster_child_launch(BackendType child_type,
+									 int child_slot,
 									 char *startup_data,
 									 size_t startup_data_len,
 									 struct ClientSocket *client_sock);
diff --git a/src/include/postmaster/syslogger.h b/src/include/postmaster/syslogger.h
index b5fc239ba9..d72b978b0a 100644
--- a/src/include/postmaster/syslogger.h
+++ b/src/include/postmaster/syslogger.h
@@ -86,7 +86,7 @@ extern PGDLLIMPORT HANDLE syslogPipe[2];
 #endif
 
 
-extern int	SysLogger_Start(void);
+extern int	SysLogger_Start(int child_slot);
 
 extern void write_syslogger_file(const char *buffer, int count, int destination);
 
-- 
2.39.2

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Heikki Linnakangas (#3)

Re: Refactoring postmaster's code to cleanup after child exit

On Fri, Aug 2, 2024 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

* v3-0001-Make-BackgroundWorkerList-doubly-linked.patch

LGTM.

[v3-0002-Refactor-code-to-handle-death-of-a-backend-or-bgw.patch]

Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList, to see if it the child process was a
background worker. If not found, then it scans through BackendList to
see if it was a regular backend. That leads to some duplication
between the bgworker and regular backend cleanup code, as both have an
entry in the BackendList that needs to be cleaned up in the same way.
Refactor that so that we scan just the BackendList to find the child
process, and if it was a background worker, do the additional
bgworker-specific cleanup in addition to the normal Backend cleanup.

Makes sense.

On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's
now logged with that exit code, instead of 0. Also, if a bgworker
exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is
restarted. Previously it was treated as a normal exit.

Interesting. So when that error was first specially handled in this thread:

/messages/by-id/AANLkTimCTkNKKrHCd3Ot6kAsrSS7SeDpOTcaLsEP7i+M@mail.gmail.com

... it went from being considered a crash, to being considered like
exit(0). It's true that shared memory can't be corrupted by a process
that never enters main(), but it's better not to hide the true reason
for the failure (if it is still possible -- I don't find many
references to that phenomenon in recent times). Clobbering exitstatus
with 0 doesn't seem right at all, now that we have background workers
whose restart behaviour is affected by that.

If a child process is not found in the BackendList, the log message
now calls it "untracked child process" rather than "server process".
Arguably that should be a PANIC, because we do track all the child
processes in the list, so failing to find a child process is highly
unexpected. But if we want to change that, let's discuss and do that
as a separate commit.

Yeah, it would be highly unexpected if waitpid() told you about some
random other process (or we screwed up the bookkeeping and didn't
recognise it). So at least having a different message seems good.

* v3-0003-Fix-comment-on-processes-being-kept-over-a-restar.patch

* v3-0004-Consolidate-postmaster-code-to-launch-background-.patch

Much of the code in process_pm_child_exit() to launch replacement
processes when one exits or when progressing to next postmaster state
was unnecessary, because the ServerLoop will launch any missing
background processes anyway. Remove the redundant code and let
ServerLoop handle it.

+1, makes sense.

In ServerLoop, move the code to launch all the processes to a new
subroutine, to group it all together.

+1, makes sense.

More soon...

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Thomas Munro (#4)

Re: Refactoring postmaster's code to cleanup after child exit

On 08/08/2024 13:47, Thomas Munro wrote:

On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's
now logged with that exit code, instead of 0. Also, if a bgworker
exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is
restarted. Previously it was treated as a normal exit.

Interesting. So when that error was first specially handled in this thread:

/messages/by-id/AANLkTimCTkNKKrHCd3Ot6kAsrSS7SeDpOTcaLsEP7i+M@mail.gmail.com

... it went from being considered a crash, to being considered like
exit(0). It's true that shared memory can't be corrupted by a process
that never enters main(), but it's better not to hide the true reason
for the failure (if it is still possible -- I don't find many
references to that phenomenon in recent times). Clobbering exitstatus
with 0 doesn't seem right at all, now that we have background workers
whose restart behaviour is affected by that.

I adjusted this ERROR_WAIT_NO_CHILDREN a little more, to avoid logging
the death of the child twice in some cases.

* v3-0003-Fix-comment-on-processes-being-kept-over-a-restar.patch

+1

Committed the patches up to and including this one, with tiny comment
changes.

* v3-0004-Consolidate-postmaster-code-to-launch-background-.patch

Much of the code in process_pm_child_exit() to launch replacement
processes when one exits or when progressing to next postmaster state
was unnecessary, because the ServerLoop will launch any missing
background processes anyway. Remove the redundant code and let
ServerLoop handle it.

I'm going to work a little more on the comments on this one before
committing; I had just moved all the "If we have lost the XXX, try to
start a new one" comments as is, but they look pretty repetitive now.

Thanks for the review!

--
Heikki Linnakangas
Neon (https://neon.tech)

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Heikki Linnakangas (#5)

7 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

On 10/08/2024 00:13, Heikki Linnakangas wrote:

On 08/08/2024 13:47, Thomas Munro wrote:

* v3-0004-Consolidate-postmaster-code-to-launch-background-.patch

     Much of the code in process_pm_child_exit() to launch replacement
     processes when one exits or when progressing to next postmaster
state
     was unnecessary, because the ServerLoop will launch any missing
     background processes anyway. Remove the redundant code and let
     ServerLoop handle it.

I'm going to work a little more on the comments on this one before
committing; I had just moved all the "If we have lost the XXX, try to
start a new one" comments as is, but they look pretty repetitive now.

Pushed this now, after adjusting the comments a bit. Thanks again for
the review!

Here are the remaining patches, rebased.

commit a1c43d65907d20a999b203e465db1277ec842a0a
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu Aug 1 17:24:12 2024 +0300

Introduce a separate BackendType for dead-end children

And replace postmaster.c's own "backend type" codes with BackendType

XXX: While working on this, many times I accidentally did something
like "foo |= B_SOMETHING" instead of "foo |= 1 << B_SOMETHING", when
constructing arguments to SignalSomeChildren or CountChildren, and
things broke in very subtle ways taking a long time to debug. The old
constants that were already bitmasks avoided that. Maybe we need some
macro magic or something to make this less error-prone.

While rebasing this today, I spotted another instance of that mistake
mentioned in the XXX comment above. I called "CountChildren(B_BACKEND)"
instead of "CountChildren(1 << B_BACKEND)". Some ideas on how to make
that less error-prone:

1. Add a separate typedef for the bitmasks, and macros/functions to work
with it. Something like:

typedef struct {
uint32 mask;
} BackendTypeMask;

static const BackendTypeMask BTMASK_ALL = { 0xffffffff };
static const BackendTypeMask BTMASK_NONE = { 0 };

static inline BackendTypeMask
BTMASK_ADD(BackendTypeMask mask, BackendType t)
{
mask.mask |= 1 << t;
return mask;
}

static inline BackendTypeMask
BTMASK_DEL(BackendTypeMask mask, BackendType t)
{
mask.mask &= ~(1 << t);
return mask;
}

Now the compiler will complain if you try to pass a BackendType for the
mask. We could do this just for BackendType, or we could put this in
src/include/lib/ with a more generic name, like "bitmask_u32".

2. Another idea is to redefine the BackendType values to be separate
bits, like the current BACKEND_TYPE_* values in postmaster.c:

typedef enum BackendType
{
B_INVALID = 0,

/* Backends and other backend-like processes */
B_BACKEND = 1 << 1,
B_DEAD_END_BACKEND = 1 << 2,
B_AUTOVAC_LAUNCHER = 1 << 3,
B_AUTOVAC_WORKER = 1 << 4,

...
} BackendType;

Then you can use | and & on BackendTypes directly. It makes it less
clear which function arguments are a BackendType and which are a
bitmask, however.

Thoughts, other ideas?

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v4-0001-Add-test-for-connection-limits.patchtext/x-patch; charset=UTF-8; name=v4-0001-Add-test-for-connection-limits.patchDownload

From 57216e6203deb99bed7a9cc5ab1b07bbdcf808cc Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:55:05 +0300
Subject: [PATCH v4 1/8] Add test for connection limits

---
 src/test/Makefile                             |  2 +-
 src/test/meson.build                          |  1 +
 src/test/postmaster/Makefile                  | 23 ++++++
 src/test/postmaster/README                    | 27 +++++++
 src/test/postmaster/meson.build               | 12 +++
 .../postmaster/t/001_connection_limits.pl     | 79 +++++++++++++++++++
 6 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 src/test/postmaster/Makefile
 create mode 100644 src/test/postmaster/README
 create mode 100644 src/test/postmaster/meson.build
 create mode 100644 src/test/postmaster/t/001_connection_limits.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index dbd3192874..abdd6e5a98 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl postmaster regress isolation modules authentication recovery subscription
 
 ifeq ($(with_icu),yes)
 SUBDIRS += icu
diff --git a/src/test/meson.build b/src/test/meson.build
index c3d0dfedf1..67376e4b7f 100644
--- a/src/test/meson.build
+++ b/src/test/meson.build
@@ -4,6 +4,7 @@ subdir('regress')
 subdir('isolation')
 
 subdir('authentication')
+subdir('postmaster')
 subdir('recovery')
 subdir('subscription')
 subdir('modules')
diff --git a/src/test/postmaster/Makefile b/src/test/postmaster/Makefile
new file mode 100644
index 0000000000..dfcce9c9ee
--- /dev/null
+++ b/src/test/postmaster/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/postmaster
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/postmaster/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/postmaster
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean:
+	rm -rf tmp_check
diff --git a/src/test/postmaster/README b/src/test/postmaster/README
new file mode 100644
index 0000000000..7e47bf5cff
--- /dev/null
+++ b/src/test/postmaster/README
@@ -0,0 +1,27 @@
+src/test/postmaster/README
+
+Regression tests for postmaster
+===============================
+
+This directory contains a test suite for postmaster's handling of
+connections, connection limits, and startup/shutdown sequence.
+
+
+Running the tests
+=================
+
+NOTE: You must have given the --enable-tap-tests argument to configure.
+
+Run
+    make check
+or
+    make installcheck
+You can use "make installcheck" if you previously did "make install".
+In that case, the code in the installation tree is tested.  With
+"make check", a temporary installation tree is built from the current
+sources and then tested.
+
+Either way, this test initializes, starts, and stops a test Postgres
+cluster.
+
+See src/test/perl/README for more info about running these tests.
diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
new file mode 100644
index 0000000000..c2de2e0eb5
--- /dev/null
+++ b/src/test/postmaster/meson.build
@@ -0,0 +1,12 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+tests += {
+  'name': 'postmaster',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'tests': [
+      't/001_connection_limits.pl',
+    ],
+  },
+}
diff --git a/src/test/postmaster/t/001_connection_limits.pl b/src/test/postmaster/t/001_connection_limits.pl
new file mode 100644
index 0000000000..f50aae4949
--- /dev/null
+++ b/src/test/postmaster/t/001_connection_limits.pl
@@ -0,0 +1,79 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# Test connection limits, i.e. max_connections, reserved_connections
+# and superuser_reserved_connections.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize the server with specific low connection limits
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "max_connections = 6");
+$node->append_conf('postgresql.conf', "reserved_connections = 2");
+$node->append_conf('postgresql.conf', "superuser_reserved_connections = 1");
+$node->append_conf('postgresql.conf', "log_connections = on");
+$node->start;
+
+$node->safe_psql(
+	'postgres', qq{
+CREATE USER regress_regular LOGIN;
+CREATE USER regress_reserved LOGIN;
+GRANT pg_use_reserved_connections TO regress_reserved;
+CREATE USER regress_superuser LOGIN SUPERUSER;
+});
+
+# With the limits we set in postgresql.conf, we can establish:
+# - 3 connections for any user with no special privileges
+# - 2 more connections for users belonging to "pg_use_reserved_connections"
+# - 1 more connection for superuser
+
+sub background_psql_as_user
+{
+	my $user = shift;
+
+	return $node->background_psql(
+		'postgres',
+		on_error_die => 1,
+		extra_params => [ '-U', $user ]);
+}
+
+my @sessions = ();
+
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with privileges of the "pg_use_reserved_connections" role/
+);
+
+push(@sessions, background_psql_as_user('regress_reserved'));
+push(@sessions, background_psql_as_user('regress_reserved'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with the SUPERUSER attribute/
+);
+
+push(@sessions, background_psql_as_user('regress_superuser'));
+$node->connect_fails(
+	"dbname=postgres user=regress_superuser",
+	"superuser_reserved_connections limit",
+	expected_stderr => qr/FATAL:  sorry, too many clients already/);
+
+# TODO: test that query cancellation is still possible
+
+foreach my $session (@sessions)
+{
+	$session->quit;
+}
+
+done_testing();
-- 
2.39.2

v4-0002-Add-test-for-dead-end-backends.patchtext/x-patch; charset=UTF-8; name=v4-0002-Add-test-for-dead-end-backends.patchDownload

From 93b9e9b6e072f63af9009e0d66ab6d0d62ea8c15 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:55:11 +0300
Subject: [PATCH v4 2/8] Add test for dead-end backends

The code path for launching a dead-end backend because we're out of
slots was not covered by any tests, so add one. (Some tests did hit
the case of launching a dead-end backend because the server is still
starting up, though, so the gap in our test coverage wasn't as big as
it sounds.)
---
 src/test/perl/PostgreSQL/Test/Cluster.pm      | 39 +++++++++++++++++++
 .../postmaster/t/001_connection_limits.pl     | 17 +++++++-
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 32ee98aebc..6d09f9c5f8 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -104,6 +104,7 @@ use File::Path qw(rmtree mkpath);
 use File::Spec;
 use File::stat qw(stat);
 use File::Temp ();
+use IO::Socket::INET;
 use IPC::Run;
 use PostgreSQL::Version;
 use PostgreSQL::Test::RecursiveCopy;
@@ -284,6 +285,44 @@ sub connstr
 	return "port=$pgport host=$pghost dbname='$dbname'";
 }
 
+=pod
+
+=item $node->raw_connect()
+
+Open a raw TCP or Unix domain socket connection to the server. This
+used by low-level protocol and connection limit tests.
+
+=cut
+
+sub raw_connect
+{
+	my ($self) = @_;
+	my $pgport = $self->port;
+	my $pghost = $self->host;
+
+	my $socket;
+	if ($PostgreSQL::Test::Utils::use_unix_sockets)
+	{
+		require IO::Socket::UNIX;
+		my $path = "$pghost/.s.PGSQL.$pgport";
+
+		$socket = IO::Socket::UNIX->new(
+			Type => SOCK_STREAM(),
+			Peer => $path,
+		) or die "Cannot create socket - $IO::Socket::errstr\n";
+	}
+	else
+	{
+		$socket = IO::Socket::INET->new(
+			PeerHost => $pghost,
+			PeerPort => $pgport,
+			Proto => 'tcp'
+		) or die "Cannot create socket - $IO::Socket::errstr\n";
+	}
+	return $socket;
+}
+
+
 =pod
 
 =item $node->group_access()
diff --git a/src/test/postmaster/t/001_connection_limits.pl b/src/test/postmaster/t/001_connection_limits.pl
index f50aae4949..3547b28bdd 100644
--- a/src/test/postmaster/t/001_connection_limits.pl
+++ b/src/test/postmaster/t/001_connection_limits.pl
@@ -43,6 +43,7 @@ sub background_psql_as_user
 }
 
 my @sessions = ();
+my @raw_connections = ();
 
 push(@sessions, background_psql_as_user('regress_regular'));
 push(@sessions, background_psql_as_user('regress_regular'));
@@ -69,11 +70,25 @@ $node->connect_fails(
 	"superuser_reserved_connections limit",
 	expected_stderr => qr/FATAL:  sorry, too many clients already/);
 
-# TODO: test that query cancellation is still possible
+# We can still open TCP (or Unix domain socket) connections, but
+# beyond a certain number (roughly 2x max_connections), they will be
+# "dead-end backends".
+for (my $i = 0; $i <= 20; $i++)
+{
+	push(@raw_connections, $node->raw_connect());
+}
+
+# TODO: test that query cancellation is still possible. A dead-end
+# backend can process a query cancellation packet.
 
+# Clean up
 foreach my $session (@sessions)
 {
 	$session->quit;
 }
+foreach my $socket (@raw_connections)
+{
+	$socket->close();
+}
 
 done_testing();
-- 
2.39.2

v4-0003-Use-an-shmem_exit-callback-to-remove-backend-from.patchtext/x-patch; charset=UTF-8; name=v4-0003-Use-an-shmem_exit-callback-to-remove-backend-from.patchDownload

From 88287a2db95e584018f1c7fa9e992feb7d179ce8 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:58:35 +0300
Subject: [PATCH v4 3/8] Use an shmem_exit callback to remove backend from
 PMChildFlags on exit

This seems nicer than having to duplicate the logic between
InitProcess() and ProcKill() for which child processes have a
PMChildFlags slot.

Move the MarkPostmasterChildActive() call earlier in InitProcess(),
out of the section protected by the spinlock.
---
 src/backend/storage/ipc/pmsignal.c | 10 ++++++--
 src/backend/storage/lmgr/proc.c    | 38 ++++++++++--------------------
 src/include/storage/pmsignal.h     |  1 -
 3 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index 27844b46a2..cb99e77476 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -24,6 +24,7 @@
 #include "miscadmin.h"
 #include "postmaster/postmaster.h"
 #include "replication/walsender.h"
+#include "storage/ipc.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/memutils.h"
@@ -121,6 +122,8 @@ postmaster_death_handler(SIGNAL_ARGS)
 
 #endif							/* USE_POSTMASTER_DEATH_SIGNAL */
 
+static void MarkPostmasterChildInactive(int code, Datum arg);
+
 /*
  * PMSignalShmemSize
  *		Compute space needed for pmsignal.c's shared memory
@@ -328,6 +331,9 @@ MarkPostmasterChildActive(void)
 	slot--;
 	Assert(PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED);
 	PMSignalState->PMChildFlags[slot] = PM_CHILD_ACTIVE;
+
+	/* Arrange to clean up at exit. */
+	on_shmem_exit(MarkPostmasterChildInactive, 0);
 }
 
 /*
@@ -352,8 +358,8 @@ MarkPostmasterChildWalSender(void)
  * MarkPostmasterChildInactive - mark a postmaster child as done using
  * shared memory.  This is called in the child process.
  */
-void
-MarkPostmasterChildInactive(void)
+static void
+MarkPostmasterChildInactive(int code, Datum arg)
 {
 	int			slot = MyPMChildSlot;
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ac66da8638..9536469e89 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -308,6 +308,19 @@ InitProcess(void)
 	if (MyProc != NULL)
 		elog(ERROR, "you already exist");
 
+	/*
+	 * Before we start accessing the shared memory in a serious way, mark
+	 * ourselves as an active postmaster child; this is so that the postmaster
+	 * can detect it if we exit without cleaning up.  (XXX autovac launcher
+	 * currently doesn't participate in this; it probably should.)
+	 *
+	 * Slot sync worker also does not participate in it, see comments atop
+	 * 'struct bkend' in postmaster.c.
+	 */
+	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
+		!AmLogicalSlotSyncWorkerProcess())
+		MarkPostmasterChildActive();
+
 	/* Decide which list should supply our PGPROC. */
 	if (AmAutoVacuumLauncherProcess() || AmAutoVacuumWorkerProcess())
 		procgloballist = &ProcGlobal->autovacFreeProcs;
@@ -360,19 +373,6 @@ InitProcess(void)
 	 */
 	Assert(MyProc->procgloballist == procgloballist);
 
-	/*
-	 * Now that we have a PGPROC, mark ourselves as an active postmaster
-	 * child; this is so that the postmaster can detect it if we exit without
-	 * cleaning up.  (XXX autovac launcher currently doesn't participate in
-	 * this; it probably should.)
-	 *
-	 * Slot sync worker also does not participate in it, see comments atop
-	 * 'struct bkend' in postmaster.c.
-	 */
-	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
-		!AmLogicalSlotSyncWorkerProcess())
-		MarkPostmasterChildActive();
-
 	/*
 	 * Initialize all fields of MyProc, except for those previously
 	 * initialized by InitProcGlobal.
@@ -947,18 +947,6 @@ ProcKill(int code, Datum arg)
 
 	SpinLockRelease(ProcStructLock);
 
-	/*
-	 * This process is no longer present in shared memory in any meaningful
-	 * way, so tell the postmaster we've cleaned up acceptably well. (XXX
-	 * autovac launcher should be included here someday)
-	 *
-	 * Slot sync worker is also not a postmaster child, so skip this shared
-	 * memory related processing here.
-	 */
-	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
-		!AmLogicalSlotSyncWorkerProcess())
-		MarkPostmasterChildInactive();
-
 	/* wake autovac launcher if needed -- see comments in FreeWorkerInfo */
 	if (AutovacuumLauncherPid != 0)
 		kill(AutovacuumLauncherPid, SIGUSR2);
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 0c9a7e32a8..3b9336b83c 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -74,7 +74,6 @@ extern int	AssignPostmasterChildSlot(void);
 extern bool ReleasePostmasterChildSlot(int slot);
 extern bool IsPostmasterChildWalSender(int slot);
 extern void MarkPostmasterChildActive(void);
-extern void MarkPostmasterChildInactive(void);
 extern void MarkPostmasterChildWalSender(void);
 extern bool PostmasterIsAliveInternal(void);
 extern void PostmasterDeathSignalInit(void);
-- 
2.39.2

v4-0004-Introduce-a-separate-BackendType-for-dead-end-chi.patchtext/x-patch; charset=UTF-8; name=v4-0004-Introduce-a-separate-BackendType-for-dead-end-chi.patchDownload

From dc53f89edbeec99f8633def8aa5f47cd98e7a150 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:59:04 +0300
Subject: [PATCH v4 4/8] Introduce a separate BackendType for dead-end children

And replace postmaster.c's own "backend type" codes with BackendType

XXX: While working on this, many times I accidentally did something
like "foo |= B_SOMETHING" instead of "foo |= 1 << B_SOMETHING", when
constructing arguments to SignalSomeChildren or CountChildren, and
things broke in very subtle ways taking a long time to debug. The old
constants that were already bitmasks avoided that. Maybe we need some
macro magic or something to make this less error-prone.
---
 src/backend/postmaster/postmaster.c    | 106 ++++++++++++-------------
 src/backend/utils/activity/pgstat_io.c |   3 +
 src/backend/utils/init/miscinit.c      |   3 +
 src/include/miscadmin.h                |   1 +
 4 files changed, 56 insertions(+), 57 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 2c8e7fa7d6..9bbbbfe55f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -129,15 +129,11 @@
 
 
 /*
- * Possible types of a backend. Beyond being the possible bkend_type values in
- * struct bkend, these are OR-able request flag bits for SignalSomeChildren()
- * and CountChildren().
+ * CountChildren and SignalSomeChildren use a uint32 bitmask argument to
+ * represent BackendTypes to count or signal.
  */
-#define BACKEND_TYPE_NORMAL		0x0001	/* normal backend */
-#define BACKEND_TYPE_AUTOVAC	0x0002	/* autovacuum worker process */
-#define BACKEND_TYPE_WALSND		0x0004	/* walsender process */
-#define BACKEND_TYPE_BGWORKER	0x0008	/* bgworker process */
-#define BACKEND_TYPE_ALL		0x000F	/* OR of all the above */
+#define BACKEND_TYPE_ALL 0xffffffff
+StaticAssertDecl(BACKEND_NUM_TYPES < 32, "too many backend types for uint32");
 
 /*
  * List of active backends (or child processes anyway; we don't actually
@@ -148,7 +144,7 @@
  * As shown in the above set of backend types, this list includes not only
  * "normal" client sessions, but also autovacuum workers, walsenders, and
  * background workers.  (Note that at the time of launch, walsenders are
- * labeled BACKEND_TYPE_NORMAL; we relabel them to BACKEND_TYPE_WALSND
+ * labeled B_BACKEND; we relabel them to B_WAL_SENDER
  * upon noticing they've changed their PMChildFlags entry.  Hence that check
  * must be done before any operation that needs to distinguish walsenders
  * from normal backends.)
@@ -157,7 +153,8 @@
  * the purpose of sending a friendly rejection message to a would-be client.
  * We must track them because they are attached to shared memory, but we know
  * they will never become live backends.  dead_end children are not assigned a
- * PMChildSlot.  dead_end children have bkend_type NORMAL.
+ * PMChildSlot.  dead_end children have bkend_type B_DEAD_END_BACKEND.
+ * FIXME: a dead-end backend can send query cancel?
  *
  * "Special" children such as the startup, bgwriter, autovacuum launcher, and
  * slot sync worker tasks are not in this list.  They are tracked via StartupPID
@@ -169,8 +166,7 @@ typedef struct bkend
 {
 	pid_t		pid;			/* process id of backend */
 	int			child_slot;		/* PMChildSlot for this backend, if any */
-	int			bkend_type;		/* child process flavor, see above */
-	bool		dead_end;		/* is it going to send an error and quit? */
+	BackendType bkend_type;		/* child process flavor, see above */
 	RegisteredBgWorker *rw;		/* bgworker info, if this is a bgworker */
 	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
 	dlist_node	elem;			/* list link in BackendList */
@@ -410,12 +406,13 @@ static void report_fork_failure_to_client(ClientSocket *client_sock, int errnum)
 static CAC_state canAcceptConnections(int backend_type);
 static void signal_child(pid_t pid, int signal);
 static void sigquit_child(pid_t pid);
-static bool SignalSomeChildren(int signal, int target);
+static bool SignalSomeChildren(int signal, uint32 targetMask);
 static void TerminateChildren(int signal);
 
-#define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
+#define SignalChildren(sig)		\
+	SignalSomeChildren(sig, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND))
 
-static int	CountChildren(int target);
+static int	CountChildren(uint32 targetMask);
 static Backend *assign_backendlist_entry(void);
 static void LaunchMissingBackgroundProcesses(void);
 static void maybe_start_bgworkers(void);
@@ -1765,7 +1762,7 @@ canAcceptConnections(int backend_type)
 	 * bgworker_should_start_now() decided whether the DB state allows them.
 	 */
 	if (pmState != PM_RUN && pmState != PM_HOT_STANDBY &&
-		backend_type != BACKEND_TYPE_BGWORKER)
+		backend_type != B_BG_WORKER)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
@@ -1782,7 +1779,7 @@ canAcceptConnections(int backend_type)
 	 * "Smart shutdown" restrictions are applied only to normal connections,
 	 * not to autovac workers or bgworkers.
 	 */
-	if (!connsAllowed && backend_type == BACKEND_TYPE_NORMAL)
+	if (!connsAllowed && backend_type == B_BACKEND)
 		return CAC_SHUTDOWN;	/* shutdown is pending */
 
 	/*
@@ -1797,7 +1794,7 @@ canAcceptConnections(int backend_type)
 	 * The limit here must match the sizes of the per-child-process arrays;
 	 * see comments for MaxLivePostmasterChildren().
 	 */
-	if (CountChildren(BACKEND_TYPE_ALL) >= MaxLivePostmasterChildren())
+	if (CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) >= MaxLivePostmasterChildren())
 		result = CAC_TOOMANY;
 
 	return result;
@@ -2555,11 +2552,11 @@ CleanupBackend(Backend *bp,
 	bool		logged = false;
 
 	/* Construct a process name for log message */
-	if (bp->dead_end)
+	if (bp->bkend_type == B_DEAD_END_BACKEND)
 	{
 		procname = _("dead end backend");
 	}
-	else if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
+	else if (bp->bkend_type == B_BG_WORKER)
 	{
 		snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""),
 				 bp->rw->rw_worker.bgw_type);
@@ -2598,7 +2595,7 @@ CleanupBackend(Backend *bp,
 	 * If the process attached to shared memory, check that it detached
 	 * cleanly.
 	 */
-	if (!bp->dead_end)
+	if (bp->bkend_type != B_DEAD_END_BACKEND)
 	{
 		if (!ReleasePostmasterChildSlot(bp->child_slot))
 		{
@@ -2630,7 +2627,7 @@ CleanupBackend(Backend *bp,
 	/*
 	 * If it was a background worker, also update its RegisteredWorker entry.
 	 */
-	if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
+	if (bp->bkend_type == B_BG_WORKER)
 	{
 		RegisteredBgWorker *rw = bp->rw;
 
@@ -2858,7 +2855,7 @@ PostmasterStateMachine(void)
 			 * This state ends when we have no normal client backends running.
 			 * Then we're ready to stop other children.
 			 */
-			if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
+			if (CountChildren(1 << B_BACKEND) == 0)
 				pmState = PM_STOP_BACKENDS;
 		}
 	}
@@ -2879,7 +2876,7 @@ PostmasterStateMachine(void)
 
 		/* Signal all backend children except walsenders */
 		SignalSomeChildren(SIGTERM,
-						   BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
+						   BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND));
 		/* and the autovac launcher too */
 		if (AutoVacPID != 0)
 			signal_child(AutoVacPID, SIGTERM);
@@ -2921,7 +2918,7 @@ PostmasterStateMachine(void)
 		 * here. Walsenders and archiver are also disregarded, they will be
 		 * terminated later after writing the checkpoint record.
 		 */
-		if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 &&
+		if (CountChildren(BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND)) == 0 &&
 			StartupPID == 0 &&
 			WalReceiverPID == 0 &&
 			WalSummarizerPID == 0 &&
@@ -2995,7 +2992,7 @@ PostmasterStateMachine(void)
 		 * left by now anyway; what we're really waiting for is walsenders and
 		 * archiver.
 		 */
-		if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0)
+		if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) == 0)
 		{
 			pmState = PM_WAIT_DEAD_END;
 		}
@@ -3294,10 +3291,10 @@ sigquit_child(pid_t pid)
 
 /*
  * Send a signal to the targeted children (but NOT special children;
- * dead_end children are never signaled, either).
+ * dead_end children are never signaled, either XXX).
  */
 static bool
-SignalSomeChildren(int signal, int target)
+SignalSomeChildren(int signal, uint32 targetMask)
 {
 	dlist_iter	iter;
 	bool		signaled = false;
@@ -3306,24 +3303,21 @@ SignalSomeChildren(int signal, int target)
 	{
 		Backend    *bp = dlist_container(Backend, elem, iter.cur);
 
-		if (bp->dead_end)
-			continue;
-
 		/*
 		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
 		 * it first and avoid touching shared memory for every child.
 		 */
-		if (target != BACKEND_TYPE_ALL)
+		if (targetMask != BACKEND_TYPE_ALL)
 		{
 			/*
 			 * Assign bkend_type for any recently announced WAL Sender
 			 * processes.
 			 */
-			if (bp->bkend_type == BACKEND_TYPE_NORMAL &&
+			if (bp->bkend_type == B_BACKEND &&
 				IsPostmasterChildWalSender(bp->child_slot))
-				bp->bkend_type = BACKEND_TYPE_WALSND;
+				bp->bkend_type = B_WAL_SENDER;
 
-			if (!(target & bp->bkend_type))
+			if ((targetMask & (1 << bp->bkend_type)) == 0)
 				continue;
 		}
 
@@ -3396,17 +3390,22 @@ BackendStartup(ClientSocket *client_sock)
 	}
 
 	/* Pass down canAcceptConnections state */
-	startup_data.canAcceptConnections = canAcceptConnections(BACKEND_TYPE_NORMAL);
-	bn->dead_end = (startup_data.canAcceptConnections != CAC_OK);
+	startup_data.canAcceptConnections = canAcceptConnections(B_BACKEND);
 	bn->rw = NULL;
 
 	/*
 	 * Unless it's a dead_end child, assign it a child slot number
 	 */
-	if (!bn->dead_end)
+	if (startup_data.canAcceptConnections == CAC_OK)
+	{
+		bn->bkend_type = B_BACKEND; /* Can change later to WALSND */
 		bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
+	}
 	else
+	{
+		bn->bkend_type = B_DEAD_END_BACKEND;
 		bn->child_slot = 0;
+	}
 
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
@@ -3419,7 +3418,7 @@ BackendStartup(ClientSocket *client_sock)
 		/* in parent, fork failed */
 		int			save_errno = errno;
 
-		if (!bn->dead_end)
+		if (bn->child_slot != 0)
 			(void) ReleasePostmasterChildSlot(bn->child_slot);
 		pfree(bn);
 		errno = save_errno;
@@ -3439,7 +3438,6 @@ BackendStartup(ClientSocket *client_sock)
 	 * of backends.
 	 */
 	bn->pid = pid;
-	bn->bkend_type = BACKEND_TYPE_NORMAL;	/* Can change later to WALSND */
 	dlist_push_head(&BackendList, &bn->elem);
 
 	return STATUS_OK;
@@ -3673,11 +3671,10 @@ dummy_handler(SIGNAL_ARGS)
 }
 
 /*
- * Count up number of child processes of specified types (dead_end children
- * are always excluded).
+ * Count up number of child processes of specified types.
  */
 static int
-CountChildren(int target)
+CountChildren(uint32 targetMask)
 {
 	dlist_iter	iter;
 	int			cnt = 0;
@@ -3686,24 +3683,21 @@ CountChildren(int target)
 	{
 		Backend    *bp = dlist_container(Backend, elem, iter.cur);
 
-		if (bp->dead_end)
-			continue;
-
 		/*
 		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
 		 * it first and avoid touching shared memory for every child.
 		 */
-		if (target != BACKEND_TYPE_ALL)
+		if (targetMask != BACKEND_TYPE_ALL)
 		{
 			/*
 			 * Assign bkend_type for any recently announced WAL Sender
 			 * processes.
 			 */
-			if (bp->bkend_type == BACKEND_TYPE_NORMAL &&
+			if (bp->bkend_type == B_BACKEND &&
 				IsPostmasterChildWalSender(bp->child_slot))
-				bp->bkend_type = BACKEND_TYPE_WALSND;
+				bp->bkend_type = B_WAL_SENDER;
 
-			if (!(target & bp->bkend_type))
+			if ((targetMask & (1 << bp->bkend_type)) == 0)
 				continue;
 		}
 
@@ -3770,13 +3764,13 @@ StartAutovacuumWorker(void)
 	 * we have to check to avoid race-condition problems during DB state
 	 * changes.
 	 */
-	if (canAcceptConnections(BACKEND_TYPE_AUTOVAC) == CAC_OK)
+	if (canAcceptConnections(B_AUTOVAC_WORKER) == CAC_OK)
 	{
 		bn = (Backend *) palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
 		if (bn)
 		{
-			/* Autovac workers are not dead_end and need a child slot */
-			bn->dead_end = false;
+			/* Autovac workers need a child slot */
+			bn->bkend_type = B_AUTOVAC_WORKER;
 			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 			bn->bgworker_notify = false;
 			bn->rw = NULL;
@@ -3784,7 +3778,6 @@ StartAutovacuumWorker(void)
 			bn->pid = StartChildProcess(B_AUTOVAC_WORKER);
 			if (bn->pid > 0)
 			{
-				bn->bkend_type = BACKEND_TYPE_AUTOVAC;
 				dlist_push_head(&BackendList, &bn->elem);
 				/* all OK */
 				return;
@@ -3990,7 +3983,7 @@ assign_backendlist_entry(void)
 	 * only possible failure is CAC_TOOMANY, so we just log an error message
 	 * based on that rather than checking the error code precisely.
 	 */
-	if (canAcceptConnections(BACKEND_TYPE_BGWORKER) != CAC_OK)
+	if (canAcceptConnections(B_BG_WORKER) != CAC_OK)
 	{
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
@@ -4008,8 +4001,7 @@ assign_backendlist_entry(void)
 	}
 
 	bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
-	bn->bkend_type = BACKEND_TYPE_BGWORKER;
-	bn->dead_end = false;
+	bn->bkend_type = B_BG_WORKER;
 	bn->bgworker_notify = false;
 
 	return bn;
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 8af55989ee..9bad1040d6 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -312,6 +312,8 @@ pgstat_io_snapshot_cb(void)
 *
 * The following BackendTypes do not participate in the cumulative stats
 * subsystem or do not perform IO on which we currently track:
+* - Dead-end backend because it is not connected to shared memory and
+*   doesn't do any IO
 * - Syslogger because it is not connected to shared memory
 * - Archiver because most relevant archiving IO is delegated to a
 *   specialized command or module
@@ -334,6 +336,7 @@ pgstat_tracks_io_bktype(BackendType bktype)
 	switch (bktype)
 	{
 		case B_INVALID:
+		case B_DEAD_END_BACKEND:
 		case B_ARCHIVER:
 		case B_LOGGER:
 		case B_WAL_RECEIVER:
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 537d92c0cf..ae8b1a4331 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -281,6 +281,9 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_BACKEND:
 			backendDesc = "client backend";
 			break;
+		case B_DEAD_END_BACKEND:
+			backendDesc = "dead-end client backend";
+			break;
 		case B_BG_WORKER:
 			backendDesc = "background worker";
 			break;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index ac16233b71..b21c4d43b9 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -337,6 +337,7 @@ typedef enum BackendType
 
 	/* Backends and other backend-like processes */
 	B_BACKEND,
+	B_DEAD_END_BACKEND,
 	B_AUTOVAC_LAUNCHER,
 	B_AUTOVAC_WORKER,
 	B_BG_WORKER,
-- 
2.39.2

v4-0005-Kill-dead-end-children-when-there-s-nothing-else-.patchtext/x-patch; charset=UTF-8; name=v4-0005-Kill-dead-end-children-when-there-s-nothing-else-.patchDownload

From 9c832ce33667abc5aef128a17fa9c27daaad872a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:59:27 +0300
Subject: [PATCH v4 5/8] Kill dead-end children when there's nothing else left

Previously, the postmaster would never try to kill dead-end child
processes, even if there were no other processes left. A dead-end
backend will eventually exit, when authentication_timeout expires, but
if a dead-end backend is the only thing that's preventing the server
from shutting down, it seems better to kill it immediately. It's
particularly important, if there was a bug in the early startup code
that prevented a dead-end child from timing out and exiting normally.

Includes a test for that case where a dead-end backend previously kept
the server from shutting down.
---
 src/backend/postmaster/postmaster.c     | 35 +++++++-------
 src/test/postmaster/meson.build         |  1 +
 src/test/postmaster/t/002_start_stop.pl | 64 +++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 19 deletions(-)
 create mode 100644 src/test/postmaster/t/002_start_stop.pl

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9bbbbfe55f..99c588ee0b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -409,9 +409,6 @@ static void sigquit_child(pid_t pid);
 static bool SignalSomeChildren(int signal, uint32 targetMask);
 static void TerminateChildren(int signal);
 
-#define SignalChildren(sig)		\
-	SignalSomeChildren(sig, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND))
-
 static int	CountChildren(uint32 targetMask);
 static Backend *assign_backendlist_entry(void);
 static void LaunchMissingBackgroundProcesses(void);
@@ -1963,7 +1960,7 @@ process_pm_reload_request(void)
 		ereport(LOG,
 				(errmsg("received SIGHUP, reloading configuration files")));
 		ProcessConfigFile(PGC_SIGHUP);
-		SignalChildren(SIGHUP);
+		SignalSomeChildren(SIGHUP, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND));
 		if (StartupPID != 0)
 			signal_child(StartupPID, SIGHUP);
 		if (BgWriterPID != 0)
@@ -2381,7 +2378,7 @@ process_pm_child_exit(void)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalSomeChildren(SIGUSR2, BACKEND_TYPE_ALL & (1 << B_DEAD_END_BACKEND));
 
 				pmState = PM_SHUTDOWN_2;
 			}
@@ -2874,7 +2871,7 @@ PostmasterStateMachine(void)
 		 */
 		ForgetUnstartedBackgroundWorkers();
 
-		/* Signal all backend children except walsenders */
+		/* Signal all backend children except walsenders and dead-end backends */
 		SignalSomeChildren(SIGTERM,
 						   BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND));
 		/* and the autovac launcher too */
@@ -2932,10 +2929,11 @@ PostmasterStateMachine(void)
 			if (Shutdown >= ImmediateShutdown || FatalError)
 			{
 				/*
-				 * Start waiting for dead_end children to die.  This state
-				 * change causes ServerLoop to stop creating new ones.
+				 * Stop any dead_end children and stop creating new ones.
 				 */
 				pmState = PM_WAIT_DEAD_END;
+				ConfigurePostmasterWaitSet(false);
+				SignalSomeChildren(SIGQUIT, 1 << B_DEAD_END_BACKEND);
 
 				/*
 				 * We already SIGQUIT'd the archiver and stats processes, if
@@ -2974,9 +2972,10 @@ PostmasterStateMachine(void)
 					 */
 					FatalError = true;
 					pmState = PM_WAIT_DEAD_END;
+					ConfigurePostmasterWaitSet(false);
 
-					/* Kill the walsenders and archiver too */
-					SignalChildren(SIGQUIT);
+					/* Kill the walsenders and archiver, too */
+					SignalSomeChildren(SIGQUIT, BACKEND_TYPE_ALL);
 					if (PgArchPID != 0)
 						signal_child(PgArchPID, SIGQUIT);
 				}
@@ -2994,15 +2993,14 @@ PostmasterStateMachine(void)
 		 */
 		if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) == 0)
 		{
+			ConfigurePostmasterWaitSet(false);
+			SignalSomeChildren(SIGTERM, 1 << B_DEAD_END_BACKEND);
 			pmState = PM_WAIT_DEAD_END;
 		}
 	}
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
-		/* Don't allow any new socket connection events. */
-		ConfigurePostmasterWaitSet(false);
-
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3290,8 +3288,7 @@ sigquit_child(pid_t pid)
 }
 
 /*
- * Send a signal to the targeted children (but NOT special children;
- * dead_end children are never signaled, either XXX).
+ * Send a signal to the targeted children (but NOT special children).
  */
 static bool
 SignalSomeChildren(int signal, uint32 targetMask)
@@ -3322,8 +3319,8 @@ SignalSomeChildren(int signal, uint32 targetMask)
 		}
 
 		ereport(DEBUG4,
-				(errmsg_internal("sending signal %d to process %d",
-								 signal, (int) bp->pid)));
+				(errmsg_internal("sending signal %d to %s process %d",
+								 signal, GetBackendTypeDesc(bp->bkend_type), (int) bp->pid)));
 		signal_child(bp->pid, signal);
 		signaled = true;
 	}
@@ -3332,12 +3329,12 @@ SignalSomeChildren(int signal, uint32 targetMask)
 
 /*
  * Send a termination signal to children.  This considers all of our children
- * processes, except syslogger and dead_end backends.
+ * processes, except syslogger.
  */
 static void
 TerminateChildren(int signal)
 {
-	SignalChildren(signal);
+	SignalSomeChildren(signal, BACKEND_TYPE_ALL);
 	if (StartupPID != 0)
 	{
 		signal_child(StartupPID, signal);
diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
index c2de2e0eb5..2d89adf520 100644
--- a/src/test/postmaster/meson.build
+++ b/src/test/postmaster/meson.build
@@ -7,6 +7,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_connection_limits.pl',
+      't/002_start_stop.pl',
     ],
   },
 }
diff --git a/src/test/postmaster/t/002_start_stop.pl b/src/test/postmaster/t/002_start_stop.pl
new file mode 100644
index 0000000000..6f114659fa
--- /dev/null
+++ b/src/test/postmaster/t/002_start_stop.pl
@@ -0,0 +1,64 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# XXX
+# XXX
+
+use IO::Socket::INET;
+use IO::Socket::UNIX;
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use Time::HiRes qw(time);
+
+# Initialize the server with low connection limits, to test dead-end backends
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "max_connections = 5");
+$node->append_conf('postgresql.conf', "log_connections = on");
+$node->append_conf('postgresql.conf', "log_min_messages = debug2");
+
+# XX
+$node->append_conf('postgresql.conf', "authentication_timeout = '120 s'");
+
+$node->start;
+
+my @sessions = ();
+my @raw_connections = ();
+
+#for (my $i=0; $i <= 5; $i++) {
+#	push(@sessions, $node->background_psql('postgres', on_error_die => 1));
+#}
+#$node->connect_fails("dbname=postgres", "max_connections reached",
+#					 expected_stderr => qr/FATAL:  sorry, too many clients already/);
+
+# We can still open TCP (or Unix domain socket) connections, but beyond a
+# certain number (roughly 2x max_connections), they will be "dead-end backends"
+for (my $i = 0; $i <= 20; $i++)
+{
+	push(@raw_connections, $node->raw_connect());
+}
+
+# Test that the dead-end backends don't prevent the server from stopping.
+my $before = time();
+$node->stop();
+my $elapsed = time() - $before;
+ok($elapsed < 60);
+
+$node->start();
+
+$node->connect_ok("dbname=postgres", "works after restart");
+
+# Clean up
+foreach my $session (@sessions)
+{
+	$session->quit;
+}
+foreach my $socket (@raw_connections)
+{
+	$socket->close();
+}
+
+done_testing();
-- 
2.39.2

v4-0006-Assign-a-child-slot-to-every-postmaster-child-pro.patchtext/x-patch; charset=UTF-8; name=v4-0006-Assign-a-child-slot-to-every-postmaster-child-pro.patchDownload

From a4a0e77f90e5e2e69cd7280b65d0e198cf6067e7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 12:43:22 +0300
Subject: [PATCH v4 6/8] Assign a child slot to every postmaster child process

Previously, only backends, autovacuum workers, and background workers
had an entry in the PMChildFlags array. With this commit, all
postmaster child processes, including all the aux processes, have an
entry.

We now maintain separate free-lists for different kinds of
backends. That ensures that there are always slots available for
autovacuum and background workers. Previously, pre-authorization
backends could prevent autovacuum or background workers from starting
up, by using up all the slots.

The code to manage the slots in the postmaster process is in a new
pmchild.c source file. Because postmaster.c is just so large.

Assigning pmsignal slot numbers is now pmchild.c's responsibility.
This replaces the PMChildInUse array in pmsignal.c.
---
 src/backend/postmaster/Makefile         |   1 +
 src/backend/postmaster/launch_backend.c |   1 +
 src/backend/postmaster/meson.build      |   1 +
 src/backend/postmaster/pmchild.c        | 287 ++++++++++
 src/backend/postmaster/postmaster.c     | 708 ++++++++++--------------
 src/backend/storage/ipc/pmsignal.c      |  83 +--
 src/backend/storage/lmgr/proc.c         |  12 +-
 src/include/postmaster/postmaster.h     |  40 ++
 src/include/storage/pmsignal.h          |   2 +-
 src/tools/pgindent/typedefs.list        |   2 +-
 10 files changed, 654 insertions(+), 483 deletions(-)
 create mode 100644 src/backend/postmaster/pmchild.c

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index db08543d19..c977d91785 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -23,6 +23,7 @@ OBJS = \
 	launch_backend.o \
 	pgarch.o \
 	postmaster.o \
+	pmchild.o \
 	startup.o \
 	syslogger.o \
 	walsummarizer.o \
diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index 0ae23fdf55..b0b91dc97f 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -182,6 +182,7 @@ static child_process_kind child_process_kinds[] = {
 	[B_INVALID] = {"invalid", NULL, false},
 
 	[B_BACKEND] = {"backend", BackendMain, true},
+	[B_DEAD_END_BACKEND] = {"dead-end backend", BackendMain, true},
 	[B_AUTOVAC_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
 	[B_AUTOVAC_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
 	[B_BG_WORKER] = {"bgworker", BackgroundWorkerMain, true},
diff --git a/src/backend/postmaster/meson.build b/src/backend/postmaster/meson.build
index 0ea4bbe084..388848bb52 100644
--- a/src/backend/postmaster/meson.build
+++ b/src/backend/postmaster/meson.build
@@ -11,6 +11,7 @@ backend_sources += files(
   'launch_backend.c',
   'pgarch.c',
   'postmaster.c',
+  'pmchild.c',
   'startup.c',
   'syslogger.c',
   'walsummarizer.c',
diff --git a/src/backend/postmaster/pmchild.c b/src/backend/postmaster/pmchild.c
new file mode 100644
index 0000000000..735c66f8e7
--- /dev/null
+++ b/src/backend/postmaster/pmchild.c
@@ -0,0 +1,287 @@
+/*-------------------------------------------------------------------------
+ *
+ * pmchild.c
+ *	  Functions for keeping track of postmaster child processes.
+ *
+ * Keep track of all child processes, so that when a process exits, we know
+ * kind of a process it was and can clean up accordingly.  Every child process
+ * is allocated a PMChild struct, from a fixed pool of structs.  The size of
+ * the pool is determined by various settings that configure how many worker
+ * processes and backend connections are allowed, i.e. autovacuum_max_workers,
+ * max_worker_processes, max_wal_senders, and max_connections.
+ *
+ * The structures and functions in this file are private to the postmaster
+ * process.  But note that there is an array in shared memory, managed by
+ * pmsignal.c, that mirrors this.
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/pmchild.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "postmaster/autovacuum.h"
+#include "postmaster/postmaster.h"
+#include "replication/walsender.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+
+/*
+ * Freelists for different kinds of child processes.  We maintain separate
+ * pools for them, so that launching a lot of backends cannot exchaust all the
+ * slots, and prevent autovacuum or an aux process from launching.
+ */
+static dlist_head freeBackendList;
+static dlist_head freeAutoVacWorkerList;
+static dlist_head freeBgWorkerList;
+static dlist_head freeAuxList;
+
+/*
+ * List of active child processes.  This includes dead-end children.
+ */
+dlist_head	ActiveChildList;
+
+/*
+ * MaxLivePostmasterChildren
+ *
+ * This reports the number postmaster child processes that can be active.  It
+ * includes all children except for dead_end children.  This allows the array
+ * in shared memory (PMChildFlags) to have a fixed maximum size.
+ */
+int
+MaxLivePostmasterChildren(void)
+{
+	int			n = 0;
+
+	/* We know exactly how mamy worker and aux processes can be active */
+	n += autovacuum_max_workers;
+	n += max_worker_processes;
+	n += NUM_AUXILIARY_PROCS;
+
+	/*
+	 * We allow more connections here than we can have backends because some
+	 * might still be authenticating; they might fail auth, or some existing
+	 * backend might exit before the auth cycle is completed.  The exact
+	 * MaxBackends limit is enforced when a new backend tries to join the
+	 * shared-inval backend array.
+	 */
+	n += 2 * (MaxConnections + max_wal_senders);
+
+	return n;
+}
+
+static void
+init_slot(PMChild *pmchild, int slotno, dlist_head *freelist)
+{
+	pmchild->pid = 0;
+	pmchild->child_slot = slotno + 1;
+	pmchild->bkend_type = B_INVALID;
+	pmchild->rw = NULL;
+	pmchild->bgworker_notify = false;
+	dlist_push_tail(freelist, &pmchild->elem);
+}
+
+/*
+ * Initialize at postmaster startup
+ */
+void
+InitPostmasterChildSlots(void)
+{
+	int			num_pmchild_slots;
+	int			slotno;
+	PMChild    *slots;
+
+	dlist_init(&freeBackendList);
+	dlist_init(&freeAutoVacWorkerList);
+	dlist_init(&freeBgWorkerList);
+	dlist_init(&freeAuxList);
+	dlist_init(&ActiveChildList);
+
+	num_pmchild_slots = MaxLivePostmasterChildren();
+
+	slots = palloc(num_pmchild_slots * sizeof(PMChild));
+
+	slotno = 0;
+	for (int i = 0; i < 2 * (MaxConnections + max_wal_senders); i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeBackendList);
+		slotno++;
+	}
+	for (int i = 0; i < autovacuum_max_workers; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeAutoVacWorkerList);
+		slotno++;
+	}
+	for (int i = 0; i < max_worker_processes; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeBgWorkerList);
+		slotno++;
+	}
+	for (int i = 0; i < NUM_AUXILIARY_PROCS; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeAuxList);
+		slotno++;
+	}
+	Assert(slotno == num_pmchild_slots);
+}
+
+/* Return the appropriate free-list for the given backend type */
+static dlist_head *
+GetFreeList(BackendType btype)
+{
+	switch (btype)
+	{
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_WAL_SENDER:
+		case B_SLOTSYNC_WORKER:
+			return &freeBackendList;
+		case B_AUTOVAC_WORKER:
+			return &freeAutoVacWorkerList;
+
+			/*
+			 * Auxiliary processes.  There can be only one of each of these
+			 * running at a time.
+			 */
+		case B_AUTOVAC_LAUNCHER:
+		case B_ARCHIVER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+		case B_WAL_RECEIVER:
+		case B_WAL_SUMMARIZER:
+		case B_WAL_WRITER:
+			return &freeAuxList;
+
+			/*
+			 * Logger is not connected to shared memory, and does not have a
+			 * PGPROC entry, but we still allocate a child slot for it.
+			 */
+		case B_LOGGER:
+			return &freeAuxList;
+
+		case B_STANDALONE_BACKEND:
+		case B_INVALID:
+		case B_DEAD_END_BACKEND:
+			break;
+	}
+	elog(ERROR, "unexpected BackendType: %d", (int) btype);
+	return NULL;
+}
+
+/*
+ * Allocate a PMChild entry for a backend of given type.
+ *
+ * The entry is taken from the right pool.
+ *
+ * pmchild->child_slot is unique among all active child processes
+ */
+PMChild *
+AssignPostmasterChildSlot(BackendType btype)
+{
+	dlist_head *freelist;
+	PMChild    *pmchild;
+
+	freelist = GetFreeList(btype);
+
+	if (dlist_is_empty(freelist))
+		return NULL;
+
+	pmchild = dlist_container(PMChild, elem, dlist_pop_head_node(freelist));
+	pmchild->pid = 0;
+	pmchild->bkend_type = btype;
+	pmchild->rw = NULL;
+	pmchild->bgworker_notify = true;
+
+	/*
+	 * pmchild->child_slot for each entry was initialized when the array of
+	 * slots was allocated.
+	 */
+
+	dlist_push_head(&ActiveChildList, &pmchild->elem);
+
+	ReservePostmasterChildSlot(pmchild->child_slot);
+
+	/* FIXME: find a more elegant way to pass this */
+	MyPMChildSlot = pmchild->child_slot;
+
+	elog(DEBUG2, "assigned pm child slot %d for %s", pmchild->child_slot, PostmasterChildName(btype));
+
+	return pmchild;
+}
+
+/*
+ * Release a PMChild slot, after the child process has exited.
+ *
+ * Returns true if the child detached cleanly from shared memory, false
+ * otherwise (see ReleasePostmasterChildSlot).
+ */
+bool
+FreePostmasterChildSlot(PMChild *pmchild)
+{
+	elog(DEBUG2, "releasing pm child slot %d", pmchild->child_slot);
+
+	dlist_delete(&pmchild->elem);
+	if (pmchild->bkend_type == B_DEAD_END_BACKEND)
+	{
+		pfree(pmchild);
+		return true;
+	}
+	else
+	{
+		dlist_head *freelist;
+
+		freelist = GetFreeList(pmchild->bkend_type);
+		dlist_push_head(freelist, &pmchild->elem);
+		return ReleasePostmasterChildSlot(pmchild->child_slot);
+	}
+}
+
+PMChild *
+FindPostmasterChildByPid(int pid)
+{
+	dlist_iter	iter;
+
+	dlist_foreach(iter, &ActiveChildList)
+	{
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+		if (bp->pid == pid)
+			return bp;
+	}
+	return NULL;
+}
+
+/*
+ * Allocate a PMChild struct for a dead-end backend.  Dead-end children are
+ * not assigned a child_slot number.  The struct is palloc'd; returns NULL if
+ * out of memory.
+ */
+PMChild *
+AllocDeadEndChild(void)
+{
+	PMChild    *pmchild;
+
+	elog(DEBUG2, "allocating dead-end child");
+
+	pmchild = (PMChild *) palloc_extended(sizeof(PMChild), MCXT_ALLOC_NO_OOM);
+	if (pmchild)
+	{
+		pmchild->pid = 0;
+		pmchild->child_slot = 0;
+		pmchild->bkend_type = B_DEAD_END_BACKEND;
+		pmchild->rw = NULL;
+		pmchild->bgworker_notify = false;
+
+		dlist_push_head(&ActiveChildList, &pmchild->elem);
+	}
+
+	return pmchild;
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 99c588ee0b..a029e28786 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -135,49 +135,8 @@
 #define BACKEND_TYPE_ALL 0xffffffff
 StaticAssertDecl(BACKEND_NUM_TYPES < 32, "too many backend types for uint32");
 
-/*
- * List of active backends (or child processes anyway; we don't actually
- * know whether a given child has become a backend or is still in the
- * authorization phase).  This is used mainly to keep track of how many
- * children we have and send them appropriate signals when necessary.
- *
- * As shown in the above set of backend types, this list includes not only
- * "normal" client sessions, but also autovacuum workers, walsenders, and
- * background workers.  (Note that at the time of launch, walsenders are
- * labeled B_BACKEND; we relabel them to B_WAL_SENDER
- * upon noticing they've changed their PMChildFlags entry.  Hence that check
- * must be done before any operation that needs to distinguish walsenders
- * from normal backends.)
- *
- * Also, "dead_end" children are in it: these are children launched just for
- * the purpose of sending a friendly rejection message to a would-be client.
- * We must track them because they are attached to shared memory, but we know
- * they will never become live backends.  dead_end children are not assigned a
- * PMChildSlot.  dead_end children have bkend_type B_DEAD_END_BACKEND.
- * FIXME: a dead-end backend can send query cancel?
- *
- * "Special" children such as the startup, bgwriter, autovacuum launcher, and
- * slot sync worker tasks are not in this list.  They are tracked via StartupPID
- * and other pid_t variables below.  (Thus, there can't be more than one of any
- * given "special" child process type.  We use BackendList entries for any
- * child process there can be more than one of.)
- */
-typedef struct bkend
-{
-	pid_t		pid;			/* process id of backend */
-	int			child_slot;		/* PMChildSlot for this backend, if any */
-	BackendType bkend_type;		/* child process flavor, see above */
-	RegisteredBgWorker *rw;		/* bgworker info, if this is a bgworker */
-	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
-	dlist_node	elem;			/* list link in BackendList */
-} Backend;
-
-static dlist_head BackendList = DLIST_STATIC_INIT(BackendList);
-
 BackgroundWorker *MyBgworkerEntry = NULL;
 
-
-
 /* The socket number we are listening for connections on */
 int			PostPortNumber = DEF_PGPORT;
 
@@ -229,17 +188,17 @@ bool		remove_temp_files_after_crash = true;
 bool		send_abort_for_crash = false;
 bool		send_abort_for_kill = false;
 
-/* PIDs of special child processes; 0 when not running */
-static pid_t StartupPID = 0,
-			BgWriterPID = 0,
-			CheckpointerPID = 0,
-			WalWriterPID = 0,
-			WalReceiverPID = 0,
-			WalSummarizerPID = 0,
-			AutoVacPID = 0,
-			PgArchPID = 0,
-			SysLoggerPID = 0,
-			SlotSyncWorkerPID = 0;
+/* special child processes; NULL when not running */
+static PMChild *StartupPMChild = NULL,
+		   *BgWriterPMChild = NULL,
+		   *CheckpointerPMChild = NULL,
+		   *WalWriterPMChild = NULL,
+		   *WalReceiverPMChild = NULL,
+		   *WalSummarizerPMChild = NULL,
+		   *AutoVacLauncherPMChild = NULL,
+		   *PgArchPMChild = NULL,
+		   *SysLoggerPMChild = NULL,
+		   *SlotSyncWorkerPMChild = NULL;
 
 /* Startup process's status */
 typedef enum
@@ -287,7 +246,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
  * PM_HOT_STANDBY state.  (connsAllowed can also restrict launching.)
  * In other states we handle connection requests by launching "dead_end"
  * child processes, which will simply send the client an error message and
- * quit.  (We track these in the BackendList so that we can know when they
+ * quit.  (We track these in the ActiveChildList so that we can know when they
  * are all gone; this is important because they're still connected to shared
  * memory, and would interfere with an attempt to destroy the shmem segment,
  * possibly leading to SHMALL failure when we try to make a new one.)
@@ -393,7 +352,7 @@ static void process_pm_child_exit(void);
 static void process_pm_reload_request(void);
 static void process_pm_shutdown_request(void);
 static void dummy_handler(SIGNAL_ARGS);
-static void CleanupBackend(Backend *bp, int exitstatus);
+static void CleanupBackend(PMChild *bp, int exitstatus);
 static void HandleChildCrash(int pid, int exitstatus, const char *procname);
 static void LogChildExit(int lev, const char *procname,
 						 int pid, int exitstatus);
@@ -403,18 +362,18 @@ static void ExitPostmaster(int status) pg_attribute_noreturn();
 static int	ServerLoop(void);
 static int	BackendStartup(ClientSocket *client_sock);
 static void report_fork_failure_to_client(ClientSocket *client_sock, int errnum);
-static CAC_state canAcceptConnections(int backend_type);
-static void signal_child(pid_t pid, int signal);
-static void sigquit_child(pid_t pid);
+static CAC_state canAcceptConnections(BackendType backend_type);
+static void signal_child(PMChild *pmchild, int signal);
+static void sigquit_child(PMChild *pmchild);
 static bool SignalSomeChildren(int signal, uint32 targetMask);
 static void TerminateChildren(int signal);
 
 static int	CountChildren(uint32 targetMask);
-static Backend *assign_backendlist_entry(void);
+static PMChild *assign_backendlist_entry(void);
 static void LaunchMissingBackgroundProcesses(void);
 static void maybe_start_bgworkers(void);
 static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
-static pid_t StartChildProcess(BackendType type);
+static PMChild *StartChildProcess(BackendType type);
 static void StartAutovacuumWorker(void);
 static void InitPostmasterDeathWatchHandle(void);
 
@@ -893,9 +852,11 @@ PostmasterMain(int argc, char *argv[])
 
 	/*
 	 * Now that loadable modules have had their chance to alter any GUCs,
-	 * calculate MaxBackends.
+	 * calculate MaxBackends, and initialize the machinery to track child
+	 * processes.
 	 */
 	InitializeMaxBackends();
+	InitPostmasterChildSlots();
 
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
@@ -1019,7 +980,15 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * If enabled, start up syslogger collection subprocess
 	 */
-	SysLoggerPID = SysLogger_Start();
+	SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
+	if (!SysLoggerPMChild)
+		elog(ERROR, "no postmaster child slot available for syslogger");
+	SysLoggerPMChild->pid = SysLogger_Start();
+	if (SysLoggerPMChild->pid == 0)
+	{
+		FreePostmasterChildSlot(SysLoggerPMChild);
+		SysLoggerPMChild = NULL;
+	}
 
 	/*
 	 * Reset whereToSendOutput from DestDebug (its starting state) to
@@ -1321,16 +1290,16 @@ PostmasterMain(int argc, char *argv[])
 	AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
 
 	/* Start bgwriter and checkpointer so they can help with recovery */
-	if (CheckpointerPID == 0)
-		CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
-	if (BgWriterPID == 0)
-		BgWriterPID = StartChildProcess(B_BG_WRITER);
+	if (CheckpointerPMChild == NULL)
+		CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
+	if (BgWriterPMChild == NULL)
+		BgWriterPMChild = StartChildProcess(B_BG_WRITER);
 
 	/*
 	 * We're ready to rock and roll...
 	 */
-	StartupPID = StartChildProcess(B_STARTUP);
-	Assert(StartupPID != 0);
+	StartupPMChild = StartChildProcess(B_STARTUP);
+	Assert(StartupPMChild != NULL);
 	StartupStatus = STARTUP_RUNNING;
 	pmState = PM_STARTUP;
 
@@ -1660,8 +1629,8 @@ ServerLoop(void)
 		if (avlauncher_needs_signal)
 		{
 			avlauncher_needs_signal = false;
-			if (AutoVacPID != 0)
-				kill(AutoVacPID, SIGUSR2);
+			if (AutoVacLauncherPMChild != NULL)
+				kill(AutoVacLauncherPMChild->pid, SIGUSR2);
 		}
 
 #ifdef HAVE_PTHREAD_IS_THREADED_NP
@@ -1748,7 +1717,7 @@ ServerLoop(void)
  * know whether a NORMAL connection might turn into a walsender.)
  */
 static CAC_state
-canAcceptConnections(int backend_type)
+canAcceptConnections(BackendType backend_type)
 {
 	CAC_state	result = CAC_OK;
 
@@ -1779,21 +1748,6 @@ canAcceptConnections(int backend_type)
 	if (!connsAllowed && backend_type == B_BACKEND)
 		return CAC_SHUTDOWN;	/* shutdown is pending */
 
-	/*
-	 * Don't start too many children.
-	 *
-	 * We allow more connections here than we can have backends because some
-	 * might still be authenticating; they might fail auth, or some existing
-	 * backend might exit before the auth cycle is completed.  The exact
-	 * MaxBackends limit is enforced when a new backend tries to join the
-	 * shared-inval backend array.
-	 *
-	 * The limit here must match the sizes of the per-child-process arrays;
-	 * see comments for MaxLivePostmasterChildren().
-	 */
-	if (CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) >= MaxLivePostmasterChildren())
-		result = CAC_TOOMANY;
-
 	return result;
 }
 
@@ -1961,26 +1915,6 @@ process_pm_reload_request(void)
 				(errmsg("received SIGHUP, reloading configuration files")));
 		ProcessConfigFile(PGC_SIGHUP);
 		SignalSomeChildren(SIGHUP, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND));
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGHUP);
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGHUP);
-		if (CheckpointerPID != 0)
-			signal_child(CheckpointerPID, SIGHUP);
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGHUP);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGHUP);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGHUP);
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGHUP);
-		if (PgArchPID != 0)
-			signal_child(PgArchPID, SIGHUP);
-		if (SysLoggerPID != 0)
-			signal_child(SysLoggerPID, SIGHUP);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGHUP);
 
 		/* Reload authentication config files too */
 		if (!load_hba())
@@ -2218,15 +2152,15 @@ process_pm_child_exit(void)
 
 	while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0)
 	{
-		bool		found;
-		dlist_mutable_iter iter;
+		PMChild    *pmchild;
 
 		/*
 		 * Check if this child was a startup process.
 		 */
-		if (pid == StartupPID)
+		if (StartupPMChild && pid == StartupPMChild->pid)
 		{
-			StartupPID = 0;
+			FreePostmasterChildSlot(StartupPMChild);
+			StartupPMChild = NULL;
 
 			/*
 			 * Startup process exited in response to a shutdown request (or it
@@ -2337,9 +2271,10 @@ process_pm_child_exit(void)
 		 * one at the next iteration of the postmaster's main loop, if
 		 * necessary.  Any other exit condition is treated as a crash.
 		 */
-		if (pid == BgWriterPID)
+		if (BgWriterPMChild && pid == BgWriterPMChild->pid)
 		{
-			BgWriterPID = 0;
+			FreePostmasterChildSlot(BgWriterPMChild);
+			BgWriterPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("background writer process"));
@@ -2349,9 +2284,10 @@ process_pm_child_exit(void)
 		/*
 		 * Was it the checkpointer?
 		 */
-		if (pid == CheckpointerPID)
+		if (CheckpointerPMChild && pid == CheckpointerPMChild->pid)
 		{
-			CheckpointerPID = 0;
+			FreePostmasterChildSlot(CheckpointerPMChild);
+			CheckpointerPMChild = NULL;
 			if (EXIT_STATUS_0(exitstatus) && pmState == PM_SHUTDOWN)
 			{
 				/*
@@ -2371,14 +2307,14 @@ process_pm_child_exit(void)
 				Assert(Shutdown > NoShutdown);
 
 				/* Waken archiver for the last time */
-				if (PgArchPID != 0)
-					signal_child(PgArchPID, SIGUSR2);
+				if (PgArchPMChild != NULL)
+					signal_child(PgArchPMChild, SIGUSR2);
 
 				/*
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalSomeChildren(SIGUSR2, BACKEND_TYPE_ALL & (1 << B_DEAD_END_BACKEND));
+				SignalSomeChildren(SIGUSR2, (1 << B_WAL_SENDER));
 
 				pmState = PM_SHUTDOWN_2;
 			}
@@ -2400,9 +2336,10 @@ process_pm_child_exit(void)
 		 * new one at the next iteration of the postmaster's main loop, if
 		 * necessary.  Any other exit condition is treated as a crash.
 		 */
-		if (pid == WalWriterPID)
+		if (WalWriterPMChild && pid == WalWriterPMChild->pid)
 		{
-			WalWriterPID = 0;
+			FreePostmasterChildSlot(WalWriterPMChild);
+			WalWriterPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("WAL writer process"));
@@ -2415,9 +2352,10 @@ process_pm_child_exit(void)
 		 * backends.  (If we need a new wal receiver, we'll start one at the
 		 * next iteration of the postmaster's main loop.)
 		 */
-		if (pid == WalReceiverPID)
+		if (WalReceiverPMChild && pid == WalReceiverPMChild->pid)
 		{
-			WalReceiverPID = 0;
+			FreePostmasterChildSlot(WalReceiverPMChild);
+			WalReceiverPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("WAL receiver process"));
@@ -2429,9 +2367,10 @@ process_pm_child_exit(void)
 		 * a new one at the next iteration of the postmaster's main loop, if
 		 * necessary.  Any other exit condition is treated as a crash.
 		 */
-		if (pid == WalSummarizerPID)
+		if (WalSummarizerPMChild && pid == WalSummarizerPMChild->pid)
 		{
-			WalSummarizerPID = 0;
+			FreePostmasterChildSlot(WalSummarizerPMChild);
+			WalSummarizerPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("WAL summarizer process"));
@@ -2444,9 +2383,10 @@ process_pm_child_exit(void)
 		 * loop, if necessary.  Any other exit condition is treated as a
 		 * crash.
 		 */
-		if (pid == AutoVacPID)
+		if (AutoVacLauncherPMChild && pid == AutoVacLauncherPMChild->pid)
 		{
-			AutoVacPID = 0;
+			FreePostmasterChildSlot(AutoVacLauncherPMChild);
+			AutoVacLauncherPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("autovacuum launcher process"));
@@ -2459,9 +2399,10 @@ process_pm_child_exit(void)
 		 * and just try to start a new one on the next cycle of the
 		 * postmaster's main loop, to retry archiving remaining files.
 		 */
-		if (pid == PgArchPID)
+		if (PgArchPMChild && pid == PgArchPMChild->pid)
 		{
-			PgArchPID = 0;
+			FreePostmasterChildSlot(PgArchPMChild);
+			PgArchPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("archiver process"));
@@ -2469,11 +2410,15 @@ process_pm_child_exit(void)
 		}
 
 		/* Was it the system logger?  If so, try to start a new one */
-		if (pid == SysLoggerPID)
+		if (SysLoggerPMChild && pid == SysLoggerPMChild->pid)
 		{
-			SysLoggerPID = 0;
 			/* for safety's sake, launch new logger *first* */
-			SysLoggerPID = SysLogger_Start();
+			SysLoggerPMChild->pid = SysLogger_Start();
+			if (SysLoggerPMChild->pid == 0)
+			{
+				FreePostmasterChildSlot(SysLoggerPMChild);
+				SysLoggerPMChild = NULL;
+			}
 			if (!EXIT_STATUS_0(exitstatus))
 				LogChildExit(LOG, _("system logger process"),
 							 pid, exitstatus);
@@ -2487,9 +2432,10 @@ process_pm_child_exit(void)
 		 * start a new one at the next iteration of the postmaster's main
 		 * loop, if necessary. Any other exit condition is treated as a crash.
 		 */
-		if (pid == SlotSyncWorkerPID)
+		if (SlotSyncWorkerPMChild && pid == SlotSyncWorkerPMChild->pid)
 		{
-			SlotSyncWorkerPID = 0;
+			FreePostmasterChildSlot(SlotSyncWorkerPMChild);
+			SlotSyncWorkerPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("slot sync worker process"));
@@ -2499,25 +2445,17 @@ process_pm_child_exit(void)
 		/*
 		 * Was it a backend or a background worker?
 		 */
-		found = false;
-		dlist_foreach_modify(iter, &BackendList)
+		pmchild = FindPostmasterChildByPid(pid);
+		if (pmchild)
 		{
-			Backend    *bp = dlist_container(Backend, elem, iter.cur);
-
-			if (bp->pid == pid)
-			{
-				dlist_delete(iter.cur);
-				CleanupBackend(bp, exitstatus);
-				found = true;
-				break;
-			}
+			CleanupBackend(pmchild, exitstatus);
 		}
 
 		/*
 		 * We don't know anything about this child process.  That's highly
 		 * unexpected, as we do track all the child processes that we fork.
 		 */
-		if (!found)
+		else
 		{
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus, _("untracked child process"));
@@ -2540,15 +2478,24 @@ process_pm_child_exit(void)
  * already been unlinked from BackendList, but we will free it here.
  */
 static void
-CleanupBackend(Backend *bp,
+CleanupBackend(PMChild *bp,
 			   int exitstatus)	/* child's exit status. */
 {
 	char		namebuf[MAXPGPATH];
 	char	   *procname;
 	bool		crashed = false;
 	bool		logged = false;
+	pid_t		bp_pid;
+	bool		bp_bgworker_notify;
+	BackendType bp_bkend_type;
+	RegisteredBgWorker *rw;
 
 	/* Construct a process name for log message */
+
+	/*
+	 * FIXME: use GetBackendTypeDesc here? How does the localization of that
+	 * work?
+	 */
 	if (bp->bkend_type == B_DEAD_END_BACKEND)
 	{
 		procname = _("dead end backend");
@@ -2589,25 +2536,28 @@ CleanupBackend(Backend *bp,
 #endif
 
 	/*
-	 * If the process attached to shared memory, check that it detached
-	 * cleanly.
+	 * Release the PMChild entry.
+	 *
+	 * If the process attached to shared memory, this also checks that it
+	 * detached cleanly.
 	 */
-	if (bp->bkend_type != B_DEAD_END_BACKEND)
+	bp_pid = bp->pid;
+	bp_bgworker_notify = bp->bgworker_notify;
+	bp_bkend_type = bp->bkend_type;
+	rw = bp->rw;
+	if (!FreePostmasterChildSlot(bp))
 	{
-		if (!ReleasePostmasterChildSlot(bp->child_slot))
-		{
-			/*
-			 * Uh-oh, the child failed to clean itself up.  Treat as a crash
-			 * after all.
-			 */
-			crashed = true;
-		}
+		/*
+		 * Uh-oh, the child failed to clean itself up.  Treat as a crash after
+		 * all.
+		 */
+		crashed = true;
 	}
+	bp = NULL;
 
 	if (crashed)
 	{
-		HandleChildCrash(bp->pid, exitstatus, namebuf);
-		pfree(bp);
+		HandleChildCrash(bp_pid, exitstatus, namebuf);
 		return;
 	}
 
@@ -2618,16 +2568,14 @@ CleanupBackend(Backend *bp,
 	 * gets skipped in the (probably very common) case where the backend has
 	 * never requested any such notifications.
 	 */
-	if (bp->bgworker_notify)
-		BackgroundWorkerStopNotifications(bp->pid);
+	if (bp_bgworker_notify)
+		BackgroundWorkerStopNotifications(bp_pid);
 
 	/*
 	 * If it was a background worker, also update its RegisteredWorker entry.
 	 */
-	if (bp->bkend_type == B_BG_WORKER)
+	if (bp_bkend_type == B_BG_WORKER)
 	{
-		RegisteredBgWorker *rw = bp->rw;
-
 		if (!EXIT_STATUS_0(exitstatus))
 		{
 			/* Record timestamp, so we know when to restart the worker. */
@@ -2646,7 +2594,7 @@ CleanupBackend(Backend *bp,
 		if (!logged)
 		{
 			LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
-						 procname, bp->pid, exitstatus);
+						 procname, bp_pid, exitstatus);
 			logged = true;
 		}
 
@@ -2655,9 +2603,7 @@ CleanupBackend(Backend *bp,
 	}
 
 	if (!logged)
-		LogChildExit(DEBUG2, procname, bp->pid, exitstatus);
-
-	pfree(bp);
+		LogChildExit(DEBUG2, procname, bp_pid, exitstatus);
 }
 
 /*
@@ -2697,9 +2643,16 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	{
 		dlist_iter	iter;
 
-		dlist_foreach(iter, &BackendList)
+		dlist_foreach(iter, &ActiveChildList)
 		{
-			Backend    *bp = dlist_container(Backend, elem, iter.cur);
+			PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+			/* We do NOT restart the syslogger */
+			if (bp == SysLoggerPMChild)
+				continue;
+
+			if (bp == StartupPMChild)
+				StartupStatus = STARTUP_SIGNALED;
 
 			/*
 			 * This backend is still alive.  Unless we did so already, tell it
@@ -2708,48 +2661,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 			 * We could exclude dead_end children here, but at least when
 			 * sending SIGABRT it seems better to include them.
 			 */
-			sigquit_child(bp->pid);
+			sigquit_child(bp);
 		}
-
-		if (StartupPID != 0)
-		{
-			sigquit_child(StartupPID);
-			StartupStatus = STARTUP_SIGNALED;
-		}
-
-		/* Take care of the bgwriter too */
-		if (BgWriterPID != 0)
-			sigquit_child(BgWriterPID);
-
-		/* Take care of the checkpointer too */
-		if (CheckpointerPID != 0)
-			sigquit_child(CheckpointerPID);
-
-		/* Take care of the walwriter too */
-		if (WalWriterPID != 0)
-			sigquit_child(WalWriterPID);
-
-		/* Take care of the walreceiver too */
-		if (WalReceiverPID != 0)
-			sigquit_child(WalReceiverPID);
-
-		/* Take care of the walsummarizer too */
-		if (WalSummarizerPID != 0)
-			sigquit_child(WalSummarizerPID);
-
-		/* Take care of the autovacuum launcher too */
-		if (AutoVacPID != 0)
-			sigquit_child(AutoVacPID);
-
-		/* Take care of the archiver too */
-		if (PgArchPID != 0)
-			sigquit_child(PgArchPID);
-
-		/* Take care of the slot sync worker too */
-		if (SlotSyncWorkerPID != 0)
-			sigquit_child(SlotSyncWorkerPID);
-
-		/* We do NOT restart the syslogger */
 	}
 
 	if (Shutdown != ImmediateShutdown)
@@ -2864,6 +2777,8 @@ PostmasterStateMachine(void)
 	 */
 	if (pmState == PM_STOP_BACKENDS)
 	{
+		uint32		targetMask;
+
 		/*
 		 * Forget any pending requests for background workers, since we're no
 		 * longer willing to launch any new workers.  (If additional requests
@@ -2871,29 +2786,27 @@ PostmasterStateMachine(void)
 		 */
 		ForgetUnstartedBackgroundWorkers();
 
-		/* Signal all backend children except walsenders and dead-end backends */
-		SignalSomeChildren(SIGTERM,
-						   BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND));
+		/* Signal all backend children except walsenders */
+		/* dead-end children are not signalled yet */
+		targetMask = (1 << B_BACKEND);
+		targetMask |= (1 << B_BG_WORKER);
+
 		/* and the autovac launcher too */
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGTERM);
+		targetMask |= (1 << B_AUTOVAC_LAUNCHER);
 		/* and the bgwriter too */
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGTERM);
+		targetMask |= (1 << B_BG_WRITER);
 		/* and the walwriter too */
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGTERM);
+		targetMask |= (1 << B_WAL_WRITER);
 		/* If we're in recovery, also stop startup and walreceiver procs */
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGTERM);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGTERM);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGTERM);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGTERM);
+		targetMask |= (1 << B_STARTUP);
+		targetMask |= (1 << B_WAL_RECEIVER);
+
+		targetMask |= (1 << B_WAL_SUMMARIZER);
+		targetMask |= (1 << B_SLOTSYNC_WORKER);
 		/* checkpointer, archiver, stats, and syslogger may continue for now */
 
+		SignalSomeChildren(SIGTERM, targetMask);
+
 		/* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
 		pmState = PM_WAIT_BACKENDS;
 	}
@@ -2915,16 +2828,14 @@ PostmasterStateMachine(void)
 		 * here. Walsenders and archiver are also disregarded, they will be
 		 * terminated later after writing the checkpoint record.
 		 */
-		if (CountChildren(BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND)) == 0 &&
-			StartupPID == 0 &&
-			WalReceiverPID == 0 &&
-			WalSummarizerPID == 0 &&
-			BgWriterPID == 0 &&
-			(CheckpointerPID == 0 ||
-			 (!FatalError && Shutdown < ImmediateShutdown)) &&
-			WalWriterPID == 0 &&
-			AutoVacPID == 0 &&
-			SlotSyncWorkerPID == 0)
+		uint32		remaining;
+
+		remaining = (1 << B_WAL_SENDER) | (1 << B_ARCHIVER) | (1 << B_LOGGER);
+		remaining |= (1 << B_DEAD_END_BACKEND);
+		if (!FatalError && Shutdown < ImmediateShutdown)
+			remaining |= (1 << B_CHECKPOINTER);
+
+		if (CountChildren(BACKEND_TYPE_ALL & ~remaining) == 0)
 		{
 			if (Shutdown >= ImmediateShutdown || FatalError)
 			{
@@ -2950,12 +2861,12 @@ PostmasterStateMachine(void)
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
-				if (CheckpointerPID == 0)
-					CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
+				if (CheckpointerPMChild == NULL)
+					CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
 				/* And tell it to shut down */
-				if (CheckpointerPID != 0)
+				if (CheckpointerPMChild != NULL)
 				{
-					signal_child(CheckpointerPID, SIGUSR2);
+					signal_child(CheckpointerPMChild, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
 				else
@@ -2976,8 +2887,8 @@ PostmasterStateMachine(void)
 
 					/* Kill the walsenders and archiver, too */
 					SignalSomeChildren(SIGQUIT, BACKEND_TYPE_ALL);
-					if (PgArchPID != 0)
-						signal_child(PgArchPID, SIGQUIT);
+					if (PgArchPMChild != NULL)
+						signal_child(PgArchPMChild, SIGQUIT);
 				}
 			}
 		}
@@ -2991,7 +2902,10 @@ PostmasterStateMachine(void)
 		 * left by now anyway; what we're really waiting for is walsenders and
 		 * archiver.
 		 */
-		if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) == 0)
+		uint32		remaining;
+
+		remaining = (1 << B_LOGGER) | (1 << B_DEAD_END_BACKEND);
+		if (CountChildren(BACKEND_TYPE_ALL & ~remaining) == 0)
 		{
 			ConfigurePostmasterWaitSet(false);
 			SignalSomeChildren(SIGTERM, 1 << B_DEAD_END_BACKEND);
@@ -3013,17 +2927,20 @@ PostmasterStateMachine(void)
 		 * normal state transition leading up to PM_WAIT_DEAD_END, or during
 		 * FatalError processing.
 		 */
-		if (dlist_is_empty(&BackendList) && PgArchPID == 0)
+		if (dlist_is_empty(&ActiveChildList) ||
+			(SysLoggerPMChild != NULL &&
+			 dlist_head_node(&ActiveChildList) == &SysLoggerPMChild->elem &&
+			 dlist_tail_node(&ActiveChildList) == &SysLoggerPMChild->elem))
 		{
 			/* These other guys should be dead already */
-			Assert(StartupPID == 0);
-			Assert(WalReceiverPID == 0);
-			Assert(WalSummarizerPID == 0);
-			Assert(BgWriterPID == 0);
-			Assert(CheckpointerPID == 0);
-			Assert(WalWriterPID == 0);
-			Assert(AutoVacPID == 0);
-			Assert(SlotSyncWorkerPID == 0);
+			Assert(StartupPMChild == NULL);
+			Assert(WalReceiverPMChild == NULL);
+			Assert(WalSummarizerPMChild == NULL);
+			Assert(BgWriterPMChild == NULL);
+			Assert(CheckpointerPMChild == NULL);
+			Assert(WalWriterPMChild == NULL);
+			Assert(AutoVacLauncherPMChild == NULL);
+			Assert(SlotSyncWorkerPMChild == NULL);
 			/* syslogger is not considered here */
 			pmState = PM_NO_CHILDREN;
 		}
@@ -3106,8 +3023,8 @@ PostmasterStateMachine(void)
 		/* re-create shared memory and semaphores */
 		CreateSharedMemoryAndSemaphores();
 
-		StartupPID = StartChildProcess(B_STARTUP);
-		Assert(StartupPID != 0);
+		StartupPMChild = StartChildProcess(B_STARTUP);
+		Assert(StartupPMChild != NULL);
 		StartupStatus = STARTUP_RUNNING;
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
@@ -3130,8 +3047,21 @@ static void
 LaunchMissingBackgroundProcesses(void)
 {
 	/* Syslogger is active in all states */
-	if (SysLoggerPID == 0 && Logging_collector)
-		SysLoggerPID = SysLogger_Start();
+	if (SysLoggerPMChild == NULL && Logging_collector)
+	{
+		SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
+		if (!SysLoggerPMChild)
+			elog(LOG, "no postmaster child slot available for syslogger");
+		else
+		{
+			SysLoggerPMChild->pid = SysLogger_Start();
+			if (SysLoggerPMChild->pid == 0)
+			{
+				FreePostmasterChildSlot(SysLoggerPMChild);
+				SysLoggerPMChild = NULL;
+			}
+		}
+	}
 
 	/*
 	 * The checkpointer and the background writer are active from the start,
@@ -3144,30 +3074,30 @@ LaunchMissingBackgroundProcesses(void)
 	if (pmState == PM_RUN || pmState == PM_RECOVERY ||
 		pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
 	{
-		if (CheckpointerPID == 0)
-			CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
-		if (BgWriterPID == 0)
-			BgWriterPID = StartChildProcess(B_BG_WRITER);
+		if (CheckpointerPMChild == NULL)
+			CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
+		if (BgWriterPMChild == NULL)
+			BgWriterPMChild = StartChildProcess(B_BG_WRITER);
 	}
 
 	/*
 	 * WAL writer is needed only in normal operation (else we cannot be
 	 * writing any new WAL).
 	 */
-	if (WalWriterPID == 0 && pmState == PM_RUN)
-		WalWriterPID = StartChildProcess(B_WAL_WRITER);
+	if (WalWriterPMChild == NULL && pmState == PM_RUN)
+		WalWriterPMChild = StartChildProcess(B_WAL_WRITER);
 
 	/*
 	 * We don't want autovacuum to run in binary upgrade mode because
 	 * autovacuum might update relfrozenxid for empty tables before the
 	 * physical files are put in place.
 	 */
-	if (!IsBinaryUpgrade && AutoVacPID == 0 &&
+	if (!IsBinaryUpgrade && AutoVacLauncherPMChild == NULL &&
 		(AutoVacuumingActive() || start_autovac_launcher) &&
 		pmState == PM_RUN)
 	{
-		AutoVacPID = StartChildProcess(B_AUTOVAC_LAUNCHER);
-		if (AutoVacPID != 0)
+		AutoVacLauncherPMChild = StartChildProcess(B_AUTOVAC_LAUNCHER);
+		if (AutoVacLauncherPMChild != NULL)
 			start_autovac_launcher = false; /* signal processed */
 	}
 
@@ -3175,11 +3105,11 @@ LaunchMissingBackgroundProcesses(void)
 	 * If WAL archiving is enabled always, we are allowed to start archiver
 	 * even during recovery.
 	 */
-	if (PgArchPID == 0 &&
+	if (PgArchPMChild == NULL &&
 		((XLogArchivingActive() && pmState == PM_RUN) ||
 		 (XLogArchivingAlways() && (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY))) &&
 		PgArchCanRestart())
-		PgArchPID = StartChildProcess(B_ARCHIVER);
+		PgArchPMChild = StartChildProcess(B_ARCHIVER);
 
 	/*
 	 * If we need to start a slot sync worker, try to do that now
@@ -3189,10 +3119,10 @@ LaunchMissingBackgroundProcesses(void)
 	 * configured correctly, and it is the first time of worker's launch, or
 	 * enough time has passed since the worker was launched last.
 	 */
-	if (SlotSyncWorkerPID == 0 && pmState == PM_HOT_STANDBY &&
+	if (SlotSyncWorkerPMChild == NULL && pmState == PM_HOT_STANDBY &&
 		Shutdown <= SmartShutdown && sync_replication_slots &&
 		ValidateSlotSyncParams(LOG) && SlotSyncWorkerCanRestart())
-		SlotSyncWorkerPID = StartChildProcess(B_SLOTSYNC_WORKER);
+		SlotSyncWorkerPMChild = StartChildProcess(B_SLOTSYNC_WORKER);
 
 	/*
 	 * If we need to start a WAL receiver, try to do that now
@@ -3208,23 +3138,23 @@ LaunchMissingBackgroundProcesses(void)
 	 */
 	if (WalReceiverRequested)
 	{
-		if (WalReceiverPID == 0 &&
+		if (WalReceiverPMChild == NULL &&
 			(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 			 pmState == PM_HOT_STANDBY) &&
 			Shutdown <= SmartShutdown)
 		{
-			WalReceiverPID = StartChildProcess(B_WAL_RECEIVER);
-			if (WalReceiverPID != 0)
+			WalReceiverPMChild = StartChildProcess(B_WAL_RECEIVER);
+			if (WalReceiverPMChild != 0)
 				WalReceiverRequested = false;
 			/* else leave the flag set, so we'll try again later */
 		}
 	}
 
 	/* If we need to start a WAL summarizer, try to do that now */
-	if (summarize_wal && WalSummarizerPID == 0 &&
+	if (summarize_wal && WalSummarizerPMChild == NULL &&
 		(pmState == PM_RUN || pmState == PM_HOT_STANDBY) &&
 		Shutdown <= SmartShutdown)
-		WalSummarizerPID = StartChildProcess(B_WAL_SUMMARIZER);
+		WalSummarizerPMChild = StartChildProcess(B_WAL_SUMMARIZER);
 
 	/* Get other worker processes running, if needed */
 	if (StartWorkerNeeded || HaveCrashedWorker)
@@ -3248,8 +3178,14 @@ LaunchMissingBackgroundProcesses(void)
  * child twice will not cause any problems.
  */
 static void
-signal_child(pid_t pid, int signal)
+signal_child(PMChild *pmchild, int signal)
 {
+	pid_t		pid;
+
+	if (pmchild == NULL || pmchild->pid == 0)
+		return;
+	pid = pmchild->pid;
+
 	if (kill(pid, signal) < 0)
 		elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
 #ifdef HAVE_SETSID
@@ -3278,13 +3214,13 @@ signal_child(pid_t pid, int signal)
  * to use SIGABRT to collect per-child core dumps.
  */
 static void
-sigquit_child(pid_t pid)
+sigquit_child(PMChild *pmchild)
 {
 	ereport(DEBUG2,
 			(errmsg_internal("sending %s to process %d",
 							 (send_abort_for_crash ? "SIGABRT" : "SIGQUIT"),
-							 (int) pid)));
-	signal_child(pid, (send_abort_for_crash ? SIGABRT : SIGQUIT));
+							 (int) pmchild->pid)));
+	signal_child(pmchild, (send_abort_for_crash ? SIGABRT : SIGQUIT));
 }
 
 /*
@@ -3296,13 +3232,13 @@ SignalSomeChildren(int signal, uint32 targetMask)
 	dlist_iter	iter;
 	bool		signaled = false;
 
-	dlist_foreach(iter, &BackendList)
+	dlist_foreach(iter, &ActiveChildList)
 	{
-		Backend    *bp = dlist_container(Backend, elem, iter.cur);
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
 
 		/*
-		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
-		 * it first and avoid touching shared memory for every child.
+		 * Since targetMask == BACKEND_TYPE_ALL is the most common case, we
+		 * test it first and avoid touching shared memory for every child.
 		 */
 		if (targetMask != BACKEND_TYPE_ALL)
 		{
@@ -3321,7 +3257,7 @@ SignalSomeChildren(int signal, uint32 targetMask)
 		ereport(DEBUG4,
 				(errmsg_internal("sending signal %d to %s process %d",
 								 signal, GetBackendTypeDesc(bp->bkend_type), (int) bp->pid)));
-		signal_child(bp->pid, signal);
+		signal_child(bp, signal);
 		signaled = true;
 	}
 	return signaled;
@@ -3334,29 +3270,12 @@ SignalSomeChildren(int signal, uint32 targetMask)
 static void
 TerminateChildren(int signal)
 {
-	SignalSomeChildren(signal, BACKEND_TYPE_ALL);
-	if (StartupPID != 0)
+	SignalSomeChildren(signal, BACKEND_TYPE_ALL & ~(1 << B_LOGGER));
+	if (StartupPMChild != NULL)
 	{
-		signal_child(StartupPID, signal);
 		if (signal == SIGQUIT || signal == SIGKILL || signal == SIGABRT)
 			StartupStatus = STARTUP_SIGNALED;
 	}
-	if (BgWriterPID != 0)
-		signal_child(BgWriterPID, signal);
-	if (CheckpointerPID != 0)
-		signal_child(CheckpointerPID, signal);
-	if (WalWriterPID != 0)
-		signal_child(WalWriterPID, signal);
-	if (WalReceiverPID != 0)
-		signal_child(WalReceiverPID, signal);
-	if (WalSummarizerPID != 0)
-		signal_child(WalSummarizerPID, signal);
-	if (AutoVacPID != 0)
-		signal_child(AutoVacPID, signal);
-	if (PgArchPID != 0)
-		signal_child(PgArchPID, signal);
-	if (SlotSyncWorkerPID != 0)
-		signal_child(SlotSyncWorkerPID, signal);
 }
 
 /*
@@ -3369,45 +3288,45 @@ TerminateChildren(int signal)
 static int
 BackendStartup(ClientSocket *client_sock)
 {
-	Backend    *bn;				/* for backend cleanup */
+	PMChild    *bn = NULL;
 	pid_t		pid;
 	BackendStartupData startup_data;
+	CAC_state	cac;
 
-	/*
-	 * Create backend data structure.  Better before the fork() so we can
-	 * handle failure cleanly.
-	 */
-	bn = (Backend *) palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
+	cac = canAcceptConnections(B_BACKEND);
+	if (cac == CAC_OK)
+	{
+		bn = AssignPostmasterChildSlot(B_BACKEND);
+		if (!bn)
+		{
+			/*
+			 * Too many regular child processes; launch a dead-end child
+			 * process instead.
+			 */
+			cac = CAC_TOOMANY;
+		}
+	}
 	if (!bn)
 	{
-		ereport(LOG,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-		return STATUS_ERROR;
+		bn = AllocDeadEndChild();
+		if (!bn)
+		{
+			ereport(LOG,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+			return STATUS_ERROR;
+		}
 	}
 
 	/* Pass down canAcceptConnections state */
-	startup_data.canAcceptConnections = canAcceptConnections(B_BACKEND);
+	startup_data.canAcceptConnections = cac;
 	bn->rw = NULL;
 
-	/*
-	 * Unless it's a dead_end child, assign it a child slot number
-	 */
-	if (startup_data.canAcceptConnections == CAC_OK)
-	{
-		bn->bkend_type = B_BACKEND; /* Can change later to WALSND */
-		bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
-	}
-	else
-	{
-		bn->bkend_type = B_DEAD_END_BACKEND;
-		bn->child_slot = 0;
-	}
-
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
-	pid = postmaster_child_launch(B_BACKEND,
+	MyPMChildSlot = bn->child_slot;
+	pid = postmaster_child_launch(bn->bkend_type,
 								  (char *) &startup_data, sizeof(startup_data),
 								  client_sock);
 	if (pid < 0)
@@ -3415,9 +3334,7 @@ BackendStartup(ClientSocket *client_sock)
 		/* in parent, fork failed */
 		int			save_errno = errno;
 
-		if (bn->child_slot != 0)
-			(void) ReleasePostmasterChildSlot(bn->child_slot);
-		pfree(bn);
+		(void) FreePostmasterChildSlot(bn);
 		errno = save_errno;
 		ereport(LOG,
 				(errmsg("could not fork new process for connection: %m")));
@@ -3435,7 +3352,6 @@ BackendStartup(ClientSocket *client_sock)
 	 * of backends.
 	 */
 	bn->pid = pid;
-	dlist_push_head(&BackendList, &bn->elem);
 
 	return STATUS_OK;
 }
@@ -3534,9 +3450,9 @@ process_pm_pmsignal(void)
 		 * Start the archiver if we're responsible for (re-)archiving received
 		 * files.
 		 */
-		Assert(PgArchPID == 0);
+		Assert(PgArchPMChild == NULL);
 		if (XLogArchivingAlways())
-			PgArchPID = StartChildProcess(B_ARCHIVER);
+			PgArchPMChild = StartChildProcess(B_ARCHIVER);
 
 		/*
 		 * If we aren't planning to enter hot standby mode later, treat
@@ -3582,16 +3498,16 @@ process_pm_pmsignal(void)
 	}
 
 	/* Tell syslogger to rotate logfile if requested */
-	if (SysLoggerPID != 0)
+	if (SysLoggerPMChild != NULL)
 	{
 		if (CheckLogrotateSignal())
 		{
-			signal_child(SysLoggerPID, SIGUSR1);
+			signal_child(SysLoggerPMChild, SIGUSR1);
 			RemoveLogrotateSignalFiles();
 		}
 		else if (CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE))
 		{
-			signal_child(SysLoggerPID, SIGUSR1);
+			signal_child(SysLoggerPMChild, SIGUSR1);
 		}
 	}
 
@@ -3638,7 +3554,7 @@ process_pm_pmsignal(void)
 		PostmasterStateMachine();
 	}
 
-	if (StartupPID != 0 &&
+	if (StartupPMChild != NULL &&
 		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 		 pmState == PM_HOT_STANDBY) &&
 		CheckPromoteSignal())
@@ -3649,7 +3565,7 @@ process_pm_pmsignal(void)
 		 * Leave the promote signal file in place and let the Startup process
 		 * do the unlink.
 		 */
-		signal_child(StartupPID, SIGUSR2);
+		signal_child(StartupPMChild, SIGUSR2);
 	}
 }
 
@@ -3676,13 +3592,13 @@ CountChildren(uint32 targetMask)
 	dlist_iter	iter;
 	int			cnt = 0;
 
-	dlist_foreach(iter, &BackendList)
+	dlist_foreach(iter, &ActiveChildList)
 	{
-		Backend    *bp = dlist_container(Backend, elem, iter.cur);
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
 
 		/*
-		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
-		 * it first and avoid touching shared memory for every child.
+		 * Since targetMask == BACKEND_TYPE_ALL is the most common case, we
+		 * test it first and avoid touching shared memory for every child.
 		 */
 		if (targetMask != BACKEND_TYPE_ALL)
 		{
@@ -3713,15 +3629,33 @@ CountChildren(uint32 targetMask)
  * Return value of StartChildProcess is subprocess' PID, or 0 if failed
  * to start subprocess.
  */
-static pid_t
+static PMChild *
 StartChildProcess(BackendType type)
 {
+	PMChild    *pmchild;
 	pid_t		pid;
 
+	pmchild = AssignPostmasterChildSlot(type);
+	if (!pmchild)
+	{
+		if (type == B_AUTOVAC_WORKER)
+			ereport(LOG,
+					(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+					 errmsg("no slot available for new autovacuum worker process")));
+		else
+		{
+			/* shouldn't happen because we allocate enough slots */
+			elog(LOG, "no postmaster child slot available for aux process");
+		}
+		return NULL;
+	}
+
+	MyPMChildSlot = pmchild->child_slot;
 	pid = postmaster_child_launch(type, NULL, 0, NULL);
 	if (pid < 0)
 	{
 		/* in parent, fork failed */
+		FreePostmasterChildSlot(pmchild);
 		ereport(LOG,
 				(errmsg("could not fork \"%s\" process: %m", PostmasterChildName(type))));
 
@@ -3731,13 +3665,14 @@ StartChildProcess(BackendType type)
 		 */
 		if (type == B_STARTUP)
 			ExitPostmaster(1);
-		return 0;
+		return NULL;
 	}
 
 	/*
 	 * in parent, successful fork
 	 */
-	return pid;
+	pmchild->pid = pid;
+	return pmchild;
 }
 
 /*
@@ -3752,7 +3687,7 @@ StartChildProcess(BackendType type)
 static void
 StartAutovacuumWorker(void)
 {
-	Backend    *bn;
+	PMChild    *bn;
 
 	/*
 	 * If not in condition to run a process, don't try, but handle it like a
@@ -3763,34 +3698,20 @@ StartAutovacuumWorker(void)
 	 */
 	if (canAcceptConnections(B_AUTOVAC_WORKER) == CAC_OK)
 	{
-		bn = (Backend *) palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
+		bn = StartChildProcess(B_AUTOVAC_WORKER);
 		if (bn)
 		{
-			/* Autovac workers need a child slot */
-			bn->bkend_type = B_AUTOVAC_WORKER;
-			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 			bn->bgworker_notify = false;
 			bn->rw = NULL;
-
-			bn->pid = StartChildProcess(B_AUTOVAC_WORKER);
-			if (bn->pid > 0)
-			{
-				dlist_push_head(&BackendList, &bn->elem);
-				/* all OK */
-				return;
-			}
-
+			return;
+		}
+		else
+		{
 			/*
 			 * fork failed, fall through to report -- actual error message was
 			 * logged by StartChildProcess
 			 */
-			(void) ReleasePostmasterChildSlot(bn->child_slot);
-			pfree(bn);
 		}
-		else
-			ereport(LOG,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
 	}
 
 	/*
@@ -3802,7 +3723,7 @@ StartAutovacuumWorker(void)
 	 * quick succession between the autovac launcher and postmaster in case
 	 * things get ugly.
 	 */
-	if (AutoVacPID != 0)
+	if (AutoVacLauncherPMChild != NULL)
 	{
 		AutoVacWorkerFailed();
 		avlauncher_needs_signal = true;
@@ -3846,23 +3767,6 @@ CreateOptsFile(int argc, char *argv[], char *fullprogname)
 }
 
 
-/*
- * MaxLivePostmasterChildren
- *
- * This reports the number of entries needed in the per-child-process array
- * (PMChildFlags).  It includes regular backends, autovac workers, walsenders
- * and background workers, but not special children nor dead_end children.
- * This allows the array to have a fixed maximum size, to wit the same
- * too-many-children limit enforced by canAcceptConnections().  The exact value
- * isn't too critical as long as it's more than MaxBackends.
- */
-int
-MaxLivePostmasterChildren(void)
-{
-	return 2 * (MaxConnections + autovacuum_max_workers + 1 +
-				max_wal_senders + max_worker_processes);
-}
-
 /*
  * Start a new bgworker.
  * Starting time conditions must have been checked already.
@@ -3875,7 +3779,7 @@ MaxLivePostmasterChildren(void)
 static bool
 do_start_bgworker(RegisteredBgWorker *rw)
 {
-	Backend    *bn;
+	PMChild    *bn;
 	pid_t		worker_pid;
 
 	Assert(rw->rw_pid == 0);
@@ -3902,6 +3806,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 			(errmsg_internal("starting background worker process \"%s\"",
 							 rw->rw_worker.bgw_name)));
 
+	MyPMChildSlot = bn->child_slot;
 	worker_pid = postmaster_child_launch(B_BG_WORKER, (char *) &rw->rw_worker, sizeof(BackgroundWorker), NULL);
 	if (worker_pid == -1)
 	{
@@ -3909,8 +3814,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errmsg("could not fork background worker process: %m")));
 		/* undo what assign_backendlist_entry did */
-		ReleasePostmasterChildSlot(bn->child_slot);
-		pfree(bn);
+		FreePostmasterChildSlot(bn);
 
 		/* mark entry as crashed, so we'll try again later */
 		rw->rw_crashed_at = GetCurrentTimestamp();
@@ -3921,8 +3825,6 @@ do_start_bgworker(RegisteredBgWorker *rw)
 	rw->rw_pid = worker_pid;
 	bn->pid = rw->rw_pid;
 	ReportBackgroundWorkerPID(rw);
-	/* add new worker to lists of backends */
-	dlist_push_head(&BackendList, &bn->elem);
 	return true;
 }
 
@@ -3970,17 +3872,13 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
  *
  * On failure, return NULL.
  */
-static Backend *
+static PMChild *
 assign_backendlist_entry(void)
 {
-	Backend    *bn;
+	PMChild    *bn;
 
-	/*
-	 * Check that database state allows another connection.  Currently the
-	 * only possible failure is CAC_TOOMANY, so we just log an error message
-	 * based on that rather than checking the error code precisely.
-	 */
-	if (canAcceptConnections(B_BG_WORKER) != CAC_OK)
+	bn = AssignPostmasterChildSlot(B_BG_WORKER);
+	if (bn == NULL)
 	{
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
@@ -3988,16 +3886,6 @@ assign_backendlist_entry(void)
 		return NULL;
 	}
 
-	bn = palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
-	if (bn == NULL)
-	{
-		ereport(LOG,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-		return NULL;
-	}
-
-	bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 	bn->bkend_type = B_BG_WORKER;
 	bn->bgworker_notify = false;
 
@@ -4138,11 +4026,11 @@ bool
 PostmasterMarkPIDForWorkerNotify(int pid)
 {
 	dlist_iter	iter;
-	Backend    *bp;
+	PMChild    *bp;
 
-	dlist_foreach(iter, &BackendList)
+	dlist_foreach(iter, &ActiveChildList)
 	{
-		bp = dlist_container(Backend, elem, iter.cur);
+		bp = dlist_container(PMChild, elem, iter.cur);
 		if (bp->pid == pid)
 		{
 			bp->bgworker_notify = true;
diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index cb99e77476..86970bf69b 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -47,11 +47,11 @@
  * exited without performing proper shutdown.  The per-child-process flags
  * have three possible states: UNUSED, ASSIGNED, ACTIVE.  An UNUSED slot is
  * available for assignment.  An ASSIGNED slot is associated with a postmaster
- * child process, but either the process has not touched shared memory yet,
- * or it has successfully cleaned up after itself.  A ACTIVE slot means the
- * process is actively using shared memory.  The slots are assigned to
- * child processes at random, and postmaster.c is responsible for tracking
- * which one goes with which PID.
+ * child process, but either the process has not touched shared memory yet, or
+ * it has successfully cleaned up after itself.  An ACTIVE slot means the
+ * process is actively using shared memory.  The slots are assigned to child
+ * processes by postmaster, and postmaster.c is responsible for tracking which
+ * one goes with which PID.
  *
  * Actually there is a fourth state, WALSENDER.  This is just like ACTIVE,
  * but carries the extra information that the child is a WAL sender.
@@ -83,15 +83,6 @@ struct PMSignalData
 /* PMSignalState pointer is valid in both postmaster and child processes */
 NON_EXEC_STATIC volatile PMSignalData *PMSignalState = NULL;
 
-/*
- * These static variables are valid only in the postmaster.  We keep a
- * duplicative private array so that we can trust its state even if some
- * failing child has clobbered the PMSignalData struct in shared memory.
- */
-static int	num_child_inuse;	/* # of entries in PMChildInUse[] */
-static int	next_child_inuse;	/* next slot to try to assign */
-static bool *PMChildInUse;		/* true if i'th flag slot is assigned */
-
 /*
  * Signal handler to be notified if postmaster dies.
  */
@@ -155,25 +146,7 @@ PMSignalShmemInit(void)
 	{
 		/* initialize all flags to zeroes */
 		MemSet(unvolatize(PMSignalData *, PMSignalState), 0, PMSignalShmemSize());
-		num_child_inuse = MaxLivePostmasterChildren();
-		PMSignalState->num_child_flags = num_child_inuse;
-
-		/*
-		 * Also allocate postmaster's private PMChildInUse[] array.  We
-		 * might've already done that in a previous shared-memory creation
-		 * cycle, in which case free the old array to avoid a leak.  (Do it
-		 * like this to support the possibility that MaxLivePostmasterChildren
-		 * changed.)  In a standalone backend, we do not need this.
-		 */
-		if (PostmasterContext != NULL)
-		{
-			if (PMChildInUse)
-				pfree(PMChildInUse);
-			PMChildInUse = (bool *)
-				MemoryContextAllocZero(PostmasterContext,
-									   num_child_inuse * sizeof(bool));
-		}
-		next_child_inuse = 0;
+		PMSignalState->num_child_flags = MaxLivePostmasterChildren();
 	}
 }
 
@@ -239,41 +212,22 @@ GetQuitSignalReason(void)
 
 
 /*
- * AssignPostmasterChildSlot - select an unused slot for a new postmaster
- * child process, and set its state to ASSIGNED.  Returns a slot number
- * (one to N).
+ * ReservePostmasterChildSlot - mark the given slot as ASSIGNED for a new
+ * postmaster child process.
  *
  * Only the postmaster is allowed to execute this routine, so we need no
  * special locking.
  */
-int
-AssignPostmasterChildSlot(void)
+void
+ReservePostmasterChildSlot(int slot)
 {
-	int			slot = next_child_inuse;
-	int			n;
+	Assert(slot > 0 && slot <= PMSignalState->num_child_flags);
+	slot--;
 
-	/*
-	 * Scan for a free slot.  Notice that we trust nothing about the contents
-	 * of PMSignalState, but use only postmaster-local data for this decision.
-	 * We track the last slot assigned so as not to waste time repeatedly
-	 * rescanning low-numbered slots.
-	 */
-	for (n = num_child_inuse; n > 0; n--)
-	{
-		if (--slot < 0)
-			slot = num_child_inuse - 1;
-		if (!PMChildInUse[slot])
-		{
-			PMChildInUse[slot] = true;
-			PMSignalState->PMChildFlags[slot] = PM_CHILD_ASSIGNED;
-			next_child_inuse = slot;
-			return slot + 1;
-		}
-	}
+	if (PMSignalState->PMChildFlags[slot] != PM_CHILD_UNUSED)
+		elog(FATAL, "postmaster child slot is already in use");
 
-	/* Out of slots ... should never happen, else postmaster.c messed up */
-	elog(FATAL, "no free slots in PMChildFlags array");
-	return 0;					/* keep compiler quiet */
+	PMSignalState->PMChildFlags[slot] = PM_CHILD_ASSIGNED;
 }
 
 /*
@@ -288,17 +242,18 @@ ReleasePostmasterChildSlot(int slot)
 {
 	bool		result;
 
-	Assert(slot > 0 && slot <= num_child_inuse);
+	Assert(slot > 0 && slot <= PMSignalState->num_child_flags);
 	slot--;
 
 	/*
 	 * Note: the slot state might already be unused, because the logic in
 	 * postmaster.c is such that this might get called twice when a child
 	 * crashes.  So we don't try to Assert anything about the state.
+	 *
+	 * FIXME: does that still happen?
 	 */
 	result = (PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED);
 	PMSignalState->PMChildFlags[slot] = PM_CHILD_UNUSED;
-	PMChildInUse[slot] = false;
 	return result;
 }
 
@@ -309,7 +264,7 @@ ReleasePostmasterChildSlot(int slot)
 bool
 IsPostmasterChildWalSender(int slot)
 {
-	Assert(slot > 0 && slot <= num_child_inuse);
+	Assert(slot > 0 && slot <= PMSignalState->num_child_flags);
 	slot--;
 
 	if (PMSignalState->PMChildFlags[slot] == PM_CHILD_WALSENDER)
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 9536469e89..2fa709137f 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -311,14 +311,9 @@ InitProcess(void)
 	/*
 	 * Before we start accessing the shared memory in a serious way, mark
 	 * ourselves as an active postmaster child; this is so that the postmaster
-	 * can detect it if we exit without cleaning up.  (XXX autovac launcher
-	 * currently doesn't participate in this; it probably should.)
-	 *
-	 * Slot sync worker also does not participate in it, see comments atop
-	 * 'struct bkend' in postmaster.c.
+	 * can detect it if we exit without cleaning up.
 	 */
-	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
-		!AmLogicalSlotSyncWorkerProcess())
+	if (IsUnderPostmaster)
 		MarkPostmasterChildActive();
 
 	/* Decide which list should supply our PGPROC. */
@@ -536,6 +531,9 @@ InitAuxiliaryProcess(void)
 	if (MyProc != NULL)
 		elog(ERROR, "you already exist");
 
+	if (IsUnderPostmaster)
+		MarkPostmasterChildActive();
+
 	/*
 	 * We use the ProcStructLock to protect assignment and releasing of
 	 * AuxiliaryProcs entries.
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 63c12917cf..deca2e8370 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -13,8 +13,39 @@
 #ifndef _POSTMASTER_H
 #define _POSTMASTER_H
 
+#include "lib/ilist.h"
 #include "miscadmin.h"
 
+/*
+ * A struct representing an active postmaster child process.  This is used
+ * mainly to keep track of how many children we have and send them appropriate
+ * signals when necessary.  All postmaster child processes are assigned a
+ * PMChild entry. That includes "normal" client sessions, but also autovacuum
+ * workers, walsenders, background workers, and aux processes.  (Note that at
+ * the time of launch, walsenders are labeled B_BACKEND; we relabel them to
+ * B_WAL_SENDER upon noticing they've changed their PMChildFlags entry.  Hence
+ * that check must be done before any operation that needs to distinguish
+ * walsenders from normal backends.)
+ *
+ * "dead_end" children are also allocated a PMChild entry: these are children
+ * launched just for the purpose of sending a friendly rejection message to a
+ * would-be client.  We must track them because they are attached to shared
+ * memory, but we know they will never become live backends.
+ *
+ * 'child_slot' is an identifier that is unique across all running child
+ * processes.  It is used as an index into the PMChildFlags array. dead_end
+ * children are not assigned a child_slot.
+ */
+typedef struct
+{
+	pid_t		pid;			/* process id of backend */
+	int			child_slot;		/* PMChildSlot for this backend, if any */
+	BackendType bkend_type;		/* child process flavor, see above */
+	struct RegisteredBgWorker *rw;	/* bgworker info, if this is a bgworker */
+	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
+	dlist_node	elem;			/* list link in BackendList */
+} PMChild;
+
 /* GUC options */
 extern PGDLLIMPORT bool EnableSSL;
 extern PGDLLIMPORT int SuperuserReservedConnections;
@@ -80,6 +111,15 @@ const char *PostmasterChildName(BackendType child_type);
 extern void SubPostmasterMain(int argc, char *argv[]) pg_attribute_noreturn();
 #endif
 
+/* prototypes for functions in pmchild.c */
+extern dlist_head ActiveChildList;
+
+extern void InitPostmasterChildSlots(void);
+extern PMChild *AssignPostmasterChildSlot(BackendType btype);
+extern bool FreePostmasterChildSlot(PMChild *pmchild);
+extern PMChild *FindPostmasterChildByPid(int pid);
+extern PMChild *AllocDeadEndChild(void);
+
 /*
  * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
  * for buffer references in buf_internals.h.  This limitation could be lifted
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 3b9336b83c..2ab198fc31 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -70,7 +70,7 @@ extern void SendPostmasterSignal(PMSignalReason reason);
 extern bool CheckPostmasterSignal(PMSignalReason reason);
 extern void SetQuitSignalReason(QuitSignalReason reason);
 extern QuitSignalReason GetQuitSignalReason(void);
-extern int	AssignPostmasterChildSlot(void);
+extern void ReservePostmasterChildSlot(int slot);
 extern bool ReleasePostmasterChildSlot(int slot);
 extern bool IsPostmasterChildWalSender(int slot);
 extern void MarkPostmasterChildActive(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 547d14b3e7..ad39c0741a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -230,7 +230,6 @@ BTWriteState
 BUF_MEM
 BYTE
 BY_HANDLE_FILE_INFORMATION
-Backend
 BackendParameters
 BackendStartupData
 BackendState
@@ -1927,6 +1926,7 @@ PLyTransformToOb
 PLyTupleToOb
 PLyUnicode_FromStringAndSize_t
 PLy_elog_impl_t
+PMChild
 PMINIDUMP_CALLBACK_INFORMATION
 PMINIDUMP_EXCEPTION_INFORMATION
 PMINIDUMP_USER_STREAM_INFORMATION
-- 
2.39.2

v4-0007-Pass-MyPMChildSlot-as-an-explicit-argument-to-chi.patchtext/x-patch; charset=UTF-8; name=v4-0007-Pass-MyPMChildSlot-as-an-explicit-argument-to-chi.patchDownload

From 088f7911d2e300bad23e5afe66370340a96275e4 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 1 Aug 2024 22:58:17 +0300
Subject: [PATCH v4 7/8] Pass MyPMChildSlot as an explicit argument to child
 process

All the other global variables passed from postmaster to child are
have the same value in all the processes, while MyPMChildSlot is more
like a parameter to each child process.
---
 src/backend/postmaster/launch_backend.c | 32 ++++++++++++++++---------
 src/backend/postmaster/pmchild.c        |  3 ---
 src/backend/postmaster/postmaster.c     | 16 ++++++-------
 src/backend/postmaster/syslogger.c      |  8 ++++---
 src/include/postmaster/postmaster.h     |  1 +
 src/include/postmaster/syslogger.h      |  2 +-
 6 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index b0b91dc97f..4e93bd1d94 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -96,7 +96,6 @@ typedef int InheritableSocket;
 typedef struct
 {
 	char		DataDir[MAXPGPATH];
-	int			MyPMChildSlot;
 #ifndef WIN32
 	unsigned long UsedShmemSegID;
 #else
@@ -137,6 +136,8 @@ typedef struct
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
 
+	int			MyPMChildSlot;
+
 	/*
 	 * These are only used by backend processes, but are here because passing
 	 * a socket needs some special handling on Windows. 'client_sock' is an
@@ -158,13 +159,16 @@ typedef struct
 static void read_backend_variables(char *id, char **startup_data, size_t *startup_data_len);
 static void restore_backend_variables(BackendParameters *param);
 
-static bool save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
+static bool save_backend_variables(BackendParameters *param, int child_slot,
+								   ClientSocket *client_sock,
 #ifdef WIN32
 								   HANDLE childProcess, pid_t childPid,
 #endif
 								   char *startup_data, size_t startup_data_len);
 
-static pid_t internal_forkexec(const char *child_kind, char *startup_data, size_t startup_data_len, ClientSocket *client_sock);
+static pid_t internal_forkexec(const char *child_kind, int child_slot,
+							   char *startup_data, size_t startup_data_len,
+							   ClientSocket *client_sock);
 
 #endif							/* EXEC_BACKEND */
 
@@ -226,7 +230,7 @@ PostmasterChildName(BackendType child_type)
  * the child process.
  */
 pid_t
-postmaster_child_launch(BackendType child_type,
+postmaster_child_launch(BackendType child_type, int child_slot,
 						char *startup_data, size_t startup_data_len,
 						ClientSocket *client_sock)
 {
@@ -235,7 +239,7 @@ postmaster_child_launch(BackendType child_type,
 	Assert(IsPostmasterEnvironment && !IsUnderPostmaster);
 
 #ifdef EXEC_BACKEND
-	pid = internal_forkexec(child_process_kinds[child_type].name,
+	pid = internal_forkexec(child_process_kinds[child_type].name, child_slot,
 							startup_data, startup_data_len, client_sock);
 	/* the child process will arrive in SubPostmasterMain */
 #else							/* !EXEC_BACKEND */
@@ -263,6 +267,7 @@ postmaster_child_launch(BackendType child_type,
 		 */
 		MemoryContextSwitchTo(TopMemoryContext);
 
+		MyPMChildSlot = child_slot;
 		if (client_sock)
 		{
 			MyClientSocket = palloc(sizeof(ClientSocket));
@@ -289,7 +294,8 @@ postmaster_child_launch(BackendType child_type,
  * - fork():s, and then exec():s the child process
  */
 static pid_t
-internal_forkexec(const char *child_kind, char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
+internal_forkexec(const char *child_kind, int child_slot,
+				  char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
 {
 	static unsigned long tmpBackendFileNum = 0;
 	pid_t		pid;
@@ -309,7 +315,7 @@ internal_forkexec(const char *child_kind, char *startup_data, size_t startup_dat
 	 */
 	paramsz = SizeOfBackendParameters(startup_data_len);
 	param = palloc0(paramsz);
-	if (!save_backend_variables(param, client_sock, startup_data, startup_data_len))
+	if (!save_backend_variables(param, child_slot, client_sock, startup_data, startup_data_len))
 	{
 		pfree(param);
 		return -1;				/* log made by save_backend_variables */
@@ -398,7 +404,8 @@ internal_forkexec(const char *child_kind, char *startup_data, size_t startup_dat
  *	 file is complete.
  */
 static pid_t
-internal_forkexec(const char *child_kind, char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
+internal_forkexec(const char *child_kind, int child_slot,
+				  char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
 {
 	int			retry_count = 0;
 	STARTUPINFO si;
@@ -479,7 +486,9 @@ retry:
 		return -1;
 	}
 
-	if (!save_backend_variables(param, client_sock, pi.hProcess, pi.dwProcessId, startup_data, startup_data_len))
+	if (!save_backend_variables(param, child_slot, client_sock,
+								pi.hProcess, pi.dwProcessId,
+								startup_data, startup_data_len))
 	{
 		/*
 		 * log made by save_backend_variables, but we have to clean up the
@@ -691,7 +700,8 @@ static void read_inheritable_socket(SOCKET *dest, InheritableSocket *src);
 
 /* Save critical backend variables into the BackendParameters struct */
 static bool
-save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
+save_backend_variables(BackendParameters *param,
+					   int child_slot, ClientSocket *client_sock,
 #ifdef WIN32
 					   HANDLE childProcess, pid_t childPid,
 #endif
@@ -708,7 +718,7 @@ save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
 
 	strlcpy(param->DataDir, DataDir, MAXPGPATH);
 
-	param->MyPMChildSlot = MyPMChildSlot;
+	param->MyPMChildSlot = child_slot;
 
 #ifdef WIN32
 	param->ShmemProtectiveRegion = ShmemProtectiveRegion;
diff --git a/src/backend/postmaster/pmchild.c b/src/backend/postmaster/pmchild.c
index 735c66f8e7..ac8776bd95 100644
--- a/src/backend/postmaster/pmchild.c
+++ b/src/backend/postmaster/pmchild.c
@@ -209,9 +209,6 @@ AssignPostmasterChildSlot(BackendType btype)
 
 	ReservePostmasterChildSlot(pmchild->child_slot);
 
-	/* FIXME: find a more elegant way to pass this */
-	MyPMChildSlot = pmchild->child_slot;
-
 	elog(DEBUG2, "assigned pm child slot %d for %s", pmchild->child_slot, PostmasterChildName(btype));
 
 	return pmchild;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a029e28786..798cd330f3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -983,7 +983,7 @@ PostmasterMain(int argc, char *argv[])
 	SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
 	if (!SysLoggerPMChild)
 		elog(ERROR, "no postmaster child slot available for syslogger");
-	SysLoggerPMChild->pid = SysLogger_Start();
+	SysLoggerPMChild->pid = SysLogger_Start(SysLoggerPMChild->child_slot);
 	if (SysLoggerPMChild->pid == 0)
 	{
 		FreePostmasterChildSlot(SysLoggerPMChild);
@@ -2413,7 +2413,7 @@ process_pm_child_exit(void)
 		if (SysLoggerPMChild && pid == SysLoggerPMChild->pid)
 		{
 			/* for safety's sake, launch new logger *first* */
-			SysLoggerPMChild->pid = SysLogger_Start();
+			SysLoggerPMChild->pid = SysLogger_Start(SysLoggerPMChild->child_slot);
 			if (SysLoggerPMChild->pid == 0)
 			{
 				FreePostmasterChildSlot(SysLoggerPMChild);
@@ -3054,7 +3054,7 @@ LaunchMissingBackgroundProcesses(void)
 			elog(LOG, "no postmaster child slot available for syslogger");
 		else
 		{
-			SysLoggerPMChild->pid = SysLogger_Start();
+			SysLoggerPMChild->pid = SysLogger_Start(SysLoggerPMChild->child_slot);
 			if (SysLoggerPMChild->pid == 0)
 			{
 				FreePostmasterChildSlot(SysLoggerPMChild);
@@ -3325,8 +3325,7 @@ BackendStartup(ClientSocket *client_sock)
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
-	MyPMChildSlot = bn->child_slot;
-	pid = postmaster_child_launch(bn->bkend_type,
+	pid = postmaster_child_launch(bn->bkend_type, bn->child_slot,
 								  (char *) &startup_data, sizeof(startup_data),
 								  client_sock);
 	if (pid < 0)
@@ -3650,8 +3649,7 @@ StartChildProcess(BackendType type)
 		return NULL;
 	}
 
-	MyPMChildSlot = pmchild->child_slot;
-	pid = postmaster_child_launch(type, NULL, 0, NULL);
+	pid = postmaster_child_launch(type, pmchild->child_slot, NULL, 0, NULL);
 	if (pid < 0)
 	{
 		/* in parent, fork failed */
@@ -3806,8 +3804,8 @@ do_start_bgworker(RegisteredBgWorker *rw)
 			(errmsg_internal("starting background worker process \"%s\"",
 							 rw->rw_worker.bgw_name)));
 
-	MyPMChildSlot = bn->child_slot;
-	worker_pid = postmaster_child_launch(B_BG_WORKER, (char *) &rw->rw_worker, sizeof(BackgroundWorker), NULL);
+	worker_pid = postmaster_child_launch(B_BG_WORKER, bn->child_slot,
+										 (char *) &rw->rw_worker, sizeof(BackgroundWorker), NULL);
 	if (worker_pid == -1)
 	{
 		/* in postmaster, fork failed ... */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 7951599fa8..d68853d429 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -590,7 +590,7 @@ SysLoggerMain(char *startup_data, size_t startup_data_len)
  * Postmaster subroutine to start a syslogger subprocess.
  */
 int
-SysLogger_Start(void)
+SysLogger_Start(int child_slot)
 {
 	pid_t		sysloggerPid;
 	char	   *filename;
@@ -699,9 +699,11 @@ SysLogger_Start(void)
 	startup_data.syslogFile = syslogger_fdget(syslogFile);
 	startup_data.csvlogFile = syslogger_fdget(csvlogFile);
 	startup_data.jsonlogFile = syslogger_fdget(jsonlogFile);
-	sysloggerPid = postmaster_child_launch(B_LOGGER, (char *) &startup_data, sizeof(startup_data), NULL);
+	sysloggerPid = postmaster_child_launch(B_LOGGER, child_slot,
+										   (char *) &startup_data, sizeof(startup_data), NULL);
 #else
-	sysloggerPid = postmaster_child_launch(B_LOGGER, NULL, 0, NULL);
+	sysloggerPid = postmaster_child_launch(B_LOGGER, child_slot,
+										   NULL, 0, NULL);
 #endif							/* EXEC_BACKEND */
 
 	if (sysloggerPid == -1)
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index deca2e8370..81a3520021 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -103,6 +103,7 @@ extern PGDLLIMPORT struct ClientSocket *MyClientSocket;
 
 /* prototypes for functions in launch_backend.c */
 extern pid_t postmaster_child_launch(BackendType child_type,
+									 int child_slot,
 									 char *startup_data,
 									 size_t startup_data_len,
 									 struct ClientSocket *client_sock);
diff --git a/src/include/postmaster/syslogger.h b/src/include/postmaster/syslogger.h
index b5fc239ba9..d72b978b0a 100644
--- a/src/include/postmaster/syslogger.h
+++ b/src/include/postmaster/syslogger.h
@@ -86,7 +86,7 @@ extern PGDLLIMPORT HANDLE syslogPipe[2];
 #endif
 
 
-extern int	SysLogger_Start(void);
+extern int	SysLogger_Start(int child_slot);
 
 extern void write_syslogger_file(const char *buffer, int count, int destination);
 
-- 
2.39.2

Alexander Lakhin

exclusion@gmail.com

over 1 year ago

In reply to: Heikki Linnakangas (#5)

Re: Refactoring postmaster's code to cleanup after child exit

Hello Heikki,

10.08.2024 00:13, Heikki Linnakangas wrote:

Committed the patches up to and including this one, with tiny comment changes.

I've noticed that on the current HEAD server.log contains binary data
(an invalid process name) after a child crash. For example, while playing
with -ftapv, I've got:
SELECT to_date('2024 613566758 1', 'IYYY IW ID');
server closed the connection unexpectedly

grep -a 'was terminated' server.log
2024-08-18 07:07:06.482 UTC|||66c19d96.3482f6|LOG: `�!x� (PID 3441407) was terminated by signal 6: Aborted

It looks like this was introduced by commit 28a520c0b (IIUC, namebuf in
CleanupBackend() may stay uninitialized in some code paths).

Best regards,
Alexander

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Alexander Lakhin (#7)

Re: Refactoring postmaster's code to cleanup after child exit

On 18/08/2024 11:00, Alexander Lakhin wrote:

10.08.2024 00:13, Heikki Linnakangas wrote:

Committed the patches up to and including this one, with tiny comment changes.

I've noticed that on the current HEAD server.log contains binary data
(an invalid process name) after a child crash. For example, while playing
with -ftapv, I've got:
SELECT to_date('2024 613566758 1', 'IYYY IW ID');
server closed the connection unexpectedly

grep -a 'was terminated' server.log
2024-08-18 07:07:06.482 UTC|||66c19d96.3482f6|LOG: `�!x� (PID 3441407) was terminated by signal 6: Aborted

It looks like this was introduced by commit 28a520c0b (IIUC, namebuf in
CleanupBackend() may stay uninitialized in some code paths).

Fixed, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)

Andres Freund

andres@anarazel.de

over 1 year ago

In reply to: Heikki Linnakangas (#6)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2024-08-12 12:55:00 +0300, Heikki Linnakangas wrote:

While rebasing this today, I spotted another instance of that mistake
mentioned in the XXX comment above. I called "CountChildren(B_BACKEND)"
instead of "CountChildren(1 << B_BACKEND)". Some ideas on how to make that
less error-prone:

1. Add a separate typedef for the bitmasks, and macros/functions to work
with it. Something like:

typedef struct {
uint32 mask;
} BackendTypeMask;

static const BackendTypeMask BTMASK_ALL = { 0xffffffff };
static const BackendTypeMask BTMASK_NONE = { 0 };

static inline BackendTypeMask
BTMASK_ADD(BackendTypeMask mask, BackendType t)
{
mask.mask |= 1 << t;
return mask;
}

static inline BackendTypeMask
BTMASK_DEL(BackendTypeMask mask, BackendType t)
{
mask.mask &= ~(1 << t);
return mask;
}

Now the compiler will complain if you try to pass a BackendType for the
mask. We could do this just for BackendType, or we could put this in
src/include/lib/ with a more generic name, like "bitmask_u32".

I don't like the second suggestion - that just ends up creating a similar
problem in the future because flag values for one thing can be passed to
something else.

+Running the tests
+=================
+
+NOTE: You must have given the --enable-tap-tests argument to configure.
+
+Run
+    make check
+or
+    make installcheck
+You can use "make installcheck" if you previously did "make install".
+In that case, the code in the installation tree is tested.  With
+"make check", a temporary installation tree is built from the current
+sources and then tested.
+
+Either way, this test initializes, starts, and stops a test Postgres
+cluster.
+
+See src/test/perl/README for more info about running these tests.

Is it really useful to have such instructions all over the tree?

From 93b9e9b6e072f63af9009e0d66ab6d0d62ea8c15 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:55:11 +0300
Subject: [PATCH v4 2/8] Add test for dead-end backends

The code path for launching a dead-end backend because we're out of
slots was not covered by any tests, so add one. (Some tests did hit
the case of launching a dead-end backend because the server is still
starting up, though, so the gap in our test coverage wasn't as big as
it sounds.)
---
src/test/perl/PostgreSQL/Test/Cluster.pm | 39 +++++++++++++++++++
.../postmaster/t/001_connection_limits.pl | 17 +++++++-
2 files changed, 55 insertions(+), 1 deletion(-)

Why does this need to use "raw" connections? Can't you just create a bunch of
connections with BackgroundPsql?

From 88287a2db95e584018f1c7fa9e992feb7d179ce8 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:58:35 +0300
Subject: [PATCH v4 3/8] Use an shmem_exit callback to remove backend from
PMChildFlags on exit

This seems nicer than having to duplicate the logic between
InitProcess() and ProcKill() for which child processes have a
PMChildFlags slot.

Move the MarkPostmasterChildActive() call earlier in InitProcess(),
out of the section protected by the spinlock.

---
src/backend/storage/ipc/pmsignal.c | 10 ++++++--
src/backend/storage/lmgr/proc.c | 38 ++++++++++--------------------
src/include/storage/pmsignal.h | 1 -
3 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index 27844b46a2..cb99e77476 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -24,6 +24,7 @@
#include "miscadmin.h"
#include "postmaster/postmaster.h"
#include "replication/walsender.h"
+#include "storage/ipc.h"
#include "storage/pmsignal.h"
#include "storage/shmem.h"
#include "utils/memutils.h"
@@ -121,6 +122,8 @@ postmaster_death_handler(SIGNAL_ARGS)

#endif /* USE_POSTMASTER_DEATH_SIGNAL */

+static void MarkPostmasterChildInactive(int code, Datum arg);
+
/*
* PMSignalShmemSize
*		Compute space needed for pmsignal.c's shared memory
@@ -328,6 +331,9 @@ MarkPostmasterChildActive(void)
slot--;
Assert(PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED);
PMSignalState->PMChildFlags[slot] = PM_CHILD_ACTIVE;
+
+	/* Arrange to clean up at exit. */
+	on_shmem_exit(MarkPostmasterChildInactive, 0);
}

/*
@@ -352,8 +358,8 @@ MarkPostmasterChildWalSender(void)
* MarkPostmasterChildInactive - mark a postmaster child as done using
* shared memory.  This is called in the child process.
*/
-void
-MarkPostmasterChildInactive(void)
+static void
+MarkPostmasterChildInactive(int code, Datum arg)
{
int			slot = MyPMChildSlot;

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ac66da8638..9536469e89 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -308,6 +308,19 @@ InitProcess(void)
if (MyProc != NULL)
elog(ERROR, "you already exist");

+	/*
+	 * Before we start accessing the shared memory in a serious way, mark
+	 * ourselves as an active postmaster child; this is so that the postmaster
+	 * can detect it if we exit without cleaning up.  (XXX autovac launcher
+	 * currently doesn't participate in this; it probably should.)
+	 *
+	 * Slot sync worker also does not participate in it, see comments atop
+	 * 'struct bkend' in postmaster.c.
+	 */
+	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
+		!AmLogicalSlotSyncWorkerProcess())
+		MarkPostmasterChildActive();

I'd not necessarily expect a call to MarkPostmasterChildActive() to register
an shmem exit hook - but I guess it's unlikely to be moved around in a
problematic way. Perhaps something like RegisterPostmasterChild() or such
would be a bit clearer?

From dc53f89edbeec99f8633def8aa5f47cd98e7a150 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:59:04 +0300
Subject: [PATCH v4 4/8] Introduce a separate BackendType for dead-end children

And replace postmaster.c's own "backend type" codes with BackendType

Hm - it seems a bit odd to open-code this when we actually have a "table
driven configuration" available? Why isn't the type a field in
child_process_kind?

That'd not solve the bitmask confusion issue, but it does seem like a better
direction to me?

From 9c832ce33667abc5aef128a17fa9c27daaad872a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:59:27 +0300
Subject: [PATCH v4 5/8] Kill dead-end children when there's nothing else left

Previously, the postmaster would never try to kill dead-end child
processes, even if there were no other processes left. A dead-end
backend will eventually exit, when authentication_timeout expires, but
if a dead-end backend is the only thing that's preventing the server
from shutting down, it seems better to kill it immediately. It's
particularly important, if there was a bug in the early startup code
that prevented a dead-end child from timing out and exiting normally.

I do wonder if we shouldn't instead get rid of dead end children. We now have
an event based loop in postmaster, it'd perform vastly better to juts handle
these connections in postmaster. And we'd get rid of these weird backend
types. But I guess this is a worthwhile improvement on its own...

Includes a test for that case where a dead-end backend previously kept
the server from shutting down.

The test hardcodes timeouts, I think we've largely come to regret that when we
did. Should probably just be a multiplier based on
PostgreSQL::Test::Utils::timeout_default?

+/*
+ * MaxLivePostmasterChildren
+ *
+ * This reports the number postmaster child processes that can be active.  It
+ * includes all children except for dead_end children.  This allows the array
+ * in shared memory (PMChildFlags) to have a fixed maximum size.
+ */
+int
+MaxLivePostmasterChildren(void)
+{
+	int			n = 0;
+
+	/* We know exactly how mamy worker and aux processes can be active */
+	n += autovacuum_max_workers;
+	n += max_worker_processes;
+	n += NUM_AUXILIARY_PROCS;
+
+	/*
+	 * We allow more connections here than we can have backends because some
+	 * might still be authenticating; they might fail auth, or some existing
+	 * backend might exit before the auth cycle is completed.  The exact
+	 * MaxBackends limit is enforced when a new backend tries to join the
+	 * shared-inval backend array.
+	 */
+	n += 2 * (MaxConnections + max_wal_senders);
+
+	return n;
+}

I wonder if we could instead maintain at least some of this in
child_process_kinds? Manually listing different types of processes in
different places doesn't seem particularly sustainable.

+/*
+ * Initialize at postmaster startup
+ */
+void
+InitPostmasterChildSlots(void)
+{
+	int			num_pmchild_slots;
+	int			slotno;
+	PMChild    *slots;
+
+	dlist_init(&freeBackendList);
+	dlist_init(&freeAutoVacWorkerList);
+	dlist_init(&freeBgWorkerList);
+	dlist_init(&freeAuxList);
+	dlist_init(&ActiveChildList);
+
+	num_pmchild_slots = MaxLivePostmasterChildren();
+
+	slots = palloc(num_pmchild_slots * sizeof(PMChild));
+
+	slotno = 0;
+	for (int i = 0; i < 2 * (MaxConnections + max_wal_senders); i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeBackendList);
+		slotno++;
+	}
+	for (int i = 0; i < autovacuum_max_workers; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeAutoVacWorkerList);
+		slotno++;
+	}
+	for (int i = 0; i < max_worker_processes; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeBgWorkerList);
+		slotno++;
+	}
+	for (int i = 0; i < NUM_AUXILIARY_PROCS; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeAuxList);
+		slotno++;
+	}
+	Assert(slotno == num_pmchild_slots);
+}

Along the same vein - could we generalize this into one array of different
slot types and then loop over that to initialize / acquire the slots?

+/* Return the appropriate free-list for the given backend type */
+static dlist_head *
+GetFreeList(BackendType btype)
+{
+	switch (btype)
+	{
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_WAL_SENDER:
+		case B_SLOTSYNC_WORKER:
+			return &freeBackendList;

Maybe a daft question - but why are all of these in the same list? Sure,
they're essentially all full backends, but they're covered by different GUCs?

+			/*
+			 * Auxiliary processes.  There can be only one of each of these
+			 * running at a time.
+			 */
+		case B_AUTOVAC_LAUNCHER:
+		case B_ARCHIVER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+		case B_WAL_RECEIVER:
+		case B_WAL_SUMMARIZER:
+		case B_WAL_WRITER:
+			return &freeAuxList;
+
+			/*
+			 * Logger is not connected to shared memory, and does not have a
+			 * PGPROC entry, but we still allocate a child slot for it.
+			 */

Tangential: Why do we need a freelist for these and why do we choose a random
pgproc for these instead of assigning one statically?

Background: I'd like to not provide AIO workers with "bounce buffers" (for IO
of buffers that can't be done in-place, like writes when checksums are
enabled). The varying proc numbers make that harder than it'd have to be...

+PMChild *
+AssignPostmasterChildSlot(BackendType btype)
+{
+	dlist_head *freelist;
+	PMChild    *pmchild;
+
+	freelist = GetFreeList(btype);
+
+	if (dlist_is_empty(freelist))
+		return NULL;
+
+	pmchild = dlist_container(PMChild, elem, dlist_pop_head_node(freelist));
+	pmchild->pid = 0;
+	pmchild->bkend_type = btype;
+	pmchild->rw = NULL;
+	pmchild->bgworker_notify = true;
+
+	/*
+	 * pmchild->child_slot for each entry was initialized when the array of
+	 * slots was allocated.
+	 */
+
+	dlist_push_head(&ActiveChildList, &pmchild->elem);
+
+	ReservePostmasterChildSlot(pmchild->child_slot);
+
+	/* FIXME: find a more elegant way to pass this */
+	MyPMChildSlot = pmchild->child_slot;

What if we assigned one offset for each process and assigned its ID here and
also used that for its ProcNumber - that way we wouldn't need to manage
freelists in two places.

+PMChild *
+FindPostmasterChildByPid(int pid)
+{
+	dlist_iter	iter;
+
+	dlist_foreach(iter, &ActiveChildList)
+	{
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+		if (bp->pid == pid)
+			return bp;
+	}
+	return NULL;
+}

It's not new, but it's quite sad that postmaster's process exit handling is
effectively O(Backends^2)...

@@ -1019,7 +980,15 @@ PostmasterMain(int argc, char *argv[])
/*
* If enabled, start up syslogger collection subprocess
*/
-	SysLoggerPID = SysLogger_Start();
+	SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
+	if (!SysLoggerPMChild)
+		elog(ERROR, "no postmaster child slot available for syslogger");
+	SysLoggerPMChild->pid = SysLogger_Start();
+	if (SysLoggerPMChild->pid == 0)
+	{
+		FreePostmasterChildSlot(SysLoggerPMChild);
+		SysLoggerPMChild = NULL;
+	}

Maybe it's a bit obsessive, but this seems long enough to make it worth not
doing inline in the already long PostmasterMain().

/*
* We're ready to rock and roll...
*/
-	StartupPID = StartChildProcess(B_STARTUP);
-	Assert(StartupPID != 0);
+	StartupPMChild = StartChildProcess(B_STARTUP);
+	Assert(StartupPMChild != NULL);

This (not new) assertion is ... odd.

@@ -1779,21 +1748,6 @@ canAcceptConnections(int backend_type)
if (!connsAllowed && backend_type == B_BACKEND)
return CAC_SHUTDOWN; /* shutdown is pending */

- /*
- * Don't start too many children.
- *
- * We allow more connections here than we can have backends because some
- * might still be authenticating; they might fail auth, or some existing
- * backend might exit before the auth cycle is completed. The exact
- * MaxBackends limit is enforced when a new backend tries to join the
- * shared-inval backend array.
- *
- * The limit here must match the sizes of the per-child-process arrays;
- * see comments for MaxLivePostmasterChildren().
- */
- if (CountChildren(BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND)) >= MaxLivePostmasterChildren())
- result = CAC_TOOMANY;
-
return result;
}

It's nice to get rid of this source of O(N^2).

@@ -1961,26 +1915,6 @@ process_pm_reload_request(void)
(errmsg("received SIGHUP, reloading configuration files")));
ProcessConfigFile(PGC_SIGHUP);
SignalSomeChildren(SIGHUP, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND));
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGHUP);
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGHUP);
-		if (CheckpointerPID != 0)
-			signal_child(CheckpointerPID, SIGHUP);
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGHUP);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGHUP);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGHUP);
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGHUP);
-		if (PgArchPID != 0)
-			signal_child(PgArchPID, SIGHUP);
-		if (SysLoggerPID != 0)
-			signal_child(SysLoggerPID, SIGHUP);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGHUP);

/* Reload authentication config files too */
if (!load_hba())

For a moment I wondered why this change was part of this commit - but I guess
we didn't have any of these in an array/list before this change...

@@ -2469,11 +2410,15 @@ process_pm_child_exit(void)
}

/* Was it the system logger?  If so, try to start a new one */
-		if (pid == SysLoggerPID)
+		if (SysLoggerPMChild && pid == SysLoggerPMChild->pid)
{
-			SysLoggerPID = 0;
/* for safety's sake, launch new logger *first* */
-			SysLoggerPID = SysLogger_Start();
+			SysLoggerPMChild->pid = SysLogger_Start();
+			if (SysLoggerPMChild->pid == 0)
+			{
+				FreePostmasterChildSlot(SysLoggerPMChild);
+				SysLoggerPMChild = NULL;
+			}
if (!EXIT_STATUS_0(exitstatus))
LogChildExit(LOG, _("system logger process"),

Seems a bit weird to have one place with a different memory lifetime handling
than other places. Why don't we just do this the same way as in other places
but continue to defer the logging until after we tried to start the new
logger?

Might be worth having a test ensuring that loggers restart OK.

/* Construct a process name for log message */
+
+	/*
+	 * FIXME: use GetBackendTypeDesc here? How does the localization of that
+	 * work?
+	 */
if (bp->bkend_type == B_DEAD_END_BACKEND)
{
procname = _("dead end backend");

Might be worth having a version of GetBackendTypeDesc() that returns a
translated string?

@@ -2697,9 +2643,16 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
{
dlist_iter iter;

-		dlist_foreach(iter, &BackendList)
+		dlist_foreach(iter, &ActiveChildList)
{
-			Backend    *bp = dlist_container(Backend, elem, iter.cur);
+			PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+			/* We do NOT restart the syslogger */
+			if (bp == SysLoggerPMChild)
+				continue;

That comment seems a bit misleading - we do restart syslogger, we just don't
do it here, no? I realize it's an old comment, but it still seems like it's
worth fixing given that you touch all the code here...

@@ -2708,48 +2661,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
* We could exclude dead_end children here, but at least when
* sending SIGABRT it seems better to include them.
*/
-			sigquit_child(bp->pid);
+			sigquit_child(bp);
}
-
-		if (StartupPID != 0)
-		{
-			sigquit_child(StartupPID);
-			StartupStatus = STARTUP_SIGNALED;
-		}
-
-		/* Take care of the bgwriter too */
-		if (BgWriterPID != 0)
-			sigquit_child(BgWriterPID);
-
-		/* Take care of the checkpointer too */
-		if (CheckpointerPID != 0)
-			sigquit_child(CheckpointerPID);
-
-		/* Take care of the walwriter too */
-		if (WalWriterPID != 0)
-			sigquit_child(WalWriterPID);
-
-		/* Take care of the walreceiver too */
-		if (WalReceiverPID != 0)
-			sigquit_child(WalReceiverPID);
-
-		/* Take care of the walsummarizer too */
-		if (WalSummarizerPID != 0)
-			sigquit_child(WalSummarizerPID);
-
-		/* Take care of the autovacuum launcher too */
-		if (AutoVacPID != 0)
-			sigquit_child(AutoVacPID);
-
-		/* Take care of the archiver too */
-		if (PgArchPID != 0)
-			sigquit_child(PgArchPID);
-
-		/* Take care of the slot sync worker too */
-		if (SlotSyncWorkerPID != 0)
-			sigquit_child(SlotSyncWorkerPID);
-
-		/* We do NOT restart the syslogger */
}

Yay.

@@ -2871,29 +2786,27 @@ PostmasterStateMachine(void)
<snip>
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGTERM);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGTERM);
+		targetMask |= (1 << B_STARTUP);
+		targetMask |= (1 << B_WAL_RECEIVER);
+
+		targetMask |= (1 << B_WAL_SUMMARIZER);
+		targetMask |= (1 << B_SLOTSYNC_WORKER);
/* checkpointer, archiver, stats, and syslogger may continue for now */

+		SignalSomeChildren(SIGTERM, targetMask);
+
/* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
pmState = PM_WAIT_BACKENDS;
<snip>

It's likely the right thing to not do as one patch, but IMO this really wants
to be a state table. Perhaps as part of child_process_kinds, perhaps separate
from that.

@@ -3130,8 +3047,21 @@ static void
LaunchMissingBackgroundProcesses(void)
{
/* Syslogger is active in all states */
-	if (SysLoggerPID == 0 && Logging_collector)
-		SysLoggerPID = SysLogger_Start();
+	if (SysLoggerPMChild == NULL && Logging_collector)
+	{
+		SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
+		if (!SysLoggerPMChild)
+			elog(LOG, "no postmaster child slot available for syslogger");

How could this elog() be reached? Seems something seriously would have gone
wrong to get here - in which case a LOG that might not even be visible (due to
logger not working) doesn't seem like the right response.

@@ -3334,29 +3270,12 @@ SignalSomeChildren(int signal, uint32 targetMask)
static void
TerminateChildren(int signal)
{

The comment for TerminateChildren() says "except syslogger and dead_end
backends." - aren't you including the latter here?

@@ -311,14 +311,9 @@ InitProcess(void)
/*
* Before we start accessing the shared memory in a serious way, mark
* ourselves as an active postmaster child; this is so that the postmaster
-	 * can detect it if we exit without cleaning up.  (XXX autovac launcher
-	 * currently doesn't participate in this; it probably should.)
-	 *
-	 * Slot sync worker also does not participate in it, see comments atop
-	 * 'struct bkend' in postmaster.c.
+	 * can detect it if we exit without cleaning up.
*/
-	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
-		!AmLogicalSlotSyncWorkerProcess())
+	if (IsUnderPostmaster)
MarkPostmasterChildActive();

/* Decide which list should supply our PGPROC. */
@@ -536,6 +531,9 @@ InitAuxiliaryProcess(void)
if (MyProc != NULL)
elog(ERROR, "you already exist");

+	if (IsUnderPostmaster)
+		MarkPostmasterChildActive();
+
/*
* We use the ProcStructLock to protect assignment and releasing of
* AuxiliaryProcs entries.

Probably worth, at some point soon, to have an InitProcessCommon() or such.

Greetings,

Andres Freund

#10

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Andres Freund (#9)

Re: Refactoring postmaster's code to cleanup after child exit

On 04/09/2024 17:35, Andres Freund wrote:

On 2024-08-12 12:55:00 +0300, Heikki Linnakangas wrote:

+Running the tests
+=================
+
+NOTE: You must have given the --enable-tap-tests argument to configure.
+
+Run
+    make check
+or
+    make installcheck
+You can use "make installcheck" if you previously did "make install".
+In that case, the code in the installation tree is tested.  With
+"make check", a temporary installation tree is built from the current
+sources and then tested.
+
+Either way, this test initializes, starts, and stops a test Postgres
+cluster.
+
+See src/test/perl/README for more info about running these tests.

Is it really useful to have such instructions all over the tree?

That's debatable but I didn't want to go down that rabbit hole with this
patch.

It's repetitive for sure. But there are small variations in which
PG_TEST_EXTRA options you need, whether "make installcheck" runs against
a running server or still creates a temporary cluster, etc.

I tried to deduplicate those instructions by moving the above
boilerplate to src/test/README, and only noting the variations in the
subdirectory READMEs. I didn't like the result. It's very helpful to
have full copy-pasteable commands with all the right "PG_TEST_EXTRA"
options for each test.

These instructions also don't mention how to run the tests with Meson.
The first time I wanted to run individual tests with Meson, it took me a
while to figure it out.

I'll think a little more about how to improve these READMEs, but let's
take that to a separate thread.

From 93b9e9b6e072f63af9009e0d66ab6d0d62ea8c15 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:55:11 +0300
Subject: [PATCH v4 2/8] Add test for dead-end backends

The code path for launching a dead-end backend because we're out of
slots was not covered by any tests, so add one. (Some tests did hit
the case of launching a dead-end backend because the server is still
starting up, though, so the gap in our test coverage wasn't as big as
it sounds.)
---
src/test/perl/PostgreSQL/Test/Cluster.pm | 39 +++++++++++++++++++
.../postmaster/t/001_connection_limits.pl | 17 +++++++-
2 files changed, 55 insertions(+), 1 deletion(-)

Why does this need to use "raw" connections? Can't you just create a bunch of
connections with BackgroundPsql?

No, these need to be connections that haven't sent the startup packet
the yet.

With Andrew's PqFFI work [1]/messages/by-id/97d1d1b9-d147-f69d-1991-d8794efed41c@dunslane.net, we could do better. The latest version on
that thread doesn't expose the async functions like PQconnectStart()
PQconnectPoll() though, but they can be added.

[1]: /messages/by-id/97d1d1b9-d147-f69d-1991-d8794efed41c@dunslane.net
/messages/by-id/97d1d1b9-d147-f69d-1991-d8794efed41c@dunslane.net

Unless you have comments on these first two patches which just add
tests, I'll commit them shortly. Still processing the rest of your
comments...

--
Heikki Linnakangas
Neon (https://neon.tech)

#11

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Andres Freund (#9)

Re: Refactoring postmaster's code to cleanup after child exit

On 04/09/2024 17:35, Andres Freund wrote:

On 2024-08-12 12:55:00 +0300, Heikki Linnakangas wrote:

From dc53f89edbeec99f8633def8aa5f47cd98e7a150 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:59:04 +0300
Subject: [PATCH v4 4/8] Introduce a separate BackendType for dead-end children

And replace postmaster.c's own "backend type" codes with BackendType

Hm - it seems a bit odd to open-code this when we actually have a "table
driven configuration" available? Why isn't the type a field in
child_process_kind?

Sorry, I didn't understand this. What exactly would you add to
child_process_kind? Where would you use it?

+/*
+ * MaxLivePostmasterChildren
+ *
+ * This reports the number postmaster child processes that can be active.  It
+ * includes all children except for dead_end children.  This allows the array
+ * in shared memory (PMChildFlags) to have a fixed maximum size.
+ */
+int
+MaxLivePostmasterChildren(void)
+{
+	int			n = 0;
+
+	/* We know exactly how mamy worker and aux processes can be active */
+	n += autovacuum_max_workers;
+	n += max_worker_processes;
+	n += NUM_AUXILIARY_PROCS;
+
+	/*
+	 * We allow more connections here than we can have backends because some
+	 * might still be authenticating; they might fail auth, or some existing
+	 * backend might exit before the auth cycle is completed.  The exact
+	 * MaxBackends limit is enforced when a new backend tries to join the
+	 * shared-inval backend array.
+	 */
+	n += 2 * (MaxConnections + max_wal_senders);
+
+	return n;
+}

I wonder if we could instead maintain at least some of this in
child_process_kinds? Manually listing different types of processes in
different places doesn't seem particularly sustainable.

Hmm, you mean adding "max this kind of children" field to
child_process_kinds? Perhaps.

+/*
+ * Initialize at postmaster startup
+ */
+void
+InitPostmasterChildSlots(void)
+{
+	int			num_pmchild_slots;
+	int			slotno;
+	PMChild    *slots;
+
+	dlist_init(&freeBackendList);
+	dlist_init(&freeAutoVacWorkerList);
+	dlist_init(&freeBgWorkerList);
+	dlist_init(&freeAuxList);
+	dlist_init(&ActiveChildList);
+
+	num_pmchild_slots = MaxLivePostmasterChildren();
+
+	slots = palloc(num_pmchild_slots * sizeof(PMChild));
+
+	slotno = 0;
+	for (int i = 0; i < 2 * (MaxConnections + max_wal_senders); i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeBackendList);
+		slotno++;
+	}
+	for (int i = 0; i < autovacuum_max_workers; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeAutoVacWorkerList);
+		slotno++;
+	}
+	for (int i = 0; i < max_worker_processes; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeBgWorkerList);
+		slotno++;
+	}
+	for (int i = 0; i < NUM_AUXILIARY_PROCS; i++)
+	{
+		init_slot(&slots[slotno], slotno, &freeAuxList);
+		slotno++;
+	}
+	Assert(slotno == num_pmchild_slots);
+}

Along the same vein - could we generalize this into one array of different
slot types and then loop over that to initialize / acquire the slots?

Makes sense.

+/* Return the appropriate free-list for the given backend type */
+static dlist_head *
+GetFreeList(BackendType btype)
+{
+	switch (btype)
+	{
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_WAL_SENDER:
+		case B_SLOTSYNC_WORKER:
+			return &freeBackendList;
Maybe a daft question - but why are all of these in the same list? Sure,
they're essentially all full backends, but they're covered by different GUCs?

No reason. No particular reason they should *not* share the same list
either though.

+			/*
+			 * Auxiliary processes.  There can be only one of each of these
+			 * running at a time.
+			 */
+		case B_AUTOVAC_LAUNCHER:
+		case B_ARCHIVER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+		case B_WAL_RECEIVER:
+		case B_WAL_SUMMARIZER:
+		case B_WAL_WRITER:
+			return &freeAuxList;
+
+			/*
+			 * Logger is not connected to shared memory, and does not have a
+			 * PGPROC entry, but we still allocate a child slot for it.
+			 */
Tangential: Why do we need a freelist for these and why do we choose a random
pgproc for these instead of assigning one statically?

Background: I'd like to not provide AIO workers with "bounce buffers" (for IO
of buffers that can't be done in-place, like writes when checksums are
enabled). The varying proc numbers make that harder than it'd have to be...

Yeah, we can make these fixed.Currently, the # of slots reserved for aux
processes is sized by NUM_AUXILIARY_PROCS, which is one smaller than the
number of different aux proces kinds:

/*
* We set aside some extra PGPROC structures for auxiliary processes,
* ie things that aren't full-fledged backends but need shmem access.
*
* Background writer, checkpointer, WAL writer, WAL summarizer, and archiver
* run during normal operation. Startup process and WAL receiver also consume
* 2 slots, but WAL writer is launched only after startup has exited, so we
* only need 6 slots.
*/
#define NUM_AUXILIARY_PROCS 6

For PMChildSlot numbers, we could certainly just allocate one more slot.

It would probably make sense for PGPROCs too, even though PGPROC is a
much larger struct.

+PMChild *
+AssignPostmasterChildSlot(BackendType btype)
+{
+	dlist_head *freelist;
+	PMChild    *pmchild;
+
+	freelist = GetFreeList(btype);
+
+	if (dlist_is_empty(freelist))
+		return NULL;
+
+	pmchild = dlist_container(PMChild, elem, dlist_pop_head_node(freelist));
+	pmchild->pid = 0;
+	pmchild->bkend_type = btype;
+	pmchild->rw = NULL;
+	pmchild->bgworker_notify = true;
+
+	/*
+	 * pmchild->child_slot for each entry was initialized when the array of
+	 * slots was allocated.
+	 */
+
+	dlist_push_head(&ActiveChildList, &pmchild->elem);
+
+	ReservePostmasterChildSlot(pmchild->child_slot);
+
+	/* FIXME: find a more elegant way to pass this */
+	MyPMChildSlot = pmchild->child_slot;

What if we assigned one offset for each process and assigned its ID here and
also used that for its ProcNumber - that way we wouldn't need to manage
freelists in two places.

It's currently possible to have up to 2 * max_connections backends in
the authentication phase. We would have to change that behaviour, or
make the PGPROC array 2x larger.

It might well be worth it, I don't know how sensible the current
behaviour is. But I'd like to punt that to later patch, to keep the
scope of this patch set reasonable. It's pretty straightforward to do
later on top of this if we want to.

+PMChild *
+FindPostmasterChildByPid(int pid)
+{
+	dlist_iter	iter;
+
+	dlist_foreach(iter, &ActiveChildList)
+	{
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+		if (bp->pid == pid)
+			return bp;
+	}
+	return NULL;
+}
It's not new, but it's quite sad that postmaster's process exit handling is
effectively O(Backends^2)...

It would be straightforward to turn ActiveChildList into a hash table.
But I'd like to leave that to a followup patch too.

/*
* We're ready to rock and roll...
*/
-	StartupPID = StartChildProcess(B_STARTUP);
-	Assert(StartupPID != 0);
+	StartupPMChild = StartChildProcess(B_STARTUP);
+	Assert(StartupPMChild != NULL);

This (not new) assertion is ... odd.

Yeah, it's an assertion because StartChildProcess has this:

/*
* fork failure is fatal during startup, but there's no need to choke
* immediately if starting other child types fails.
*/
if (type == B_STARTUP)
ExitPostmaster(1);

@@ -1961,26 +1915,6 @@ process_pm_reload_request(void)
(errmsg("received SIGHUP, reloading configuration files")));
ProcessConfigFile(PGC_SIGHUP);
SignalSomeChildren(SIGHUP, BACKEND_TYPE_ALL & ~(1 << B_DEAD_END_BACKEND));
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGHUP);
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGHUP);
-		if (CheckpointerPID != 0)
-			signal_child(CheckpointerPID, SIGHUP);
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGHUP);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGHUP);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGHUP);
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGHUP);
-		if (PgArchPID != 0)
-			signal_child(PgArchPID, SIGHUP);
-		if (SysLoggerPID != 0)
-			signal_child(SysLoggerPID, SIGHUP);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGHUP);

/* Reload authentication config files too */
if (!load_hba())

For a moment I wondered why this change was part of this commit - but I guess
we didn't have any of these in an array/list before this change...

Correct.

@@ -2469,11 +2410,15 @@ process_pm_child_exit(void)
}
/* Was it the system logger?  If so, try to start a new one */
-		if (pid == SysLoggerPID)
+		if (SysLoggerPMChild && pid == SysLoggerPMChild->pid)
{
-			SysLoggerPID = 0;
/* for safety's sake, launch new logger *first* */
-			SysLoggerPID = SysLogger_Start();
+			SysLoggerPMChild->pid = SysLogger_Start();
+			if (SysLoggerPMChild->pid == 0)
+			{
+				FreePostmasterChildSlot(SysLoggerPMChild);
+				SysLoggerPMChild = NULL;
+			}
if (!EXIT_STATUS_0(exitstatus))
LogChildExit(LOG, _("system logger process"),
Seems a bit weird to have one place with a different memory lifetime handling
than other places. Why don't we just do this the same way as in other places
but continue to defer the logging until after we tried to start the new
logger?

Hmm, you mean let LaunchMissingBackgroundProcesses() handle the restart?

I'm a little scared of changing the existing logic. We don't have a
mechanism for deferring logging, so we would have to invent that, or the
logs would just accumulate in the pipe until syslogger starts up.
There's some code between here and LaunchMissingBackgroundProcesses(),
so might postmaster get blocked between writing to the syslogger pipe,
before having restarted it?

If forking the syslogger process fails, that can happen anyway, though.

Might be worth having a test ensuring that loggers restart OK.

Yeah..

/* Construct a process name for log message */
+
+	/*
+	 * FIXME: use GetBackendTypeDesc here? How does the localization of that
+	 * work?
+	 */
if (bp->bkend_type == B_DEAD_END_BACKEND)
{
procname = _("dead end backend");

Might be worth having a version of GetBackendTypeDesc() that returns a
translated string?

Constructing the string for background workers is a little more complicated:

snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""),
bp->rw->rw_worker.bgw_type);

We could still do that for background workers and use the transalated
variant of GetBackendTypeDesc() for everything else though.

@@ -2697,9 +2643,16 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
{
dlist_iter iter;
-		dlist_foreach(iter, &BackendList)
+		dlist_foreach(iter, &ActiveChildList)
{
-			Backend    *bp = dlist_container(Backend, elem, iter.cur);
+			PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+			/* We do NOT restart the syslogger */
+			if (bp == SysLoggerPMChild)
+				continue;
That comment seems a bit misleading - we do restart syslogger, we just don't
do it here, no? I realize it's an old comment, but it still seems like it's
worth fixing given that you touch all the code here...

No, we really do not restart the syslogger. This code runs when
*another* process has crashed unexpectedly. We kill all other processes,
reinitialize shared memory and restart, but the old syslogger keeps
running through all that.

I'll add a note on that to InitPostmasterChildSlots(), as it's a bit
surprising.

@@ -2871,29 +2786,27 @@ PostmasterStateMachine(void)
<snip>
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGTERM);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGTERM);
+		targetMask |= (1 << B_STARTUP);
+		targetMask |= (1 << B_WAL_RECEIVER);
+
+		targetMask |= (1 << B_WAL_SUMMARIZER);
+		targetMask |= (1 << B_SLOTSYNC_WORKER);
/* checkpointer, archiver, stats, and syslogger may continue for now */

+		SignalSomeChildren(SIGTERM, targetMask);
+
/* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
pmState = PM_WAIT_BACKENDS;
<snip>

It's likely the right thing to not do as one patch, but IMO this really wants
to be a state table. Perhaps as part of child_process_kinds, perhaps separate
from that.

Yeah. I've tried to refactor this into a table before, but didn't come
up with anything that I was happy with. I also feel there must be a
better way to organize this, but not sure what exactly. I hope that will
become more apparent after these other changes.

@@ -3130,8 +3047,21 @@ static void
LaunchMissingBackgroundProcesses(void)
{
/* Syslogger is active in all states */
-	if (SysLoggerPID == 0 && Logging_collector)
-		SysLoggerPID = SysLogger_Start();
+	if (SysLoggerPMChild == NULL && Logging_collector)
+	{
+		SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
+		if (!SysLoggerPMChild)
+			elog(LOG, "no postmaster child slot available for syslogger");
How could this elog() be reached? Seems something seriously would have gone
wrong to get here - in which case a LOG that might not even be visible (due to
logger not working) doesn't seem like the right response.

I'll turn it into an assertion or PANIC.

@@ -3334,29 +3270,12 @@ SignalSomeChildren(int signal, uint32 targetMask)
static void
TerminateChildren(int signal)
{

The comment for TerminateChildren() says "except syslogger and dead_end
backends." - aren't you including the latter here?

The comment is adjusted in
v4-0004-Introduce-a-separate-BackendType-for-dead-end-chi.patch. Before
that, SignalChildren() does ignore dead-end children.

Thanks for the review!

--
Heikki Linnakangas
Neon (https://neon.tech)

#12

Robert Haas

robertmhaas@gmail.com

over 1 year ago

In reply to: Heikki Linnakangas (#11)

Re: Refactoring postmaster's code to cleanup after child exit

On Fri, Sep 6, 2024 at 9:13 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

It's currently possible to have up to 2 * max_connections backends in
the authentication phase. We would have to change that behaviour, or
make the PGPROC array 2x larger.

I know I already said this elsewhere, but in case it got lost in the
shuffle, +1 for changing this, unless somebody can make a compelling
argument why 2 * max_connections isn't WAY too many.

--
Robert Haas
EDB: http://www.enterprisedb.com

#13

Andres Freund

andres@anarazel.de

over 1 year ago

In reply to: Heikki Linnakangas (#11)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2024-09-06 16:13:43 +0300, Heikki Linnakangas wrote:

On 04/09/2024 17:35, Andres Freund wrote:

On 2024-08-12 12:55:00 +0300, Heikki Linnakangas wrote:

From dc53f89edbeec99f8633def8aa5f47cd98e7a150 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Aug 2024 10:59:04 +0300
Subject: [PATCH v4 4/8] Introduce a separate BackendType for dead-end children

And replace postmaster.c's own "backend type" codes with BackendType

Hm - it seems a bit odd to open-code this when we actually have a "table
driven configuration" available? Why isn't the type a field in
child_process_kind?

Sorry, I didn't understand this. What exactly would you add to
child_process_kind? Where would you use it?

I'm not entirely sure what I was thinking of. It might be partially triggering
a prior complaint I had about manually assigning things to MyBackendType,
despite actually having all the information already.

One thing that I just noticed is that this patch orphans comment references to
BACKEND_TYPE_AUTOVAC and BACKEND_TYPE_BGWORKER.

Seems a tad odd to have BACKEND_TYPE_ALL after removing everything else from
the BACKEND_TYPE_* "namespace".

To deal with the issue around bitmasks you had mentioned, I think we should at
least have a static inline function to convert B_* values to the bitmask
index.

+/*
+ * MaxLivePostmasterChildren
+ *
+ * This reports the number postmaster child processes that can be active.  It
+ * includes all children except for dead_end children.  This allows the array
+ * in shared memory (PMChildFlags) to have a fixed maximum size.
+ */
+int
+MaxLivePostmasterChildren(void)
+{
+	int			n = 0;
+
+	/* We know exactly how mamy worker and aux processes can be active */
+	n += autovacuum_max_workers;
+	n += max_worker_processes;
+	n += NUM_AUXILIARY_PROCS;
+
+	/*
+	 * We allow more connections here than we can have backends because some
+	 * might still be authenticating; they might fail auth, or some existing
+	 * backend might exit before the auth cycle is completed.  The exact
+	 * MaxBackends limit is enforced when a new backend tries to join the
+	 * shared-inval backend array.
+	 */
+	n += 2 * (MaxConnections + max_wal_senders);
+
+	return n;
+}

I wonder if we could instead maintain at least some of this in
child_process_kinds? Manually listing different types of processes in
different places doesn't seem particularly sustainable.

Hmm, you mean adding "max this kind of children" field to
child_process_kinds? Perhaps.

Yep, that's what I meant.

+/* Return the appropriate free-list for the given backend type */
+static dlist_head *
+GetFreeList(BackendType btype)
+{
+	switch (btype)
+	{
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_WAL_SENDER:
+		case B_SLOTSYNC_WORKER:
+			return &freeBackendList;
Maybe a daft question - but why are all of these in the same list? Sure,
they're essentially all full backends, but they're covered by different GUCs?
No reason. No particular reason they should *not* share the same list either
though.

Aren't they controlled by distinct connection limits? Isn't there a danger
that we could use up entries and fail connections due to that, despite not
actually being above the limit?

Tangential: Why do we need a freelist for these and why do we choose a random
pgproc for these instead of assigning one statically?

Background: I'd like to not provide AIO workers with "bounce buffers" (for IO
of buffers that can't be done in-place, like writes when checksums are
enabled). The varying proc numbers make that harder than it'd have to be...

Yeah, we can make these fixed.

Cool.

Currently, the # of slots reserved for aux processes is sized by
NUM_AUXILIARY_PROCS, which is one smaller than the number of different aux
proces kinds:

/*
* We set aside some extra PGPROC structures for auxiliary processes,
* ie things that aren't full-fledged backends but need shmem access.
*
* Background writer, checkpointer, WAL writer, WAL summarizer, and archiver
* run during normal operation. Startup process and WAL receiver also consume
* 2 slots, but WAL writer is launched only after startup has exited, so we
* only need 6 slots.
*/
#define NUM_AUXILIARY_PROCS 6

For PMChildSlot numbers, we could certainly just allocate one more slot.

It would probably make sense for PGPROCs too, even though PGPROC is a much
larger struct.

I don't think it's worth worrying about that much. PGPROC is large, but not
*that* large. And the robustness win of properly detecting when there's a
problem around starting/stopping aux workers seems to outweigh that to me.

+PMChild *
+AssignPostmasterChildSlot(BackendType btype)
+{
+	dlist_head *freelist;
+	PMChild    *pmchild;
+
+	freelist = GetFreeList(btype);
+
+	if (dlist_is_empty(freelist))
+		return NULL;
+
+	pmchild = dlist_container(PMChild, elem, dlist_pop_head_node(freelist));
+	pmchild->pid = 0;
+	pmchild->bkend_type = btype;
+	pmchild->rw = NULL;
+	pmchild->bgworker_notify = true;
+
+	/*
+	 * pmchild->child_slot for each entry was initialized when the array of
+	 * slots was allocated.
+	 */
+
+	dlist_push_head(&ActiveChildList, &pmchild->elem);
+
+	ReservePostmasterChildSlot(pmchild->child_slot);
+
+	/* FIXME: find a more elegant way to pass this */
+	MyPMChildSlot = pmchild->child_slot;
What if we assigned one offset for each process and assigned its ID here and
also used that for its ProcNumber - that way we wouldn't need to manage
freelists in two places.
It's currently possible to have up to 2 * max_connections backends in the
authentication phase. We would have to change that behaviour, or make the
PGPROC array 2x larger.

That however, might be too much...

It might well be worth it, I don't know how sensible the current behaviour
is. But I'd like to punt that to later patch, to keep the scope of this
patch set reasonable. It's pretty straightforward to do later on top of this
if we want to.

Makes sense.

I still think that we'd be better off to just return an error to the client in
postmaster, rather than deal with this dead-end children mess. That was
perhaps justified at some point, but now it seems to add way more complexity
than it's worth. And it's absurdly expensive to fork to return an error. Way
more expensive than just having postmaster send an error and close the socket.

@@ -2469,11 +2410,15 @@ process_pm_child_exit(void)
}
/* Was it the system logger?  If so, try to start a new one */
-		if (pid == SysLoggerPID)
+		if (SysLoggerPMChild && pid == SysLoggerPMChild->pid)
{
-			SysLoggerPID = 0;
/* for safety's sake, launch new logger *first* */
-			SysLoggerPID = SysLogger_Start();
+			SysLoggerPMChild->pid = SysLogger_Start();
+			if (SysLoggerPMChild->pid == 0)
+			{
+				FreePostmasterChildSlot(SysLoggerPMChild);
+				SysLoggerPMChild = NULL;
+			}
if (!EXIT_STATUS_0(exitstatus))
LogChildExit(LOG, _("system logger process"),
Seems a bit weird to have one place with a different memory lifetime handling
than other places. Why don't we just do this the same way as in other places
but continue to defer the logging until after we tried to start the new
logger?
Hmm, you mean let LaunchMissingBackgroundProcesses() handle the restart?

Yea - which it already can do, presumably to handle the case of
logging_collector. It just seems odd to have code to have three places calling
SysLogger_Start() - with some mild variations of the code.

Perhaps we can at least centralize some of that?

But you have a point with:

I'm a little scared of changing the existing logic. We don't have a
mechanism for deferring logging, so we would have to invent that, or the
logs would just accumulate in the pipe until syslogger starts up. There's
some code between here and LaunchMissingBackgroundProcesses(), so might
postmaster get blocked between writing to the syslogger pipe, before having
restarted it?

If forking the syslogger process fails, that can happen anyway, though.

/* Construct a process name for log message */
+
+	/*
+	 * FIXME: use GetBackendTypeDesc here? How does the localization of that
+	 * work?
+	 */
if (bp->bkend_type == B_DEAD_END_BACKEND)
{
procname = _("dead end backend");
Might be worth having a version of GetBackendTypeDesc() that returns a
translated string?
Constructing the string for background workers is a little more complicated:

Random aside: I *hate* that there's no trivial way recognie background workers
in pg_stat_activity, because somebody made pg_stat_activity.backend_type
report something completely under control of extensions...

@@ -2697,9 +2643,16 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
{
dlist_iter	iter;
-		dlist_foreach(iter, &BackendList)
+		dlist_foreach(iter, &ActiveChildList)
{
-			Backend    *bp = dlist_container(Backend, elem, iter.cur);
+			PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+			/* We do NOT restart the syslogger */
+			if (bp == SysLoggerPMChild)
+				continue;
That comment seems a bit misleading - we do restart syslogger, we just don't
do it here, no? I realize it's an old comment, but it still seems like it's
worth fixing given that you touch all the code here...
No, we really do not restart the syslogger.

Hm?

/* Was it the system logger? If so, try to start a new one */
if (SysLoggerPMChild && pid == SysLoggerPMChild->pid)
{
/* for safety's sake, launch new logger *first* */
SysLoggerPMChild->pid = SysLogger_Start(SysLoggerPMChild->child_slot);
if (SysLoggerPMChild->pid == 0)
{
FreePostmasterChildSlot(SysLoggerPMChild);
SysLoggerPMChild = NULL;
}
if (!EXIT_STATUS_0(exitstatus))
LogChildExit(LOG, _("system logger process"),
pid, exitstatus);
continue;
}

We don't do it reaction to other processes crashing, but we still restart it
if it dies. Perhaps it's clear from context - but I had to think aobut it for
a moment.

@@ -2871,29 +2786,27 @@ PostmasterStateMachine(void)
<snip>
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGTERM);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGTERM);
+		targetMask |= (1 << B_STARTUP);
+		targetMask |= (1 << B_WAL_RECEIVER);
+
+		targetMask |= (1 << B_WAL_SUMMARIZER);
+		targetMask |= (1 << B_SLOTSYNC_WORKER);
/* checkpointer, archiver, stats, and syslogger may continue for now */
+		SignalSomeChildren(SIGTERM, targetMask);
+
/* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
pmState = PM_WAIT_BACKENDS;
<snip>
It's likely the right thing to not do as one patch, but IMO this really wants
to be a state table. Perhaps as part of child_process_kinds, perhaps separate
from that.
Yeah. I've tried to refactor this into a table before, but didn't come up
with anything that I was happy with. I also feel there must be a better way
to organize this, but not sure what exactly. I hope that will become more
apparent after these other changes.

What I'm imagining is something like:
1) Make PMState values each have a distinct bit
2) Move PMState to some (new?) header
3) Add a "uint32 should_run" member to child_process_kind that's a bitmask of
PMStates
4) Add a new function in launch_backend.c that gets passed the "target"
PMState and returns a bitmask of the tasks that should be running (or the
inverse, doesn't really matter).
5) Instead of open-coding the targetMask "computation", use the new function
from 4).

I think that might not look too bad?

Greetings,

Andres Freund

#14

Robert Haas

robertmhaas@gmail.com

over 1 year ago

In reply to: Andres Freund (#13)

Re: Refactoring postmaster's code to cleanup after child exit

On Tue, Sep 10, 2024 at 12:59 PM Andres Freund <andres@anarazel.de> wrote:

I still think that we'd be better off to just return an error to the client in
postmaster, rather than deal with this dead-end children mess. That was
perhaps justified at some point, but now it seems to add way more complexity
than it's worth. And it's absurdly expensive to fork to return an error. Way
more expensive than just having postmaster send an error and close the socket.

The tricky case is the one where the client write() -- or SSL_write() -- blocks.

--
Robert Haas
EDB: http://www.enterprisedb.com

#15

Andres Freund

andres@anarazel.de

over 1 year ago

In reply to: Heikki Linnakangas (#6)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2024-08-12 12:55:00 +0300, Heikki Linnakangas wrote:

@@ -2864,6 +2777,8 @@ PostmasterStateMachine(void)
*/
if (pmState == PM_STOP_BACKENDS)
{
+ uint32 targetMask;
+
/*
* Forget any pending requests for background workers, since we're no
* longer willing to launch any new workers. (If additional requests
@@ -2871,29 +2786,27 @@ PostmasterStateMachine(void)
*/
ForgetUnstartedBackgroundWorkers();

-		/* Signal all backend children except walsenders and dead-end backends */
-		SignalSomeChildren(SIGTERM,
-						   BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND));
+		/* Signal all backend children except walsenders */
+		/* dead-end children are not signalled yet */
+		targetMask = (1 << B_BACKEND);
+		targetMask |= (1 << B_BG_WORKER);
+
/* and the autovac launcher too */
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGTERM);
+		targetMask |= (1 << B_AUTOVAC_LAUNCHER);
/* and the bgwriter too */
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGTERM);
+		targetMask |= (1 << B_BG_WRITER);
/* and the walwriter too */
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGTERM);
+		targetMask |= (1 << B_WAL_WRITER);
/* If we're in recovery, also stop startup and walreceiver procs */
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGTERM);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGTERM);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGTERM);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGTERM);
+		targetMask |= (1 << B_STARTUP);
+		targetMask |= (1 << B_WAL_RECEIVER);
+
+		targetMask |= (1 << B_WAL_SUMMARIZER);
+		targetMask |= (1 << B_SLOTSYNC_WORKER);
/* checkpointer, archiver, stats, and syslogger may continue for now */

+		SignalSomeChildren(SIGTERM, targetMask);
+
/* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
pmState = PM_WAIT_BACKENDS;
}

I think this might now omit shutting down at least autovac workers, which
afaict previously were included.

Greetings,

Andres Freund

#16

Andres Freund

andres@anarazel.de

over 1 year ago

In reply to: Robert Haas (#14)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2024-09-10 13:33:36 -0400, Robert Haas wrote:

On Tue, Sep 10, 2024 at 12:59 PM Andres Freund <andres@anarazel.de> wrote:

I still think that we'd be better off to just return an error to the client in
postmaster, rather than deal with this dead-end children mess. That was
perhaps justified at some point, but now it seems to add way more complexity
than it's worth. And it's absurdly expensive to fork to return an error. Way
more expensive than just having postmaster send an error and close the socket.

The tricky case is the one where the client write() -- or SSL_write() -- blocks.

Yea, SSL definitely does make it harder. But it's not exactly rocket science
to do non-blocking SSL connection establishment. After all, we do manage to
do so in libpq...

Greetings,

Andres Freund

#17

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Heikki Linnakangas (#10)

2 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

On 06/09/2024 12:52, Heikki Linnakangas wrote:

Unless you have comments on these first two patches which just add
tests, I'll commit them shortly. Still processing the rest of your
comments...

Didn't happen as "shortly" as I thought..

My test for dead-end backends opens 20 TCP (or unix domain) connections
to the server, in quick succession. That works fine my system, and it
passed cirrus CI on other platforms, but on FreeBSD it failed
repeatedly. The behavior in that scenario is apparently
platform-dependent: it depends on the accept queue size, but what
happens when you reach the queue size also seems to depend on the
platform. On my Linux system, the connect() calls in the client are
blocked, if the server is doesn't call accept() fast enough, but
apparently you get an error on *BSD systems.

I'm not sure of the exact details, but in any case, platform-dependent
behavior needs to be avoided in tests. So I changed the test so that it
sends an SSLRequest packet on each connection and waits for reply (which
is always 'N' to reject it in this test), before opening the next
connection. This way, each connection is still left hanging, which is
what I want in this test, but only after postmaster has successfully
accept()ed it and forked the backend.

So here are these test patches again, with that addition.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v5-0001-Add-test-for-connection-limits.patchtext/x-patch; charset=UTF-8; name=v5-0001-Add-test-for-connection-limits.patchDownload

From 16993fde8153b4feced18e6a49d8b470e0a49241 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 4 Oct 2024 20:15:36 +0300
Subject: [PATCH v5 1/3] Add test for connection limits

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
---
 src/test/Makefile                             |  2 +-
 src/test/meson.build                          |  1 +
 src/test/postmaster/Makefile                  | 23 ++++++
 src/test/postmaster/README                    | 27 +++++++
 src/test/postmaster/meson.build               | 12 +++
 .../postmaster/t/001_connection_limits.pl     | 79 +++++++++++++++++++
 6 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 src/test/postmaster/Makefile
 create mode 100644 src/test/postmaster/README
 create mode 100644 src/test/postmaster/meson.build
 create mode 100644 src/test/postmaster/t/001_connection_limits.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index dbd3192874..abdd6e5a98 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl postmaster regress isolation modules authentication recovery subscription
 
 ifeq ($(with_icu),yes)
 SUBDIRS += icu
diff --git a/src/test/meson.build b/src/test/meson.build
index c3d0dfedf1..67376e4b7f 100644
--- a/src/test/meson.build
+++ b/src/test/meson.build
@@ -4,6 +4,7 @@ subdir('regress')
 subdir('isolation')
 
 subdir('authentication')
+subdir('postmaster')
 subdir('recovery')
 subdir('subscription')
 subdir('modules')
diff --git a/src/test/postmaster/Makefile b/src/test/postmaster/Makefile
new file mode 100644
index 0000000000..dfcce9c9ee
--- /dev/null
+++ b/src/test/postmaster/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/postmaster
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/postmaster/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/postmaster
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean:
+	rm -rf tmp_check
diff --git a/src/test/postmaster/README b/src/test/postmaster/README
new file mode 100644
index 0000000000..7e47bf5cff
--- /dev/null
+++ b/src/test/postmaster/README
@@ -0,0 +1,27 @@
+src/test/postmaster/README
+
+Regression tests for postmaster
+===============================
+
+This directory contains a test suite for postmaster's handling of
+connections, connection limits, and startup/shutdown sequence.
+
+
+Running the tests
+=================
+
+NOTE: You must have given the --enable-tap-tests argument to configure.
+
+Run
+    make check
+or
+    make installcheck
+You can use "make installcheck" if you previously did "make install".
+In that case, the code in the installation tree is tested.  With
+"make check", a temporary installation tree is built from the current
+sources and then tested.
+
+Either way, this test initializes, starts, and stops a test Postgres
+cluster.
+
+See src/test/perl/README for more info about running these tests.
diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
new file mode 100644
index 0000000000..c2de2e0eb5
--- /dev/null
+++ b/src/test/postmaster/meson.build
@@ -0,0 +1,12 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+tests += {
+  'name': 'postmaster',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'tests': [
+      't/001_connection_limits.pl',
+    ],
+  },
+}
diff --git a/src/test/postmaster/t/001_connection_limits.pl b/src/test/postmaster/t/001_connection_limits.pl
new file mode 100644
index 0000000000..f50aae4949
--- /dev/null
+++ b/src/test/postmaster/t/001_connection_limits.pl
@@ -0,0 +1,79 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# Test connection limits, i.e. max_connections, reserved_connections
+# and superuser_reserved_connections.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize the server with specific low connection limits
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "max_connections = 6");
+$node->append_conf('postgresql.conf', "reserved_connections = 2");
+$node->append_conf('postgresql.conf', "superuser_reserved_connections = 1");
+$node->append_conf('postgresql.conf', "log_connections = on");
+$node->start;
+
+$node->safe_psql(
+	'postgres', qq{
+CREATE USER regress_regular LOGIN;
+CREATE USER regress_reserved LOGIN;
+GRANT pg_use_reserved_connections TO regress_reserved;
+CREATE USER regress_superuser LOGIN SUPERUSER;
+});
+
+# With the limits we set in postgresql.conf, we can establish:
+# - 3 connections for any user with no special privileges
+# - 2 more connections for users belonging to "pg_use_reserved_connections"
+# - 1 more connection for superuser
+
+sub background_psql_as_user
+{
+	my $user = shift;
+
+	return $node->background_psql(
+		'postgres',
+		on_error_die => 1,
+		extra_params => [ '-U', $user ]);
+}
+
+my @sessions = ();
+
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with privileges of the "pg_use_reserved_connections" role/
+);
+
+push(@sessions, background_psql_as_user('regress_reserved'));
+push(@sessions, background_psql_as_user('regress_reserved'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with the SUPERUSER attribute/
+);
+
+push(@sessions, background_psql_as_user('regress_superuser'));
+$node->connect_fails(
+	"dbname=postgres user=regress_superuser",
+	"superuser_reserved_connections limit",
+	expected_stderr => qr/FATAL:  sorry, too many clients already/);
+
+# TODO: test that query cancellation is still possible
+
+foreach my $session (@sessions)
+{
+	$session->quit;
+}
+
+done_testing();
-- 
2.39.5

v5-0002-Add-test-for-dead-end-backends.patchtext/x-patch; charset=UTF-8; name=v5-0002-Add-test-for-dead-end-backends.patchDownload

From cd53192ea66a5b07c93941bd36e9f38bf8005956 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 4 Oct 2024 20:15:44 +0300
Subject: [PATCH v5 2/3] Add test for dead-end backends

The code path for launching a dead-end backend because we're out of
slots was not covered by any tests, so add one. (Some tests did hit
the case of launching a dead-end backend because the server is still
starting up, though, so the gap in our test coverage wasn't as big as
it sounds.)

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
---
 src/test/perl/PostgreSQL/Test/Cluster.pm      | 38 +++++++++++++++++++
 .../postmaster/t/001_connection_limits.pl     | 37 +++++++++++++++++-
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 90a842f96a..c278765fb0 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -104,6 +104,7 @@ use File::Path qw(rmtree mkpath);
 use File::Spec;
 use File::stat qw(stat);
 use File::Temp ();
+use IO::Socket::INET;
 use IPC::Run;
 use PostgreSQL::Version;
 use PostgreSQL::Test::RecursiveCopy;
@@ -286,6 +287,43 @@ sub connstr
 
 =pod
 
+=item $node->raw_connect()
+
+Open a raw TCP or Unix domain socket connection to the server. This is
+used by low-level protocol and connection limit tests.
+
+=cut
+
+sub raw_connect
+{
+	my ($self) = @_;
+	my $pgport = $self->port;
+	my $pghost = $self->host;
+
+	my $socket;
+	if ($PostgreSQL::Test::Utils::use_unix_sockets)
+	{
+		require IO::Socket::UNIX;
+		my $path = "$pghost/.s.PGSQL.$pgport";
+
+		$socket = IO::Socket::UNIX->new(
+			Type => SOCK_STREAM(),
+			Peer => $path,
+		) or die "Cannot create socket - $IO::Socket::errstr\n";
+	}
+	else
+	{
+		$socket = IO::Socket::INET->new(
+			PeerHost => $pghost,
+			PeerPort => $pgport,
+			Proto => 'tcp'
+		) or die "Cannot create socket - $IO::Socket::errstr\n";
+	}
+	return $socket;
+}
+
+=pod
+
 =item $node->group_access()
 
 Does the data dir allow group access?
diff --git a/src/test/postmaster/t/001_connection_limits.pl b/src/test/postmaster/t/001_connection_limits.pl
index f50aae4949..158464fe03 100644
--- a/src/test/postmaster/t/001_connection_limits.pl
+++ b/src/test/postmaster/t/001_connection_limits.pl
@@ -43,6 +43,7 @@ sub background_psql_as_user
 }
 
 my @sessions = ();
+my @raw_connections = ();
 
 push(@sessions, background_psql_as_user('regress_regular'));
 push(@sessions, background_psql_as_user('regress_regular'));
@@ -69,11 +70,45 @@ $node->connect_fails(
 	"superuser_reserved_connections limit",
 	expected_stderr => qr/FATAL:  sorry, too many clients already/);
 
-# TODO: test that query cancellation is still possible
+# We can still open TCP (or Unix domain socket) connections, but
+# beyond a certain number (roughly 2x max_connections), they will be
+# "dead-end backends".
+for (my $i = 0; $i <= 20; $i++)
+{
+	my $sock = $node->raw_connect();
+
+	# On a busy system, the server might reject connections if
+	# postmaster cannot accept() them fast enough. The exact limit and
+	# behavior depends on the platform. To make this reliable, we
+	# attempt SSL negotiation on each connection before opening next
+	# one. The server will reject the SSL negotations, but when it
+	# does so, we know that the backend has been launched and we
+	# should be able to open another connection.
+
+	# SSLRequest packet consists of packet length followed by
+	# NEGOTIATE_SSL_CODE.
+	my $negotiate_ssl_code = pack("Nnn", 8, 1234, 5679);
+	my $sent = $sock->send($negotiate_ssl_code);
 
+	# Read reply. We expect the server to reject it with 'N'
+	my $reply = "";
+	$sock->recv($reply, 1);
+	is($reply, "N", "dead-end connection $i");
+
+	push(@raw_connections, $sock);
+}
+
+# TODO: test that query cancellation is still possible. A dead-end
+# backend can process a query cancellation packet.
+
+# Clean up
 foreach my $session (@sessions)
 {
 	$session->quit;
 }
+foreach my $socket (@raw_connections)
+{
+	$socket->close();
+}
 
 done_testing();
-- 
2.39.5

#18

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Heikki Linnakangas (#17)

Re: Refactoring postmaster's code to cleanup after child exit

On Sat, Oct 5, 2024 at 7:41 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

My test for dead-end backends opens 20 TCP (or unix domain) connections
to the server, in quick succession. That works fine my system, and it
passed cirrus CI on other platforms, but on FreeBSD it failed
repeatedly. The behavior in that scenario is apparently
platform-dependent: it depends on the accept queue size, but what
happens when you reach the queue size also seems to depend on the
platform. On my Linux system, the connect() calls in the client are
blocked, if the server is doesn't call accept() fast enough, but
apparently you get an error on *BSD systems.

Right, we've analysed that difference in AF_UNIX implementation
before[1]/messages/by-id/CADc_NKg2d+oZY9mg4DdQdoUcGzN2kOYXBu-3--RW_hEe0tUV=g@mail.gmail.com, which shows up in the real world, where client sockets ie
libpq's are usually non-blocking, as EAGAIN on Linux (which is not
valid per POSIX) vs ECONNREFUSED on other OSes. All fail to connect,
but the error message is different.

For blocking AF_UNIX client sockets like in your test, Linux
effectively has an infinite queue made from two layers. The listen
queue (a queue of connecting sockets) does respect the requested
backlog size, but when it's full it has an extra trick: the connect()
call waits (in a queue of threads) for space to become free in the
listen queue, so it's effectively unlimited (but only for blocking
sockets), while FreeBSD and I suspect any other implementation
deriving from or reimplementing the BSD socket code gives you
ECONNREFUSED. macOS behaves just the same as FreeBSD AFAICT, so I
don't know why you didn't see the same thing... I guess it's just
racing against accept() draining the queue.

It's possible that Windows copied the Linux behaviour for AF_UNIX,
given that it probably has something to do with the WSL project for
emulating Linux, but IDK.

[1]: /messages/by-id/CADc_NKg2d+oZY9mg4DdQdoUcGzN2kOYXBu-3--RW_hEe0tUV=g@mail.gmail.com

I'm not sure of the exact details, but in any case, platform-dependent
behavior needs to be avoided in tests. So I changed the test so that it
sends an SSLRequest packet on each connection and waits for reply (which
is always 'N' to reject it in this test), before opening the next
connection. This way, each connection is still left hanging, which is
what I want in this test, but only after postmaster has successfully
accept()ed it and forked the backend.

Makes sense.

#19

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Thomas Munro (#18)

Re: Refactoring postmaster's code to cleanup after child exit

On 05/10/2024 01:03, Thomas Munro wrote:

On Sat, Oct 5, 2024 at 7:41 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

My test for dead-end backends opens 20 TCP (or unix domain) connections
to the server, in quick succession. That works fine my system, and it
passed cirrus CI on other platforms, but on FreeBSD it failed
repeatedly. The behavior in that scenario is apparently
platform-dependent: it depends on the accept queue size, but what
happens when you reach the queue size also seems to depend on the
platform. On my Linux system, the connect() calls in the client are
blocked, if the server is doesn't call accept() fast enough, but
apparently you get an error on *BSD systems.

Right, we've analysed that difference in AF_UNIX implementation
before[1], which shows up in the real world, where client sockets ie
libpq's are usually non-blocking, as EAGAIN on Linux (which is not
valid per POSIX) vs ECONNREFUSED on other OSes. All fail to connect,
but the error message is different.

Thanks for the pointer!

For blocking AF_UNIX client sockets like in your test, Linux
effectively has an infinite queue made from two layers. The listen
queue (a queue of connecting sockets) does respect the requested
backlog size, but when it's full it has an extra trick: the connect()
call waits (in a queue of threads) for space to become free in the
listen queue, so it's effectively unlimited (but only for blocking
sockets), while FreeBSD and I suspect any other implementation
deriving from or reimplementing the BSD socket code gives you
ECONNREFUSED. macOS behaves just the same as FreeBSD AFAICT, so I
don't know why you didn't see the same thing... I guess it's just
racing against accept() draining the queue.

In fact I misremembered: the failure happened on macOS, *not* FreeBSD.
It could be just luck I didn't see it on FreeBSD though.

It's possible that Windows copied the Linux behaviour for AF_UNIX,
given that it probably has something to do with the WSL project for
emulating Linux, but IDK.

Sadly Windows' IO::Socket::UNIX hasn't been implemented on Windows (or
at least on this perl distribution we're using in Cirrus CI):

Socket::pack_sockaddr_un not implemented on this architecture at
C:/strawberry/5.26.3.1/perl/lib/Socket.pm line 872.

so I'll have to disable this test on Windows anyway.

--
Heikki Linnakangas
Neon (https://neon.tech)

#20

Dagfinn Ilmari Mannsåker

ilmari@ilmari.org

over 1 year ago

In reply to: Heikki Linnakangas (#19)

Re: Refactoring postmaster's code to cleanup after child exit

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 05/10/2024 01:03, Thomas Munro wrote:

It's possible that Windows copied the Linux behaviour for AF_UNIX,
given that it probably has something to do with the WSL project for
emulating Linux, but IDK.

Sadly Windows' IO::Socket::UNIX hasn't been implemented on Windows (or
at least on this perl distribution we're using in Cirrus CI):

Socket::pack_sockaddr_un not implemented on this architecture at
C:/strawberry/5.26.3.1/perl/lib/Socket.pm line 872.

so I'll have to disable this test on Windows anyway.

Socket version 2.028 (included in Perl 5.32) provides pack_sockaddr_un()
on Windows, so that can be fixed by bumping the Perl version in
https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl
to something more modern (such as 5.40.0.1), and only skipping the test
if on Windows if Socket is too old.

The decision to use 5.26 seems to come from the initial creation of the
CI images in 2021 (when 5.34 was current), with the comment «newer
versions don't currently work correctly for plperl». That claim is
worth revisiting, and fixing if it's still the case.

- ilmari

#21

Andres Freund

andres@anarazel.de

over 1 year ago

In reply to: Dagfinn Ilmari Mannsåker (#20)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2024-10-05 20:51:50 +0100, Dagfinn Ilmari Mannsï¿½ker wrote:

Socket version 2.028 (included in Perl 5.32) provides pack_sockaddr_un()
on Windows, so that can be fixed by bumping the Perl version in
https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl
to something more modern (such as 5.40.0.1), and only skipping the test
if on Windows if Socket is too old.

The decision to use 5.26 seems to come from the initial creation of the
CI images in 2021 (when 5.34 was current), with the comment ï¿½newer
versions don't currently work correctly for plperlï¿½. That claim is
worth revisiting, and fixing if it's still the case.

I think we fixed the issues that were known at the time. I think I tried
upgrading to something newer at some point and there were some weird, but
fixable, encoding issues. Unfortunately I don't have the bandwidth to tackle
this rn.

Greetings,

Andres Freund

#22

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Dagfinn Ilmari Mannsåker (#20)

2 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

On 05/10/2024 22:51, Dagfinn Ilmari Mannsåker wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Sadly Windows' IO::Socket::UNIX hasn't been implemented on Windows (or
at least on this perl distribution we're using in Cirrus CI):

Socket::pack_sockaddr_un not implemented on this architecture at
C:/strawberry/5.26.3.1/perl/lib/Socket.pm line 872.

so I'll have to disable this test on Windows anyway.

Socket version 2.028 (included in Perl 5.32) provides pack_sockaddr_un()
on Windows, so that can be fixed by bumping the Perl version in
https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl
to something more modern (such as 5.40.0.1), and only skipping the test
if on Windows if Socket is too old.

The decision to use 5.26 seems to come from the initial creation of the
CI images in 2021 (when 5.34 was current), with the comment «newer
versions don't currently work correctly for plperl». That claim is
worth revisiting, and fixing if it's still the case.

Yeah, it would be nice to update it. I wonder if commit
341f4e002d461a3c5513cb864490cddae2b43a64 fixed whatever the problem was.

In the meanwhile, here is a one more version of the test patches, with a
SKIP that checks that IO::Socket::UNIX works.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v6-0001-Add-test-for-connection-limits.patchtext/x-patch; charset=UTF-8; name=v6-0001-Add-test-for-connection-limits.patchDownload

From 9b499e8cdf09a127d7506837d5e2a697dd342820 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 8 Oct 2024 00:54:30 +0300
Subject: [PATCH v6 1/3] Add test for connection limits

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
---
 src/test/Makefile                             |  2 +-
 src/test/meson.build                          |  1 +
 src/test/postmaster/Makefile                  | 23 ++++++
 src/test/postmaster/README                    | 27 +++++++
 src/test/postmaster/meson.build               | 12 +++
 .../postmaster/t/001_connection_limits.pl     | 79 +++++++++++++++++++
 6 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 src/test/postmaster/Makefile
 create mode 100644 src/test/postmaster/README
 create mode 100644 src/test/postmaster/meson.build
 create mode 100644 src/test/postmaster/t/001_connection_limits.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index dbd3192874..abdd6e5a98 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl postmaster regress isolation modules authentication recovery subscription
 
 ifeq ($(with_icu),yes)
 SUBDIRS += icu
diff --git a/src/test/meson.build b/src/test/meson.build
index c3d0dfedf1..67376e4b7f 100644
--- a/src/test/meson.build
+++ b/src/test/meson.build
@@ -4,6 +4,7 @@ subdir('regress')
 subdir('isolation')
 
 subdir('authentication')
+subdir('postmaster')
 subdir('recovery')
 subdir('subscription')
 subdir('modules')
diff --git a/src/test/postmaster/Makefile b/src/test/postmaster/Makefile
new file mode 100644
index 0000000000..dfcce9c9ee
--- /dev/null
+++ b/src/test/postmaster/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/postmaster
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/postmaster/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/postmaster
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean:
+	rm -rf tmp_check
diff --git a/src/test/postmaster/README b/src/test/postmaster/README
new file mode 100644
index 0000000000..7e47bf5cff
--- /dev/null
+++ b/src/test/postmaster/README
@@ -0,0 +1,27 @@
+src/test/postmaster/README
+
+Regression tests for postmaster
+===============================
+
+This directory contains a test suite for postmaster's handling of
+connections, connection limits, and startup/shutdown sequence.
+
+
+Running the tests
+=================
+
+NOTE: You must have given the --enable-tap-tests argument to configure.
+
+Run
+    make check
+or
+    make installcheck
+You can use "make installcheck" if you previously did "make install".
+In that case, the code in the installation tree is tested.  With
+"make check", a temporary installation tree is built from the current
+sources and then tested.
+
+Either way, this test initializes, starts, and stops a test Postgres
+cluster.
+
+See src/test/perl/README for more info about running these tests.
diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
new file mode 100644
index 0000000000..c2de2e0eb5
--- /dev/null
+++ b/src/test/postmaster/meson.build
@@ -0,0 +1,12 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+tests += {
+  'name': 'postmaster',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'tests': [
+      't/001_connection_limits.pl',
+    ],
+  },
+}
diff --git a/src/test/postmaster/t/001_connection_limits.pl b/src/test/postmaster/t/001_connection_limits.pl
new file mode 100644
index 0000000000..f50aae4949
--- /dev/null
+++ b/src/test/postmaster/t/001_connection_limits.pl
@@ -0,0 +1,79 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# Test connection limits, i.e. max_connections, reserved_connections
+# and superuser_reserved_connections.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize the server with specific low connection limits
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "max_connections = 6");
+$node->append_conf('postgresql.conf', "reserved_connections = 2");
+$node->append_conf('postgresql.conf', "superuser_reserved_connections = 1");
+$node->append_conf('postgresql.conf', "log_connections = on");
+$node->start;
+
+$node->safe_psql(
+	'postgres', qq{
+CREATE USER regress_regular LOGIN;
+CREATE USER regress_reserved LOGIN;
+GRANT pg_use_reserved_connections TO regress_reserved;
+CREATE USER regress_superuser LOGIN SUPERUSER;
+});
+
+# With the limits we set in postgresql.conf, we can establish:
+# - 3 connections for any user with no special privileges
+# - 2 more connections for users belonging to "pg_use_reserved_connections"
+# - 1 more connection for superuser
+
+sub background_psql_as_user
+{
+	my $user = shift;
+
+	return $node->background_psql(
+		'postgres',
+		on_error_die => 1,
+		extra_params => [ '-U', $user ]);
+}
+
+my @sessions = ();
+
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with privileges of the "pg_use_reserved_connections" role/
+);
+
+push(@sessions, background_psql_as_user('regress_reserved'));
+push(@sessions, background_psql_as_user('regress_reserved'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with the SUPERUSER attribute/
+);
+
+push(@sessions, background_psql_as_user('regress_superuser'));
+$node->connect_fails(
+	"dbname=postgres user=regress_superuser",
+	"superuser_reserved_connections limit",
+	expected_stderr => qr/FATAL:  sorry, too many clients already/);
+
+# TODO: test that query cancellation is still possible
+
+foreach my $session (@sessions)
+{
+	$session->quit;
+}
+
+done_testing();
-- 
2.39.5

v6-0002-Add-test-for-dead-end-backends.patchtext/x-patch; charset=UTF-8; name=v6-0002-Add-test-for-dead-end-backends.patchDownload

From 5fe25211724859bfa29bff73ac780322ab95181c Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 8 Oct 2024 00:54:33 +0300
Subject: [PATCH v6 2/3] Add test for dead-end backends

The code path for launching a dead-end backend because we're out of
slots was not covered by any tests, so add one. (Some tests did hit
the case of launching a dead-end backend because the server is still
starting up, though, so the gap in our test coverage wasn't as big as
it sounds.)

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
---
 src/test/perl/PostgreSQL/Test/Cluster.pm      | 78 +++++++++++++++++++
 .../postmaster/t/001_connection_limits.pl     | 42 +++++++++-
 2 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 30857f34bf..63c25eeb83 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -104,6 +104,7 @@ use File::Path qw(rmtree mkpath);
 use File::Spec;
 use File::stat qw(stat);
 use File::Temp ();
+use IO::Socket::INET;
 use IPC::Run;
 use PostgreSQL::Version;
 use PostgreSQL::Test::RecursiveCopy;
@@ -291,6 +292,83 @@ sub connstr
 
 =pod
 
+=item $node->raw_connect()
+
+Open a raw TCP or Unix domain socket connection to the server. This is
+used by low-level protocol and connection limit tests.
+
+=cut
+
+sub raw_connect
+{
+	my ($self) = @_;
+	my $pgport = $self->port;
+	my $pghost = $self->host;
+
+	my $socket;
+	if ($PostgreSQL::Test::Utils::use_unix_sockets)
+	{
+		require IO::Socket::UNIX;
+		my $path = "$pghost/.s.PGSQL.$pgport";
+
+		$socket = IO::Socket::UNIX->new(
+			Type => SOCK_STREAM(),
+			Peer => $path,
+		) or die "Cannot create socket - $IO::Socket::errstr\n";
+	}
+	else
+	{
+		$socket = IO::Socket::INET->new(
+			PeerHost => $pghost,
+			PeerPort => $pgport,
+			Proto => 'tcp'
+		) or die "Cannot create socket - $IO::Socket::errstr\n";
+	}
+	return $socket;
+}
+
+=pod
+
+=item $node->raw_connect_works()
+
+Check if raw_connect() function works on this platform. This should
+be called to SKIP any tests that require raw_connect().
+
+This tries to connect to the server, to test whether it works or not,,
+so the server is up and running. Otherwise this can return 0 even if
+there's nothing wrong with raw_connect() itself.
+
+Notably, raw_connect() does not work on Unix domain sockets on
+Strawberry perl 5.26.3.1 on Windows, which we use in Cirrus CI images
+as of this writing. It dies with "not implemented on this
+architecture".
+
+=cut
+
+sub raw_connect_works
+{
+	my ($self) = @_;
+
+	# If we're using Unix domain sockets, we need a working
+	# IO::Socket::UNIX implementation.
+	if ($PostgreSQL::Test::Utils::use_unix_sockets)
+	{
+		diag "checking if IO::Socket::UNIX works";
+		eval {
+			my $sock = $self->raw_connect();
+			$sock->close();
+		};
+		if ($@ =~ /not implemented/)
+		{
+			diag "IO::Socket::UNIX does not work: $@";
+			return 0;
+		}
+	}
+	return 1
+}
+
+=pod
+
 =item $node->group_access()
 
 Does the data dir allow group access?
diff --git a/src/test/postmaster/t/001_connection_limits.pl b/src/test/postmaster/t/001_connection_limits.pl
index f50aae4949..f8d24bcf24 100644
--- a/src/test/postmaster/t/001_connection_limits.pl
+++ b/src/test/postmaster/t/001_connection_limits.pl
@@ -43,6 +43,7 @@ sub background_psql_as_user
 }
 
 my @sessions = ();
+my @raw_connections = ();
 
 push(@sessions, background_psql_as_user('regress_regular'));
 push(@sessions, background_psql_as_user('regress_regular'));
@@ -69,11 +70,50 @@ $node->connect_fails(
 	"superuser_reserved_connections limit",
 	expected_stderr => qr/FATAL:  sorry, too many clients already/);
 
-# TODO: test that query cancellation is still possible
+# We can still open TCP (or Unix domain socket) connections, but
+# beyond a certain number (roughly 2x max_connections), they will be
+# "dead-end backends".
+SKIP:
+{
+	skip "this test requies working raw_connect()" unless $node->raw_connect_works();
+
+	for (my $i = 0; $i <= 20; $i++)
+	{
+		my $sock = $node->raw_connect();
+
+		# On a busy system, the server might reject connections if
+		# postmaster cannot accept() them fast enough. The exact limit
+		# and behavior depends on the platform. To make this reliable,
+		# we attempt SSL negotiation on each connection before opening
+		# next one. The server will reject the SSL negotations, but
+		# when it does so, we know that the backend has been launched
+		# and we should be able to open another connection.
+
+		# SSLRequest packet consists of packet length followed by
+		# NEGOTIATE_SSL_CODE.
+		my $negotiate_ssl_code = pack("Nnn", 8, 1234, 5679);
+		my $sent = $sock->send($negotiate_ssl_code);
+
+		# Read reply. We expect the server to reject it with 'N'
+		my $reply = "";
+		$sock->recv($reply, 1);
+		is($reply, "N", "dead-end connection $i");
 
+		push(@raw_connections, $sock);
+	}
+}
+
+# TODO: test that query cancellation is still possible. A dead-end
+# backend can process a query cancellation packet.
+
+# Clean up
 foreach my $session (@sessions)
 {
 	$session->quit;
 }
+foreach my $socket (@raw_connections)
+{
+	$socket->close();
+}
 
 done_testing();
-- 
2.39.5

#23

Nazir Bilal Yavuz

byavuz81@gmail.com

over 1 year ago

In reply to: Heikki Linnakangas (#22)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On Tue, 8 Oct 2024 at 00:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 05/10/2024 22:51, Dagfinn Ilmari Mannsåker wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Sadly Windows' IO::Socket::UNIX hasn't been implemented on Windows (or
at least on this perl distribution we're using in Cirrus CI):

Socket::pack_sockaddr_un not implemented on this architecture at
C:/strawberry/5.26.3.1/perl/lib/Socket.pm line 872.

so I'll have to disable this test on Windows anyway.

Socket version 2.028 (included in Perl 5.32) provides pack_sockaddr_un()
on Windows, so that can be fixed by bumping the Perl version in
https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl
to something more modern (such as 5.40.0.1), and only skipping the test
if on Windows if Socket is too old.

The decision to use 5.26 seems to come from the initial creation of the
CI images in 2021 (when 5.34 was current), with the comment «newer
versions don't currently work correctly for plperl». That claim is
worth revisiting, and fixing if it's still the case.

Yeah, it would be nice to update it. I wonder if commit
341f4e002d461a3c5513cb864490cddae2b43a64 fixed whatever the problem was.

The perl version in Windows CI image is bumped to 5.40.0.1 [1]https://github.com/anarazel/pg-vm-images/commit/cbd5d46f2fb7b28efb126ddac64d12711247dfa8. So,
the related test passes on Windows now [2]https://cirrus-ci.com/task/5682393120505856?logs=check_world#L241.

[1]: https://github.com/anarazel/pg-vm-images/commit/cbd5d46f2fb7b28efb126ddac64d12711247dfa8
[2]: https://cirrus-ci.com/task/5682393120505856?logs=check_world#L241

--
Regards,
Nazir Bilal Yavuz
Microsoft

#24

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: Andres Freund (#15)

4 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

I pushed the first three patches, with the new test and one of the small
refactoring patches. Thanks for all the comments so far! Here is a new
version of the remaining patches.

Lots of little cleanups and changes here and there since the last
versions, but the notable bigger changes are:

- There is now a BackendTypeMask datatype, so that if you try to mix up
bitmasks and plain BackendType values, the compiler will complain.

- pmchild.c has been rewritten per feedback, so that the "pools" of
PMChild structs are more explicit. The size of each pool is only stated
once, whereas before the same logic was duplicated in
MaxLivePostmasterChildren() which calculates the number of slots and in
InitPostmasterChildSlots() which allocates them.

- In PostmasterStateMachine(), I combined the code to handle
PM_STOP_BACKENDS and PM_WAIT_BACKENDS. They are essentially the same
state, except that PM_STOP_BACKENDS first sends the signal to all the
child processes that it will then wait for. They both needed to build
the same bitmask of processes to signal or wait for; this eliminates the
duplication.

Responses to some specific comments below:

On 10/09/2024 20:53, Andres Freund wrote:

On 2024-08-12 12:55:00 +0300, Heikki Linnakangas wrote:

-		/* Signal all backend children except walsenders and dead-end backends */
-		SignalSomeChildren(SIGTERM,
-						   BACKEND_TYPE_ALL & ~(1 << B_WAL_SENDER | 1 << B_DEAD_END_BACKEND));
+		/* Signal all backend children except walsenders */
+		/* dead-end children are not signalled yet */
+		targetMask = (1 << B_BACKEND);
+		targetMask |= (1 << B_BG_WORKER);
+
/* and the autovac launcher too */
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGTERM);
+		targetMask |= (1 << B_AUTOVAC_LAUNCHER);
/* and the bgwriter too */
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGTERM);
+		targetMask |= (1 << B_BG_WRITER);
/* and the walwriter too */
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGTERM);
+		targetMask |= (1 << B_WAL_WRITER);
/* If we're in recovery, also stop startup and walreceiver procs */
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGTERM);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGTERM);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGTERM);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGTERM);
+		targetMask |= (1 << B_STARTUP);
+		targetMask |= (1 << B_WAL_RECEIVER);
+
+		targetMask |= (1 << B_WAL_SUMMARIZER);
+		targetMask |= (1 << B_SLOTSYNC_WORKER);
/* checkpointer, archiver, stats, and syslogger may continue for now */

+		SignalSomeChildren(SIGTERM, targetMask);
+
/* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
pmState = PM_WAIT_BACKENDS;
}

I think this might now omit shutting down at least autovac workers, which
afaict previously were included.

Fixed. And this code now also explicitly lists backend types that are
*not* signaled, and there is an assertion that all backend types are
accounted for. Thanks to that, if someone adds a new backend type, they
will be forced to decide if the new backend type should be signaled here
or not. That's not quite table-driven like you suggested, but it's
closer to that.

+/* Return the appropriate free-list for the given backend type */
+static dlist_head *
+GetFreeList(BackendType btype)
+{
+	switch (btype)
+	{
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_WAL_SENDER:
+		case B_SLOTSYNC_WORKER:
+			return &freeBackendList;
Maybe a daft question - but why are all of these in the same list? Sure,
they're essentially all full backends, but they're covered by different GUCs?
No reason. No particular reason they should *not* share the same list either
though.
Aren't they controlled by distinct connection limits? Isn't there a danger
that we could use up entries and fail connections due to that, despite not
actually being above the limit?

Yes, this was in fact just wrong. Slotsync worker is a special process
and should not be allocated from the same pool as backends, and
previously it was not. And indeed bgworkers have a separate connection
limit, and should have a separate pool. Fixed.

Tangential: Why do we need a freelist for these and why do we choose a random
pgproc for these instead of assigning one statically?

Background: I'd like to not provide AIO workers with "bounce buffers" (for IO
of buffers that can't be done in-place, like writes when checksums are
enabled). The varying proc numbers make that harder than it'd have to be...

Yeah, we can make these fixed.

Cool.

All the aux processes now have their own "free list" or pool of a single
entry now, so after postmaster startup, their child_slot never changes.
They're still not constants across server startups though, because the
numbering depends on max_connections etc. If it matters, we could
allocate the aux process slot numbers first, so that they would be truly
static, but I didn't do that.

I did not change how ProcNumbers are allocated. They are still separate
from PMChildSlots.

Includes a test for that case where a dead-end backend previously kept
the server from shutting down.

The test hardcodes timeouts, I think we've largely come to regret that when we
did. Should probably just be a multiplier based on
PostgreSQL::Test::Utils::timeout_default?

Hmm, what the test really needs is that the authentication_timeout is >>
the "pg_ctl stop" timeout. The idea is that if a dead-end backend isn't
killed, but needs to wait for authentication_timeout to expire, the test
should fail. The default pg_ctl stop timeout is actually only 60 s,
while PostgreSQL::Test::Utils::timeout_default is 180 s.

I changed authentication_timeout in the test to
PostgreSQL::Test::Utils::timeout_default, but also added an explicit
timeout to "$node->stop", and set that to authentication_timeout / 2.
That ensures that the stop timeout is smaller than
authentication_timeout, regardless of
PostgreSQL::Test::Utils::timeout_default or the default pg_ctl timeout.

/* Construct a process name for log message */
+
+	/*
+	 * FIXME: use GetBackendTypeDesc here? How does the localization of that
+	 * work?
+	 */
if (bp->bkend_type == B_DEAD_END_BACKEND)
{
procname = _("dead end backend");

Might be worth having a version of GetBackendTypeDesc() that returns a
translated string?

I made the translation of GetBackendTypeDesc() work the same as for
error_severity(int elevel).

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Replace-postmaster.c-s-own-backend-type-codes-with-B.patchtext/x-patch; charset=UTF-8; name=0001-Replace-postmaster.c-s-own-backend-type-codes-with-B.patchDownload

From 80c098bd242ab6550be4ebdf9e7bd66c9d32bc55 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 9 Oct 2024 20:30:16 +0300
Subject: [PATCH 1/4] Replace postmaster.c's own backend type codes with
 BackendType

Introduce a separate BackendType for dead-end children, so that we
don't need a separate dead_end flag.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
---
 src/backend/postmaster/launch_backend.c |   1 +
 src/backend/postmaster/postmaster.c     | 234 ++++++++++++++----------
 src/backend/utils/activity/pgstat_io.c  |   3 +
 src/backend/utils/init/miscinit.c       |  43 +++--
 src/include/miscadmin.h                 |   1 +
 src/tools/pgindent/typedefs.list        |   1 +
 6 files changed, 167 insertions(+), 116 deletions(-)

diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index 0ae23fdf55..b0b91dc97f 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -182,6 +182,7 @@ static child_process_kind child_process_kinds[] = {
 	[B_INVALID] = {"invalid", NULL, false},
 
 	[B_BACKEND] = {"backend", BackendMain, true},
+	[B_DEAD_END_BACKEND] = {"dead-end backend", BackendMain, true},
 	[B_AUTOVAC_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
 	[B_AUTOVAC_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
 	[B_BG_WORKER] = {"bgworker", BackgroundWorkerMain, true},
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 85fd24e828..14a0ce91b2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -129,15 +129,71 @@
 
 
 /*
- * Possible types of a backend. Beyond being the possible bkend_type values in
- * struct bkend, these are OR-able request flag bits for SignalSomeChildren()
- * and CountChildren().
+ * CountChildren and SignalChildren take a bitmask argument to represent
+ * BackendTypes to count or signal.  Define a separate type and functions to
+ * work with the bitmasks, to avoid accidentally passing a plain BackendType
+ * in place of a bitmask or vice versa.
  */
-#define BACKEND_TYPE_NORMAL		0x0001	/* normal backend */
-#define BACKEND_TYPE_AUTOVAC	0x0002	/* autovacuum worker process */
-#define BACKEND_TYPE_WALSND		0x0004	/* walsender process */
-#define BACKEND_TYPE_BGWORKER	0x0008	/* bgworker process */
-#define BACKEND_TYPE_ALL		0x000F	/* OR of all the above */
+typedef struct
+{
+	uint32		mask;
+} BackendTypeMask;
+
+StaticAssertDecl(BACKEND_NUM_TYPES < 32, "too many backend types for uint32");
+
+static const BackendTypeMask BTYPE_MASK_ALL = {(1 << BACKEND_NUM_TYPES) - 1};
+#if 0							/* unused */
+static const BackendTypeMask BTYPE_MASK_NONE = {0};
+#endif
+
+static inline BackendTypeMask
+btmask(BackendType t)
+{
+	BackendTypeMask mask = {.mask = 1 << t};
+
+	return mask;
+}
+
+#if 0							/* unused */
+static inline BackendTypeMask
+btmask_add(BackendTypeMask mask, BackendType t)
+{
+	mask.mask |= 1 << t;
+	return mask;
+}
+#endif
+
+static inline BackendTypeMask
+btmask_del(BackendTypeMask mask, BackendType t)
+{
+	mask.mask &= ~(1 << t);
+	return mask;
+}
+
+static inline BackendTypeMask
+btmask_all_except(BackendType t)
+{
+	BackendTypeMask mask = BTYPE_MASK_ALL;
+
+	mask = btmask_del(mask, t);
+	return mask;
+}
+
+static inline BackendTypeMask
+btmask_all_except2(BackendType t1, BackendType t2)
+{
+	BackendTypeMask mask = BTYPE_MASK_ALL;
+
+	mask = btmask_del(mask, t1);
+	mask = btmask_del(mask, t2);
+	return mask;
+}
+
+static inline bool
+btmask_contains(BackendTypeMask mask, BackendType t)
+{
+	return (mask.mask & (1 << t)) != 0;
+}
 
 /*
  * List of active backends (or child processes anyway; we don't actually
@@ -148,7 +204,7 @@
  * As shown in the above set of backend types, this list includes not only
  * "normal" client sessions, but also autovacuum workers, walsenders, and
  * background workers.  (Note that at the time of launch, walsenders are
- * labeled BACKEND_TYPE_NORMAL; we relabel them to BACKEND_TYPE_WALSND
+ * labeled B_BACKEND; we relabel them to B_WAL_SENDER
  * upon noticing they've changed their PMChildFlags entry.  Hence that check
  * must be done before any operation that needs to distinguish walsenders
  * from normal backends.)
@@ -157,7 +213,7 @@
  * the purpose of sending a friendly rejection message to a would-be client.
  * We must track them because they are attached to shared memory, but we know
  * they will never become live backends.  dead_end children are not assigned a
- * PMChildSlot.  dead_end children have bkend_type NORMAL.
+ * PMChildSlot.  dead_end children have bkend_type B_DEAD_END_BACKEND.
  *
  * "Special" children such as the startup, bgwriter, autovacuum launcher, and
  * slot sync worker tasks are not in this list.  They are tracked via StartupPID
@@ -169,8 +225,7 @@ typedef struct bkend
 {
 	pid_t		pid;			/* process id of backend */
 	int			child_slot;		/* PMChildSlot for this backend, if any */
-	int			bkend_type;		/* child process flavor, see above */
-	bool		dead_end;		/* is it going to send an error and quit? */
+	BackendType bkend_type;		/* child process flavor, see above */
 	RegisteredBgWorker *rw;		/* bgworker info, if this is a bgworker */
 	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
 	dlist_node	elem;			/* list link in BackendList */
@@ -407,15 +462,12 @@ static void ExitPostmaster(int status) pg_attribute_noreturn();
 static int	ServerLoop(void);
 static int	BackendStartup(ClientSocket *client_sock);
 static void report_fork_failure_to_client(ClientSocket *client_sock, int errnum);
-static CAC_state canAcceptConnections(int backend_type);
+static CAC_state canAcceptConnections(BackendType backend_type);
 static void signal_child(pid_t pid, int signal);
 static void sigquit_child(pid_t pid);
-static bool SignalSomeChildren(int signal, int target);
+static bool SignalChildren(int signal, BackendTypeMask targetMask);
 static void TerminateChildren(int signal);
-
-#define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
-
-static int	CountChildren(int target);
+static int	CountChildren(BackendTypeMask targetMask);
 static Backend *assign_backendlist_entry(void);
 static void LaunchMissingBackgroundProcesses(void);
 static void maybe_start_bgworkers(void);
@@ -1754,12 +1806,12 @@ ServerLoop(void)
 
 /*
  * canAcceptConnections --- check to see if database state allows connections
- * of the specified type.  backend_type can be BACKEND_TYPE_NORMAL,
- * BACKEND_TYPE_AUTOVAC, or BACKEND_TYPE_BGWORKER.  (Note that we don't yet
- * know whether a NORMAL connection might turn into a walsender.)
+ * of the specified type.  backend_type can be B_BACKEND, B_AUTOVAC_WORKER, or
+ * B_BG_WORKER.  (Note that we don't yet know whether a normal B_BACKEND
+ * connection might turn into a walsender.)
  */
 static CAC_state
-canAcceptConnections(int backend_type)
+canAcceptConnections(BackendType backend_type)
 {
 	CAC_state	result = CAC_OK;
 
@@ -1770,7 +1822,7 @@ canAcceptConnections(int backend_type)
 	 * bgworker_should_start_now() decided whether the DB state allows them.
 	 */
 	if (pmState != PM_RUN && pmState != PM_HOT_STANDBY &&
-		backend_type != BACKEND_TYPE_BGWORKER)
+		backend_type != B_BG_WORKER)
 	{
 		if (Shutdown > NoShutdown)
 			return CAC_SHUTDOWN;	/* shutdown is pending */
@@ -1787,7 +1839,7 @@ canAcceptConnections(int backend_type)
 	 * "Smart shutdown" restrictions are applied only to normal connections,
 	 * not to autovac workers or bgworkers.
 	 */
-	if (!connsAllowed && backend_type == BACKEND_TYPE_NORMAL)
+	if (!connsAllowed && backend_type == B_BACKEND)
 		return CAC_SHUTDOWN;	/* shutdown is pending */
 
 	/*
@@ -1802,7 +1854,7 @@ canAcceptConnections(int backend_type)
 	 * The limit here must match the sizes of the per-child-process arrays;
 	 * see comments for MaxLivePostmasterChildren().
 	 */
-	if (CountChildren(BACKEND_TYPE_ALL) >= MaxLivePostmasterChildren())
+	if (CountChildren(btmask_all_except(B_DEAD_END_BACKEND)) >= MaxLivePostmasterChildren())
 		result = CAC_TOOMANY;
 
 	return result;
@@ -1971,7 +2023,7 @@ process_pm_reload_request(void)
 		ereport(LOG,
 				(errmsg("received SIGHUP, reloading configuration files")));
 		ProcessConfigFile(PGC_SIGHUP);
-		SignalChildren(SIGHUP);
+		SignalChildren(SIGHUP, btmask_all_except(B_DEAD_END_BACKEND));
 		if (StartupPID != 0)
 			signal_child(StartupPID, SIGHUP);
 		if (BgWriterPID != 0)
@@ -2389,7 +2441,7 @@ process_pm_child_exit(void)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGUSR2, btmask(B_WAL_SENDER));
 
 				pmState = PM_SHUTDOWN_2;
 			}
@@ -2555,23 +2607,19 @@ CleanupBackend(Backend *bp,
 			   int exitstatus)	/* child's exit status. */
 {
 	char		namebuf[MAXPGPATH];
-	char	   *procname;
+	const char *procname;
 	bool		crashed = false;
 	bool		logged = false;
 
-	/* Construct a process name for log message */
-	if (bp->dead_end)
-	{
-		procname = _("dead end backend");
-	}
-	else if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
+	/* Construct a process name for the log message */
+	if (bp->bkend_type == B_BG_WORKER)
 	{
 		snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""),
 				 bp->rw->rw_worker.bgw_type);
 		procname = namebuf;
 	}
 	else
-		procname = _("server process");
+		procname = _(GetBackendTypeDesc(bp->bkend_type));
 
 	/*
 	 * If a backend dies in an ugly way then we must signal all other backends
@@ -2603,7 +2651,7 @@ CleanupBackend(Backend *bp,
 	 * If the process attached to shared memory, check that it detached
 	 * cleanly.
 	 */
-	if (!bp->dead_end)
+	if (bp->bkend_type != B_DEAD_END_BACKEND)
 	{
 		if (!ReleasePostmasterChildSlot(bp->child_slot))
 		{
@@ -2636,7 +2684,7 @@ CleanupBackend(Backend *bp,
 	 * If it was a background worker, also update its RegisteredBgWorker
 	 * entry.
 	 */
-	if (bp->bkend_type == BACKEND_TYPE_BGWORKER)
+	if (bp->bkend_type == B_BG_WORKER)
 	{
 		RegisteredBgWorker *rw = bp->rw;
 
@@ -2864,7 +2912,7 @@ PostmasterStateMachine(void)
 			 * This state ends when we have no normal client backends running.
 			 * Then we're ready to stop other children.
 			 */
-			if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
+			if (CountChildren(btmask(B_BACKEND)) == 0)
 				pmState = PM_STOP_BACKENDS;
 		}
 	}
@@ -2883,9 +2931,8 @@ PostmasterStateMachine(void)
 		 */
 		ForgetUnstartedBackgroundWorkers();
 
-		/* Signal all backend children except walsenders */
-		SignalSomeChildren(SIGTERM,
-						   BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
+		/* Signal all backend children except walsenders and dead-end backends */
+		SignalChildren(SIGTERM, btmask_all_except2(B_WAL_SENDER, B_DEAD_END_BACKEND));
 		/* and the autovac launcher too */
 		if (AutoVacPID != 0)
 			signal_child(AutoVacPID, SIGTERM);
@@ -2927,7 +2974,7 @@ PostmasterStateMachine(void)
 		 * here. Walsenders and archiver are also disregarded, they will be
 		 * terminated later after writing the checkpoint record.
 		 */
-		if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 &&
+		if (CountChildren(btmask_all_except2(B_WAL_SENDER, B_DEAD_END_BACKEND)) == 0 &&
 			StartupPID == 0 &&
 			WalReceiverPID == 0 &&
 			WalSummarizerPID == 0 &&
@@ -2985,7 +3032,7 @@ PostmasterStateMachine(void)
 					pmState = PM_WAIT_DEAD_END;
 
 					/* Kill the walsenders and archiver too */
-					SignalChildren(SIGQUIT);
+					SignalChildren(SIGQUIT, btmask_all_except(B_DEAD_END_BACKEND));
 					if (PgArchPID != 0)
 						signal_child(PgArchPID, SIGQUIT);
 				}
@@ -3001,7 +3048,7 @@ PostmasterStateMachine(void)
 		 * left by now anyway; what we're really waiting for is walsenders and
 		 * archiver.
 		 */
-		if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0)
+		if (PgArchPID == 0 && CountChildren(btmask_all_except(B_DEAD_END_BACKEND)) == 0)
 		{
 			pmState = PM_WAIT_DEAD_END;
 		}
@@ -3299,11 +3346,10 @@ sigquit_child(pid_t pid)
 }
 
 /*
- * Send a signal to the targeted children (but NOT special children;
- * dead_end children are never signaled, either).
+ * Send a signal to the targeted children (but NOT special children).
  */
 static bool
-SignalSomeChildren(int signal, int target)
+SignalChildren(int signal, BackendTypeMask targetMask)
 {
 	dlist_iter	iter;
 	bool		signaled = false;
@@ -3312,30 +3358,24 @@ SignalSomeChildren(int signal, int target)
 	{
 		Backend    *bp = dlist_container(Backend, elem, iter.cur);
 
-		if (bp->dead_end)
-			continue;
-
 		/*
-		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
-		 * it first and avoid touching shared memory for every child.
+		 * If we need to distinguish between B_BACKEND and B_WAL_SENDER, check
+		 * if any B_BACKEND backends have recently announced that they are
+		 * actually WAL senders.
 		 */
-		if (target != BACKEND_TYPE_ALL)
+		if (btmask_contains(targetMask, B_WAL_SENDER) != btmask_contains(targetMask, B_BACKEND) &&
+			bp->bkend_type == B_BACKEND)
 		{
-			/*
-			 * Assign bkend_type for any recently announced WAL Sender
-			 * processes.
-			 */
-			if (bp->bkend_type == BACKEND_TYPE_NORMAL &&
-				IsPostmasterChildWalSender(bp->child_slot))
-				bp->bkend_type = BACKEND_TYPE_WALSND;
-
-			if (!(target & bp->bkend_type))
-				continue;
+			if (IsPostmasterChildWalSender(bp->child_slot))
+				bp->bkend_type = B_WAL_SENDER;
 		}
 
+		if (!btmask_contains(targetMask, bp->bkend_type))
+			continue;
+
 		ereport(DEBUG4,
-				(errmsg_internal("sending signal %d to process %d",
-								 signal, (int) bp->pid)));
+				(errmsg_internal("sending signal %d to %s process %d",
+								 signal, GetBackendTypeDesc(bp->bkend_type), (int) bp->pid)));
 		signal_child(bp->pid, signal);
 		signaled = true;
 	}
@@ -3349,7 +3389,7 @@ SignalSomeChildren(int signal, int target)
 static void
 TerminateChildren(int signal)
 {
-	SignalChildren(signal);
+	SignalChildren(signal, btmask_all_except(B_DEAD_END_BACKEND));
 	if (StartupPID != 0)
 	{
 		signal_child(StartupPID, signal);
@@ -3402,22 +3442,27 @@ BackendStartup(ClientSocket *client_sock)
 	}
 
 	/* Pass down canAcceptConnections state */
-	startup_data.canAcceptConnections = canAcceptConnections(BACKEND_TYPE_NORMAL);
-	bn->dead_end = (startup_data.canAcceptConnections != CAC_OK);
+	startup_data.canAcceptConnections = canAcceptConnections(B_BACKEND);
 	bn->rw = NULL;
 
 	/*
 	 * Unless it's a dead_end child, assign it a child slot number
 	 */
-	if (!bn->dead_end)
+	if (startup_data.canAcceptConnections == CAC_OK)
+	{
+		bn->bkend_type = B_BACKEND; /* Can change later to B_WAL_SENDER */
 		bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
+	}
 	else
+	{
+		bn->bkend_type = B_DEAD_END_BACKEND;
 		bn->child_slot = 0;
+	}
 
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
-	pid = postmaster_child_launch(B_BACKEND,
+	pid = postmaster_child_launch(bn->bkend_type,
 								  (char *) &startup_data, sizeof(startup_data),
 								  client_sock);
 	if (pid < 0)
@@ -3425,7 +3470,7 @@ BackendStartup(ClientSocket *client_sock)
 		/* in parent, fork failed */
 		int			save_errno = errno;
 
-		if (!bn->dead_end)
+		if (bn->child_slot != 0)
 			(void) ReleasePostmasterChildSlot(bn->child_slot);
 		pfree(bn);
 		errno = save_errno;
@@ -3437,7 +3482,8 @@ BackendStartup(ClientSocket *client_sock)
 
 	/* in parent, successful fork */
 	ereport(DEBUG2,
-			(errmsg_internal("forked new backend, pid=%d socket=%d",
+			(errmsg_internal("forked new %s, pid=%d socket=%d",
+							 GetBackendTypeDesc(bn->bkend_type),
 							 (int) pid, (int) client_sock->sock)));
 
 	/*
@@ -3445,7 +3491,6 @@ BackendStartup(ClientSocket *client_sock)
 	 * of backends.
 	 */
 	bn->pid = pid;
-	bn->bkend_type = BACKEND_TYPE_NORMAL;	/* Can change later to WALSND */
 	dlist_push_head(&BackendList, &bn->elem);
 
 	return STATUS_OK;
@@ -3679,11 +3724,10 @@ dummy_handler(SIGNAL_ARGS)
 }
 
 /*
- * Count up number of child processes of specified types (dead_end children
- * are always excluded).
+ * Count up number of child processes of specified types.
  */
 static int
-CountChildren(int target)
+CountChildren(BackendTypeMask targetMask)
 {
 	dlist_iter	iter;
 	int			cnt = 0;
@@ -3692,27 +3736,21 @@ CountChildren(int target)
 	{
 		Backend    *bp = dlist_container(Backend, elem, iter.cur);
 
-		if (bp->dead_end)
-			continue;
-
 		/*
-		 * Since target == BACKEND_TYPE_ALL is the most common case, we test
-		 * it first and avoid touching shared memory for every child.
+		 * If we need to distinguish between B_BACKEND and B_WAL_SENDER, check
+		 * if any B_BACKEND backends have recently announced that they are
+		 * actually WAL senders.
 		 */
-		if (target != BACKEND_TYPE_ALL)
+		if (btmask_contains(targetMask, B_WAL_SENDER) != btmask_contains(targetMask, B_BACKEND) &&
+			bp->bkend_type == B_BACKEND)
 		{
-			/*
-			 * Assign bkend_type for any recently announced WAL Sender
-			 * processes.
-			 */
-			if (bp->bkend_type == BACKEND_TYPE_NORMAL &&
-				IsPostmasterChildWalSender(bp->child_slot))
-				bp->bkend_type = BACKEND_TYPE_WALSND;
-
-			if (!(target & bp->bkend_type))
-				continue;
+			if (IsPostmasterChildWalSender(bp->child_slot))
+				bp->bkend_type = B_WAL_SENDER;
 		}
 
+		if (!btmask_contains(targetMask, bp->bkend_type))
+			continue;
+
 		cnt++;
 	}
 	return cnt;
@@ -3776,13 +3814,13 @@ StartAutovacuumWorker(void)
 	 * we have to check to avoid race-condition problems during DB state
 	 * changes.
 	 */
-	if (canAcceptConnections(BACKEND_TYPE_AUTOVAC) == CAC_OK)
+	if (canAcceptConnections(B_AUTOVAC_WORKER) == CAC_OK)
 	{
 		bn = (Backend *) palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
 		if (bn)
 		{
-			/* Autovac workers are not dead_end and need a child slot */
-			bn->dead_end = false;
+			/* Autovac workers need a child slot */
+			bn->bkend_type = B_AUTOVAC_WORKER;
 			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 			bn->bgworker_notify = false;
 			bn->rw = NULL;
@@ -3790,7 +3828,6 @@ StartAutovacuumWorker(void)
 			bn->pid = StartChildProcess(B_AUTOVAC_WORKER);
 			if (bn->pid > 0)
 			{
-				bn->bkend_type = BACKEND_TYPE_AUTOVAC;
 				dlist_push_head(&BackendList, &bn->elem);
 				/* all OK */
 				return;
@@ -3996,7 +4033,7 @@ assign_backendlist_entry(void)
 	 * only possible failure is CAC_TOOMANY, so we just log an error message
 	 * based on that rather than checking the error code precisely.
 	 */
-	if (canAcceptConnections(BACKEND_TYPE_BGWORKER) != CAC_OK)
+	if (canAcceptConnections(B_BG_WORKER) != CAC_OK)
 	{
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
@@ -4014,8 +4051,7 @@ assign_backendlist_entry(void)
 	}
 
 	bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
-	bn->bkend_type = BACKEND_TYPE_BGWORKER;
-	bn->dead_end = false;
+	bn->bkend_type = B_BG_WORKER;
 	bn->bgworker_notify = false;
 
 	return bn;
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index cc2ffc78aa..f9883af2b3 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -330,6 +330,8 @@ pgstat_io_snapshot_cb(void)
 *
 * The following BackendTypes do not participate in the cumulative stats
 * subsystem or do not perform IO on which we currently track:
+* - Dead-end backend because it is not connected to shared memory and
+*   doesn't do any IO
 * - Syslogger because it is not connected to shared memory
 * - Archiver because most relevant archiving IO is delegated to a
 *   specialized command or module
@@ -352,6 +354,7 @@ pgstat_tracks_io_bktype(BackendType bktype)
 	switch (bktype)
 	{
 		case B_INVALID:
+		case B_DEAD_END_BACKEND:
 		case B_ARCHIVER:
 		case B_LOGGER:
 		case B_WAL_RECEIVER:
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index ef60f41b8c..920cf46d66 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -259,60 +259,69 @@ SwitchBackToLocalLatch(void)
 	SetLatch(MyLatch);
 }
 
+/*
+ * Return a human-readable string representation of a BackendType.
+ *
+ * The string is not localized here, but we mark the strings for translation
+ * so that callers can invoke _() on the result.
+ */
 const char *
 GetBackendTypeDesc(BackendType backendType)
 {
-	const char *backendDesc = "unknown process type";
+	const char *backendDesc = gettext_noop("unknown process type");
 
 	switch (backendType)
 	{
 		case B_INVALID:
-			backendDesc = "not initialized";
+			backendDesc = gettext_noop("not initialized");
 			break;
 		case B_ARCHIVER:
-			backendDesc = "archiver";
+			backendDesc = gettext_noop("archiver");
 			break;
 		case B_AUTOVAC_LAUNCHER:
-			backendDesc = "autovacuum launcher";
+			backendDesc = gettext_noop("autovacuum launcher");
 			break;
 		case B_AUTOVAC_WORKER:
-			backendDesc = "autovacuum worker";
+			backendDesc = gettext_noop("autovacuum worker");
 			break;
 		case B_BACKEND:
-			backendDesc = "client backend";
+			backendDesc = gettext_noop("client backend");
+			break;
+		case B_DEAD_END_BACKEND:
+			backendDesc = gettext_noop("dead-end client backend");
 			break;
 		case B_BG_WORKER:
-			backendDesc = "background worker";
+			backendDesc = gettext_noop("background worker");
 			break;
 		case B_BG_WRITER:
-			backendDesc = "background writer";
+			backendDesc = gettext_noop("background writer");
 			break;
 		case B_CHECKPOINTER:
-			backendDesc = "checkpointer";
+			backendDesc = gettext_noop("checkpointer");
 			break;
 		case B_LOGGER:
-			backendDesc = "logger";
+			backendDesc = gettext_noop("logger");
 			break;
 		case B_SLOTSYNC_WORKER:
-			backendDesc = "slotsync worker";
+			backendDesc = gettext_noop("slotsync worker");
 			break;
 		case B_STANDALONE_BACKEND:
-			backendDesc = "standalone backend";
+			backendDesc = gettext_noop("standalone backend");
 			break;
 		case B_STARTUP:
-			backendDesc = "startup";
+			backendDesc = gettext_noop("startup");
 			break;
 		case B_WAL_RECEIVER:
-			backendDesc = "walreceiver";
+			backendDesc = gettext_noop("walreceiver");
 			break;
 		case B_WAL_SENDER:
-			backendDesc = "walsender";
+			backendDesc = gettext_noop("walsender");
 			break;
 		case B_WAL_SUMMARIZER:
-			backendDesc = "walsummarizer";
+			backendDesc = gettext_noop("walsummarizer");
 			break;
 		case B_WAL_WRITER:
-			backendDesc = "walwriter";
+			backendDesc = gettext_noop("walwriter");
 			break;
 	}
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index e26d108a47..67e7717d98 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -333,6 +333,7 @@ typedef enum BackendType
 
 	/* Backends and other backend-like processes */
 	B_BACKEND,
+	B_DEAD_END_BACKEND,
 	B_AUTOVAC_LAUNCHER,
 	B_AUTOVAC_WORKER,
 	B_BG_WORKER,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a65e1c07c5..6f98fbc6f5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -235,6 +235,7 @@ BackendParameters
 BackendStartupData
 BackendState
 BackendType
+BackendTypeMask
 BackgroundWorker
 BackgroundWorkerArray
 BackgroundWorkerHandle
-- 
2.39.5

0002-Kill-dead-end-children-when-there-s-nothing-else-lef.patchtext/x-patch; charset=UTF-8; name=0002-Kill-dead-end-children-when-there-s-nothing-else-lef.patchDownload

From 0e009083262ffca7ad1a6a99dfc03fd3988f76ef Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 9 Oct 2024 21:28:08 +0300
Subject: [PATCH 2/4] Kill dead-end children when there's nothing else left

Previously, the postmaster would never try to kill dead-end child
processes, even if there were no other processes left. A dead-end
backend will eventually exit, when authentication_timeout expires, but
if a dead-end backend is the only thing that's preventing the server
from shutting down, it seems better to kill it immediately. It's
particularly important, if there was a bug in the early startup code
that prevented a dead-end child from timing out and exiting normally.

Includes a test for that case where a dead-end backend previously
prevented the server from shutting down.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
---
 src/backend/postmaster/postmaster.c      | 17 ++--
 src/test/perl/PostgreSQL/Test/Cluster.pm | 10 ++-
 src/test/postmaster/meson.build          |  1 +
 src/test/postmaster/t/002_start_stop.pl  | 98 ++++++++++++++++++++++++
 4 files changed, 116 insertions(+), 10 deletions(-)
 create mode 100644 src/test/postmaster/t/002_start_stop.pl

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 14a0ce91b2..7d3074a2a8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2988,10 +2988,11 @@ PostmasterStateMachine(void)
 			if (Shutdown >= ImmediateShutdown || FatalError)
 			{
 				/*
-				 * Start waiting for dead_end children to die.  This state
-				 * change causes ServerLoop to stop creating new ones.
+				 * Stop any dead_end children and stop creating new ones.
 				 */
 				pmState = PM_WAIT_DEAD_END;
+				ConfigurePostmasterWaitSet(false);
+				SignalChildren(SIGQUIT, btmask(B_DEAD_END_BACKEND));
 
 				/*
 				 * We already SIGQUIT'd the archiver and stats processes, if
@@ -3030,9 +3031,10 @@ PostmasterStateMachine(void)
 					 */
 					FatalError = true;
 					pmState = PM_WAIT_DEAD_END;
+					ConfigurePostmasterWaitSet(false);
 
 					/* Kill the walsenders and archiver too */
-					SignalChildren(SIGQUIT, btmask_all_except(B_DEAD_END_BACKEND));
+					SignalChildren(SIGQUIT, BTYPE_MASK_ALL);
 					if (PgArchPID != 0)
 						signal_child(PgArchPID, SIGQUIT);
 				}
@@ -3051,14 +3053,13 @@ PostmasterStateMachine(void)
 		if (PgArchPID == 0 && CountChildren(btmask_all_except(B_DEAD_END_BACKEND)) == 0)
 		{
 			pmState = PM_WAIT_DEAD_END;
+			ConfigurePostmasterWaitSet(false);
+			SignalChildren(SIGTERM, BTYPE_MASK_ALL);
 		}
 	}
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
-		/* Don't allow any new socket connection events. */
-		ConfigurePostmasterWaitSet(false);
-
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3384,12 +3385,12 @@ SignalChildren(int signal, BackendTypeMask targetMask)
 
 /*
  * Send a termination signal to children.  This considers all of our children
- * processes, except syslogger and dead_end backends.
+ * processes, except syslogger.
  */
 static void
 TerminateChildren(int signal)
 {
-	SignalChildren(signal, btmask_all_except(B_DEAD_END_BACKEND));
+	SignalChildren(signal, BTYPE_MASK_ALL);
 	if (StartupPID != 0)
 	{
 		signal_child(StartupPID, signal);
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index c793f2135d..6b77128db0 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1186,6 +1186,9 @@ this to fail.  Otherwise, tests might fail to detect server crashes.
 With optional extra param fail_ok => 1, returns 0 for failure
 instead of bailing out.
 
+The optional extra param timeout can be used to pass the pg_ctl
+--timeout option.
+
 =cut
 
 sub stop
@@ -1201,8 +1204,11 @@ sub stop
 	return 1 unless defined $self->{_pid};
 
 	print "### Stopping node \"$name\" using mode $mode\n";
-	$ret = PostgreSQL::Test::Utils::system_log('pg_ctl', '-D', $pgdata,
-		'-m', $mode, 'stop');
+	my @cmd = ('pg_ctl', '-D', $pgdata, '-m', $mode, 'stop');
+	if ($params{timeout}) {
+		push(@cmd, ('--timeout', $params{timeout}));
+	}
+	$ret = PostgreSQL::Test::Utils::system_log(@cmd);
 
 	if ($ret != 0)
 	{
diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
index c2de2e0eb5..2d89adf520 100644
--- a/src/test/postmaster/meson.build
+++ b/src/test/postmaster/meson.build
@@ -7,6 +7,7 @@ tests += {
   'tap': {
     'tests': [
       't/001_connection_limits.pl',
+      't/002_start_stop.pl',
     ],
   },
 }
diff --git a/src/test/postmaster/t/002_start_stop.pl b/src/test/postmaster/t/002_start_stop.pl
new file mode 100644
index 0000000000..0b956d1184
--- /dev/null
+++ b/src/test/postmaster/t/002_start_stop.pl
@@ -0,0 +1,98 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# Test postmaster start and stop state machine.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+#
+# Test that dead-end backends don't prevent the server from shutting
+# down.
+#
+# Dead-end backends can linger until they reach
+# 'authentication_timeout'. We use a long authentication_timeout and a
+# much shorter timeout for the "pg_ctl stop" operation, to test that
+# if dead-end backends are not killed at fast shut down, "pg_ctl stop"
+# will error out before the authentication timeout kicks in and cleans
+# up the dead-end backends.
+my $authentication_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $stop_timeout = $authentication_timeout / 2;
+
+# Initialize the server with low connection limits, to test dead-end backends
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf('postgresql.conf', "max_connections = 5");
+$node->append_conf('postgresql.conf', "max_wal_senders = 0");
+$node->append_conf('postgresql.conf', "autovacuum_max_workers = 1");
+$node->append_conf('postgresql.conf', "max_worker_processes = 1");
+$node->append_conf('postgresql.conf', "log_connections = on");
+$node->append_conf('postgresql.conf', "log_min_messages = debug2");
+$node->append_conf('postgresql.conf',
+	"authentication_timeout = '$authentication_timeout s'");
+$node->append_conf('postgresql.conf', 'trace_connection_negotiation=on');
+$node->start;
+
+if (!$node->raw_connect_works())
+{
+	plan skip_all => "this test requires working raw_connect()";
+}
+
+my @raw_connections = ();
+
+# Open a lot of TCP (or Unix domain socket) connections to use up all
+# the connection slots. Beyond a certain number (roughly 2x
+# max_connections), they will be "dead-end backends".
+for (my $i = 0; $i <= 20; $i++)
+{
+	my $sock = $node->raw_connect();
+
+	# On a busy system, the server might reject connections if
+	# postmaster cannot accept() them fast enough. The exact limit
+	# and behavior depends on the platform. To make this reliable,
+	# we attempt SSL negotiation on each connection before opening
+	# next one. The server will reject the SSL negotations, but
+	# when it does so, we know that the backend has been launched
+	# and we should be able to open another connection.
+
+	# SSLRequest packet consists of packet length followed by
+	# NEGOTIATE_SSL_CODE.
+	my $negotiate_ssl_code = pack("Nnn", 8, 1234, 5679);
+	my $sent = $sock->send($negotiate_ssl_code);
+
+	# Read reply. We expect the server to reject it with 'N'
+	my $reply = "";
+	$sock->recv($reply, 1);
+	is($reply, "N", "dead-end connection $i");
+
+	push(@raw_connections, $sock);
+}
+
+# When all the connection slots are in use, new connections will fail
+# before even looking up the user. Hence you now get "sorry, too many
+# clients already" instead of "role does not exist" error. Test that
+# to ensure that we have used up all the slots.
+$node->connect_fails("dbname=postgres user=invalid_user",
+	"connect ",
+	expected_stderr => qr/FATAL:  sorry, too many clients already/);
+
+# Open one more connection, to really ensure that we have at least one
+# dead-end backend.
+my $sock = $node->raw_connect();
+
+# Test that the dead-end backends don't prevent the server from stopping.
+$node->stop('fast', timeout => $stop_timeout);
+
+$node->start();
+$node->connect_ok("dbname=postgres", "works after restart");
+
+# Clean up
+foreach my $socket (@raw_connections)
+{
+	$socket->close();
+}
+
+done_testing();
-- 
2.39.5

0003-Assign-a-child-slot-to-every-postmaster-child-proces.patchtext/x-patch; charset=UTF-8; name=0003-Assign-a-child-slot-to-every-postmaster-child-proces.patchDownload

From 134128baabf6d99be888acf40ee15770b3252505 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 9 Oct 2024 22:14:13 +0300
Subject: [PATCH 3/4] Assign a child slot to every postmaster child process

Previously, only backends, autovacuum workers, and background workers
had an entry in the PMChildFlags array. With this commit, all
postmaster child processes, including all the aux processes, have an
entry. Dead-end backends still don't get an entry, though, and other
processes that don't touch shared memory will never mark their
PMChildFlags entry as active.

We now maintain separate free-lists for different kinds of child
processes. That ensures that there are always slots available for
autovacuum and background workers. Previously, pre-authorization
backends could prevent autovacuum or background workers from starting
up, by using up all the slots.

The code to manage the slots in the postmaster process is in a new
pmchild.c source file. Because postmaster.c is just so large.
Assigning pmsignal slot numbers is now pmchild.c's responsibility.
This replaces the PMChildInUse array in pmsignal.c.

Some of the comments in postmaster.c still talked about the "stats
process", but that was removed in commit 5891c7a8ed. Fix those while
we're at it.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
---
 src/backend/postmaster/Makefile         |   1 +
 src/backend/postmaster/launch_backend.c |   3 +
 src/backend/postmaster/meson.build      |   1 +
 src/backend/postmaster/pmchild.c        | 282 ++++++++
 src/backend/postmaster/postmaster.c     | 815 +++++++++++-------------
 src/backend/postmaster/syslogger.c      |   6 +-
 src/backend/storage/ipc/pmsignal.c      |  83 +--
 src/backend/storage/lmgr/proc.c         |  12 +-
 src/include/postmaster/postmaster.h     |  45 ++
 src/include/postmaster/syslogger.h      |   2 +-
 src/include/storage/pmsignal.h          |   2 +-
 src/tools/pgindent/typedefs.list        |   3 +-
 12 files changed, 725 insertions(+), 530 deletions(-)
 create mode 100644 src/backend/postmaster/pmchild.c

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index db08543d19..0f4435d2d9 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	interrupt.o \
 	launch_backend.o \
 	pgarch.o \
+	pmchild.o \
 	postmaster.o \
 	startup.o \
 	syslogger.o \
diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index b0b91dc97f..02755b6448 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -126,6 +126,7 @@ typedef struct
 	bool		query_id_enabled;
 	int			max_safe_fds;
 	int			MaxBackends;
+	int			num_pmchild_slots;
 #ifdef WIN32
 	HANDLE		PostmasterHandle;
 	HANDLE		initial_signal_pipe;
@@ -743,6 +744,7 @@ save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
 	param->max_safe_fds = max_safe_fds;
 
 	param->MaxBackends = MaxBackends;
+	param->num_pmchild_slots = num_pmchild_slots;
 
 #ifdef WIN32
 	param->PostmasterHandle = PostmasterHandle;
@@ -1002,6 +1004,7 @@ restore_backend_variables(BackendParameters *param)
 	max_safe_fds = param->max_safe_fds;
 
 	MaxBackends = param->MaxBackends;
+	num_pmchild_slots = param->num_pmchild_slots;
 
 #ifdef WIN32
 	PostmasterHandle = param->PostmasterHandle;
diff --git a/src/backend/postmaster/meson.build b/src/backend/postmaster/meson.build
index 0ea4bbe084..0e80f20986 100644
--- a/src/backend/postmaster/meson.build
+++ b/src/backend/postmaster/meson.build
@@ -10,6 +10,7 @@ backend_sources += files(
   'interrupt.c',
   'launch_backend.c',
   'pgarch.c',
+  'pmchild.c',
   'postmaster.c',
   'startup.c',
   'syslogger.c',
diff --git a/src/backend/postmaster/pmchild.c b/src/backend/postmaster/pmchild.c
new file mode 100644
index 0000000000..849abbf092
--- /dev/null
+++ b/src/backend/postmaster/pmchild.c
@@ -0,0 +1,282 @@
+/*-------------------------------------------------------------------------
+ *
+ * pmchild.c
+ *	  Functions for keeping track of postmaster child processes.
+ *
+ * We keep track of all child processes so that when a process exits, we
+ * know what kind of a process it was and can clean up accordingly.  Every
+ * child process is allocated a PMChild struct from a fixed pool of
+ * structs.  The size of the pool is determined by various settings that
+ * configure how many worker processes and backend connections are
+ * allowed, i.e. autovacuum_max_workers, max_worker_processes,
+ * max_wal_senders, and max_connections.
+ *
+ * Dead-end backends are handled slightly differently.  There is no limit
+ * on the number of dead-end backends, and they do not need unique IDs, so
+ * their PMChild structs are allocated dynamically, not from a pool.
+ *
+ * The structures and functions in this file are private to the postmaster
+ * process.  But note that there is an array in shared memory, managed by
+ * pmsignal.c, that mirrors this.
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/pmchild.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "postmaster/autovacuum.h"
+#include "postmaster/postmaster.h"
+#include "replication/walsender.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+
+/*
+ * Freelists for different kinds of child processes.  We maintain separate
+ * pools for each, so that for example launching a lot of regular backends
+ * cannot prevent autovacuum or an aux process from launching.
+ */
+typedef struct PMChildPool
+{
+	int			size;			/* number of PMChild slots reserved for this
+								 * kind of processes */
+	int			first_slotno;	/* first slot belonging to this pool */
+	dlist_head	freelist;		/* currently unused PMChild entries */
+} PMChildPool;
+
+static PMChildPool pmchildPools[BACKEND_NUM_TYPES];
+NON_EXEC_STATIC int	num_pmchild_slots;
+
+/*
+ * List of active child processes.  This includes dead-end children.
+ */
+dlist_head	ActiveChildList;
+
+/*
+ * MaxLivePostmasterChildren
+ *
+ * This reports the number of postmaster child processes that can be active.
+ * It includes all children except for dead_end children.  This allows the
+ * array in shared memory (PMChildFlags) to have a fixed maximum size.
+ */
+int
+MaxLivePostmasterChildren(void)
+{
+	return num_pmchild_slots;
+}
+
+/*
+ * Initialize at postmaster startup
+ *
+ * Note: This is not called on crash restart.  We rely on PMChild entries to
+ * remain valid through the restart process.  This is important because the
+ * syslogger survives through the crash restart process, so we must not
+ * invalidate its PMChild slot.
+ */
+void
+InitPostmasterChildSlots(void)
+{
+	int			slotno;
+	PMChild    *slots;
+
+	/*
+	 * We allow more connections here than we can have backends because some
+	 * might still be authenticating; they might fail auth, or some existing
+	 * backend might exit before the auth cycle is completed.  The exact
+	 * MaxConnections limit is enforced when a new backend tries to join the
+	 * PGPROC array.
+	 *
+	 * WAL senders start out as regular backends, so they share the same pool.
+	 */
+	pmchildPools[B_BACKEND].size = 2 * (MaxConnections + max_wal_senders);
+
+	pmchildPools[B_AUTOVAC_WORKER].size = autovacuum_max_workers;
+	pmchildPools[B_BG_WORKER].size = max_worker_processes;
+
+	/*
+	 * There can be only one of each of these running at a time.  They each
+	 * get their own pool of just one entry.
+	 */
+	pmchildPools[B_AUTOVAC_LAUNCHER].size = 1;
+	pmchildPools[B_SLOTSYNC_WORKER].size = 1;
+	pmchildPools[B_ARCHIVER].size = 1;
+	pmchildPools[B_BG_WRITER].size = 1;
+	pmchildPools[B_CHECKPOINTER].size = 1;
+	pmchildPools[B_STARTUP].size = 1;
+	pmchildPools[B_WAL_RECEIVER].size = 1;
+	pmchildPools[B_WAL_SUMMARIZER].size = 1;
+	pmchildPools[B_WAL_WRITER].size = 1;
+	pmchildPools[B_LOGGER].size = 1;
+
+	/* The rest of the pmchildPools are left at zero size */
+
+	/* Count the total number of slots */
+	num_pmchild_slots = 0;
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+		num_pmchild_slots += pmchildPools[i].size;
+
+	/* Initialize them */
+	slots = palloc(num_pmchild_slots * sizeof(PMChild));
+	slotno = 0;
+	for (int btype = 0; btype < BACKEND_NUM_TYPES; btype++)
+	{
+		pmchildPools[btype].first_slotno = slotno + 1;
+		dlist_init(&pmchildPools[btype].freelist);
+
+		for (int j = 0; j < pmchildPools[btype].size; j++)
+		{
+			slots[slotno].pid = 0;
+			slots[slotno].child_slot = slotno + 1;
+			slots[slotno].bkend_type = B_INVALID;
+			slots[slotno].rw = NULL;
+			slots[slotno].bgworker_notify = false;
+			dlist_push_tail(&pmchildPools[btype].freelist, &slots[slotno].elem);
+			slotno++;
+		}
+	}
+	Assert(slotno == num_pmchild_slots);
+
+	/* Initialize other structures */
+	dlist_init(&ActiveChildList);
+}
+
+/*
+ * Allocate a PMChild entry for a postmaster child process of given type.
+ *
+ * The entry is taken from the right pool for the type.
+ *
+ * pmchild->child_slot in the returned struct is unique among all active child
+ * processes.
+ */
+PMChild *
+AssignPostmasterChildSlot(BackendType btype)
+{
+	dlist_head *freelist;
+	PMChild    *pmchild;
+
+	if (pmchildPools[btype].size == 0)
+		elog(ERROR, "cannot allocate a PMChild slot for backend type %d", btype);
+
+	freelist = &pmchildPools[btype].freelist;
+	if (dlist_is_empty(freelist))
+		return NULL;
+
+	pmchild = dlist_container(PMChild, elem, dlist_pop_head_node(freelist));
+	pmchild->pid = 0;
+	pmchild->bkend_type = btype;
+	pmchild->rw = NULL;
+	pmchild->bgworker_notify = true;
+
+	/*
+	 * pmchild->child_slot for each entry was initialized when the array of
+	 * slots was allocated.  Sanity check it.
+	 */
+	if (!(pmchild->child_slot >= pmchildPools[btype].first_slotno &&
+		  pmchild->child_slot < pmchildPools[btype].first_slotno + pmchildPools[btype].size))
+	{
+		elog(ERROR, "pmchild freelist for backend type %d is corrupt",
+			 pmchild->bkend_type);
+	}
+
+	dlist_push_head(&ActiveChildList, &pmchild->elem);
+
+	ReservePostmasterChildSlot(pmchild->child_slot);
+
+	elog(DEBUG2, "assigned pm child slot %d for %s",
+		 pmchild->child_slot, PostmasterChildName(btype));
+
+	return pmchild;
+}
+
+/*
+ * Allocate a PMChild struct for a dead-end backend.  Dead-end children are
+ * not assigned a child_slot number.  The struct is palloc'd; returns NULL if
+ * out of memory.
+ */
+PMChild *
+AllocDeadEndChild(void)
+{
+	PMChild    *pmchild;
+
+	elog(DEBUG2, "allocating dead-end child");
+
+	pmchild = (PMChild *) palloc_extended(sizeof(PMChild), MCXT_ALLOC_NO_OOM);
+	if (pmchild)
+	{
+		pmchild->pid = 0;
+		pmchild->child_slot = 0;
+		pmchild->bkend_type = B_DEAD_END_BACKEND;
+		pmchild->rw = NULL;
+		pmchild->bgworker_notify = false;
+
+		dlist_push_head(&ActiveChildList, &pmchild->elem);
+	}
+
+	return pmchild;
+}
+
+/*
+ * Release a PMChild slot, after the child process has exited.
+ *
+ * Returns true if the child detached cleanly from shared memory, false
+ * otherwise (see ReleasePostmasterChildSlot).
+ */
+bool
+FreePostmasterChildSlot(PMChild *pmchild)
+{
+	dlist_delete(&pmchild->elem);
+	if (pmchild->bkend_type == B_DEAD_END_BACKEND)
+	{
+		elog(DEBUG2, "releasing dead-end backend");
+		pfree(pmchild);
+		return true;
+	}
+	else
+	{
+		PMChildPool *pool;
+
+		elog(DEBUG2, "releasing pm child slot %d", pmchild->child_slot);
+
+		/* WAL senders start out as regular backends, and share the pool */
+		if (pmchild->bkend_type == B_WAL_SENDER)
+			pool = &pmchildPools[B_BACKEND];
+		else
+			pool = &pmchildPools[pmchild->bkend_type];
+
+		/* sanity check that we return the entry to the right pool */
+		if (!(pmchild->child_slot >= pool->first_slotno &&
+			  pmchild->child_slot < pool->first_slotno + pool->size))
+		{
+			elog(ERROR, "pmchild freelist for backend type %d is corrupt",
+				 pmchild->bkend_type);
+		}
+
+		dlist_push_head(&pool->freelist, &pmchild->elem);
+		return ReleasePostmasterChildSlot(pmchild->child_slot);
+	}
+}
+
+/*
+ * Find the PMChild entry of a running child process by PID.
+ */
+PMChild *
+FindPostmasterChildByPid(int pid)
+{
+	dlist_iter	iter;
+
+	dlist_foreach(iter, &ActiveChildList)
+	{
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+		if (bp->pid == pid)
+			return bp;
+	}
+	return NULL;
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 7d3074a2a8..97a1b7ae1a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -142,9 +142,7 @@ typedef struct
 StaticAssertDecl(BACKEND_NUM_TYPES < 32, "too many backend types for uint32");
 
 static const BackendTypeMask BTYPE_MASK_ALL = {(1 << BACKEND_NUM_TYPES) - 1};
-#if 0							/* unused */
 static const BackendTypeMask BTYPE_MASK_NONE = {0};
-#endif
 
 static inline BackendTypeMask
 btmask(BackendType t)
@@ -154,14 +152,12 @@ btmask(BackendType t)
 	return mask;
 }
 
-#if 0							/* unused */
 static inline BackendTypeMask
 btmask_add(BackendTypeMask mask, BackendType t)
 {
 	mask.mask |= 1 << t;
 	return mask;
 }
-#endif
 
 static inline BackendTypeMask
 btmask_del(BackendTypeMask mask, BackendType t)
@@ -195,48 +191,9 @@ btmask_contains(BackendTypeMask mask, BackendType t)
 	return (mask.mask & (1 << t)) != 0;
 }
 
-/*
- * List of active backends (or child processes anyway; we don't actually
- * know whether a given child has become a backend or is still in the
- * authorization phase).  This is used mainly to keep track of how many
- * children we have and send them appropriate signals when necessary.
- *
- * As shown in the above set of backend types, this list includes not only
- * "normal" client sessions, but also autovacuum workers, walsenders, and
- * background workers.  (Note that at the time of launch, walsenders are
- * labeled B_BACKEND; we relabel them to B_WAL_SENDER
- * upon noticing they've changed their PMChildFlags entry.  Hence that check
- * must be done before any operation that needs to distinguish walsenders
- * from normal backends.)
- *
- * Also, "dead_end" children are in it: these are children launched just for
- * the purpose of sending a friendly rejection message to a would-be client.
- * We must track them because they are attached to shared memory, but we know
- * they will never become live backends.  dead_end children are not assigned a
- * PMChildSlot.  dead_end children have bkend_type B_DEAD_END_BACKEND.
- *
- * "Special" children such as the startup, bgwriter, autovacuum launcher, and
- * slot sync worker tasks are not in this list.  They are tracked via StartupPID
- * and other pid_t variables below.  (Thus, there can't be more than one of any
- * given "special" child process type.  We use BackendList entries for any
- * child process there can be more than one of.)
- */
-typedef struct bkend
-{
-	pid_t		pid;			/* process id of backend */
-	int			child_slot;		/* PMChildSlot for this backend, if any */
-	BackendType bkend_type;		/* child process flavor, see above */
-	RegisteredBgWorker *rw;		/* bgworker info, if this is a bgworker */
-	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
-	dlist_node	elem;			/* list link in BackendList */
-} Backend;
-
-static dlist_head BackendList = DLIST_STATIC_INIT(BackendList);
 
 BackgroundWorker *MyBgworkerEntry = NULL;
 
-
-
 /* The socket number we are listening for connections on */
 int			PostPortNumber = DEF_PGPORT;
 
@@ -288,17 +245,17 @@ bool		remove_temp_files_after_crash = true;
 bool		send_abort_for_crash = false;
 bool		send_abort_for_kill = false;
 
-/* PIDs of special child processes; 0 when not running */
-static pid_t StartupPID = 0,
-			BgWriterPID = 0,
-			CheckpointerPID = 0,
-			WalWriterPID = 0,
-			WalReceiverPID = 0,
-			WalSummarizerPID = 0,
-			AutoVacPID = 0,
-			PgArchPID = 0,
-			SysLoggerPID = 0,
-			SlotSyncWorkerPID = 0;
+/* special child processes; NULL when not running */
+static PMChild *StartupPMChild = NULL,
+		   *BgWriterPMChild = NULL,
+		   *CheckpointerPMChild = NULL,
+		   *WalWriterPMChild = NULL,
+		   *WalReceiverPMChild = NULL,
+		   *WalSummarizerPMChild = NULL,
+		   *AutoVacLauncherPMChild = NULL,
+		   *PgArchPMChild = NULL,
+		   *SysLoggerPMChild = NULL,
+		   *SlotSyncWorkerPMChild = NULL;
 
 /* Startup process's status */
 typedef enum
@@ -346,7 +303,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
  * PM_HOT_STANDBY state.  (connsAllowed can also restrict launching.)
  * In other states we handle connection requests by launching "dead_end"
  * child processes, which will simply send the client an error message and
- * quit.  (We track these in the BackendList so that we can know when they
+ * quit.  (We track these in the ActiveChildList so that we can know when they
  * are all gone; this is important because they're still connected to shared
  * memory, and would interfere with an attempt to destroy the shmem segment,
  * possibly leading to SHMALL failure when we try to make a new one.)
@@ -452,7 +409,7 @@ static void process_pm_child_exit(void);
 static void process_pm_reload_request(void);
 static void process_pm_shutdown_request(void);
 static void dummy_handler(SIGNAL_ARGS);
-static void CleanupBackend(Backend *bp, int exitstatus);
+static void CleanupBackend(PMChild *bp, int exitstatus);
 static void HandleChildCrash(int pid, int exitstatus, const char *procname);
 static void LogChildExit(int lev, const char *procname,
 						 int pid, int exitstatus);
@@ -463,16 +420,17 @@ static int	ServerLoop(void);
 static int	BackendStartup(ClientSocket *client_sock);
 static void report_fork_failure_to_client(ClientSocket *client_sock, int errnum);
 static CAC_state canAcceptConnections(BackendType backend_type);
-static void signal_child(pid_t pid, int signal);
-static void sigquit_child(pid_t pid);
+static void signal_child(PMChild *pmchild, int signal);
+static void sigquit_child(PMChild *pmchild);
 static bool SignalChildren(int signal, BackendTypeMask targetMask);
 static void TerminateChildren(int signal);
 static int	CountChildren(BackendTypeMask targetMask);
-static Backend *assign_backendlist_entry(void);
+static PMChild *assign_backendlist_entry(void);
 static void LaunchMissingBackgroundProcesses(void);
 static void maybe_start_bgworkers(void);
 static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
-static pid_t StartChildProcess(BackendType type);
+static PMChild *StartChildProcess(BackendType type);
+static void StartSysLogger(void);
 static void StartAutovacuumWorker(void);
 static void InitPostmasterDeathWatchHandle(void);
 
@@ -951,9 +909,11 @@ PostmasterMain(int argc, char *argv[])
 
 	/*
 	 * Now that loadable modules have had their chance to alter any GUCs,
-	 * calculate MaxBackends.
+	 * calculate MaxBackends and initialize the machinery to track child
+	 * processes.
 	 */
 	InitializeMaxBackends();
+	InitPostmasterChildSlots();
 
 	/*
 	 * Calculate the size of the PGPROC fast-path lock arrays.
@@ -1082,7 +1042,8 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * If enabled, start up syslogger collection subprocess
 	 */
-	SysLoggerPID = SysLogger_Start();
+	if (Logging_collector)
+		StartSysLogger();
 
 	/*
 	 * Reset whereToSendOutput from DestDebug (its starting state) to
@@ -1384,16 +1345,16 @@ PostmasterMain(int argc, char *argv[])
 	AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
 
 	/* Start bgwriter and checkpointer so they can help with recovery */
-	if (CheckpointerPID == 0)
-		CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
-	if (BgWriterPID == 0)
-		BgWriterPID = StartChildProcess(B_BG_WRITER);
+	if (CheckpointerPMChild == NULL)
+		CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
+	if (BgWriterPMChild == NULL)
+		BgWriterPMChild = StartChildProcess(B_BG_WRITER);
 
 	/*
 	 * We're ready to rock and roll...
 	 */
-	StartupPID = StartChildProcess(B_STARTUP);
-	Assert(StartupPID != 0);
+	StartupPMChild = StartChildProcess(B_STARTUP);
+	Assert(StartupPMChild != NULL);
 	StartupStatus = STARTUP_RUNNING;
 	pmState = PM_STARTUP;
 
@@ -1723,8 +1684,8 @@ ServerLoop(void)
 		if (avlauncher_needs_signal)
 		{
 			avlauncher_needs_signal = false;
-			if (AutoVacPID != 0)
-				kill(AutoVacPID, SIGUSR2);
+			if (AutoVacLauncherPMChild != NULL)
+				kill(AutoVacLauncherPMChild->pid, SIGUSR2);
 		}
 
 #ifdef HAVE_PTHREAD_IS_THREADED_NP
@@ -1842,21 +1803,6 @@ canAcceptConnections(BackendType backend_type)
 	if (!connsAllowed && backend_type == B_BACKEND)
 		return CAC_SHUTDOWN;	/* shutdown is pending */
 
-	/*
-	 * Don't start too many children.
-	 *
-	 * We allow more connections here than we can have backends because some
-	 * might still be authenticating; they might fail auth, or some existing
-	 * backend might exit before the auth cycle is completed.  The exact
-	 * MaxBackends limit is enforced when a new backend tries to join the
-	 * shared-inval backend array.
-	 *
-	 * The limit here must match the sizes of the per-child-process arrays;
-	 * see comments for MaxLivePostmasterChildren().
-	 */
-	if (CountChildren(btmask_all_except(B_DEAD_END_BACKEND)) >= MaxLivePostmasterChildren())
-		result = CAC_TOOMANY;
-
 	return result;
 }
 
@@ -2024,26 +1970,6 @@ process_pm_reload_request(void)
 				(errmsg("received SIGHUP, reloading configuration files")));
 		ProcessConfigFile(PGC_SIGHUP);
 		SignalChildren(SIGHUP, btmask_all_except(B_DEAD_END_BACKEND));
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGHUP);
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGHUP);
-		if (CheckpointerPID != 0)
-			signal_child(CheckpointerPID, SIGHUP);
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGHUP);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGHUP);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGHUP);
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGHUP);
-		if (PgArchPID != 0)
-			signal_child(PgArchPID, SIGHUP);
-		if (SysLoggerPID != 0)
-			signal_child(SysLoggerPID, SIGHUP);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGHUP);
 
 		/* Reload authentication config files too */
 		if (!load_hba())
@@ -2281,15 +2207,15 @@ process_pm_child_exit(void)
 
 	while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0)
 	{
-		bool		found;
-		dlist_mutable_iter iter;
+		PMChild    *pmchild;
 
 		/*
 		 * Check if this child was a startup process.
 		 */
-		if (pid == StartupPID)
+		if (StartupPMChild && pid == StartupPMChild->pid)
 		{
-			StartupPID = 0;
+			FreePostmasterChildSlot(StartupPMChild);
+			StartupPMChild = NULL;
 
 			/*
 			 * Startup process exited in response to a shutdown request (or it
@@ -2400,9 +2326,10 @@ process_pm_child_exit(void)
 		 * one at the next iteration of the postmaster's main loop, if
 		 * necessary.  Any other exit condition is treated as a crash.
 		 */
-		if (pid == BgWriterPID)
+		if (BgWriterPMChild && pid == BgWriterPMChild->pid)
 		{
-			BgWriterPID = 0;
+			FreePostmasterChildSlot(BgWriterPMChild);
+			BgWriterPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("background writer process"));
@@ -2412,9 +2339,10 @@ process_pm_child_exit(void)
 		/*
 		 * Was it the checkpointer?
 		 */
-		if (pid == CheckpointerPID)
+		if (CheckpointerPMChild && pid == CheckpointerPMChild->pid)
 		{
-			CheckpointerPID = 0;
+			FreePostmasterChildSlot(CheckpointerPMChild);
+			CheckpointerPMChild = NULL;
 			if (EXIT_STATUS_0(exitstatus) && pmState == PM_SHUTDOWN)
 			{
 				/*
@@ -2434,8 +2362,8 @@ process_pm_child_exit(void)
 				Assert(Shutdown > NoShutdown);
 
 				/* Waken archiver for the last time */
-				if (PgArchPID != 0)
-					signal_child(PgArchPID, SIGUSR2);
+				if (PgArchPMChild != NULL)
+					signal_child(PgArchPMChild, SIGUSR2);
 
 				/*
 				 * Waken walsenders for the last time. No regular backends
@@ -2463,9 +2391,10 @@ process_pm_child_exit(void)
 		 * new one at the next iteration of the postmaster's main loop, if
 		 * necessary.  Any other exit condition is treated as a crash.
 		 */
-		if (pid == WalWriterPID)
+		if (WalWriterPMChild && pid == WalWriterPMChild->pid)
 		{
-			WalWriterPID = 0;
+			FreePostmasterChildSlot(WalWriterPMChild);
+			WalWriterPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("WAL writer process"));
@@ -2478,9 +2407,10 @@ process_pm_child_exit(void)
 		 * backends.  (If we need a new wal receiver, we'll start one at the
 		 * next iteration of the postmaster's main loop.)
 		 */
-		if (pid == WalReceiverPID)
+		if (WalReceiverPMChild && pid == WalReceiverPMChild->pid)
 		{
-			WalReceiverPID = 0;
+			FreePostmasterChildSlot(WalReceiverPMChild);
+			WalReceiverPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("WAL receiver process"));
@@ -2492,9 +2422,10 @@ process_pm_child_exit(void)
 		 * a new one at the next iteration of the postmaster's main loop, if
 		 * necessary.  Any other exit condition is treated as a crash.
 		 */
-		if (pid == WalSummarizerPID)
+		if (WalSummarizerPMChild && pid == WalSummarizerPMChild->pid)
 		{
-			WalSummarizerPID = 0;
+			FreePostmasterChildSlot(WalSummarizerPMChild);
+			WalSummarizerPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("WAL summarizer process"));
@@ -2507,9 +2438,10 @@ process_pm_child_exit(void)
 		 * loop, if necessary.  Any other exit condition is treated as a
 		 * crash.
 		 */
-		if (pid == AutoVacPID)
+		if (AutoVacLauncherPMChild && pid == AutoVacLauncherPMChild->pid)
 		{
-			AutoVacPID = 0;
+			FreePostmasterChildSlot(AutoVacLauncherPMChild);
+			AutoVacLauncherPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("autovacuum launcher process"));
@@ -2522,9 +2454,10 @@ process_pm_child_exit(void)
 		 * and just try to start a new one on the next cycle of the
 		 * postmaster's main loop, to retry archiving remaining files.
 		 */
-		if (pid == PgArchPID)
+		if (PgArchPMChild && pid == PgArchPMChild->pid)
 		{
-			PgArchPID = 0;
+			FreePostmasterChildSlot(PgArchPMChild);
+			PgArchPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("archiver process"));
@@ -2532,11 +2465,15 @@ process_pm_child_exit(void)
 		}
 
 		/* Was it the system logger?  If so, try to start a new one */
-		if (pid == SysLoggerPID)
+		if (SysLoggerPMChild && pid == SysLoggerPMChild->pid)
 		{
-			SysLoggerPID = 0;
+			FreePostmasterChildSlot(SysLoggerPMChild);
+			SysLoggerPMChild = NULL;
+
 			/* for safety's sake, launch new logger *first* */
-			SysLoggerPID = SysLogger_Start();
+			if (Logging_collector)
+				StartSysLogger();
+
 			if (!EXIT_STATUS_0(exitstatus))
 				LogChildExit(LOG, _("system logger process"),
 							 pid, exitstatus);
@@ -2550,9 +2487,10 @@ process_pm_child_exit(void)
 		 * start a new one at the next iteration of the postmaster's main
 		 * loop, if necessary. Any other exit condition is treated as a crash.
 		 */
-		if (pid == SlotSyncWorkerPID)
+		if (SlotSyncWorkerPMChild && pid == SlotSyncWorkerPMChild->pid)
 		{
-			SlotSyncWorkerPID = 0;
+			FreePostmasterChildSlot(SlotSyncWorkerPMChild);
+			SlotSyncWorkerPMChild = NULL;
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus,
 								 _("slot sync worker process"));
@@ -2562,25 +2500,17 @@ process_pm_child_exit(void)
 		/*
 		 * Was it a backend or a background worker?
 		 */
-		found = false;
-		dlist_foreach_modify(iter, &BackendList)
+		pmchild = FindPostmasterChildByPid(pid);
+		if (pmchild)
 		{
-			Backend    *bp = dlist_container(Backend, elem, iter.cur);
-
-			if (bp->pid == pid)
-			{
-				dlist_delete(iter.cur);
-				CleanupBackend(bp, exitstatus);
-				found = true;
-				break;
-			}
+			CleanupBackend(pmchild, exitstatus);
 		}
 
 		/*
 		 * We don't know anything about this child process.  That's highly
 		 * unexpected, as we do track all the child processes that we fork.
 		 */
-		if (!found)
+		else
 		{
 			if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
 				HandleChildCrash(pid, exitstatus, _("untracked child process"));
@@ -2603,13 +2533,17 @@ process_pm_child_exit(void)
  * already been unlinked from BackendList, but we will free it here.
  */
 static void
-CleanupBackend(Backend *bp,
+CleanupBackend(PMChild *bp,
 			   int exitstatus)	/* child's exit status. */
 {
 	char		namebuf[MAXPGPATH];
 	const char *procname;
 	bool		crashed = false;
 	bool		logged = false;
+	pid_t		bp_pid;
+	bool		bp_bgworker_notify;
+	BackendType bp_bkend_type;
+	RegisteredBgWorker *rw;
 
 	/* Construct a process name for the log message */
 	if (bp->bkend_type == B_BG_WORKER)
@@ -2648,25 +2582,28 @@ CleanupBackend(Backend *bp,
 #endif
 
 	/*
-	 * If the process attached to shared memory, check that it detached
-	 * cleanly.
+	 * Release the PMChild entry.
+	 *
+	 * If the process attached to shared memory, this also checks that it
+	 * detached cleanly.
 	 */
-	if (bp->bkend_type != B_DEAD_END_BACKEND)
+	bp_pid = bp->pid;
+	bp_bgworker_notify = bp->bgworker_notify;
+	bp_bkend_type = bp->bkend_type;
+	rw = bp->rw;
+	if (!FreePostmasterChildSlot(bp))
 	{
-		if (!ReleasePostmasterChildSlot(bp->child_slot))
-		{
-			/*
-			 * Uh-oh, the child failed to clean itself up.  Treat as a crash
-			 * after all.
-			 */
-			crashed = true;
-		}
+		/*
+		 * Uh-oh, the child failed to clean itself up.  Treat as a crash after
+		 * all.
+		 */
+		crashed = true;
 	}
+	bp = NULL;
 
 	if (crashed)
 	{
-		HandleChildCrash(bp->pid, exitstatus, procname);
-		pfree(bp);
+		HandleChildCrash(bp_pid, exitstatus, procname);
 		return;
 	}
 
@@ -2677,17 +2614,15 @@ CleanupBackend(Backend *bp,
 	 * gets skipped in the (probably very common) case where the backend has
 	 * never requested any such notifications.
 	 */
-	if (bp->bgworker_notify)
-		BackgroundWorkerStopNotifications(bp->pid);
+	if (bp_bgworker_notify)
+		BackgroundWorkerStopNotifications(bp_pid);
 
 	/*
 	 * If it was a background worker, also update its RegisteredBgWorker
 	 * entry.
 	 */
-	if (bp->bkend_type == B_BG_WORKER)
+	if (bp_bkend_type == B_BG_WORKER)
 	{
-		RegisteredBgWorker *rw = bp->rw;
-
 		if (!EXIT_STATUS_0(exitstatus))
 		{
 			/* Record timestamp, so we know when to restart the worker. */
@@ -2706,7 +2641,7 @@ CleanupBackend(Backend *bp,
 		if (!logged)
 		{
 			LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG,
-						 procname, bp->pid, exitstatus);
+						 procname, bp_pid, exitstatus);
 			logged = true;
 		}
 
@@ -2715,9 +2650,7 @@ CleanupBackend(Backend *bp,
 	}
 
 	if (!logged)
-		LogChildExit(DEBUG2, procname, bp->pid, exitstatus);
-
-	pfree(bp);
+		LogChildExit(DEBUG2, procname, bp_pid, exitstatus);
 }
 
 /*
@@ -2757,9 +2690,16 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	{
 		dlist_iter	iter;
 
-		dlist_foreach(iter, &BackendList)
+		dlist_foreach(iter, &ActiveChildList)
 		{
-			Backend    *bp = dlist_container(Backend, elem, iter.cur);
+			PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
+
+			/* We do NOT restart the syslogger */
+			if (bp == SysLoggerPMChild)
+				continue;
+
+			if (bp == StartupPMChild)
+				StartupStatus = STARTUP_SIGNALED;
 
 			/*
 			 * This backend is still alive.  Unless we did so already, tell it
@@ -2768,48 +2708,8 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 			 * We could exclude dead_end children here, but at least when
 			 * sending SIGABRT it seems better to include them.
 			 */
-			sigquit_child(bp->pid);
-		}
-
-		if (StartupPID != 0)
-		{
-			sigquit_child(StartupPID);
-			StartupStatus = STARTUP_SIGNALED;
+			sigquit_child(bp);
 		}
-
-		/* Take care of the bgwriter too */
-		if (BgWriterPID != 0)
-			sigquit_child(BgWriterPID);
-
-		/* Take care of the checkpointer too */
-		if (CheckpointerPID != 0)
-			sigquit_child(CheckpointerPID);
-
-		/* Take care of the walwriter too */
-		if (WalWriterPID != 0)
-			sigquit_child(WalWriterPID);
-
-		/* Take care of the walreceiver too */
-		if (WalReceiverPID != 0)
-			sigquit_child(WalReceiverPID);
-
-		/* Take care of the walsummarizer too */
-		if (WalSummarizerPID != 0)
-			sigquit_child(WalSummarizerPID);
-
-		/* Take care of the autovacuum launcher too */
-		if (AutoVacPID != 0)
-			sigquit_child(AutoVacPID);
-
-		/* Take care of the archiver too */
-		if (PgArchPID != 0)
-			sigquit_child(PgArchPID);
-
-		/* Take care of the slot sync worker too */
-		if (SlotSyncWorkerPID != 0)
-			sigquit_child(SlotSyncWorkerPID);
-
-		/* We do NOT restart the syslogger */
 	}
 
 	if (Shutdown != ImmediateShutdown)
@@ -2918,72 +2818,94 @@ PostmasterStateMachine(void)
 	}
 
 	/*
-	 * If we're ready to do so, signal child processes to shut down.  (This
-	 * isn't a persistent state, but treating it as a distinct pmState allows
-	 * us to share this code across multiple shutdown code paths.)
+	 * In the PM_WAIT_BACKENDS state, wait for all the regular backends and
+	 * procesess like autovacuum and background workers that are comparable to
+	 * backends to exit.
+	 *
+	 * PM_STOP_BACKENDS is a transient state that means the same as
+	 * PM_WAIT_BACKENDS, but we signal the processes first, before waiting for
+	 * them.  Treating it as a distinct pmState allows us to share this code
+	 * across multiple shutdown code paths.
 	 */
-	if (pmState == PM_STOP_BACKENDS)
+	if (pmState == PM_STOP_BACKENDS || pmState == PM_WAIT_BACKENDS)
 	{
+		BackendTypeMask targetMask = BTYPE_MASK_NONE;
+
 		/*
-		 * Forget any pending requests for background workers, since we're no
-		 * longer willing to launch any new workers.  (If additional requests
-		 * arrive, BackgroundWorkerStateChange will reject them.)
+		 * PM_WAIT_BACKENDS state ends when we have no regular backends, no
+		 * autovac launcher or workers, and no bgworkers (including
+		 * unconnected ones).  No walwriter, bgwriter, slot sync worker, or
+		 * WAL summarizer either.
 		 */
-		ForgetUnstartedBackgroundWorkers();
-
-		/* Signal all backend children except walsenders and dead-end backends */
-		SignalChildren(SIGTERM, btmask_all_except2(B_WAL_SENDER, B_DEAD_END_BACKEND));
-		/* and the autovac launcher too */
-		if (AutoVacPID != 0)
-			signal_child(AutoVacPID, SIGTERM);
-		/* and the bgwriter too */
-		if (BgWriterPID != 0)
-			signal_child(BgWriterPID, SIGTERM);
-		/* and the walwriter too */
-		if (WalWriterPID != 0)
-			signal_child(WalWriterPID, SIGTERM);
+		targetMask = btmask_add(targetMask, B_BACKEND);
+		targetMask = btmask_add(targetMask, B_AUTOVAC_LAUNCHER);
+		targetMask = btmask_add(targetMask, B_AUTOVAC_WORKER);
+		targetMask = btmask_add(targetMask, B_BG_WORKER);
+
+		targetMask = btmask_add(targetMask, B_WAL_WRITER);
+		targetMask = btmask_add(targetMask, B_BG_WRITER);
+		targetMask = btmask_add(targetMask, B_SLOTSYNC_WORKER);
+		targetMask = btmask_add(targetMask, B_WAL_SUMMARIZER);
+
 		/* If we're in recovery, also stop startup and walreceiver procs */
-		if (StartupPID != 0)
-			signal_child(StartupPID, SIGTERM);
-		if (WalReceiverPID != 0)
-			signal_child(WalReceiverPID, SIGTERM);
-		if (WalSummarizerPID != 0)
-			signal_child(WalSummarizerPID, SIGTERM);
-		if (SlotSyncWorkerPID != 0)
-			signal_child(SlotSyncWorkerPID, SIGTERM);
-		/* checkpointer, archiver, stats, and syslogger may continue for now */
-
-		/* Now transition to PM_WAIT_BACKENDS state to wait for them to die */
-		pmState = PM_WAIT_BACKENDS;
-	}
+		targetMask = btmask_add(targetMask, B_STARTUP);
+		targetMask = btmask_add(targetMask, B_WAL_RECEIVER);
 
-	/*
-	 * If we are in a state-machine state that implies waiting for backends to
-	 * exit, see if they're all gone, and change state if so.
-	 */
-	if (pmState == PM_WAIT_BACKENDS)
-	{
 		/*
-		 * PM_WAIT_BACKENDS state ends when we have no regular backends
-		 * (including autovac workers), no bgworkers (including unconnected
-		 * ones), and no walwriter, autovac launcher, bgwriter or slot sync
-		 * worker.  If we are doing crash recovery or an immediate shutdown
-		 * then we expect the checkpointer to exit as well, otherwise not. The
-		 * stats and syslogger processes are disregarded since they are not
-		 * connected to shared memory; we also disregard dead_end children
-		 * here. Walsenders and archiver are also disregarded, they will be
-		 * terminated later after writing the checkpoint record.
+		 * If we are doing crash recovery or an immediate shutdown then we
+		 * expect the checkpointer to exit as well, otherwise not.
 		 */
-		if (CountChildren(btmask_all_except2(B_WAL_SENDER, B_DEAD_END_BACKEND)) == 0 &&
-			StartupPID == 0 &&
-			WalReceiverPID == 0 &&
-			WalSummarizerPID == 0 &&
-			BgWriterPID == 0 &&
-			(CheckpointerPID == 0 ||
-			 (!FatalError && Shutdown < ImmediateShutdown)) &&
-			WalWriterPID == 0 &&
-			AutoVacPID == 0 &&
-			SlotSyncWorkerPID == 0)
+		if (FatalError || Shutdown >= ImmediateShutdown)
+			targetMask = btmask_add(targetMask, B_CHECKPOINTER);
+
+		/*
+		 * Walsenders and archiver will continue running; they will be
+		 * terminated later after writing the checkpoint record.  We also let
+		 * dead_end children to keep running for now.  The syslogger process
+		 * exits last.
+		 *
+		 * This assertion checks that we have covered all backend types,
+		 * either by including them in targetMask, or by noting here that they
+		 * are allowed to continue running.
+		 */
+#ifdef USE_ASSERT_CHECKING
+		{
+			BackendTypeMask remainMask = BTYPE_MASK_NONE;
+
+			remainMask = btmask_add(remainMask, B_WAL_SENDER);
+			remainMask = btmask_add(remainMask, B_ARCHIVER);
+			remainMask = btmask_add(remainMask, B_DEAD_END_BACKEND);
+			remainMask = btmask_add(remainMask, B_LOGGER);
+
+			/* checkpointer may or may not be in targetMask already */
+			remainMask = btmask_add(remainMask, B_CHECKPOINTER);
+
+			/* these are not real postmaster children */
+			remainMask = btmask_add(remainMask, B_INVALID);
+			remainMask = btmask_add(remainMask, B_STANDALONE_BACKEND);
+
+			/* All types should be included in targetMask or remainMask */
+			Assert((remainMask.mask | targetMask.mask) == BTYPE_MASK_ALL.mask);
+		}
+#endif
+
+		/* If we had not yet signaled the processes to exit, do so now */
+		if (pmState == PM_STOP_BACKENDS)
+		{
+			/*
+			 * Forget any pending requests for background workers, since we're
+			 * no longer willing to launch any new workers.  (If additional
+			 * requests arrive, BackgroundWorkerStateChange will reject them.)
+			 */
+			ForgetUnstartedBackgroundWorkers();
+
+			SignalChildren(SIGTERM, targetMask);
+
+			pmState = PM_WAIT_BACKENDS;
+		}
+
+		/* Are any of the target processes still running? */
+		if (CountChildren(targetMask) == 0)
 		{
 			if (Shutdown >= ImmediateShutdown || FatalError)
 			{
@@ -2991,13 +2913,14 @@ PostmasterStateMachine(void)
 				 * Stop any dead_end children and stop creating new ones.
 				 */
 				pmState = PM_WAIT_DEAD_END;
+				elog(DEBUG1, "entering PM_WAIT_DEAD_END");
 				ConfigurePostmasterWaitSet(false);
 				SignalChildren(SIGQUIT, btmask(B_DEAD_END_BACKEND));
 
 				/*
-				 * We already SIGQUIT'd the archiver and stats processes, if
-				 * any, when we started immediate shutdown or entered
-				 * FatalError state.
+				 * We already SIGQUIT'd walsenders and the archiver, if any,
+				 * when we started immediate shutdown or entered FatalError
+				 * state.
 				 */
 			}
 			else
@@ -3009,13 +2932,14 @@ PostmasterStateMachine(void)
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
-				if (CheckpointerPID == 0)
-					CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
+				if (CheckpointerPMChild == NULL)
+					CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
 				/* And tell it to shut down */
-				if (CheckpointerPID != 0)
+				if (CheckpointerPMChild != NULL)
 				{
-					signal_child(CheckpointerPID, SIGUSR2);
+					signal_child(CheckpointerPMChild, SIGUSR2);
 					pmState = PM_SHUTDOWN;
+					elog(DEBUG1, "entering PM_SHUTDOWN");
 				}
 				else
 				{
@@ -3031,12 +2955,11 @@ PostmasterStateMachine(void)
 					 */
 					FatalError = true;
 					pmState = PM_WAIT_DEAD_END;
+					elog(DEBUG1, "entering PM_WAIT_DEAD_END");
 					ConfigurePostmasterWaitSet(false);
 
 					/* Kill the walsenders and archiver too */
-					SignalChildren(SIGQUIT, BTYPE_MASK_ALL);
-					if (PgArchPID != 0)
-						signal_child(PgArchPID, SIGQUIT);
+					SignalChildren(SIGQUIT, btmask_all_except(B_LOGGER));
 				}
 			}
 		}
@@ -3050,39 +2973,40 @@ PostmasterStateMachine(void)
 		 * left by now anyway; what we're really waiting for is walsenders and
 		 * archiver.
 		 */
-		if (PgArchPID == 0 && CountChildren(btmask_all_except(B_DEAD_END_BACKEND)) == 0)
+		if (CountChildren(btmask_all_except2(B_LOGGER, B_DEAD_END_BACKEND)) == 0)
 		{
 			pmState = PM_WAIT_DEAD_END;
 			ConfigurePostmasterWaitSet(false);
-			SignalChildren(SIGTERM, BTYPE_MASK_ALL);
+			SignalChildren(SIGTERM, btmask_all_except(B_LOGGER));
 		}
 	}
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
 		/*
-		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
-		 * (ie, no dead_end children remain), and the archiver is gone too.
-		 *
-		 * The reason we wait for those two is to protect them against a new
-		 * postmaster starting conflicting subprocesses; this isn't an
-		 * ironclad protection, but it at least helps in the
-		 * shutdown-and-immediately-restart scenario.  Note that they have
-		 * already been sent appropriate shutdown signals, either during a
-		 * normal state transition leading up to PM_WAIT_DEAD_END, or during
+		 * PM_WAIT_DEAD_END state ends when all other children are gone except
+		 * for the logger.  During normal shutdown, all that remains are
+		 * dead-end backends, but in FatalError processing we jump straight
+		 * here with more processes remaining.  Note that they have already
+		 * been sent appropriate shutdown signals, either during a normal
+		 * state transition leading up to PM_WAIT_DEAD_END, or during
 		 * FatalError processing.
+		 *
+		 * The reason we wait is to protect against a new postmaster starting
+		 * conflicting subprocesses; this isn't an ironclad protection, but it
+		 * at least helps in the shutdown-and-immediately-restart scenario.
 		 */
-		if (dlist_is_empty(&BackendList) && PgArchPID == 0)
+		if (CountChildren(btmask_all_except(B_LOGGER)) == 0)
 		{
 			/* These other guys should be dead already */
-			Assert(StartupPID == 0);
-			Assert(WalReceiverPID == 0);
-			Assert(WalSummarizerPID == 0);
-			Assert(BgWriterPID == 0);
-			Assert(CheckpointerPID == 0);
-			Assert(WalWriterPID == 0);
-			Assert(AutoVacPID == 0);
-			Assert(SlotSyncWorkerPID == 0);
+			Assert(StartupPMChild == NULL);
+			Assert(WalReceiverPMChild == NULL);
+			Assert(WalSummarizerPMChild == NULL);
+			Assert(BgWriterPMChild == NULL);
+			Assert(CheckpointerPMChild == NULL);
+			Assert(WalWriterPMChild == NULL);
+			Assert(AutoVacLauncherPMChild == NULL);
+			Assert(SlotSyncWorkerPMChild == NULL);
 			/* syslogger is not considered here */
 			pmState = PM_NO_CHILDREN;
 		}
@@ -3165,8 +3089,8 @@ PostmasterStateMachine(void)
 		/* re-create shared memory and semaphores */
 		CreateSharedMemoryAndSemaphores();
 
-		StartupPID = StartChildProcess(B_STARTUP);
-		Assert(StartupPID != 0);
+		StartupPMChild = StartChildProcess(B_STARTUP);
+		Assert(StartupPMChild != NULL);
 		StartupStatus = STARTUP_RUNNING;
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
@@ -3189,8 +3113,8 @@ static void
 LaunchMissingBackgroundProcesses(void)
 {
 	/* Syslogger is active in all states */
-	if (SysLoggerPID == 0 && Logging_collector)
-		SysLoggerPID = SysLogger_Start();
+	if (SysLoggerPMChild == NULL && Logging_collector)
+		StartSysLogger();
 
 	/*
 	 * The checkpointer and the background writer are active from the start,
@@ -3203,30 +3127,30 @@ LaunchMissingBackgroundProcesses(void)
 	if (pmState == PM_RUN || pmState == PM_RECOVERY ||
 		pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
 	{
-		if (CheckpointerPID == 0)
-			CheckpointerPID = StartChildProcess(B_CHECKPOINTER);
-		if (BgWriterPID == 0)
-			BgWriterPID = StartChildProcess(B_BG_WRITER);
+		if (CheckpointerPMChild == NULL)
+			CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
+		if (BgWriterPMChild == NULL)
+			BgWriterPMChild = StartChildProcess(B_BG_WRITER);
 	}
 
 	/*
 	 * WAL writer is needed only in normal operation (else we cannot be
 	 * writing any new WAL).
 	 */
-	if (WalWriterPID == 0 && pmState == PM_RUN)
-		WalWriterPID = StartChildProcess(B_WAL_WRITER);
+	if (WalWriterPMChild == NULL && pmState == PM_RUN)
+		WalWriterPMChild = StartChildProcess(B_WAL_WRITER);
 
 	/*
 	 * We don't want autovacuum to run in binary upgrade mode because
 	 * autovacuum might update relfrozenxid for empty tables before the
 	 * physical files are put in place.
 	 */
-	if (!IsBinaryUpgrade && AutoVacPID == 0 &&
+	if (!IsBinaryUpgrade && AutoVacLauncherPMChild == NULL &&
 		(AutoVacuumingActive() || start_autovac_launcher) &&
 		pmState == PM_RUN)
 	{
-		AutoVacPID = StartChildProcess(B_AUTOVAC_LAUNCHER);
-		if (AutoVacPID != 0)
+		AutoVacLauncherPMChild = StartChildProcess(B_AUTOVAC_LAUNCHER);
+		if (AutoVacLauncherPMChild != NULL)
 			start_autovac_launcher = false; /* signal processed */
 	}
 
@@ -3234,11 +3158,11 @@ LaunchMissingBackgroundProcesses(void)
 	 * If WAL archiving is enabled always, we are allowed to start archiver
 	 * even during recovery.
 	 */
-	if (PgArchPID == 0 &&
+	if (PgArchPMChild == NULL &&
 		((XLogArchivingActive() && pmState == PM_RUN) ||
 		 (XLogArchivingAlways() && (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY))) &&
 		PgArchCanRestart())
-		PgArchPID = StartChildProcess(B_ARCHIVER);
+		PgArchPMChild = StartChildProcess(B_ARCHIVER);
 
 	/*
 	 * If we need to start a slot sync worker, try to do that now
@@ -3248,10 +3172,10 @@ LaunchMissingBackgroundProcesses(void)
 	 * configured correctly, and it is the first time of worker's launch, or
 	 * enough time has passed since the worker was launched last.
 	 */
-	if (SlotSyncWorkerPID == 0 && pmState == PM_HOT_STANDBY &&
+	if (SlotSyncWorkerPMChild == NULL && pmState == PM_HOT_STANDBY &&
 		Shutdown <= SmartShutdown && sync_replication_slots &&
 		ValidateSlotSyncParams(LOG) && SlotSyncWorkerCanRestart())
-		SlotSyncWorkerPID = StartChildProcess(B_SLOTSYNC_WORKER);
+		SlotSyncWorkerPMChild = StartChildProcess(B_SLOTSYNC_WORKER);
 
 	/*
 	 * If we need to start a WAL receiver, try to do that now
@@ -3267,23 +3191,23 @@ LaunchMissingBackgroundProcesses(void)
 	 */
 	if (WalReceiverRequested)
 	{
-		if (WalReceiverPID == 0 &&
+		if (WalReceiverPMChild == NULL &&
 			(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 			 pmState == PM_HOT_STANDBY) &&
 			Shutdown <= SmartShutdown)
 		{
-			WalReceiverPID = StartChildProcess(B_WAL_RECEIVER);
-			if (WalReceiverPID != 0)
+			WalReceiverPMChild = StartChildProcess(B_WAL_RECEIVER);
+			if (WalReceiverPMChild != 0)
 				WalReceiverRequested = false;
 			/* else leave the flag set, so we'll try again later */
 		}
 	}
 
 	/* If we need to start a WAL summarizer, try to do that now */
-	if (summarize_wal && WalSummarizerPID == 0 &&
+	if (summarize_wal && WalSummarizerPMChild == NULL &&
 		(pmState == PM_RUN || pmState == PM_HOT_STANDBY) &&
 		Shutdown <= SmartShutdown)
-		WalSummarizerPID = StartChildProcess(B_WAL_SUMMARIZER);
+		WalSummarizerPMChild = StartChildProcess(B_WAL_SUMMARIZER);
 
 	/* Get other worker processes running, if needed */
 	if (StartWorkerNeeded || HaveCrashedWorker)
@@ -3307,8 +3231,14 @@ LaunchMissingBackgroundProcesses(void)
  * child twice will not cause any problems.
  */
 static void
-signal_child(pid_t pid, int signal)
+signal_child(PMChild *pmchild, int signal)
 {
+	pid_t		pid;
+
+	if (pmchild == NULL || pmchild->pid == 0)
+		return;
+	pid = pmchild->pid;
+
 	if (kill(pid, signal) < 0)
 		elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
 #ifdef HAVE_SETSID
@@ -3337,17 +3267,17 @@ signal_child(pid_t pid, int signal)
  * to use SIGABRT to collect per-child core dumps.
  */
 static void
-sigquit_child(pid_t pid)
+sigquit_child(PMChild *pmchild)
 {
 	ereport(DEBUG2,
 			(errmsg_internal("sending %s to process %d",
 							 (send_abort_for_crash ? "SIGABRT" : "SIGQUIT"),
-							 (int) pid)));
-	signal_child(pid, (send_abort_for_crash ? SIGABRT : SIGQUIT));
+							 (int) pmchild->pid)));
+	signal_child(pmchild, (send_abort_for_crash ? SIGABRT : SIGQUIT));
 }
 
 /*
- * Send a signal to the targeted children (but NOT special children).
+ * Send a signal to the targeted children.
  */
 static bool
 SignalChildren(int signal, BackendTypeMask targetMask)
@@ -3355,9 +3285,9 @@ SignalChildren(int signal, BackendTypeMask targetMask)
 	dlist_iter	iter;
 	bool		signaled = false;
 
-	dlist_foreach(iter, &BackendList)
+	dlist_foreach(iter, &ActiveChildList)
 	{
-		Backend    *bp = dlist_container(Backend, elem, iter.cur);
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
 
 		/*
 		 * If we need to distinguish between B_BACKEND and B_WAL_SENDER, check
@@ -3377,7 +3307,7 @@ SignalChildren(int signal, BackendTypeMask targetMask)
 		ereport(DEBUG4,
 				(errmsg_internal("sending signal %d to %s process %d",
 								 signal, GetBackendTypeDesc(bp->bkend_type), (int) bp->pid)));
-		signal_child(bp->pid, signal);
+		signal_child(bp, signal);
 		signaled = true;
 	}
 	return signaled;
@@ -3390,29 +3320,12 @@ SignalChildren(int signal, BackendTypeMask targetMask)
 static void
 TerminateChildren(int signal)
 {
-	SignalChildren(signal, BTYPE_MASK_ALL);
-	if (StartupPID != 0)
+	SignalChildren(signal, btmask_all_except(B_LOGGER));
+	if (StartupPMChild != NULL)
 	{
-		signal_child(StartupPID, signal);
 		if (signal == SIGQUIT || signal == SIGKILL || signal == SIGABRT)
 			StartupStatus = STARTUP_SIGNALED;
 	}
-	if (BgWriterPID != 0)
-		signal_child(BgWriterPID, signal);
-	if (CheckpointerPID != 0)
-		signal_child(CheckpointerPID, signal);
-	if (WalWriterPID != 0)
-		signal_child(WalWriterPID, signal);
-	if (WalReceiverPID != 0)
-		signal_child(WalReceiverPID, signal);
-	if (WalSummarizerPID != 0)
-		signal_child(WalSummarizerPID, signal);
-	if (AutoVacPID != 0)
-		signal_child(AutoVacPID, signal);
-	if (PgArchPID != 0)
-		signal_child(PgArchPID, signal);
-	if (SlotSyncWorkerPID != 0)
-		signal_child(SlotSyncWorkerPID, signal);
 }
 
 /*
@@ -3425,44 +3338,45 @@ TerminateChildren(int signal)
 static int
 BackendStartup(ClientSocket *client_sock)
 {
-	Backend    *bn;				/* for backend cleanup */
+	PMChild    *bn = NULL;
 	pid_t		pid;
 	BackendStartupData startup_data;
+	CAC_state	cac;
 
-	/*
-	 * Create backend data structure.  Better before the fork() so we can
-	 * handle failure cleanly.
-	 */
-	bn = (Backend *) palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
+	cac = canAcceptConnections(B_BACKEND);
+	if (cac == CAC_OK)
+	{
+		/* Can change later to B_WAL_SENDER */
+		bn = AssignPostmasterChildSlot(B_BACKEND);
+		if (!bn)
+		{
+			/*
+			 * Too many regular child processes; launch a dead-end child
+			 * process instead.
+			 */
+			cac = CAC_TOOMANY;
+		}
+	}
 	if (!bn)
 	{
-		ereport(LOG,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-		return STATUS_ERROR;
+		bn = AllocDeadEndChild();
+		if (!bn)
+		{
+			ereport(LOG,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+			return STATUS_ERROR;
+		}
 	}
 
 	/* Pass down canAcceptConnections state */
-	startup_data.canAcceptConnections = canAcceptConnections(B_BACKEND);
+	startup_data.canAcceptConnections = cac;
 	bn->rw = NULL;
 
-	/*
-	 * Unless it's a dead_end child, assign it a child slot number
-	 */
-	if (startup_data.canAcceptConnections == CAC_OK)
-	{
-		bn->bkend_type = B_BACKEND; /* Can change later to B_WAL_SENDER */
-		bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
-	}
-	else
-	{
-		bn->bkend_type = B_DEAD_END_BACKEND;
-		bn->child_slot = 0;
-	}
-
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
+	MyPMChildSlot = bn->child_slot;
 	pid = postmaster_child_launch(bn->bkend_type,
 								  (char *) &startup_data, sizeof(startup_data),
 								  client_sock);
@@ -3471,9 +3385,7 @@ BackendStartup(ClientSocket *client_sock)
 		/* in parent, fork failed */
 		int			save_errno = errno;
 
-		if (bn->child_slot != 0)
-			(void) ReleasePostmasterChildSlot(bn->child_slot);
-		pfree(bn);
+		(void) FreePostmasterChildSlot(bn);
 		errno = save_errno;
 		ereport(LOG,
 				(errmsg("could not fork new process for connection: %m")));
@@ -3492,8 +3404,6 @@ BackendStartup(ClientSocket *client_sock)
 	 * of backends.
 	 */
 	bn->pid = pid;
-	dlist_push_head(&BackendList, &bn->elem);
-
 	return STATUS_OK;
 }
 
@@ -3591,9 +3501,9 @@ process_pm_pmsignal(void)
 		 * Start the archiver if we're responsible for (re-)archiving received
 		 * files.
 		 */
-		Assert(PgArchPID == 0);
+		Assert(PgArchPMChild == NULL);
 		if (XLogArchivingAlways())
-			PgArchPID = StartChildProcess(B_ARCHIVER);
+			PgArchPMChild = StartChildProcess(B_ARCHIVER);
 
 		/*
 		 * If we aren't planning to enter hot standby mode later, treat
@@ -3639,16 +3549,16 @@ process_pm_pmsignal(void)
 	}
 
 	/* Tell syslogger to rotate logfile if requested */
-	if (SysLoggerPID != 0)
+	if (SysLoggerPMChild != NULL)
 	{
 		if (CheckLogrotateSignal())
 		{
-			signal_child(SysLoggerPID, SIGUSR1);
+			signal_child(SysLoggerPMChild, SIGUSR1);
 			RemoveLogrotateSignalFiles();
 		}
 		else if (CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE))
 		{
-			signal_child(SysLoggerPID, SIGUSR1);
+			signal_child(SysLoggerPMChild, SIGUSR1);
 		}
 	}
 
@@ -3695,7 +3605,7 @@ process_pm_pmsignal(void)
 		PostmasterStateMachine();
 	}
 
-	if (StartupPID != 0 &&
+	if (StartupPMChild != NULL &&
 		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 		 pmState == PM_HOT_STANDBY) &&
 		CheckPromoteSignal())
@@ -3706,7 +3616,7 @@ process_pm_pmsignal(void)
 		 * Leave the promote signal file in place and let the Startup process
 		 * do the unlink.
 		 */
-		signal_child(StartupPID, SIGUSR2);
+		signal_child(StartupPMChild, SIGUSR2);
 	}
 }
 
@@ -3733,9 +3643,9 @@ CountChildren(BackendTypeMask targetMask)
 	dlist_iter	iter;
 	int			cnt = 0;
 
-	dlist_foreach(iter, &BackendList)
+	dlist_foreach(iter, &ActiveChildList)
 	{
-		Backend    *bp = dlist_container(Backend, elem, iter.cur);
+		PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
 
 		/*
 		 * If we need to distinguish between B_BACKEND and B_WAL_SENDER, check
@@ -3752,6 +3662,10 @@ CountChildren(BackendTypeMask targetMask)
 		if (!btmask_contains(targetMask, bp->bkend_type))
 			continue;
 
+		ereport(DEBUG4,
+				(errmsg_internal("%s process %d is still running",
+								 GetBackendTypeDesc(bp->bkend_type), (int) bp->pid)));
+
 		cnt++;
 	}
 	return cnt;
@@ -3764,18 +3678,36 @@ CountChildren(BackendTypeMask targetMask)
  * "type" determines what kind of child will be started.  All child types
  * initially go to AuxiliaryProcessMain, which will handle common setup.
  *
- * Return value of StartChildProcess is subprocess' PID, or 0 if failed
- * to start subprocess.
+ * Return value of StartChildProcess is subprocess' PMChild entry, or NULL on
+ * failure.
  */
-static pid_t
+static PMChild *
 StartChildProcess(BackendType type)
 {
+	PMChild    *pmchild;
 	pid_t		pid;
 
+	pmchild = AssignPostmasterChildSlot(type);
+	if (!pmchild)
+	{
+		if (type == B_AUTOVAC_WORKER)
+			ereport(LOG,
+					(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+					 errmsg("no slot available for new autovacuum worker process")));
+		else
+		{
+			/* shouldn't happen because we allocate enough slots */
+			elog(LOG, "no postmaster child slot available for aux process");
+		}
+		return NULL;
+	}
+
+	MyPMChildSlot = pmchild->child_slot;
 	pid = postmaster_child_launch(type, NULL, 0, NULL);
 	if (pid < 0)
 	{
 		/* in parent, fork failed */
+		FreePostmasterChildSlot(pmchild);
 		ereport(LOG,
 				(errmsg("could not fork \"%s\" process: %m", PostmasterChildName(type))));
 
@@ -3785,13 +3717,31 @@ StartChildProcess(BackendType type)
 		 */
 		if (type == B_STARTUP)
 			ExitPostmaster(1);
-		return 0;
+		return NULL;
 	}
 
-	/*
-	 * in parent, successful fork
-	 */
-	return pid;
+	/* in parent, successful fork */
+	pmchild->pid = pid;
+	return pmchild;
+}
+
+/*
+ * StartSysLogger -- start the syslogger process
+ */
+void
+StartSysLogger(void)
+{
+	Assert(SysLoggerPMChild == NULL);
+
+	SysLoggerPMChild = AssignPostmasterChildSlot(B_LOGGER);
+	if (!SysLoggerPMChild)
+		elog(PANIC, "no postmaster child slot available for syslogger");
+	SysLoggerPMChild->pid = SysLogger_Start(SysLoggerPMChild->child_slot);
+	if (SysLoggerPMChild->pid == 0)
+	{
+		FreePostmasterChildSlot(SysLoggerPMChild);
+		SysLoggerPMChild = NULL;
+	}
 }
 
 /*
@@ -3806,7 +3756,7 @@ StartChildProcess(BackendType type)
 static void
 StartAutovacuumWorker(void)
 {
-	Backend    *bn;
+	PMChild    *bn;
 
 	/*
 	 * If not in condition to run a process, don't try, but handle it like a
@@ -3817,34 +3767,20 @@ StartAutovacuumWorker(void)
 	 */
 	if (canAcceptConnections(B_AUTOVAC_WORKER) == CAC_OK)
 	{
-		bn = (Backend *) palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
+		bn = StartChildProcess(B_AUTOVAC_WORKER);
 		if (bn)
 		{
-			/* Autovac workers need a child slot */
-			bn->bkend_type = B_AUTOVAC_WORKER;
-			bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 			bn->bgworker_notify = false;
 			bn->rw = NULL;
-
-			bn->pid = StartChildProcess(B_AUTOVAC_WORKER);
-			if (bn->pid > 0)
-			{
-				dlist_push_head(&BackendList, &bn->elem);
-				/* all OK */
-				return;
-			}
-
+			return;
+		}
+		else
+		{
 			/*
 			 * fork failed, fall through to report -- actual error message was
 			 * logged by StartChildProcess
 			 */
-			(void) ReleasePostmasterChildSlot(bn->child_slot);
-			pfree(bn);
 		}
-		else
-			ereport(LOG,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
 	}
 
 	/*
@@ -3856,7 +3792,7 @@ StartAutovacuumWorker(void)
 	 * quick succession between the autovac launcher and postmaster in case
 	 * things get ugly.
 	 */
-	if (AutoVacPID != 0)
+	if (AutoVacLauncherPMChild != NULL)
 	{
 		AutoVacWorkerFailed();
 		avlauncher_needs_signal = true;
@@ -3900,23 +3836,6 @@ CreateOptsFile(int argc, char *argv[], char *fullprogname)
 }
 
 
-/*
- * MaxLivePostmasterChildren
- *
- * This reports the number of entries needed in the per-child-process array
- * (PMChildFlags).  It includes regular backends, autovac workers, walsenders
- * and background workers, but not special children nor dead_end children.
- * This allows the array to have a fixed maximum size, to wit the same
- * too-many-children limit enforced by canAcceptConnections().  The exact value
- * isn't too critical as long as it's more than MaxBackends.
- */
-int
-MaxLivePostmasterChildren(void)
-{
-	return 2 * (MaxConnections + autovacuum_max_workers + 1 +
-				max_wal_senders + max_worker_processes);
-}
-
 /*
  * Start a new bgworker.
  * Starting time conditions must have been checked already.
@@ -3929,7 +3848,7 @@ MaxLivePostmasterChildren(void)
 static bool
 do_start_bgworker(RegisteredBgWorker *rw)
 {
-	Backend    *bn;
+	PMChild    *bn;
 	pid_t		worker_pid;
 
 	Assert(rw->rw_pid == 0);
@@ -3956,6 +3875,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 			(errmsg_internal("starting background worker process \"%s\"",
 							 rw->rw_worker.bgw_name)));
 
+	MyPMChildSlot = bn->child_slot;
 	worker_pid = postmaster_child_launch(B_BG_WORKER, (char *) &rw->rw_worker, sizeof(BackgroundWorker), NULL);
 	if (worker_pid == -1)
 	{
@@ -3963,8 +3883,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 		ereport(LOG,
 				(errmsg("could not fork background worker process: %m")));
 		/* undo what assign_backendlist_entry did */
-		ReleasePostmasterChildSlot(bn->child_slot);
-		pfree(bn);
+		FreePostmasterChildSlot(bn);
 
 		/* mark entry as crashed, so we'll try again later */
 		rw->rw_crashed_at = GetCurrentTimestamp();
@@ -3975,8 +3894,6 @@ do_start_bgworker(RegisteredBgWorker *rw)
 	rw->rw_pid = worker_pid;
 	bn->pid = rw->rw_pid;
 	ReportBackgroundWorkerPID(rw);
-	/* add new worker to lists of backends */
-	dlist_push_head(&BackendList, &bn->elem);
 	return true;
 }
 
@@ -4024,17 +3941,13 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
  *
  * On failure, return NULL.
  */
-static Backend *
+static PMChild *
 assign_backendlist_entry(void)
 {
-	Backend    *bn;
+	PMChild    *bn;
 
-	/*
-	 * Check that database state allows another connection.  Currently the
-	 * only possible failure is CAC_TOOMANY, so we just log an error message
-	 * based on that rather than checking the error code precisely.
-	 */
-	if (canAcceptConnections(B_BG_WORKER) != CAC_OK)
+	bn = AssignPostmasterChildSlot(B_BG_WORKER);
+	if (bn == NULL)
 	{
 		ereport(LOG,
 				(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
@@ -4042,16 +3955,6 @@ assign_backendlist_entry(void)
 		return NULL;
 	}
 
-	bn = palloc_extended(sizeof(Backend), MCXT_ALLOC_NO_OOM);
-	if (bn == NULL)
-	{
-		ereport(LOG,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-		return NULL;
-	}
-
-	bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
 	bn->bkend_type = B_BG_WORKER;
 	bn->bgworker_notify = false;
 
@@ -4192,11 +4095,11 @@ bool
 PostmasterMarkPIDForWorkerNotify(int pid)
 {
 	dlist_iter	iter;
-	Backend    *bp;
+	PMChild    *bp;
 
-	dlist_foreach(iter, &BackendList)
+	dlist_foreach(iter, &ActiveChildList)
 	{
-		bp = dlist_container(Backend, elem, iter.cur);
+		bp = dlist_container(PMChild, elem, iter.cur);
 		if (bp->pid == pid)
 		{
 			bp->bgworker_notify = true;
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 7951599fa8..7ca24c6663 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -590,7 +590,7 @@ SysLoggerMain(char *startup_data, size_t startup_data_len)
  * Postmaster subroutine to start a syslogger subprocess.
  */
 int
-SysLogger_Start(void)
+SysLogger_Start(int child_slot)
 {
 	pid_t		sysloggerPid;
 	char	   *filename;
@@ -598,8 +598,7 @@ SysLogger_Start(void)
 	SysloggerStartupData startup_data;
 #endif							/* EXEC_BACKEND */
 
-	if (!Logging_collector)
-		return 0;
+	Assert(Logging_collector);
 
 	/*
 	 * If first time through, create the pipe which will receive stderr
@@ -695,6 +694,7 @@ SysLogger_Start(void)
 		pfree(filename);
 	}
 
+	MyPMChildSlot = child_slot;
 #ifdef EXEC_BACKEND
 	startup_data.syslogFile = syslogger_fdget(syslogFile);
 	startup_data.csvlogFile = syslogger_fdget(csvlogFile);
diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index c801e9bec5..929ab570c1 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -47,11 +47,11 @@
  * exited without performing proper shutdown.  The per-child-process flags
  * have three possible states: UNUSED, ASSIGNED, ACTIVE.  An UNUSED slot is
  * available for assignment.  An ASSIGNED slot is associated with a postmaster
- * child process, but either the process has not touched shared memory yet,
- * or it has successfully cleaned up after itself.  A ACTIVE slot means the
- * process is actively using shared memory.  The slots are assigned to
- * child processes at random, and postmaster.c is responsible for tracking
- * which one goes with which PID.
+ * child process, but either the process has not touched shared memory yet, or
+ * it has successfully cleaned up after itself.  An ACTIVE slot means the
+ * process is actively using shared memory.  The slots are assigned to child
+ * processes by postmaster, and pmchild.c is responsible for tracking which
+ * one goes with which PID.
  *
  * Actually there is a fourth state, WALSENDER.  This is just like ACTIVE,
  * but carries the extra information that the child is a WAL sender.
@@ -84,13 +84,11 @@ struct PMSignalData
 NON_EXEC_STATIC volatile PMSignalData *PMSignalState = NULL;
 
 /*
- * These static variables are valid only in the postmaster.  We keep a
- * duplicative private array so that we can trust its state even if some
- * failing child has clobbered the PMSignalData struct in shared memory.
+ * Local copy of PMSignalState->num_child_flags, only valid in the
+ * postmaster.  Postmaster keeps a local copy so that it doesn't need to
+ * trust the value in shared memory.
  */
-static int	num_child_inuse;	/* # of entries in PMChildInUse[] */
-static int	next_child_inuse;	/* next slot to try to assign */
-static bool *PMChildInUse;		/* true if i'th flag slot is assigned */
+static int     num_child_flags;
 
 /*
  * Signal handler to be notified if postmaster dies.
@@ -155,25 +153,8 @@ PMSignalShmemInit(void)
 	{
 		/* initialize all flags to zeroes */
 		MemSet(unvolatize(PMSignalData *, PMSignalState), 0, PMSignalShmemSize());
-		num_child_inuse = MaxLivePostmasterChildren();
-		PMSignalState->num_child_flags = num_child_inuse;
-
-		/*
-		 * Also allocate postmaster's private PMChildInUse[] array.  We
-		 * might've already done that in a previous shared-memory creation
-		 * cycle, in which case free the old array to avoid a leak.  (Do it
-		 * like this to support the possibility that MaxLivePostmasterChildren
-		 * changed.)  In a standalone backend, we do not need this.
-		 */
-		if (PostmasterContext != NULL)
-		{
-			if (PMChildInUse)
-				pfree(PMChildInUse);
-			PMChildInUse = (bool *)
-				MemoryContextAllocZero(PostmasterContext,
-									   num_child_inuse * sizeof(bool));
-		}
-		next_child_inuse = 0;
+		num_child_flags = MaxLivePostmasterChildren();
+		PMSignalState->num_child_flags = num_child_flags;
 	}
 }
 
@@ -239,41 +220,22 @@ GetQuitSignalReason(void)
 
 
 /*
- * AssignPostmasterChildSlot - select an unused slot for a new postmaster
- * child process, and set its state to ASSIGNED.  Returns a slot number
- * (one to N).
+ * ReservePostmasterChildSlot - mark the given slot as ASSIGNED for a new
+ * postmaster child process.
  *
  * Only the postmaster is allowed to execute this routine, so we need no
  * special locking.
  */
-int
-AssignPostmasterChildSlot(void)
+void
+ReservePostmasterChildSlot(int slot)
 {
-	int			slot = next_child_inuse;
-	int			n;
+	Assert(slot > 0 && slot <= num_child_flags);
+	slot--;
 
-	/*
-	 * Scan for a free slot.  Notice that we trust nothing about the contents
-	 * of PMSignalState, but use only postmaster-local data for this decision.
-	 * We track the last slot assigned so as not to waste time repeatedly
-	 * rescanning low-numbered slots.
-	 */
-	for (n = num_child_inuse; n > 0; n--)
-	{
-		if (--slot < 0)
-			slot = num_child_inuse - 1;
-		if (!PMChildInUse[slot])
-		{
-			PMChildInUse[slot] = true;
-			PMSignalState->PMChildFlags[slot] = PM_CHILD_ASSIGNED;
-			next_child_inuse = slot;
-			return slot + 1;
-		}
-	}
+	if (PMSignalState->PMChildFlags[slot] != PM_CHILD_UNUSED)
+		elog(FATAL, "postmaster child slot is already in use");
 
-	/* Out of slots ... should never happen, else postmaster.c messed up */
-	elog(FATAL, "no free slots in PMChildFlags array");
-	return 0;					/* keep compiler quiet */
+	PMSignalState->PMChildFlags[slot] = PM_CHILD_ASSIGNED;
 }
 
 /*
@@ -288,7 +250,7 @@ ReleasePostmasterChildSlot(int slot)
 {
 	bool		result;
 
-	Assert(slot > 0 && slot <= num_child_inuse);
+	Assert(slot > 0 && slot <= num_child_flags);
 	slot--;
 
 	/*
@@ -298,7 +260,6 @@ ReleasePostmasterChildSlot(int slot)
 	 */
 	result = (PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED);
 	PMSignalState->PMChildFlags[slot] = PM_CHILD_UNUSED;
-	PMChildInUse[slot] = false;
 	return result;
 }
 
@@ -309,7 +270,7 @@ ReleasePostmasterChildSlot(int slot)
 bool
 IsPostmasterChildWalSender(int slot)
 {
-	Assert(slot > 0 && slot <= num_child_inuse);
+	Assert(slot > 0 && slot <= num_child_flags);
 	slot--;
 
 	if (PMSignalState->PMChildFlags[slot] == PM_CHILD_WALSENDER)
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index eaf3916f28..8b7c1fafc5 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -357,14 +357,9 @@ InitProcess(void)
 	/*
 	 * Before we start accessing the shared memory in a serious way, mark
 	 * ourselves as an active postmaster child; this is so that the postmaster
-	 * can detect it if we exit without cleaning up.  (XXX autovac launcher
-	 * currently doesn't participate in this; it probably should.)
-	 *
-	 * Slot sync worker also does not participate in it, see comments atop
-	 * 'struct bkend' in postmaster.c.
+	 * can detect it if we exit without cleaning up.
 	 */
-	if (IsUnderPostmaster && !AmAutoVacuumLauncherProcess() &&
-		!AmLogicalSlotSyncWorkerProcess())
+	if (IsUnderPostmaster)
 		RegisterPostmasterChildActive();
 
 	/* Decide which list should supply our PGPROC. */
@@ -582,6 +577,9 @@ InitAuxiliaryProcess(void)
 	if (MyProc != NULL)
 		elog(ERROR, "you already exist");
 
+	if (IsUnderPostmaster)
+		RegisterPostmasterChildActive();
+
 	/*
 	 * We use the ProcStructLock to protect assignment and releasing of
 	 * AuxiliaryProcs entries.
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 63c12917cf..5a40aee3ce 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -13,8 +13,44 @@
 #ifndef _POSTMASTER_H
 #define _POSTMASTER_H
 
+#include "lib/ilist.h"
 #include "miscadmin.h"
 
+/*
+ * A struct representing an active postmaster child process.  This is used
+ * mainly to keep track of how many children we have and send them appropriate
+ * signals when necessary.  All postmaster child processes are assigned a
+ * PMChild entry. That includes "normal" client sessions, but also autovacuum
+ * workers, walsenders, background workers, and aux processes.  (Note that at
+ * the time of launch, walsenders are labeled B_BACKEND; we relabel them to
+ * B_WAL_SENDER upon noticing they've changed their PMChildFlags entry.  Hence
+ * that check must be done before any operation that needs to distinguish
+ * walsenders from normal backends.)
+ *
+ * "dead_end" children are also allocated a PMChild entry: these are children
+ * launched just for the purpose of sending a friendly rejection message to a
+ * would-be client.  We must track them because they are attached to shared
+ * memory, but we know they will never become live backends.
+ *
+ * 'child_slot' is an identifier that is unique across all running child
+ * processes.  It is used as an index into the PMChildFlags array. dead_end
+ * children are not assigned a child_slot and have child_slot == 0 (valid
+ * child_slot ids start from 1).
+ */
+typedef struct
+{
+	pid_t		pid;			/* process id of backend */
+	int			child_slot;		/* PMChildSlot for this backend, if any */
+	BackendType bkend_type;		/* child process flavor, see above */
+	struct RegisteredBgWorker *rw;	/* bgworker info, if this is a bgworker */
+	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
+	dlist_node	elem;			/* list link in BackendList */
+} PMChild;
+
+#ifdef EXEC_BACKEND
+extern int	num_pmchild_slots;
+#endif
+
 /* GUC options */
 extern PGDLLIMPORT bool EnableSSL;
 extern PGDLLIMPORT int SuperuserReservedConnections;
@@ -80,6 +116,15 @@ const char *PostmasterChildName(BackendType child_type);
 extern void SubPostmasterMain(int argc, char *argv[]) pg_attribute_noreturn();
 #endif
 
+/* prototypes for functions in pmchild.c */
+extern dlist_head ActiveChildList;
+
+extern void InitPostmasterChildSlots(void);
+extern PMChild *AssignPostmasterChildSlot(BackendType btype);
+extern bool FreePostmasterChildSlot(PMChild *pmchild);
+extern PMChild *FindPostmasterChildByPid(int pid);
+extern PMChild *AllocDeadEndChild(void);
+
 /*
  * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
  * for buffer references in buf_internals.h.  This limitation could be lifted
diff --git a/src/include/postmaster/syslogger.h b/src/include/postmaster/syslogger.h
index 94ea263f2b..27bd16ae1d 100644
--- a/src/include/postmaster/syslogger.h
+++ b/src/include/postmaster/syslogger.h
@@ -86,7 +86,7 @@ extern PGDLLIMPORT HANDLE syslogPipe[2];
 #endif
 
 
-extern int	SysLogger_Start(void);
+extern int	SysLogger_Start(int child_slot);
 
 extern void write_syslogger_file(const char *buffer, int count, int destination);
 
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index ce4620af1f..4a32f7fed0 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -70,7 +70,7 @@ extern void SendPostmasterSignal(PMSignalReason reason);
 extern bool CheckPostmasterSignal(PMSignalReason reason);
 extern void SetQuitSignalReason(QuitSignalReason reason);
 extern QuitSignalReason GetQuitSignalReason(void);
-extern int	AssignPostmasterChildSlot(void);
+extern void ReservePostmasterChildSlot(int slot);
 extern bool ReleasePostmasterChildSlot(int slot);
 extern bool IsPostmasterChildWalSender(int slot);
 extern void RegisterPostmasterChildActive(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6f98fbc6f5..6d77a9fc17 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -230,7 +230,6 @@ BTWriteState
 BUF_MEM
 BYTE
 BY_HANDLE_FILE_INFORMATION
-Backend
 BackendParameters
 BackendStartupData
 BackendState
@@ -1931,6 +1930,8 @@ PLyTransformToOb
 PLyTupleToOb
 PLyUnicode_FromStringAndSize_t
 PLy_elog_impl_t
+PMChild
+PMChildPool
 PMINIDUMP_CALLBACK_INFORMATION
 PMINIDUMP_EXCEPTION_INFORMATION
 PMINIDUMP_USER_STREAM_INFORMATION
-- 
2.39.5

0004-Pass-MyPMChildSlot-as-an-explicit-argument-to-child-.patchtext/x-patch; charset=UTF-8; name=0004-Pass-MyPMChildSlot-as-an-explicit-argument-to-child-.patchDownload

From 935a3708fae9bc721ea23ebb5955705252ab4b37 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 9 Oct 2024 22:15:41 +0300
Subject: [PATCH 4/4] Pass MyPMChildSlot as an explicit argument to child
 process

All the other global variables passed from postmaster to child have
the same value in all the processes, while MyPMChildSlot is more like
a parameter to each child process.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
---
 src/backend/postmaster/launch_backend.c | 32 ++++++++++++++++---------
 src/backend/postmaster/postmaster.c     | 10 ++++----
 src/backend/postmaster/syslogger.c      |  7 +++---
 src/include/postmaster/postmaster.h     |  1 +
 4 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index 02755b6448..20c8d1e038 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -96,7 +96,6 @@ typedef int InheritableSocket;
 typedef struct
 {
 	char		DataDir[MAXPGPATH];
-	int			MyPMChildSlot;
 #ifndef WIN32
 	unsigned long UsedShmemSegID;
 #else
@@ -138,6 +137,8 @@ typedef struct
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
 
+	int			MyPMChildSlot;
+
 	/*
 	 * These are only used by backend processes, but are here because passing
 	 * a socket needs some special handling on Windows. 'client_sock' is an
@@ -159,13 +160,16 @@ typedef struct
 static void read_backend_variables(char *id, char **startup_data, size_t *startup_data_len);
 static void restore_backend_variables(BackendParameters *param);
 
-static bool save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
+static bool save_backend_variables(BackendParameters *param, int child_slot,
+								   ClientSocket *client_sock,
 #ifdef WIN32
 								   HANDLE childProcess, pid_t childPid,
 #endif
 								   char *startup_data, size_t startup_data_len);
 
-static pid_t internal_forkexec(const char *child_kind, char *startup_data, size_t startup_data_len, ClientSocket *client_sock);
+static pid_t internal_forkexec(const char *child_kind, int child_slot,
+							   char *startup_data, size_t startup_data_len,
+							   ClientSocket *client_sock);
 
 #endif							/* EXEC_BACKEND */
 
@@ -227,7 +231,7 @@ PostmasterChildName(BackendType child_type)
  * the child process.
  */
 pid_t
-postmaster_child_launch(BackendType child_type,
+postmaster_child_launch(BackendType child_type, int child_slot,
 						char *startup_data, size_t startup_data_len,
 						ClientSocket *client_sock)
 {
@@ -236,7 +240,7 @@ postmaster_child_launch(BackendType child_type,
 	Assert(IsPostmasterEnvironment && !IsUnderPostmaster);
 
 #ifdef EXEC_BACKEND
-	pid = internal_forkexec(child_process_kinds[child_type].name,
+	pid = internal_forkexec(child_process_kinds[child_type].name, child_slot,
 							startup_data, startup_data_len, client_sock);
 	/* the child process will arrive in SubPostmasterMain */
 #else							/* !EXEC_BACKEND */
@@ -264,6 +268,7 @@ postmaster_child_launch(BackendType child_type,
 		 */
 		MemoryContextSwitchTo(TopMemoryContext);
 
+		MyPMChildSlot = child_slot;
 		if (client_sock)
 		{
 			MyClientSocket = palloc(sizeof(ClientSocket));
@@ -290,7 +295,8 @@ postmaster_child_launch(BackendType child_type,
  * - fork():s, and then exec():s the child process
  */
 static pid_t
-internal_forkexec(const char *child_kind, char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
+internal_forkexec(const char *child_kind, int child_slot,
+				  char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
 {
 	static unsigned long tmpBackendFileNum = 0;
 	pid_t		pid;
@@ -310,7 +316,7 @@ internal_forkexec(const char *child_kind, char *startup_data, size_t startup_dat
 	 */
 	paramsz = SizeOfBackendParameters(startup_data_len);
 	param = palloc0(paramsz);
-	if (!save_backend_variables(param, client_sock, startup_data, startup_data_len))
+	if (!save_backend_variables(param, child_slot, client_sock, startup_data, startup_data_len))
 	{
 		pfree(param);
 		return -1;				/* log made by save_backend_variables */
@@ -399,7 +405,8 @@ internal_forkexec(const char *child_kind, char *startup_data, size_t startup_dat
  *	 file is complete.
  */
 static pid_t
-internal_forkexec(const char *child_kind, char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
+internal_forkexec(const char *child_kind, int child_slot,
+				  char *startup_data, size_t startup_data_len, ClientSocket *client_sock)
 {
 	int			retry_count = 0;
 	STARTUPINFO si;
@@ -480,7 +487,9 @@ retry:
 		return -1;
 	}
 
-	if (!save_backend_variables(param, client_sock, pi.hProcess, pi.dwProcessId, startup_data, startup_data_len))
+	if (!save_backend_variables(param, child_slot, client_sock,
+								pi.hProcess, pi.dwProcessId,
+								startup_data, startup_data_len))
 	{
 		/*
 		 * log made by save_backend_variables, but we have to clean up the
@@ -692,7 +701,8 @@ static void read_inheritable_socket(SOCKET *dest, InheritableSocket *src);
 
 /* Save critical backend variables into the BackendParameters struct */
 static bool
-save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
+save_backend_variables(BackendParameters *param,
+					   int child_slot, ClientSocket *client_sock,
 #ifdef WIN32
 					   HANDLE childProcess, pid_t childPid,
 #endif
@@ -709,7 +719,7 @@ save_backend_variables(BackendParameters *param, ClientSocket *client_sock,
 
 	strlcpy(param->DataDir, DataDir, MAXPGPATH);
 
-	param->MyPMChildSlot = MyPMChildSlot;
+	param->MyPMChildSlot = child_slot;
 
 #ifdef WIN32
 	param->ShmemProtectiveRegion = ShmemProtectiveRegion;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 97a1b7ae1a..6eab012dc2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3376,8 +3376,7 @@ BackendStartup(ClientSocket *client_sock)
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
-	MyPMChildSlot = bn->child_slot;
-	pid = postmaster_child_launch(bn->bkend_type,
+	pid = postmaster_child_launch(bn->bkend_type, bn->child_slot,
 								  (char *) &startup_data, sizeof(startup_data),
 								  client_sock);
 	if (pid < 0)
@@ -3702,8 +3701,7 @@ StartChildProcess(BackendType type)
 		return NULL;
 	}
 
-	MyPMChildSlot = pmchild->child_slot;
-	pid = postmaster_child_launch(type, NULL, 0, NULL);
+	pid = postmaster_child_launch(type, pmchild->child_slot, NULL, 0, NULL);
 	if (pid < 0)
 	{
 		/* in parent, fork failed */
@@ -3875,8 +3873,8 @@ do_start_bgworker(RegisteredBgWorker *rw)
 			(errmsg_internal("starting background worker process \"%s\"",
 							 rw->rw_worker.bgw_name)));
 
-	MyPMChildSlot = bn->child_slot;
-	worker_pid = postmaster_child_launch(B_BG_WORKER, (char *) &rw->rw_worker, sizeof(BackgroundWorker), NULL);
+	worker_pid = postmaster_child_launch(B_BG_WORKER, bn->child_slot,
+										 (char *) &rw->rw_worker, sizeof(BackgroundWorker), NULL);
 	if (worker_pid == -1)
 	{
 		/* in postmaster, fork failed ... */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 7ca24c6663..f12639056f 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -694,14 +694,15 @@ SysLogger_Start(int child_slot)
 		pfree(filename);
 	}
 
-	MyPMChildSlot = child_slot;
 #ifdef EXEC_BACKEND
 	startup_data.syslogFile = syslogger_fdget(syslogFile);
 	startup_data.csvlogFile = syslogger_fdget(csvlogFile);
 	startup_data.jsonlogFile = syslogger_fdget(jsonlogFile);
-	sysloggerPid = postmaster_child_launch(B_LOGGER, (char *) &startup_data, sizeof(startup_data), NULL);
+	sysloggerPid = postmaster_child_launch(B_LOGGER, child_slot,
+										   (char *) &startup_data, sizeof(startup_data), NULL);
 #else
-	sysloggerPid = postmaster_child_launch(B_LOGGER, NULL, 0, NULL);
+	sysloggerPid = postmaster_child_launch(B_LOGGER, child_slot,
+										   NULL, 0, NULL);
 #endif							/* EXEC_BACKEND */
 
 	if (sysloggerPid == -1)
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 5a40aee3ce..df18e6cfc5 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -108,6 +108,7 @@ extern PGDLLIMPORT struct ClientSocket *MyClientSocket;
 
 /* prototypes for functions in launch_backend.c */
 extern pid_t postmaster_child_launch(BackendType child_type,
+									 int child_slot,
 									 char *startup_data,
 									 size_t startup_data_len,
 									 struct ClientSocket *client_sock);
-- 
2.39.5

#25

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Heikki Linnakangas (#24)

Re: Refactoring postmaster's code to cleanup after child exit

On 09/10/2024 23:40, Heikki Linnakangas wrote:

I pushed the first three patches, with the new test and one of the small
refactoring patches. Thanks for all the comments so far! Here is a new
version of the remaining patches.

Lots of little cleanups and changes here and there since the last
versions, but the notable bigger changes are:

- There is now a BackendTypeMask datatype, so that if you try to mix up
bitmasks and plain BackendType values, the compiler will complain.

- pmchild.c has been rewritten per feedback, so that the "pools" of
PMChild structs are more explicit. The size of each pool is only stated
once, whereas before the same logic was duplicated in
MaxLivePostmasterChildren() which calculates the number of slots and in
InitPostmasterChildSlots() which allocates them.

- In PostmasterStateMachine(), I combined the code to handle
PM_STOP_BACKENDS and PM_WAIT_BACKENDS. They are essentially the same
state, except that PM_STOP_BACKENDS first sends the signal to all the
child processes that it will then wait for. They both needed to build
the same bitmask of processes to signal or wait for; this eliminates the
duplication.

Made a few more changes since last patch version:

- Fixed initialization in pmchild.c in single-user and bootstrapping mode
- inlined assign_backendlist_entry() into its only caller; it wasn't
doing much anymore
- cleaned up some leftovers in canAcceptConnections()
- Renamed some functions for clarity, fixed some leftover comments that
still talked about Backend structs and BackendList

With those changes, committed. Thanks for the review!

--
Heikki Linnakangas
Neon (https://neon.tech)

#26

Tomas Vondra

tomas@vondra.me

about 1 year ago

In reply to: Heikki Linnakangas (#25)

Re: Refactoring postmaster's code to cleanup after child exit

On 11/14/24 15:13, Heikki Linnakangas wrote:

On 09/10/2024 23:40, Heikki Linnakangas wrote:

I pushed the first three patches, with the new test and one of the small
refactoring patches. Thanks for all the comments so far! Here is a new
version of the remaining patches.

Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
seems to have problems with valgrind :-( I reliably get this failure:

t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 4.
t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
All 4 subtests passed

and tmp_check/log/regress_log_001_connection_limits says:

[23:48:44.444](1.129s) ok 3 - reserved_connections limit
[23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
process ended prematurely at
/home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 154.
# Postmaster PID for node "primary" is 198592

That BackgroundPsql.pm line is this in wait_connect()

$self->{run}->pump()
until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;

By trial and error I found that it fails on this line 70:

push(@sessions, background_psql_as_user('regress_superuser'));

but I have no idea idea why. There are multiple similar calls a couple
lines earlier, and those work fine. And various other TAP tests with
background_sql() work fine too.

So what's so special about this particular line?

regards

--
Tomas Vondra

#27

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Tomas Vondra (#26)

Re: Refactoring postmaster's code to cleanup after child exit

On 09/12/2024 01:12, Tomas Vondra wrote:

On 11/14/24 15:13, Heikki Linnakangas wrote:

On 09/10/2024 23:40, Heikki Linnakangas wrote:

I pushed the first three patches, with the new test and one of the small
refactoring patches. Thanks for all the comments so far! Here is a new
version of the remaining patches.

Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
seems to have problems with valgrind :-( I reliably get this failure:

How exactly do you run the test with valgrind? What platform?

It works for me, with this:

(cd build && ninja && rm -rf tmp_install && meson test --suite setup &&
valgrind --leak-check=no --gen-suppressions=all
--suppressions=/home/heikki/git-sandbox/postgresql/src/tools/valgrind.supp
--time-stamp=yes
--error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END
--log-file=$HOME/pg-valgrind/%p.log --trace-children=yes meson test
--suite postmaster )

t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 4.
t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
All 4 subtests passed

and tmp_check/log/regress_log_001_connection_limits says:

[23:48:44.444](1.129s) ok 3 - reserved_connections limit
[23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
process ended prematurely at
/home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 154.
# Postmaster PID for node "primary" is 198592

That BackgroundPsql.pm line is this in wait_connect()

$self->{run}->pump()
until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;

By trial and error I found that it fails on this line 70:

push(@sessions, background_psql_as_user('regress_superuser'));

but I have no idea idea why. There are multiple similar calls a couple
lines earlier, and those work fine. And various other TAP tests with
background_sql() work fine too.

So what's so special about this particular line?

Weird. Valgrind makes everything slow; is it a timeout? Any other clues
in the logs?

--
Heikki Linnakangas
Neon (https://neon.tech)

#28

Tomas Vondra

tomas@vondra.me

about 1 year ago

In reply to: Heikki Linnakangas (#27)

1 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

On 12/9/24 13:30, Heikki Linnakangas wrote:

On 09/12/2024 01:12, Tomas Vondra wrote:

On 11/14/24 15:13, Heikki Linnakangas wrote:

On 09/10/2024 23:40, Heikki Linnakangas wrote:

I pushed the first three patches, with the new test and one of the
small
refactoring patches. Thanks for all the comments so far! Here is a new
version of the remaining patches.

Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
seems to have problems with valgrind :-( I reliably get this failure:

How exactly do you run the test with valgrind? What platform?

It failed for me on both amd64 (Fedora 41) and rpi5 32/64-bit (Debian).

It works for me, with this:

(cd build && ninja && rm -rf tmp_install && meson test --suite setup &&
valgrind --leak-check=no --gen-suppressions=all --suppressions=/home/
heikki/git-sandbox/postgresql/src/tools/valgrind.supp --time-stamp=yes
--error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END --log-file=$HOME/
pg-valgrind/%p.log --trace-children=yes meson test --suite postmaster )

I have a patch that tweaks pg_ctl/pg_regress to execute valgrind, so I
just do

./configure --enable-debug --prefix=/home/user/builds/master
--enable-depend --enable-cassert --enable-tap-tests CPPFLAGS="-O0 -ggdb3
-DUSE_VALGRIND"

and then the usual "make check" or whatever.

The patch has a hardcoded path to the .supp file, and places the
valgrind log into /tmp. It has worked for me fine up until that commit,
and it still seems to be working in every other test directory.

t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 4.
t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424,
0x1d00)
All 4 subtests passed

and tmp_check/log/regress_log_001_connection_limits says:

[23:48:44.444](1.129s) ok 3 - reserved_connections limit
[23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
process ended prematurely at
/home/user/work/postgres/src/test/postmaster/../../../src/test/perl/
PostgreSQL/Test/BackgroundPsql.pm
line 154.
# Postmaster PID for node "primary" is 198592

That BackgroundPsql.pm line is this in wait_connect()

   $self->{run}->pump()
     until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;

By trial and error I found that it fails on this line 70:

   push(@sessions, background_psql_as_user('regress_superuser'));

but I have no idea idea why. There are multiple similar calls a couple
lines earlier, and those work fine. And various other TAP tests with
background_sql() work fine too.

So what's so special about this particular line?

Weird. Valgrind makes everything slow; is it a timeout? Any other clues
in the logs?

Yeah, weird.

Timeouts were the first thing I thought about, but it fails even if I
set PGCTLTIMEOUT/PG_TEST_TIMEOUT_DEFAULT to 3600. And it doesn't seem to
be waiting for anything for that long :-(

regards

--
Tomas Vondra

Attachments:

valgrind-master.patchtext/x-patch; charset=UTF-8; name=valgrind-master.patchDownload

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index d6bb2c33119..83544aeb5e2 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -487,12 +487,12 @@ start_postmaster(void)
 	 * has the same PID as the current child process.
 	 */
 	if (log_file != NULL)
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
-					   exec_path, pgdata_opt, post_opts,
+		cmd = psprintf("exec valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
+					   getpid(), exec_path, pgdata_opt, post_opts,
 					   DEVNULL, log_file);
 	else
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1",
-					   exec_path, pgdata_opt, post_opts, DEVNULL);
+		cmd = psprintf("exec valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s\" %s%s < \"%s\" 2>&1",
+					   getpid(), exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	(void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 0e40ed32a21..97d880c3deb 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2478,10 +2478,10 @@ regression_main(int argc, char *argv[],
 		 * Start the temp postmaster
 		 */
 		snprintf(buf, sizeof(buf),
-				 "\"%s%spostgres\" -D \"%s/data\" -F%s "
+				 "valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s%spostgres\" -D \"%s/data\" -F%s "
 				 "-c \"listen_addresses=%s\" -k \"%s\" "
 				 "> \"%s/log/postmaster.log\" 2>&1",
-				 bindir ? bindir : "",
+				 getpid(), bindir ? bindir : "",
 				 bindir ? "/" : "",
 				 temp_instance, debug ? " -d 5" : "",
 				 hostname ? hostname : "", sockdir ? sockdir : "",

#29

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Tomas Vondra (#28)

Re: Refactoring postmaster's code to cleanup after child exit

On 09/12/2024 14:47, Tomas Vondra wrote:

On 12/9/24 13:30, Heikki Linnakangas wrote:

On 09/12/2024 01:12, Tomas Vondra wrote:

On 11/14/24 15:13, Heikki Linnakangas wrote:

On 09/10/2024 23:40, Heikki Linnakangas wrote:

I pushed the first three patches, with the new test and one of the
small
refactoring patches. Thanks for all the comments so far! Here is a new
version of the remaining patches.

Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
seems to have problems with valgrind :-( I reliably get this failure:

How exactly do you run the test with valgrind? What platform?

It failed for me on both amd64 (Fedora 41) and rpi5 32/64-bit (Debian).

It works for me, with this:

(cd build && ninja && rm -rf tmp_install && meson test --suite setup &&
valgrind --leak-check=no --gen-suppressions=all --suppressions=/home/
heikki/git-sandbox/postgresql/src/tools/valgrind.supp --time-stamp=yes
--error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END --log-file=$HOME/
pg-valgrind/%p.log --trace-children=yes meson test --suite postmaster )

I have a patch that tweaks pg_ctl/pg_regress to execute valgrind, so I
just do

./configure --enable-debug --prefix=/home/user/builds/master
--enable-depend --enable-cassert --enable-tap-tests CPPFLAGS="-O0 -ggdb3
-DUSE_VALGRIND"

and then the usual "make check" or whatever.

The patch has a hardcoded path to the .supp file, and places the
valgrind log into /tmp. It has worked for me fine up until that commit,
and it still seems to be working in every other test directory.

Ok, I was able to reproduce this with that setup.

Unsurprisingly, it's a timing issue. It can be reproduced without
valgrind by adding this delay:

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 289059435a9..1eb6bad72ca 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -583,6 +583,7 @@ errfinish(const char *filename, int lineno, const 
char *funcname)
  		 * FATAL termination.  The postmaster may or may not consider this
  		 * worthy of panic, depending on which subprocess returns it.
  		 */
+		sleep(1);
  		proc_exit(1);
  	}

The test opens a connection that is expected to fail with the "remaining
connection slots are reserved for roles with the SUPERUSER attribute"
error. Right after that, it opens a new connection as superuser, and
expects it to succeed. But if the previous backend hasn't exited yet,
the new connection fails with "too many clients already".

Not sure how to fix this. A small sleep in the test would work, but in
principle there's no delay that's guaranteed to be enough. A more robust
solution would be to run a "select count(*) from pg_stat_activity" and
wait until the number of connections are what's expected. I'll try that
and see how complicated that gets..

--
Heikki Linnakangas
Neon (https://neon.tech)

#30

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Heikki Linnakangas (#29)

Re: Refactoring postmaster's code to cleanup after child exit

On 09/12/2024 22:55, Heikki Linnakangas wrote:

Not sure how to fix this. A small sleep in the test would work, but in
principle there's no delay that's guaranteed to be enough. A more robust
solution would be to run a "select count(*) from pg_stat_activity" and
wait until the number of connections are what's expected. I'll try that
and see how complicated that gets..

Checking pg_stat_activity doesn't help, because the backend doesn't
register itself in pg_stat_activity until later. A connection that's
rejected due to connection limits never shows up in pg_stat_activity.

Some options:

0. Do nothing

1. Add a small sleep to the test

2. Move the pgstat_bestart() call earlier in the startup sequence, so
that a backend shows up in pg_stat_activity before it acquires a PGPROC
entry, and stays visible until after it has released its PGPROC entry.
This would give more visibility to backends that are starting up.

3. Rearrange the FATAL error handling so that the process removes itself
from PGPROC before sending the error to the client. That would be kind
of nice anyway. Currently, if sending the rejection error message to the
client blocks, you are holding up a PGPROC slot until the message is
sent. The error message packet is short, so it's highly unlikely to
block, but still.

Option 3 seems kind of nice in principle, but looking at the code, it's
a bit awkward to implement. Easiest way to implement it would be to
modify send_message_to_frontend() to not call pq_flush() on FATAL
errors, and flush the data in socket_close() instead. Not a lot of code,
but it's a pretty ugly special case.

Option 2 seems nice too, but seems like a lot of work.

--
Heikki Linnakangas
Neon (https://neon.tech)

#31

Tomas Vondra

tomas@vondra.me

about 1 year ago

In reply to: Heikki Linnakangas (#30)

Re: Refactoring postmaster's code to cleanup after child exit

On 12/10/24 11:00, Heikki Linnakangas wrote:

On 09/12/2024 22:55, Heikki Linnakangas wrote:

Not sure how to fix this. A small sleep in the test would work, but in
principle there's no delay that's guaranteed to be enough. A more
robust solution would be to run a "select count(*) from
pg_stat_activity" and wait until the number of connections are what's
expected. I'll try that and see how complicated that gets..

Checking pg_stat_activity doesn't help, because the backend doesn't
register itself in pg_stat_activity until later. A connection that's
rejected due to connection limits never shows up in pg_stat_activity.

Some options:

0. Do nothing

1. Add a small sleep to the test

I'd just add a short sleep. Yeah, sleeps are not great, but everything
else seems like a lot of effort just to make this one test pass under
valgrind, and I don't think it's worth it.

Can we make the sleep conditional on valgrind, so that regular builds
are not affected? I guess regular builds could fail too, but I don't
think we've seen such failures until now.

regards

--
Tomas Vondra

#32

Andres Freund

andres@anarazel.de

10 months ago

In reply to: Tomas Vondra (#26)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2024-12-09 00:12:32 +0100, Tomas Vondra wrote:

Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
seems to have problems with valgrind :-( I reliably get this failure:

t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 4.
t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
All 4 subtests passed

and tmp_check/log/regress_log_001_connection_limits says:

[23:48:44.444](1.129s) ok 3 - reserved_connections limit
[23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
process ended prematurely at
/home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 154.
# Postmaster PID for node "primary" is 198592

I just saw this failure on skink in the BF:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2015%3A43%3A23

[17:05:56.438](0.247s) ok 3 - reserved_connections limit
[17:05:56.438](0.000s) ok 4 - reserved_connections limit: matches
process ended prematurely at /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm line 160.

That BackgroundPsql.pm line is this in wait_connect()

$self->{run}->pump()
until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;

A big part of the problem here imo is the exception behaviour that
IPC::Run::pump() has:

If pump() is called after all harnessed activities have completed, a "process
ended prematurely" exception to be thrown. This allows for simple scripting
of external applications without having to add lots of error handling code at
each step of the script:

Which is, uh, not very compatible with how we use IPC::Run (here and
elsewhere). Just ending the test because a connection failed is pretty awful.

This behaviour makes it really hard to debug problems. It'd have been a lot
easier to understand the problem if we'd seen psql's stderr before the test
died.

I guess that mean at the very least we'd need to put an eval {} around the
->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?
Presumably not just in in wait_connect(), but also at least in pump_until()?

Will respond downthread to a potential workaround for the issue.

Greetings,

Andres Freund

#33

Andres Freund

andres@anarazel.de

10 months ago

In reply to: Heikki Linnakangas (#30)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2024-12-10 12:00:12 +0200, Heikki Linnakangas wrote:

On 09/12/2024 22:55, Heikki Linnakangas wrote:

Not sure how to fix this. A small sleep in the test would work, but in
principle there's no delay that's guaranteed to be enough. A more robust
solution would be to run a "select count(*) from pg_stat_activity" and
wait until the number of connections are what's expected. I'll try that
and see how complicated that gets..

Checking pg_stat_activity doesn't help, because the backend doesn't register
itself in pg_stat_activity until later. A connection that's rejected due to
connection limits never shows up in pg_stat_activity.

Some options:

0. Do nothing

1. Add a small sleep to the test

2. Move the pgstat_bestart() call earlier in the startup sequence, so that a
backend shows up in pg_stat_activity before it acquires a PGPROC entry, and
stays visible until after it has released its PGPROC entry. This would give
more visibility to backends that are starting up.

We don't necessarily *have* a PGPROC entry for that backend when we run out of
connections, no?

3. Rearrange the FATAL error handling so that the process removes itself
from PGPROC before sending the error to the client. That would be kind of
nice anyway. Currently, if sending the rejection error message to the client
blocks, you are holding up a PGPROC slot until the message is sent. The
error message packet is short, so it's highly unlikely to block, but still.

This is definitely a problem, there was even a recent thread about it. It can
be triggered even with just an ERROR message though :(

For this test, could we perhaps rely on the log messages postmaster logs when
child processes exit?

2025-03-04 17:56:12.528 EST [3509838][not initialized][:0][[unknown]] LOG: connection received: host=[local]
2025-03-04 17:56:12.528 EST [3509838][client backend][:0][[unknown]] FATAL: sorry, too many clients already
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: releasing pm child slot 2
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: client backend (PID 3509838) exited with exit code 1

I.e. the test could wait for the 'client backend exited' message using
->wait_for_log()?

Greetings,

Andres Freund

#34

Michael Paquier

michael@paquier.xyz

10 months ago

In reply to: Andres Freund (#33)

Re: Refactoring postmaster's code to cleanup after child exit

On Tue, Mar 04, 2025 at 05:58:42PM -0500, Andres Freund wrote:

On 2024-12-10 12:00:12 +0200, Heikki Linnakangas wrote:

2. Move the pgstat_bestart() call earlier in the startup sequence, so that a
backend shows up in pg_stat_activity before it acquires a PGPROC entry, and
stays visible until after it has released its PGPROC entry. This would give
more visibility to backends that are starting up.

We don't necessarily *have* a PGPROC entry for that backend when we run out of
connections, no?

Exactly. If I got this thread's argument right, you cannot have a
PGPROC entry that could be plugged into pg_stat_activity that early
during the startup process when collecting the startup packet.

For this test, could we perhaps rely on the log messages postmaster logs when
child processes exit?

2025-03-04 17:56:12.528 EST [3509838][not initialized][:0][[unknown]] LOG: connection received: host=[local]
2025-03-04 17:56:12.528 EST [3509838][client backend][:0][[unknown]] FATAL: sorry, too many clients already
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: releasing pm child slot 2
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: client backend (PID 3509838) exited with exit code 1

I.e. the test could wait for the 'client backend exited' message using
->wait_for_log()?

Matching expected contents in the server logs is a practice I've found
to be rather reliable, with wait_for_log(). Why not adding an
injection point with a WARNING or a LOG generated, then check the
server logs for the code path taken based on the elog() generated with
the point name?
--
Michael

#35

Noah Misch

noah@leadboat.com

10 months ago

In reply to: Andres Freund (#32)

Re: Refactoring postmaster's code to cleanup after child exit

On Tue, Mar 04, 2025 at 05:50:34PM -0500, Andres Freund wrote:

On 2024-12-09 00:12:32 +0100, Tomas Vondra wrote:

[23:48:44.444](1.129s) ok 3 - reserved_connections limit
[23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
process ended prematurely at
/home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 154.
# Postmaster PID for node "primary" is 198592

I just saw this failure on skink in the BF:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2015%3A43%3A23

[17:05:56.438](0.247s) ok 3 - reserved_connections limit
[17:05:56.438](0.000s) ok 4 - reserved_connections limit: matches
process ended prematurely at /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm line 160.

That BackgroundPsql.pm line is this in wait_connect()

$self->{run}->pump()
until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;

A big part of the problem here imo is the exception behaviour that
IPC::Run::pump() has:

If pump() is called after all harnessed activities have completed, a "process
ended prematurely" exception to be thrown. This allows for simple scripting
of external applications without having to add lots of error handling code at
each step of the script:

Which is, uh, not very compatible with how we use IPC::Run (here and
elsewhere). Just ending the test because a connection failed is pretty awful.

Historically, I think we've avoided this sort of trouble by doing pipe I/O
only on processes where we feel able to predict when the process will exit.
Commit f44b9b6 is one example (simpler case, not involving pump()). It would
be a nice improvement to do better, since there's always some risk of
unexpected exit.

This behaviour makes it really hard to debug problems. It'd have been a lot
easier to understand the problem if we'd seen psql's stderr before the test
died.

I guess that mean at the very least we'd need to put an eval {} around the
->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?

That sounds right.

Officially, you could call ->pumpable() before ->pump(). It's defined as
'Returns TRUE if calling pump() won't throw an immediate "process ended
prematurely" exception.' I lack high confidence that it avoids the exception,
because the pump() still calls pumpable()->reap_nb()->waitpid(WNOHANG) and may
decide "process ended prematurely" based on the new finding. In other words,
I bet there would be a TOCTOU defect in "$h->pump if $h->pumpable".

Presumably not just in in wait_connect(), but also at least in pump_until()?

If the goal is to have it capture maximum data from processes that exit when
we don't expect it (seems good to me), yes.

#36

Andres Freund

andres@anarazel.de

10 months ago

In reply to: Noah Misch (#35)

4 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2025-03-05 20:49:33 -0800, Noah Misch wrote:

This behaviour makes it really hard to debug problems. It'd have been a lot
easier to understand the problem if we'd seen psql's stderr before the test
died.

I guess that mean at the very least we'd need to put an eval {} around the
->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?

That sounds right.

In the attached patch I did that for wait_connect(). I did verify that it
works by implementing the wait_connect() fix before fixing
002_connection_limits.pl, which fails if a sleep(1) is added just before the
proc_exit(1) for FATAL.

I didn't yet tackle pump_until() yet as it

a) uses pumpable() to check if it's safe to pump() and should kinda sometimes
maybe report an error, even though the fact that it doesn't display stderr
(if stout is waited on) makes it harder to debug.

b) Fixing the error report seems like it'd require an interface change to
pump_until().

Officially, you could call ->pumpable() before ->pump(). It's defined as
'Returns TRUE if calling pump() won't throw an immediate "process ended
prematurely" exception.'

It's also documented to be internal only...

I do share your doubts re pumpable():

I lack high confidence that it avoids the exception,
because the pump() still calls pumpable()->reap_nb()->waitpid(WNOHANG) and may
decide "process ended prematurely" based on the new finding. In other words,
I bet there would be a TOCTOU defect in "$h->pump if $h->pumpable".

On 2025-03-05 08:23:32 +0900, Michael Paquier wrote:

For this test, could we perhaps rely on the log messages postmaster logs when
child processes exit?

2025-03-04 17:56:12.528 EST [3509838][not initialized][:0][[unknown]] LOG: connection received: host=[local]
2025-03-04 17:56:12.528 EST [3509838][client backend][:0][[unknown]] FATAL: sorry, too many clients already
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: releasing pm child slot 2
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: client backend (PID 3509838) exited with exit code 1

I.e. the test could wait for the 'client backend exited' message using
->wait_for_log()?

Matching expected contents in the server logs is a practice I've found
to be rather reliable, with wait_for_log().

The attached patch implements that approach. It does fix the problem from what
I can tell. It's not great that it requires log_min_messages = DEBUG2, but
that seems ok for this test.

Why not adding an injection point with a WARNING or a LOG generated, then
check the server logs for the code path taken based on the elog() generated
with the point name?

I think the log_min_messages approach is a lot simpler. If we need something
like this more widely we can reconsider injection points...

I also attached a patch to improve connect_fails()/connect_ok() test names a
bit. They weren't symmetric and I felt they were lacking in detail for the
psql return code check.

Another annoying and also funny problem I saw is this failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-06%2009%3A18%3A21
2025-03-06 10:42:02.552 UTC [372451][postmaster][:0] LOG: 1800 s is outside the valid range for parameter "authentication_timeout" (1 s .. 600 s)

I had to increase PG_TEST_TIMEOUT_DEFAULT due to some other test timing out
when run under valgrind (due to having to insert a lot of rows). But then this
test runs into the above issue.

The easiest way seems to be to just limit PG_TEST_TIMEOUT_DEFAULT in this
test.

Greetings,

Andres Freund

Attachments:

v1-0001-tests-Improve-test-names-in-connect_fails-connect.patchtext/x-diff; charset=us-asciiDownload

From 9655d7a40e0d410b15457d69392de847ddf141ba Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 6 Mar 2025 10:36:24 -0500
Subject: [PATCH v1 1/4] tests: Improve test names in
 connect_fails()/connect_ok()

connect_fails() didn't mention that stderr matched, whereas connect_ok() did.

Neither connect_fails() nor connect_ok() mentioned what they were checking
when checking psql's return status.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b105cba05a6..883532e1cd3 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2554,7 +2554,7 @@ sub connect_ok
 		connstr => "$connstr",
 		on_error_stop => 0);
 
-	is($ret, 0, $test_name);
+	is($ret, 0, "$test_name: connect succeeds, as expected");
 
 	if (defined($params{expected_stdout}))
 	{
@@ -2619,11 +2619,11 @@ sub connect_fails
 		extra_params => ['-w'],
 		connstr => "$connstr");
 
-	isnt($ret, 0, $test_name);
+	isnt($ret, 0, "$test_name: connect fails, as expected");
 
 	if (defined($params{expected_stderr}))
 	{
-		like($stderr, $params{expected_stderr}, "$test_name: matches");
+		like($stderr, $params{expected_stderr}, "$test_name: stderr matches");
 	}
 
 	$self->log_check($test_name, $log_location, %params);
-- 
2.48.1.76.g4e746b1a31.dirty

v1-0002-tests-Add-note-if-BackgroundPsql-wait_connect-fai.patchtext/x-diff; charset=us-asciiDownload

From e080bcccd441c7e3a993ef72b7cd842ae939ecce Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 6 Mar 2025 10:38:44 -0500
Subject: [PATCH v1 2/4] tests: Add note if BackgroundPsql::wait_connect()
 fails

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 .../perl/PostgreSQL/Test/BackgroundPsql.pm    | 26 ++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index c611a61cf4e..1deb410c133 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -154,10 +154,28 @@ sub wait_connect
 	my $banner = "background_psql: ready";
 	my $banner_match = qr/(^|\n)$banner\r?\n/;
 	$self->{stdin} .= "\\echo $banner\n\\warn $banner\n";
-	$self->{run}->pump()
-	  until ($self->{stdout} =~ /$banner_match/
-		  && $self->{stderr} =~ /$banner\r?\n/)
-	  || $self->{timeout}->is_expired;
+
+	# IPC::Run throws in case psql exits while we're pumping. To make it
+	# easier to diagnose that, catch the error, report stdout/stderr at time
+	# of death and reraise.
+	eval {
+		$self->{run}->pump()
+		  until ($self->{stdout} =~ /$banner_match/
+			  && $self->{stderr} =~ /$banner\r?\n/)
+		  || $self->{timeout}->is_expired;
+	};
+	if ($@)
+	{
+		chomp(my $stdout = $self->{stdout});
+		chomp(my $stderr = $self->{stderr});
+		chomp(my $err = $@);
+		diag qq(psql died while connecting:
+  stdout: $stdout
+  stderr: $stderr
+  perl error: $err
+);
+		die "psql died while connecting";
+	}
 
 	note "connect output:\n",
 	  explain {
-- 
2.48.1.76.g4e746b1a31.dirty

v1-0003-tests-Try-to-fix-race-condition-in-postmaster-002.patchtext/x-diff; charset=us-asciiDownload

From 6ba326b1bb39a8d181137482df425a109539aa15 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 6 Mar 2025 10:30:21 -0500
Subject: [PATCH v1 3/4] tests: Try to fix race condition in
 postmaster/002_connection_limits

We need to wait for process exit.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 .../postmaster/t/002_connection_limits.pl     | 35 +++++++++++++++++--
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/src/test/postmaster/t/002_connection_limits.pl b/src/test/postmaster/t/002_connection_limits.pl
index 8cfa6e0ced5..2c185eef6eb 100644
--- a/src/test/postmaster/t/002_connection_limits.pl
+++ b/src/test/postmaster/t/002_connection_limits.pl
@@ -20,6 +20,7 @@ $node->append_conf('postgresql.conf', "max_connections = 6");
 $node->append_conf('postgresql.conf', "reserved_connections = 2");
 $node->append_conf('postgresql.conf', "superuser_reserved_connections = 1");
 $node->append_conf('postgresql.conf', "log_connections = on");
+$node->append_conf('postgresql.conf', "log_min_messages=debug2");
 $node->start;
 
 $node->safe_psql(
@@ -45,13 +46,39 @@ sub background_psql_as_user
 		extra_params => [ '-U', $user ]);
 }
 
+# Like connect_fails(), except that we also wait for the failed backend to
+# have exited.
+#
+# This tests needs to wait for client processes to exit because the error
+# message for a failed connection is reported before the backend has detached
+# from shared memory. If we didn't wait, subsequent tests might hit connection
+# limits spuriously.
+#
+# This can't easily be generalized, as detecting process exit requires
+# log_min_messages to be at least DEBUG2 and is not concurrency safe, as we
+# can't easily be sure the right process exited. In this test that's not a
+# problem though, we only have one new connection at a time.
+sub connect_fails_wait
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+	my ($node, $connstr, $test_name, %params) = @_;
+
+	my $log_location = -s $node->logfile;
+
+	$node->connect_fails($connstr, $test_name, %params);
+	$node->wait_for_log(qr/DEBUG:  client backend.*exited with exit code 1/,
+		$log_location);
+	ok(1, "$test_name: client backend process exited");
+}
+
 my @sessions = ();
 my @raw_connections = ();
 
 push(@sessions, background_psql_as_user('regress_regular'));
 push(@sessions, background_psql_as_user('regress_regular'));
 push(@sessions, background_psql_as_user('regress_regular'));
-$node->connect_fails(
+connect_fails_wait(
+	$node,
 	"dbname=postgres user=regress_regular",
 	"reserved_connections limit",
 	expected_stderr =>
@@ -60,7 +87,8 @@ $node->connect_fails(
 
 push(@sessions, background_psql_as_user('regress_reserved'));
 push(@sessions, background_psql_as_user('regress_reserved'));
-$node->connect_fails(
+connect_fails_wait(
+	$node,
 	"dbname=postgres user=regress_regular",
 	"reserved_connections limit",
 	expected_stderr =>
@@ -68,7 +96,8 @@ $node->connect_fails(
 );
 
 push(@sessions, background_psql_as_user('regress_superuser'));
-$node->connect_fails(
+connect_fails_wait(
+	$node,
 	"dbname=postgres user=regress_superuser",
 	"superuser_reserved_connections limit",
 	expected_stderr => qr/FATAL:  sorry, too many clients already/);
-- 
2.48.1.76.g4e746b1a31.dirty

v1-0004-tests-Don-t-fail-due-to-high-default-timeout-in-p.patchtext/x-diff; charset=us-asciiDownload

From b05b8d2031e430f425cb8c23b829002d7277c520 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 6 Mar 2025 15:13:10 -0500
Subject: [PATCH v1 4/4] tests: Don't fail due to high default timeout in
 postmaster/003_start_stop

Per buildfarm animal skink.

Discussion: https://postgr.es/m/20250306044933.7a.nmisch@google.com
---
 src/test/postmaster/t/003_start_stop.pl | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/test/postmaster/t/003_start_stop.pl b/src/test/postmaster/t/003_start_stop.pl
index 036b296f72b..4dc394139d9 100644
--- a/src/test/postmaster/t/003_start_stop.pl
+++ b/src/test/postmaster/t/003_start_stop.pl
@@ -20,6 +20,10 @@ use Test::More;
 # "pg_ctl stop" will error out before the authentication timeout kicks
 # in and cleans up the dead-end backends.
 my $authentication_timeout = $PostgreSQL::Test::Utils::timeout_default;
+
+# Don't fail due to hitting the max value allowed for authentication_timeout.
+$authentication_timeout = 600 unless $authentication_timeout < 600;
+
 my $stop_timeout = $authentication_timeout / 2;
 
 # Initialize the server with low connection limits, to test dead-end backends
-- 
2.48.1.76.g4e746b1a31.dirty

#37

Heikki Linnakangas

hlinnaka@iki.fi

10 months ago

In reply to: Michael Paquier (#34)

Re: Refactoring postmaster's code to cleanup after child exit

On 05/03/2025 01:23, Michael Paquier wrote:

On Tue, Mar 04, 2025 at 05:58:42PM -0500, Andres Freund wrote:

On 2024-12-10 12:00:12 +0200, Heikki Linnakangas wrote:

2. Move the pgstat_bestart() call earlier in the startup sequence, so that a
backend shows up in pg_stat_activity before it acquires a PGPROC entry, and
stays visible until after it has released its PGPROC entry. This would give
more visibility to backends that are starting up.

We don't necessarily *have* a PGPROC entry for that backend when we run out of
connections, no?

Exactly. If I got this thread's argument right, you cannot have a
PGPROC entry that could be plugged into pg_stat_activity that early
during the startup process when collecting the startup packet.

That's true in general; once you start running out of connections, you
can indeed run out PGPROC slots too. In this particular case, though,
there were still PGPROC slots available, reserved for superuser
connections, so it would've helped.

We could also have more pg_stat_activity slots than PGPROC slots, or
just have a few more PGPROC slots than what is required by MaxBackends.

For this test, could we perhaps rely on the log messages postmaster logs when
child processes exit?

2025-03-04 17:56:12.528 EST [3509838][not initialized][:0][[unknown]] LOG: connection received: host=[local]
2025-03-04 17:56:12.528 EST [3509838][client backend][:0][[unknown]] FATAL: sorry, too many clients already
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: releasing pm child slot 2
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: client backend (PID 3509838) exited with exit code 1

I.e. the test could wait for the 'client backend exited' message using
->wait_for_log()?

Matching expected contents in the server logs is a practice I've found
to be rather reliable, with wait_for_log(). Why not adding an
injection point with a WARNING or a LOG generated, then check the
server logs for the code path taken based on the elog() generated with
the point name?

Hmm, yeah, watching for "releasing pm child slot" or an explicit
injection point would work.

--
Heikki Linnakangas
Neon (https://neon.tech)

#38

Heikki Linnakangas

hlinnaka@iki.fi

10 months ago

In reply to: Andres Freund (#36)

1 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

In short, all the 4 patches look good to me. Thanks for picking this up!

On 06/03/2025 22:16, Andres Freund wrote:

On 2025-03-05 20:49:33 -0800, Noah Misch wrote:

This behaviour makes it really hard to debug problems. It'd have been a lot
easier to understand the problem if we'd seen psql's stderr before the test
died.

I guess that mean at the very least we'd need to put an eval {} around the
->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?

That sounds right.

In the attached patch I did that for wait_connect(). I did verify that it
works by implementing the wait_connect() fix before fixing
002_connection_limits.pl, which fails if a sleep(1) is added just before the
proc_exit(1) for FATAL.

+1. For the archives sake, I just want to clarify that this pump stuff
is all about getting better error messages on a test failure. It doesn't
help with the original issue.

This is all annoyingly complicated, but getting good error messages is
worth it.

On 2025-03-05 08:23:32 +0900, Michael Paquier wrote:>> Why not adding an injection point with a WARNING or a LOG generated,

then

check the server logs for the code path taken based on the elog() generated
with the point name?

I think the log_min_messages approach is a lot simpler. If we need something
like this more widely we can reconsider injection points...

+1. It's a little annoying to depend on a detail like the "client
backend process exited" debug message, but seems like the best fix for now.

I also attached a patch to improve connect_fails()/connect_ok() test names a
bit. They weren't symmetric and I felt they were lacking in detail for the
psql return code check.

+1.

While we're at it, attached are a few more cleanups I noticed.

Another annoying and also funny problem I saw is this failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-06%2009%3A18%3A21
2025-03-06 10:42:02.552 UTC [372451][postmaster][:0] LOG: 1800 s is outside the valid range for parameter "authentication_timeout" (1 s .. 600 s)

I had to increase PG_TEST_TIMEOUT_DEFAULT due to some other test timing out
when run under valgrind (due to having to insert a lot of rows). But then this
test runs into the above issue.

The easiest way seems to be to just limit PG_TEST_TIMEOUT_DEFAULT in this
test.

LGTM

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Fix-test-name-and-username-used-in-failed-connection.patchtext/x-patch; charset=UTF-8; name=0001-Fix-test-name-and-username-used-in-failed-connection.patchDownload

From a4871adb5de6f363b96e9a2d5723c32330ad1e6e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 6 Mar 2025 22:45:55 +0200
Subject: [PATCH 1/1] Fix test name and username used in failed connection
 attempt

The first failed connection tests the "regular" connections limit, not
the reserved limit.

The username doesn't really matter, but since the previous successful
connections used "regress_reserved", it seems weird to switch back to
"regress_regular" for the expected-to-fail attempt.
---
 src/test/postmaster/t/002_connection_limits.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/test/postmaster/t/002_connection_limits.pl b/src/test/postmaster/t/002_connection_limits.pl
index 8cfa6e0ced5..94c087a2751 100644
--- a/src/test/postmaster/t/002_connection_limits.pl
+++ b/src/test/postmaster/t/002_connection_limits.pl
@@ -53,7 +53,7 @@ push(@sessions, background_psql_as_user('regress_regular'));
 push(@sessions, background_psql_as_user('regress_regular'));
 $node->connect_fails(
 	"dbname=postgres user=regress_regular",
-	"reserved_connections limit",
+	"regular connections limit",
 	expected_stderr =>
 	  qr/FATAL:  remaining connection slots are reserved for roles with privileges of the "pg_use_reserved_connections" role/
 );
@@ -61,7 +61,7 @@ $node->connect_fails(
 push(@sessions, background_psql_as_user('regress_reserved'));
 push(@sessions, background_psql_as_user('regress_reserved'));
 $node->connect_fails(
-	"dbname=postgres user=regress_regular",
+	"dbname=postgres user=regress_reserved",
 	"reserved_connections limit",
 	expected_stderr =>
 	  qr/FATAL:  remaining connection slots are reserved for roles with the SUPERUSER attribute/
-- 
2.39.5

#39

Andres Freund

andres@anarazel.de

10 months ago

In reply to: Heikki Linnakangas (#38)

4 attachment(s)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2025-03-06 22:49:20 +0200, Heikki Linnakangas wrote:

In short, all the 4 patches look good to me. Thanks for picking this up!

On 06/03/2025 22:16, Andres Freund wrote:

On 2025-03-05 20:49:33 -0800, Noah Misch wrote:

This behaviour makes it really hard to debug problems. It'd have been a lot
easier to understand the problem if we'd seen psql's stderr before the test
died.

I guess that mean at the very least we'd need to put an eval {} around the
->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?

That sounds right.

In the attached patch I did that for wait_connect(). I did verify that it
works by implementing the wait_connect() fix before fixing
002_connection_limits.pl, which fails if a sleep(1) is added just before the
proc_exit(1) for FATAL.

+1. For the archives sake, I just want to clarify that this pump stuff is
all about getting better error messages on a test failure. It doesn't help
with the original issue.

Agreed.

This is all annoyingly complicated, but getting good error messages is worth
it.

Yea. I really look forward to having a way to write stuff like this that
doesn't involve hackily driving psql from 100m away using rubber bands.

On 2025-03-05 08:23:32 +0900, Michael Paquier wrote:>> Why not adding an
injection point with a WARNING or a LOG generated,

then

check the server logs for the code path taken based on the elog() generated
with the point name?

I think the log_min_messages approach is a lot simpler. If we need something
like this more widely we can reconsider injection points...

+1. It's a little annoying to depend on a detail like the "client backend
process exited" debug message, but seems like the best fix for now.

We use the same message for LOG messages too, for other types of backends, so
I think it's not that likely to change. But stilll not great.

While we're at it, attached are a few more cleanups I noticed.

I assume you'll apply that yourself?

Commits with updated commit messages attached.

I wonder if we should apply the polishing of connect_ok()/connect_fails() and
the wait_connect() debuggability improvements to the backbranches? Keeping TAP
infrastructure as similar as possible between branches has proven worthwhile
IME.

Greetings,

Andres Freund

Attachments:

v2-0001-tests-Improve-test-names-in-connect_fails-connect.patchtext/x-diff; charset=us-asciiDownload

From d6dbf4c4a1e723a27df8a08b7e75352b8fb29d05 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 7 Mar 2025 09:44:00 -0500
Subject: [PATCH v2 1/4] tests: Improve test names in
 connect_fails()/connect_ok()

connect_fails() didn't mention that stderr matched, whereas connect_ok() did.

Neither connect_fails() nor connect_ok() mentioned what they were checking
when checking psql's return status.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw
---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b105cba05a6..883532e1cd3 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2554,7 +2554,7 @@ sub connect_ok
 		connstr => "$connstr",
 		on_error_stop => 0);
 
-	is($ret, 0, $test_name);
+	is($ret, 0, "$test_name: connect succeeds, as expected");
 
 	if (defined($params{expected_stdout}))
 	{
@@ -2619,11 +2619,11 @@ sub connect_fails
 		extra_params => ['-w'],
 		connstr => "$connstr");
 
-	isnt($ret, 0, $test_name);
+	isnt($ret, 0, "$test_name: connect fails, as expected");
 
 	if (defined($params{expected_stderr}))
 	{
-		like($stderr, $params{expected_stderr}, "$test_name: matches");
+		like($stderr, $params{expected_stderr}, "$test_name: stderr matches");
 	}
 
 	$self->log_check($test_name, $log_location, %params);
-- 
2.48.1.76.g4e746b1a31.dirty

v2-0002-tests-Add-note-if-BackgroundPsql-wait_connect-fai.patchtext/x-diff; charset=us-asciiDownload

From e514fe32c2566c524f1f18410266a1e2efdc7644 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 7 Mar 2025 09:44:00 -0500
Subject: [PATCH v2 2/4] tests: Add note if BackgroundPsql::wait_connect()
 fails

If wait_connect() failed due to psql exiting, all that we'd see is a "process
ended prematurely" error thrown by IPC::Run, without ever seeing psql's error
message.

Address that by wrapping the pump() call in eval and taking note of stdout &
stderr in case of failure.

We might want to do that in pump_until() as well, but that seems to require
API changes, so let's do the easily achievable bit first.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw
---
 .../perl/PostgreSQL/Test/BackgroundPsql.pm    | 26 ++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index c611a61cf4e..1deb410c133 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -154,10 +154,28 @@ sub wait_connect
 	my $banner = "background_psql: ready";
 	my $banner_match = qr/(^|\n)$banner\r?\n/;
 	$self->{stdin} .= "\\echo $banner\n\\warn $banner\n";
-	$self->{run}->pump()
-	  until ($self->{stdout} =~ /$banner_match/
-		  && $self->{stderr} =~ /$banner\r?\n/)
-	  || $self->{timeout}->is_expired;
+
+	# IPC::Run throws in case psql exits while we're pumping. To make it
+	# easier to diagnose that, catch the error, report stdout/stderr at time
+	# of death and reraise.
+	eval {
+		$self->{run}->pump()
+		  until ($self->{stdout} =~ /$banner_match/
+			  && $self->{stderr} =~ /$banner\r?\n/)
+		  || $self->{timeout}->is_expired;
+	};
+	if ($@)
+	{
+		chomp(my $stdout = $self->{stdout});
+		chomp(my $stderr = $self->{stderr});
+		chomp(my $err = $@);
+		diag qq(psql died while connecting:
+  stdout: $stdout
+  stderr: $stderr
+  perl error: $err
+);
+		die "psql died while connecting";
+	}
 
 	note "connect output:\n",
 	  explain {
-- 
2.48.1.76.g4e746b1a31.dirty

v2-0003-tests-Fix-race-condition-in-postmaster-002_connec.patchtext/x-diff; charset=us-asciiDownload

From 6bd317053557030d7d5b1818b1254aebf2230f08 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 7 Mar 2025 09:44:00 -0500
Subject: [PATCH v2 3/4] tests: Fix race condition in
 postmaster/002_connection_limits

The test occasionally failed due to unexpected connection limit errors being
encountered after having waited for FATAL errors on another connection. These
spurious failures were caused by the the backend reporting FATAL errors to the
client before detaching from the PGPROC entry. Adding a sleep(1) before
proc_exit() makes it easy to reproduce that problem.

To fix the issue, add a helper function that waits for postmaster to notice
the process having exited. For now this is implemented by waiting for the
DEBUG2 message that postmaster logs in that case. That's not the prettiest
fix, but simple. If we notice this problem elsewhere, it might be worthwhile
to make this more general, e.g. by adding an injection point.

Reported-by: Tomas Vondra <tomas@vondra.me>
Diagnosed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw
---
 .../postmaster/t/002_connection_limits.pl     | 35 +++++++++++++++++--
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/src/test/postmaster/t/002_connection_limits.pl b/src/test/postmaster/t/002_connection_limits.pl
index 8cfa6e0ced5..2c185eef6eb 100644
--- a/src/test/postmaster/t/002_connection_limits.pl
+++ b/src/test/postmaster/t/002_connection_limits.pl
@@ -20,6 +20,7 @@ $node->append_conf('postgresql.conf', "max_connections = 6");
 $node->append_conf('postgresql.conf', "reserved_connections = 2");
 $node->append_conf('postgresql.conf', "superuser_reserved_connections = 1");
 $node->append_conf('postgresql.conf', "log_connections = on");
+$node->append_conf('postgresql.conf', "log_min_messages=debug2");
 $node->start;
 
 $node->safe_psql(
@@ -45,13 +46,39 @@ sub background_psql_as_user
 		extra_params => [ '-U', $user ]);
 }
 
+# Like connect_fails(), except that we also wait for the failed backend to
+# have exited.
+#
+# This tests needs to wait for client processes to exit because the error
+# message for a failed connection is reported before the backend has detached
+# from shared memory. If we didn't wait, subsequent tests might hit connection
+# limits spuriously.
+#
+# This can't easily be generalized, as detecting process exit requires
+# log_min_messages to be at least DEBUG2 and is not concurrency safe, as we
+# can't easily be sure the right process exited. In this test that's not a
+# problem though, we only have one new connection at a time.
+sub connect_fails_wait
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+	my ($node, $connstr, $test_name, %params) = @_;
+
+	my $log_location = -s $node->logfile;
+
+	$node->connect_fails($connstr, $test_name, %params);
+	$node->wait_for_log(qr/DEBUG:  client backend.*exited with exit code 1/,
+		$log_location);
+	ok(1, "$test_name: client backend process exited");
+}
+
 my @sessions = ();
 my @raw_connections = ();
 
 push(@sessions, background_psql_as_user('regress_regular'));
 push(@sessions, background_psql_as_user('regress_regular'));
 push(@sessions, background_psql_as_user('regress_regular'));
-$node->connect_fails(
+connect_fails_wait(
+	$node,
 	"dbname=postgres user=regress_regular",
 	"reserved_connections limit",
 	expected_stderr =>
@@ -60,7 +87,8 @@ $node->connect_fails(
 
 push(@sessions, background_psql_as_user('regress_reserved'));
 push(@sessions, background_psql_as_user('regress_reserved'));
-$node->connect_fails(
+connect_fails_wait(
+	$node,
 	"dbname=postgres user=regress_regular",
 	"reserved_connections limit",
 	expected_stderr =>
@@ -68,7 +96,8 @@ $node->connect_fails(
 );
 
 push(@sessions, background_psql_as_user('regress_superuser'));
-$node->connect_fails(
+connect_fails_wait(
+	$node,
 	"dbname=postgres user=regress_superuser",
 	"superuser_reserved_connections limit",
 	expected_stderr => qr/FATAL:  sorry, too many clients already/);
-- 
2.48.1.76.g4e746b1a31.dirty

v2-0004-tests-Don-t-fail-due-to-high-default-timeout-in-p.patchtext/x-diff; charset=us-asciiDownload

From f8c5ee4adae736b275185c837a0971a5b2dbbc40 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 7 Mar 2025 09:44:00 -0500
Subject: [PATCH v2 4/4] tests: Don't fail due to high default timeout in
 postmaster/003_start_stop

Some BF animals use very high timeouts due to their slowness. Unfortunately
postmaster/003_start_stop fails if a high timeout is configured, due to
authentication_timeout having a fairly low max.

As this test is reasonably fast, the easiest fix seems to be to cap the
timeout to 600.

Per buildfarm animal skink.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw
---
 src/test/postmaster/t/003_start_stop.pl | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/test/postmaster/t/003_start_stop.pl b/src/test/postmaster/t/003_start_stop.pl
index 036b296f72b..4dc394139d9 100644
--- a/src/test/postmaster/t/003_start_stop.pl
+++ b/src/test/postmaster/t/003_start_stop.pl
@@ -20,6 +20,10 @@ use Test::More;
 # "pg_ctl stop" will error out before the authentication timeout kicks
 # in and cleans up the dead-end backends.
 my $authentication_timeout = $PostgreSQL::Test::Utils::timeout_default;
+
+# Don't fail due to hitting the max value allowed for authentication_timeout.
+$authentication_timeout = 600 unless $authentication_timeout < 600;
+
 my $stop_timeout = $authentication_timeout / 2;
 
 # Initialize the server with low connection limits, to test dead-end backends
-- 
2.48.1.76.g4e746b1a31.dirty

#40

Tomas Vondra

tomas@vondra.me

10 months ago

In reply to: Andres Freund (#39)

Re: Refactoring postmaster's code to cleanup after child exit

On 3/7/25 15:53, Andres Freund wrote:

Hi,

On 2025-03-06 22:49:20 +0200, Heikki Linnakangas wrote:

In short, all the 4 patches look good to me. Thanks for picking this up!

On 06/03/2025 22:16, Andres Freund wrote:

On 2025-03-05 20:49:33 -0800, Noah Misch wrote:

This behaviour makes it really hard to debug problems. It'd have been a lot
easier to understand the problem if we'd seen psql's stderr before the test
died.

I guess that mean at the very least we'd need to put an eval {} around the
->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?

That sounds right.

In the attached patch I did that for wait_connect(). I did verify that it
works by implementing the wait_connect() fix before fixing
002_connection_limits.pl, which fails if a sleep(1) is added just before the
proc_exit(1) for FATAL.

+1. For the archives sake, I just want to clarify that this pump stuff is
all about getting better error messages on a test failure. It doesn't help
with the original issue.

Agreed.

FWIW I keep running into this (and skink seems unhappy too). I ended up
just adding a sleep(1), right before

push(@sessions, background_psql_as_user('regress_superuser'));

and that makes it work on all my machines (including rpi5).

regards

--
Tomas Vondra

#41

Andres Freund

andres@anarazel.de

10 months ago

In reply to: Tomas Vondra (#40)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2025-03-07 16:25:09 +0100, Tomas Vondra wrote:

FWIW I keep running into this (and skink seems unhappy too). I ended up
just adding a sleep(1), right before

push(@sessions, background_psql_as_user('regress_superuser'));

and that makes it work on all my machines (including rpi5).

Can you confirm that the fix attached to my prior email suffices to address
the issue on your machine too? I'm planning to push the fixes soon.

Greetings,

Andres Freund

#42

Tomas Vondra

tomas@vondra.me

10 months ago

In reply to: Andres Freund (#41)

Re: Refactoring postmaster's code to cleanup after child exit

On 3/7/25 16:49, Andres Freund wrote:

Hi,

On 2025-03-07 16:25:09 +0100, Tomas Vondra wrote:

FWIW I keep running into this (and skink seems unhappy too). I ended up
just adding a sleep(1), right before

push(@sessions, background_psql_as_user('regress_superuser'));

and that makes it work on all my machines (including rpi5).

Can you confirm that the fix attached to my prior email suffices to address
the issue on your machine too? I'm planning to push the fixes soon.

Yes, the v2 fixes that too. I got confused by the message suggesting

... this pump stuff is all about getting better error messages
on a test failure. It doesn't help with the original issue.

which made me believe the tests will still fail, so I haven't tried the
patches before. But that doesn't seem to be the case.

regards

--
Tomas Vondra

#43

Andres Freund

andres@anarazel.de

10 months ago

In reply to: Tomas Vondra (#42)

Re: Refactoring postmaster's code to cleanup after child exit

Hi,

On 2025-03-07 18:03:04 +0100, Tomas Vondra wrote:

On 3/7/25 16:49, Andres Freund wrote:

Hi,

On 2025-03-07 16:25:09 +0100, Tomas Vondra wrote:

FWIW I keep running into this (and skink seems unhappy too). I ended up
just adding a sleep(1), right before

push(@sessions, background_psql_as_user('regress_superuser'));

and that makes it work on all my machines (including rpi5).

Can you confirm that the fix attached to my prior email suffices to address
the issue on your machine too? I'm planning to push the fixes soon.

Yes, the v2 fixes that too.

Cool, thanks for testing.

I got confused by the message suggesting

... this pump stuff is all about getting better error messages
on a test failure. It doesn't help with the original issue.

which made me believe the tests will still fail, so I haven't tried the
patches before.

That was just about 0002 (and 0001) neither fixing the race themselves, nor
being required to fix the race. 0002 does make it easier to understand what
went wrong, that's all...

Greetings,

Andres Freund