Using WaitEventSet in the postmaster

Started by Thomas Munroabout 3 years ago28 messages
#1Thomas Munro
thomas.munro@gmail.com
1 attachment(s)

Hi,

Here's a work-in-progress patch that uses WaitEventSet for the main
event loop in the postmaster, with a latch as the wakeup mechanism for
"PM signals" (requests from backends to do things like start a
background worker, etc). There are still raw signals that are part of
the external interface (SIGHUP etc), but those handlers just set a
flag and set the latch, instead of doing the state machine work. Some
advantages I can think of:

1. Inherits various slightly more efficient modern kernel APIs for
multiplexing.
2. Will automatically benefit from later improvements to WaitEventSet.
3. Looks much more like the rest of our code.
4. Requires no obscure signal programming knowledge to understand.
5. Removes the strange call stacks we have, where most of postgres is
forked from inside a signal handler.
6. Might help with weirdness and bugs in some signal implementations
(Cygwin, NetBSD?).
7. Removes the need to stat() PROMOTE_SIGNAL_FILE and
LOGROTATE_SIGNAL_FILE whenever PM signals are sent, now that SIGUSR1
is less overloaded.
8. It's a small step towards removing the need to emulate signals on Windows.

In order to avoid adding a new dependency on the contents of shared
memory, I introduced SetLatchRobustly() that will always use the slow
path kernel wakeup primitive, even in cases where SetLatch() would
not. The idea here is that if one backend trashes shared memory,
others backends can still wake the postmaster even though it may
appear that the postmaster isn't waiting or the latch is already set.
It would be possible to go further and have a robust wait mode that
doesn't read is_set too. It was indecision here that stopped me
proposing this sooner...

One thing that might need more work is cleanup of the PM's WES in
child processes. Also I noticed in passing that the Windows kernel
event handles for latches are probably leaked on crash-reinit, but
that was already true, this just adds one more. Also the way I re-add
the latch every time through the event loop in case there was a
crash-reinit is stupid, I'll tidy that up.

This is something I extracted and rejuvenated from a larger set of
patches I was hacking on a year or two ago to try to get rid of lots
of users of raw signals. The recent thread about mamba's struggles
and the possibility that latches might help reminded me to dust this
part off, and potentially avoid some duplicated effort.

I'm not saying this is free of bugs, but it's passing on CI and seems
like enough to share and see what people think.

(Some other ideas I thought about back then: we could invent
WL_SIGNAL, and not need all those global flag variables or the
handlers that set the latch. For eg kqueue it's trivial, and for
ancient Unixes you could do a sort of home-made signalfd with a single
generic signal handler that just does write(self_pipe, &siginfo,
sizeof(siginfo)). But that starts to seems like refactoring for
refactoring's sake; that's probably how I'd write a native kqueue
program, but it's not even that obvious to me that we really should be
pretending that Windows has signals, which put me off that idea.
Perhaps in a post-fake-signals world we just have the postmaster's
event loop directly consume commands from pg_ctl from a control pipe?
Too many decisions at once, I gave that line of thinking up for now.)

Attachments:

0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchtext/x-patch; charset=US-ASCII; name=0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From 978aa358885312372f842cd47549bb04a78477ab Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
its main work entirely inside signal handlers, which were only unblocked
while waiting in select().

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Work in progress!
---
 src/backend/libpq/pqsignal.c        |  40 ----
 src/backend/postmaster/postmaster.c | 336 ++++++++++++++--------------
 src/backend/storage/ipc/latch.c     |  54 ++++-
 src/backend/storage/ipc/pmsignal.c  |  26 ++-
 src/backend/utils/init/miscinit.c   |  13 +-
 src/include/libpq/pqsignal.h        |   3 -
 src/include/miscadmin.h             |   1 +
 src/include/storage/latch.h         |   9 +-
 src/include/storage/pmsignal.h      |   3 +
 9 files changed, 259 insertions(+), 226 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 1ab34c5214..718043a39d 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a8a246921f..528ad494c3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -362,6 +362,13 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_reload_request;
+static volatile sig_atomic_t pending_shutdown_request;
+static volatile sig_atomic_t pending_child_exit;
+static volatile sig_atomic_t pending_logrotate_check_request;
+static volatile sig_atomic_t pending_promote_check_request;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +387,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_file_check_request_signal(SIGNAL_ARGS);
+static void handle_reload_request_signal(SIGNAL_ARGS);
+static void process_reload_request(void);
+static void handle_shutdown_request_signal(SIGNAL_ARGS);
+static void process_shutdown_request(void);
+static void handle_child_exit_signal(SIGNAL_ARGS);
+static void process_child_exit(void);
+static void process_backend_request(void);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -401,7 +412,6 @@ static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -575,6 +585,7 @@ PostmasterMain(int argc, char *argv[])
 
 	IsPostmasterEnvironment = true;
 
+
 	/*
 	 * Start our win32 signal implementation
 	 */
@@ -609,26 +620,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +629,21 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
+	pqsignal(SIGHUP, handle_reload_request_signal);
+	pqsignal(SIGINT, handle_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN);	/* ignored */
+	pqsignal(SIGPIPE, SIG_IGN);	/* ignored */
+	pqsignal(SIGUSR1, handle_file_check_request_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_child_exit_signal);
 
-#ifdef SIGURG
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitLocalLatch();
 
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	PG_SETMASK(&UnBlockSig);
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,15 +653,15 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN);	/* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN);	/* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN);	/* ignored */
 #endif
 
 	/*
@@ -1706,97 +1692,90 @@ DetermineSleepTime(struct timeval *timeout)
 static int
 ServerLoop(void)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEventSet *wes;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
-	last_lockfile_recheck_time = last_touch_time = time(NULL);
+	/* Set up a WaitEventSet for our latch and listening sockets. */
+	MyLatch = GetPostmasterLatch();
+	OwnLatch(MyLatch);
+	wes = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+	for (int i = 0; i < MAXLISTEN; i++)
+	{
+		int			fd = ListenSocket[i];
 
-	nSockets = initMasks(&readmask);
+		if (fd == PGINVALID_SOCKET)
+			break;
+		AddWaitEventToSet(wes, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+	}
+
+	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
 
 		/*
 		 * Wait for a connection request to arrive.
 		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
 		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
+		 * any new connections, so we just sleep.
 		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
 		if (pmState == PM_WAIT_DEAD_END)
 		{
-			PG_SETMASK(&UnBlockSig);
-
 			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
-
-			PG_SETMASK(&BlockSig);
+			nevents = 0;
 		}
 		else
 		{
-			/* must set timeout each time; some OSes change it! */
 			struct timeval timeout;
 
-			/* Needs to run with blocked signals! */
 			DetermineSleepTime(&timeout);
 
-			PG_SETMASK(&UnBlockSig);
-
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-
-			PG_SETMASK(&BlockSig);
+			ModifyWaitEvent(wes, 0, WL_LATCH_SET, MyLatch);		/* XXX because address changes on crash! */
+			nevents = WaitEventSetWait(wes,
+									   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+									   events,
+									   lengthof(events),
+									   0 /* postmaster posts no wait_events */);
 		}
 
-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
-		}
+		if (pending_reload_request)
+			process_reload_request();
+		if (pending_shutdown_request)
+			process_shutdown_request();
+		if (pending_child_exit)
+			process_child_exit();
 
 		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set, or new connection pending on any of our sockets? If so,
+		 * fork a child process to deal with it.
 		 */
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
 		{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+				process_backend_request();
+			}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
+
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure
+					 * in this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
@@ -1939,34 +1918,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2707,14 +2658,41 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * pg_ctl uses SIGUSR1 to ask postmaster to check for logrotate and promote
+ * files.
+ */
+static void
+handle_file_check_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	/* Schedule a check for both signal files. */
+	pending_logrotate_check_request = true;
+	pending_promote_check_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+static void
+handle_reload_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	pending_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * Re-read config files, and tell children to do same.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+process_reload_request(void)
 {
-	int			save_errno = errno;
+	pending_reload_request = false;
 
 	if (Shutdown <= SmartShutdown)
 	{
@@ -2771,27 +2749,46 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
-
-	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * Handler for the three shutdown signals.
  */
 static void
-pmdie(SIGNAL_ARGS)
+handle_shutdown_request_signal(SIGNAL_ARGS)
 {
-	int			save_errno = errno;
-
-	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+	int save_errno = errno;
 
 	switch (postgres_signal_arg)
 	{
 		case SIGTERM:
+			pending_shutdown_request = SmartShutdown;
+			break;
+		case SIGINT:
+			pending_shutdown_request = FastShutdown;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = ImmediateShutdown;
+			break;
+	}
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Process shutdown request.
+ */
+static void
+process_shutdown_request(void)
+{
+	int		mode = pending_shutdown_request;
 
+	pending_shutdown_request = NoShutdown;
+
+	switch (mode)
+	{
+		case SmartShutdown:
 			/*
 			 * Smart Shutdown:
 			 *
@@ -2830,8 +2827,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGINT:
-
+		case FastShutdown:
 			/*
 			 * Fast Shutdown:
 			 *
@@ -2871,8 +2867,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGQUIT:
-
+		case ImmediateShutdown:
 			/*
 			 * Immediate Shutdown:
 			 *
@@ -2908,20 +2903,30 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 	}
+}
+
+static void
+handle_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_child_exit(void)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3213,8 +3218,6 @@ reaper(SIGNAL_ARGS)
 	 * or actions to make.
 	 */
 	PostmasterStateMachine();
-
-	errno = save_errno;
 }
 
 /*
@@ -3642,8 +3645,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_backend_request(), which process the signals and latches that might
+ * mean we need to change state.
  */
 static void
 PostmasterStateMachine(void)
@@ -3899,6 +3903,9 @@ PostmasterStateMachine(void)
 		/* re-create shared memory and semaphores */
 		CreateSharedMemoryAndSemaphores();
 
+		MyLatch = GetPostmasterLatch();
+		OwnLatch(MyLatch);
+
 		StartupPID = StartupDataBase();
 		Assert(StartupPID != 0);
 		StartupStatus = STARTUP_RUNNING;
@@ -4094,6 +4101,7 @@ BackendStartup(Port *port)
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
+	PG_SETMASK(&BlockSig);
 #ifdef EXEC_BACKEND
 	pid = backend_forkexec(port);
 #else							/* !EXEC_BACKEND */
@@ -4137,6 +4145,7 @@ BackendStartup(Port *port)
 		ereport(LOG,
 				(errmsg("could not fork new process for connection: %m")));
 		report_fork_failure_to_client(port, save_errno);
+		PG_SETMASK(&UnBlockSig);
 		return STATUS_ERROR;
 	}
 
@@ -4158,6 +4167,7 @@ BackendStartup(Port *port)
 		ShmemBackendArrayAdd(bn);
 #endif
 
+	PG_SETMASK(&UnBlockSig);
 	return STATUS_OK;
 }
 
@@ -5013,13 +5023,11 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_backend_request(void)
 {
-	int			save_errno = errno;
-
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
 	 * unexpected states. If the startup process quickly starts up, completes
@@ -5088,8 +5096,9 @@ sigusr1_handler(SIGNAL_ARGS)
 		maybe_start_bgworkers();
 
 	/* Tell syslogger to rotate logfile if requested */
-	if (SysLoggerPID != 0)
+	if (SysLoggerPID != 0 && pending_logrotate_check_request)
 	{
+		pending_logrotate_check_request = false;
 		if (CheckLogrotateSignal())
 		{
 			signal_child(SysLoggerPID, SIGUSR1);
@@ -5149,7 +5158,8 @@ sigusr1_handler(SIGNAL_ARGS)
 	if (StartupPID != 0 &&
 		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
 		 pmState == PM_HOT_STANDBY) &&
-		CheckPromoteSignal())
+		pending_promote_check_request &&
+		(pending_promote_check_request = false, CheckPromoteSignal()))
 	{
 		/*
 		 * Tell startup process to finish recovery.
@@ -5159,8 +5169,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index eb3a569aae..a27c6a1752 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -197,6 +197,8 @@ static void WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event);
 static inline int WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 										WaitEvent *occurred_events, int nevents);
 
+static void WakeLatch(Latch *latch);
+
 /*
  * Initialize the process-local latch infrastructure.
  *
@@ -283,6 +285,17 @@ InitializeLatchSupport(void)
 #ifdef WAIT_USE_SIGNALFD
 	sigset_t	signalfd_mask;
 
+	if (IsUnderPostmaster)
+	{
+		if (signal_fd != -1)
+		{
+			/* Release postmaster's signal FD; ignore any error */
+			(void) close(signal_fd);
+			signal_fd = -1;
+			ReleaseExternalFD();
+		}
+	}
+
 	/* Block SIGURG, because we'll receive it through a signalfd. */
 	sigaddset(&UnBlockSig, SIGURG);
 
@@ -590,12 +603,6 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
 void
 SetLatch(Latch *latch)
 {
-#ifndef WIN32
-	pid_t		owner_pid;
-#else
-	HANDLE		handle;
-#endif
-
 	/*
 	 * The memory barrier has to be placed here to ensure that any flag
 	 * variables possibly changed by this process have been flushed to main
@@ -613,6 +620,33 @@ SetLatch(Latch *latch)
 	if (!latch->maybe_sleeping)
 		return;
 
+	WakeLatch(latch);
+}
+
+/*
+ * A variant of SetLatch() used when waking the postmaster.  This skips the
+ * optimizations that normally avoid a system call if the owner isn't currently
+ * waiting or the latch is already set.  This is intended for waking the
+ * postmaster, which couldn't often benefit from such optimizations anyway
+ * because it spends its whole time waiting.  This should reduce opportunities
+ * for memory corruption to prevent the delivery of a wakeup.
+ */
+void
+SetLatchRobustly(Latch *latch)
+{
+	latch->is_set = true;
+	WakeLatch(latch);
+}
+
+static void
+WakeLatch(Latch *latch)
+{
+#ifndef WIN32
+	pid_t		owner_pid;
+#else
+	HANDLE		handle;
+#endif
+
 #ifndef WIN32
 
 	/*
@@ -1312,6 +1346,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
 			flags |= FD_WRITE;
 		if (event->events & WL_SOCKET_CONNECTED)
 			flags |= FD_CONNECT;
+		if (event->events & WL_SOCKET_ACCEPT)
+			flags |= FD_ACCEPT;
 
 		if (*handle == WSA_INVALID_EVENT)
 		{
@@ -2067,6 +2103,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* connected */
 			occurred_events->events |= WL_SOCKET_CONNECTED;
 		}
+		if ((cur_event->events & WL_SOCKET_ACCEPT) &&
+			(resEvents.lNetworkEvents & FD_ACCEPT))
+		{
+			/* connected */
+			occurred_events->events |= WL_SOCKET_ACCEPT;
+		}
 		if (resEvents.lNetworkEvents & FD_CLOSE)
 		{
 			/* EOF/error, so signal all caller-requested socket flags */
diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index c85521d364..7a7bf593c4 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -24,13 +24,14 @@
 #include "miscadmin.h"
 #include "postmaster/postmaster.h"
 #include "replication/walsender.h"
+#include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/memutils.h"
 
 
 /*
- * The postmaster is signaled by its children by sending SIGUSR1.  The
+ * The postmaster is signaled by its children by setting its latch.  The
  * specific reason is communicated via flags in shared memory.  We keep
  * a boolean flag for each possible "reason", so that different reasons
  * can be signaled by different backends at the same time.  (However,
@@ -70,17 +71,19 @@
 /* "typedef struct PMSignalData PMSignalData" appears in pmsignal.h */
 struct PMSignalData
 {
+	/* latch for waking postmaster */
+	Latch		pm_latch;
 	/* per-reason flags for signaling the postmaster */
-	sig_atomic_t PMSignalFlags[NUM_PMSIGNALS];
+	volatile sig_atomic_t PMSignalFlags[NUM_PMSIGNALS];
 	/* global flags for signals from postmaster to children */
-	QuitSignalReason sigquit_reason;	/* why SIGQUIT was sent */
+	volatile QuitSignalReason sigquit_reason;	/* why SIGQUIT was sent */
 	/* per-child-process flags */
 	int			num_child_flags;	/* # of entries in PMChildFlags[] */
-	sig_atomic_t PMChildFlags[FLEXIBLE_ARRAY_MEMBER];
+	volatile sig_atomic_t PMChildFlags[FLEXIBLE_ARRAY_MEMBER];
 };
 
 /* PMSignalState pointer is valid in both postmaster and child processes */
-NON_EXEC_STATIC volatile PMSignalData *PMSignalState = NULL;
+NON_EXEC_STATIC PMSignalData *PMSignalState = NULL;
 
 /*
  * These static variables are valid only in the postmaster.  We keep a
@@ -151,9 +154,10 @@ PMSignalShmemInit(void)
 	if (!found)
 	{
 		/* initialize all flags to zeroes */
-		MemSet(unvolatize(PMSignalData *, PMSignalState), 0, PMSignalShmemSize());
+		MemSet(PMSignalState, 0, PMSignalShmemSize());
 		num_child_inuse = MaxLivePostmasterChildren();
 		PMSignalState->num_child_flags = num_child_inuse;
+		InitSharedLatch(&PMSignalState->pm_latch);
 
 		/*
 		 * Also allocate postmaster's private PMChildInUse[] array.  We
@@ -186,13 +190,13 @@ SendPostmasterSignal(PMSignalReason reason)
 	/* Atomically set the proper flag */
 	PMSignalState->PMSignalFlags[reason] = true;
 	/* Send signal to postmaster */
-	kill(PostmasterPid, SIGUSR1);
+	SetLatchRobustly(&PMSignalState->pm_latch);
 }
 
 /*
  * CheckPostmasterSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called by postmaster
- * after receiving SIGUSR1.
+ * after its latch is set.
  */
 bool
 CheckPostmasterSignal(PMSignalReason reason)
@@ -206,6 +210,12 @@ CheckPostmasterSignal(PMSignalReason reason)
 	return false;
 }
 
+Latch *
+GetPostmasterLatch(void)
+{
+	return &PMSignalState->pm_latch;
+}
+
 /*
  * SetQuitSignalReason - broadcast the reason for a system shutdown.
  * Should be called by postmaster before sending SIGQUIT to children.
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index eb1046450b..1348261220 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 7890b426a8..76eb380a4f 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 795182fa51..0975867197 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 68ab740f16..2e8b64cef2 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -135,10 +135,16 @@ typedef struct Latch
 #define WL_SOCKET_CONNECTED  WL_SOCKET_WRITEABLE
 #endif
 #define WL_SOCKET_CLOSED 	 (1 << 7)
+#ifdef WIN32
+#define WL_SOCKET_ACCEPT	 (1 << 8)
+#else
+#define WL_SOCKET_ACCEPT	 WL_SOCKET_READABLE
+#endif
 #define WL_SOCKET_MASK		(WL_SOCKET_READABLE | \
 							 WL_SOCKET_WRITEABLE | \
 							 WL_SOCKET_CONNECTED | \
-							 WL_SOCKET_CLOSED)
+							 WL_SOCKET_CLOSED | \
+							 WL_SOCKET_ACCEPT)
 
 typedef struct WaitEvent
 {
@@ -163,6 +169,7 @@ extern void InitSharedLatch(Latch *latch);
 extern void OwnLatch(Latch *latch);
 extern void DisownLatch(Latch *latch);
 extern void SetLatch(Latch *latch);
+extern void SetLatchRobustly(Latch *latch);
 extern void ResetLatch(Latch *latch);
 extern void ShutdownLatchSupport(void);
 
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 58f4ddf476..ba657f716b 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -24,6 +24,8 @@
 #include "sys/procctl.h"
 #endif
 
+struct Latch;
+
 /*
  * Reasons for signaling the postmaster.  We can cope with simultaneous
  * signals for different reasons.  If the same reason is signaled multiple
@@ -74,6 +76,7 @@ extern void MarkPostmasterChildInactive(void);
 extern void MarkPostmasterChildWalSender(void);
 extern bool PostmasterIsAliveInternal(void);
 extern void PostmasterDeathSignalInit(void);
+extern struct Latch *GetPostmasterLatch(void);
 
 
 /*
-- 
2.38.1

#2Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#1)
Re: Using WaitEventSet in the postmaster

Hi,

On 2022-12-02 10:12:25 +1300, Thomas Munro wrote:

Here's a work-in-progress patch that uses WaitEventSet for the main
event loop in the postmaster

Wee!

with a latch as the wakeup mechanism for "PM signals" (requests from
backends to do things like start a background worker, etc).

Hm - is that directly related? ISTM that using a WES in the main loop, and
changing pmsignal.c to a latch are somewhat separate things?

Using a latch for pmsignal.c seems like a larger lift, because it means that
all of latch.c needs to be robust against a corrupted struct Latch.

In order to avoid adding a new dependency on the contents of shared
memory, I introduced SetLatchRobustly() that will always use the slow
path kernel wakeup primitive, even in cases where SetLatch() would
not. The idea here is that if one backend trashes shared memory,
others backends can still wake the postmaster even though it may
appear that the postmaster isn't waiting or the latch is already set.

Why is that a concern that needs to be addressed?

ISTM that the important thing is that either a) the postmaster's latch can't
be corrupted, because it's not shared with backends or b) struct Latch can be
overwritten with random contents without causing additional problems in
postmaster.

I don't think b) is the case as the patch stands. Imagine some process
overwriting pm_latch->owner_pid. That'd then break the SetLatch() in
postmaster's signal handler, because it wouldn't realize that itself needs to
be woken up, and we'd just signal some random process.

It doesn't seem trivial (but not impossible either) to make SetLatch() robust
against arbitrary corruption. So it seems easier to me to just put the latch
in process local memory, and do a SetLatch() in postmaster's SIGUSR1 handler.

Greetings,

Andres Freund

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#2)
Re: Using WaitEventSet in the postmaster

On Fri, Dec 2, 2022 at 2:40 PM Andres Freund <andres@anarazel.de> wrote:

On 2022-12-02 10:12:25 +1300, Thomas Munro wrote:

with a latch as the wakeup mechanism for "PM signals" (requests from
backends to do things like start a background worker, etc).

Hm - is that directly related? ISTM that using a WES in the main loop, and
changing pmsignal.c to a latch are somewhat separate things?

Yeah, that's a good question. This comes from a larger patch set
where my *goal* was to use latches everywhere possible for
interprocess wakeups, but it does indeed make a lot of sense to do the
postmaster WaitEventSet retrofit completely independently of that, and
leaving the associated robustness problems for later proposals (the
posted patch clearly fails to solve them).

I don't think b) is the case as the patch stands. Imagine some process
overwriting pm_latch->owner_pid. That'd then break the SetLatch() in
postmaster's signal handler, because it wouldn't realize that itself needs to
be woken up, and we'd just signal some random process.

Right. At some point I had an idea about a non-shared table of
latches where OS-specific things like pids and HANDLEs live, so only
the maybe_waiting and is_set flags are in shared memory, and even
those are ignored when accessing the latch in 'robust' mode (they're
only optimisations after all). I didn't try it though. First you
might have to switch to a model with a finite set of latches
identified by index, or something like that. But I like your idea of
separating that whole problem.

It doesn't seem trivial (but not impossible either) to make SetLatch() robust
against arbitrary corruption. So it seems easier to me to just put the latch
in process local memory, and do a SetLatch() in postmaster's SIGUSR1 handler.

Alright, good idea, I'll do a v2 like that.

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#3)
1 attachment(s)
Re: Using WaitEventSet in the postmaster

On Fri, Dec 2, 2022 at 3:36 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Dec 2, 2022 at 2:40 PM Andres Freund <andres@anarazel.de> wrote:

It doesn't seem trivial (but not impossible either) to make SetLatch() robust
against arbitrary corruption. So it seems easier to me to just put the latch
in process local memory, and do a SetLatch() in postmaster's SIGUSR1 handler.

Alright, good idea, I'll do a v2 like that.

Here's an iteration like that. Still WIP grade. It passes, but there
must be something I don't understand about this computer program yet,
because if I move the "if (pending_..." section up into the block
where WL_LATCH_SET has arrived (instead of testing those variables
every time through the loop), a couple of tests leave zombie
(unreaped) processes behind, indicating that something funky happened
to the state machine that I haven't yet grokked. Will look more next
week.

By the way, I think if we do this and then also do
s/select(/WaitLatchOrSocket(/ in auth.c's RADIUS code, then we could
then drop a chunk of newly unreachable code in
src/backend/port/win32/socket.c (though maybe I missed something; it's
quite hard to grep for "select" in a SQL database :-D). There's also
a bunch of suspect stuff in there about UDP that is already dead
thanks to the pgstats work.

Attachments:

v2-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From 9f60cb42b222952ab94d0d4d89017c1390400196 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v2] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * WL_SOCKET_ACCEPT is a new event for an incoming connection (on Unix, this is
   just another name for WL_SOCKET_READABLE, but Window has a different
   underlying event; this mirrors WL_SOCKET_CONNECTED on the other
   end of a connection)

 * Small adjustments to WES to allow it to run in the postmaster.

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (perhaps
   later we'll figure out how to use a shared latch "robustly", so that
   memory corruption can't interfere with the postmaster's
   cleanup-and-restart responsibilities, but for now there is a two-step
   signal protocol SIGUSR1 -> SIGURG).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX variable and sets the local latch, and a
   process_XXX part.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant: SUSv2 left it implementation-defined whether select()
   restarts, but didn't add that qualification for poll(), and it doesn't
   matter anyway because we call SetLatch() creating a new reason to wake
   up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqsignal.c        |  40 ----
 src/backend/postmaster/postmaster.c | 330 ++++++++++++++--------------
 src/backend/storage/ipc/latch.c     |  19 ++
 src/backend/tcop/postgres.c         |   1 -
 src/backend/utils/init/miscinit.c   |  13 +-
 src/include/libpq/pqsignal.h        |   3 -
 src/include/miscadmin.h             |   1 +
 src/include/storage/latch.h         |   8 +-
 8 files changed, 203 insertions(+), 212 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 1ab34c5214..718043a39d 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a8a246921f..1813939b4e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include <time.h>
 #include <sys/wait.h>
 #include <ctype.h>
-#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <fcntl.h>
@@ -362,6 +361,12 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_action_request;
+static volatile sig_atomic_t pending_child_exit;
+static volatile sig_atomic_t pending_reload_request;
+static volatile sig_atomic_t pending_shutdown_request;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +385,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_action_request_signal(SIGNAL_ARGS);
+static void handle_child_exit_signal(SIGNAL_ARGS);
+static void handle_reload_request_signal(SIGNAL_ARGS);
+static void handle_shutdown_request_signal(SIGNAL_ARGS);
+static void process_action_request(void);
+static void process_child_exit(void);
+static void process_reload_request(void);
+static void process_shutdown_request(void);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -401,7 +410,6 @@ static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -609,26 +617,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +626,21 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
+	pqsignal(SIGHUP, handle_reload_request_signal);
+	pqsignal(SIGINT, handle_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN);	/* ignored */
+	pqsignal(SIGPIPE, SIG_IGN);	/* ignored */
+	pqsignal(SIGUSR1, handle_action_request_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_child_exit_signal);
 
-#ifdef SIGURG
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitLocalLatch();
 
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	PG_SETMASK(&UnBlockSig);
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,15 +650,15 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN);	/* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN);	/* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN);	/* ignored */
 #endif
 
 	/*
@@ -1706,101 +1689,93 @@ DetermineSleepTime(struct timeval *timeout)
 static int
 ServerLoop(void)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEventSet *wes;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
-	last_lockfile_recheck_time = last_touch_time = time(NULL);
+	/* Set up a WaitEventSet for our latch and listening sockets. */
+	wes = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+	for (int i = 0; i < MAXLISTEN; i++)
+	{
+		int			fd = ListenSocket[i];
+
+		if (fd == PGINVALID_SOCKET)
+			break;
+		AddWaitEventToSet(wes, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+	}
 
-	nSockets = initMasks(&readmask);
+	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
 
 		/*
 		 * Wait for a connection request to arrive.
 		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
 		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
+		 * any new connections, so we just sleep.
 		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
 		if (pmState == PM_WAIT_DEAD_END)
 		{
-			PG_SETMASK(&UnBlockSig);
-
 			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
-
-			PG_SETMASK(&BlockSig);
+			nevents = 0;
 		}
 		else
 		{
-			/* must set timeout each time; some OSes change it! */
 			struct timeval timeout;
 
-			/* Needs to run with blocked signals! */
 			DetermineSleepTime(&timeout);
 
-			PG_SETMASK(&UnBlockSig);
-
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-
-			PG_SETMASK(&BlockSig);
-		}
-
-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
+			nevents = WaitEventSetWait(wes,
+									   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+									   events,
+									   lengthof(events),
+									   0 /* postmaster posts no wait_events */);
 		}
 
 		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of our
+		 * sockets? If the latter, fork a child process to deal with it.
 		 */
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
 		{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+			}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
+
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure
+					 * in this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
 
+		/* Process work scheduled by signal handlers. */
+		if (pending_action_request)
+			process_action_request();
+		if (pending_child_exit)
+			process_child_exit();
+		if (pending_reload_request)
+			process_reload_request();
+		if (pending_shutdown_request)
+			process_shutdown_request();
+
 		/* If we have lost the log collector, try to start a new one */
 		if (SysLoggerPID == 0 && Logging_collector)
 			SysLoggerPID = SysLogger_Start();
@@ -1939,34 +1914,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2707,14 +2654,42 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * Child processes use SIGUSR1 to for pmsignals.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */
+static void
+handle_action_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	pending_action_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * pg_ctl uses SIGHUP to request a reload of the configuration files.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+handle_reload_request_signal(SIGNAL_ARGS)
 {
-	int			save_errno = errno;
+	int save_errno = errno;
+
+	pending_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Re-read config files, and tell children to do same.
+ */
+static void
+process_reload_request(void)
+{
+	pending_reload_request = false;
 
 	if (Shutdown <= SmartShutdown)
 	{
@@ -2771,27 +2746,47 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
-
-	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
  */
 static void
-pmdie(SIGNAL_ARGS)
+handle_shutdown_request_signal(SIGNAL_ARGS)
 {
-	int			save_errno = errno;
-
-	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+	int save_errno = errno;
 
 	switch (postgres_signal_arg)
 	{
 		case SIGTERM:
+			pending_shutdown_request = SmartShutdown;
+			break;
+		case SIGINT:
+			pending_shutdown_request = FastShutdown;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = ImmediateShutdown;
+			break;
+	}
+	SetLatch(MyLatch);
 
+	errno = save_errno;
+}
+
+/*
+ * Process shutdown request.
+ */
+static void
+process_shutdown_request(void)
+{
+	int		mode = pending_shutdown_request;
+
+	pending_shutdown_request = NoShutdown;
+
+	switch (mode)
+	{
+		case SmartShutdown:
 			/*
 			 * Smart Shutdown:
 			 *
@@ -2830,8 +2825,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGINT:
-
+		case FastShutdown:
 			/*
 			 * Fast Shutdown:
 			 *
@@ -2871,8 +2865,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGQUIT:
-
+		case ImmediateShutdown:
 			/*
 			 * Immediate Shutdown:
 			 *
@@ -2908,20 +2901,30 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 	}
+}
+
+static void
+handle_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_child_exit(void)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3213,8 +3216,6 @@ reaper(SIGNAL_ARGS)
 	 * or actions to make.
 	 */
 	PostmasterStateMachine();
-
-	errno = save_errno;
 }
 
 /*
@@ -3642,8 +3643,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_action_request(), which process the signals that might mean we need
+ * to change state.
  */
 static void
 PostmasterStateMachine(void)
@@ -4094,6 +4096,7 @@ BackendStartup(Port *port)
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
+	PG_SETMASK(&BlockSig);
 #ifdef EXEC_BACKEND
 	pid = backend_forkexec(port);
 #else							/* !EXEC_BACKEND */
@@ -4137,6 +4140,7 @@ BackendStartup(Port *port)
 		ereport(LOG,
 				(errmsg("could not fork new process for connection: %m")));
 		report_fork_failure_to_client(port, save_errno);
+		PG_SETMASK(&UnBlockSig);
 		return STATUS_ERROR;
 	}
 
@@ -4158,6 +4162,7 @@ BackendStartup(Port *port)
 		ShmemBackendArrayAdd(bn);
 #endif
 
+	PG_SETMASK(&UnBlockSig);
 	return STATUS_OK;
 }
 
@@ -5013,12 +5018,13 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends,
+ * and check for promote and logrotate requests from pg_ctl.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_action_request(void)
 {
-	int			save_errno = errno;
+	pending_action_request = false;
 
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
@@ -5159,8 +5165,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index eb3a569aae..3134a4dd04 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -283,6 +283,17 @@ InitializeLatchSupport(void)
 #ifdef WAIT_USE_SIGNALFD
 	sigset_t	signalfd_mask;
 
+	if (IsUnderPostmaster)
+	{
+		if (signal_fd != -1)
+		{
+			/* Release postmaster's signal FD; ignore any error */
+			(void) close(signal_fd);
+			signal_fd = -1;
+			ReleaseExternalFD();
+		}
+	}
+
 	/* Block SIGURG, because we'll receive it through a signalfd. */
 	sigaddset(&UnBlockSig, SIGURG);
 
@@ -1312,6 +1323,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
 			flags |= FD_WRITE;
 		if (event->events & WL_SOCKET_CONNECTED)
 			flags |= FD_CONNECT;
+		if (event->events & WL_SOCKET_ACCEPT)
+			flags |= FD_ACCEPT;
 
 		if (*handle == WSA_INVALID_EVENT)
 		{
@@ -2067,6 +2080,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* connected */
 			occurred_events->events |= WL_SOCKET_CONNECTED;
 		}
+		if ((cur_event->events & WL_SOCKET_ACCEPT) &&
+			(resEvents.lNetworkEvents & FD_ACCEPT))
+		{
+			/* incoming connection ready to accept */
+			occurred_events->events |= WL_SOCKET_ACCEPT;
+		}
 		if (resEvents.lNetworkEvents & FD_CLOSE)
 		{
 			/* EOF/error, so signal all caller-requested socket flags */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..655e881688 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
-#include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/time.h>
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index eb1046450b..1348261220 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 7890b426a8..76eb380a4f 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 795182fa51..0975867197 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 68ab740f16..88f1354714 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -135,10 +135,16 @@ typedef struct Latch
 #define WL_SOCKET_CONNECTED  WL_SOCKET_WRITEABLE
 #endif
 #define WL_SOCKET_CLOSED 	 (1 << 7)
+#ifdef WIN32
+#define WL_SOCKET_ACCEPT	 (1 << 8)
+#else
+#define WL_SOCKET_ACCEPT	 WL_SOCKET_READABLE
+#endif
 #define WL_SOCKET_MASK		(WL_SOCKET_READABLE | \
 							 WL_SOCKET_WRITEABLE | \
 							 WL_SOCKET_CONNECTED | \
-							 WL_SOCKET_CLOSED)
+							 WL_SOCKET_CLOSED | \
+							 WL_SOCKET_ACCEPT)
 
 typedef struct WaitEvent
 {
-- 
2.30.2

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#4)
1 attachment(s)
Re: Using WaitEventSet in the postmaster

On Sat, Dec 3, 2022 at 10:41 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Here's an iteration like that. Still WIP grade. It passes, but there
must be something I don't understand about this computer program yet,
because if I move the "if (pending_..." section up into the block
where WL_LATCH_SET has arrived (instead of testing those variables
every time through the loop), a couple of tests leave zombie
(unreaped) processes behind, indicating that something funky happened
to the state machine that I haven't yet grokked. Will look more next
week.

Duh. The reason for that was the pre-existing special case for
PM_WAIT_DEAD_END, which used a sleep(100ms) loop to wait for children
to exit, which I needed to change to a latch wait. Fixed in the next
iteration, attached.

The reason for the existing sleep-based approach was that we didn't
want to accept any more connections (or spin furiously because the
listen queue was non-empty). So in this version I invented a way to
suppress socket events temporarily with WL_SOCKET_IGNORE, and then
reactivate them after crash reinit.

Still WIP, but I hope travelling in the right direction.

Attachments:

v3-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From 65b5fa1f7024cb78cee9ba57d36a78dc17ffe492 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v3] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * WL_SOCKET_ACCEPT is a new event for an incoming connection (on Unix,
   this is just another name for WL_SOCKET_READABLE, but Window has a
   different underlying event; this mirrors WL_SOCKET_CONNECTED on the
   other end of a connection)

 * WL_SOCKET_IGNORE is a new way to stop waking up for new incoming
   connections while shutting down.

 * Small adjustments to WaitEventSet to allow running in the postmaster.

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (perhaps
   later we'll figure out how to use a shared latch "robustly", so that
   memory corruption can't interfere with the postmaster's
   cleanup-and-restart responsibilities, but for now there is a two-step
   signal protocol SIGUSR1 -> SIGURG).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX variable and sets the local latch, and a
   process_XXX part.

 * ServerLoop(), the process_XXX() functions and
   PostmasterStateMachine() now all take a pointer to a Postmaster
   object that lives on the stack as a parameter that initially holds the
   WaitEventSet they need to do their job.  Many other global variables
   could be moved into it, but that's not done here.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant: SUSv2 left it implementation-defined whether select()
   restarts, but didn't add that qualification for poll(), and it doesn't
   matter anyway because we call SetLatch() creating a new reason to wake
   up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqsignal.c        |  40 ---
 src/backend/postmaster/postmaster.c | 413 +++++++++++++++-------------
 src/backend/storage/ipc/latch.c     |  22 ++
 src/backend/tcop/postgres.c         |   1 -
 src/backend/utils/init/miscinit.c   |  13 +-
 src/include/libpq/pqsignal.h        |   3 -
 src/include/miscadmin.h             |   1 +
 src/include/storage/latch.h         |   9 +-
 8 files changed, 266 insertions(+), 236 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 1ab34c5214..718043a39d 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a8a246921f..5000fb987d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include <time.h>
 #include <sys/wait.h>
 #include <ctype.h>
-#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <fcntl.h>
@@ -325,6 +324,16 @@ typedef enum
 	PM_NO_CHILDREN				/* all important children have exited */
 } PMState;
 
+/*
+ * Object representing the state of a postmaster.
+ *
+ * XXX Lots of global variables could move in here.
+ */
+typedef struct
+{
+	WaitEventSet	*wes;
+} Postmaster;
+
 static PMState pmState = PM_INIT;
 
 /*
@@ -362,6 +371,14 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_action_request;
+static volatile sig_atomic_t pending_child_exit;
+static volatile sig_atomic_t pending_reload_request;
+static volatile sig_atomic_t pending_shutdown_request;
+
+static bool		reenable_server_socket_events;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +397,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_action_request_signal(SIGNAL_ARGS);
+static void handle_child_exit_signal(SIGNAL_ARGS);
+static void handle_reload_request_signal(SIGNAL_ARGS);
+static void handle_shutdown_request_signal(SIGNAL_ARGS);
+static void process_action_request(Postmaster *postmaster);
+static void process_child_exit(Postmaster *postmaster);
+static void process_reload_request(void);
+static void process_shutdown_request(Postmaster *postmaster);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -392,16 +413,15 @@ static bool CleanupBackgroundWorker(int pid, int exitstatus);
 static void HandleChildCrash(int pid, int exitstatus, const char *procname);
 static void LogChildExit(int lev, const char *procname,
 						 int pid, int exitstatus);
-static void PostmasterStateMachine(void);
+static void PostmasterStateMachine(Postmaster *postmaster);
 static void BackendInitialize(Port *port);
 static void BackendRun(Port *port) pg_attribute_noreturn();
 static void ExitPostmaster(int status) pg_attribute_noreturn();
-static int	ServerLoop(void);
+static int	ServerLoop(Postmaster *postmaster);
 static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -568,6 +588,7 @@ PostmasterMain(int argc, char *argv[])
 	bool		listen_addr_saved = false;
 	int			i;
 	char	   *output_config_variable = NULL;
+	Postmaster	postmaster = {0};
 
 	InitProcessGlobals();
 
@@ -609,26 +630,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +639,21 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
+	pqsignal(SIGHUP, handle_reload_request_signal);
+	pqsignal(SIGINT, handle_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN);	/* ignored */
+	pqsignal(SIGPIPE, SIG_IGN);	/* ignored */
+	pqsignal(SIGUSR1, handle_action_request_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_child_exit_signal);
 
-#ifdef SIGURG
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitLocalLatch();
 
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	PG_SETMASK(&UnBlockSig);
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,15 +663,15 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN);	/* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN);	/* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN);	/* ignored */
 #endif
 
 	/*
@@ -1460,7 +1456,7 @@ PostmasterMain(int argc, char *argv[])
 	/* Some workers may be scheduled to start now */
 	maybe_start_bgworkers();
 
-	status = ServerLoop();
+	status = ServerLoop(&postmaster);
 
 	/*
 	 * ServerLoop probably shouldn't ever return, but if it does, close down.
@@ -1698,105 +1694,112 @@ DetermineSleepTime(struct timeval *timeout)
 	}
 }
 
+/*
+ * Initialize the WaitEventSet we'll use in our main event loop.
+ */
+static void
+InitializeWaitSet(Postmaster *postmaster)
+{
+	/* Set up a WaitEventSet for our latch and listening sockets. */
+	postmaster->wes = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(postmaster->wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+	for (int i = 0; i < MAXLISTEN; i++)
+	{
+		int			fd = ListenSocket[i];
+
+		if (fd == PGINVALID_SOCKET)
+			break;
+		AddWaitEventToSet(postmaster->wes, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+	}
+}
+
+/*
+ * Activate or deactivate the server socket events.
+ */
+static void
+AdjustServerSocketEvents(Postmaster *postmaster, bool active)
+{
+	for (int pos = 1; pos < GetNumRegisteredWaitEvents(postmaster->wes); ++pos)
+		ModifyWaitEvent(postmaster->wes,
+						pos, active ? WL_SOCKET_ACCEPT : WL_SOCKET_IGNORE,
+						NULL);
+}
+
 /*
  * Main idle loop of postmaster
  *
  * NB: Needs to be called with signals blocked
  */
 static int
-ServerLoop(void)
+ServerLoop(Postmaster *postmaster)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
+	InitializeWaitSet(postmaster);
 	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
-	nSockets = initMasks(&readmask);
-
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
+		struct timeval timeout;
 
-		/*
-		 * Wait for a connection request to arrive.
-		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
-		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
-		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
-		if (pmState == PM_WAIT_DEAD_END)
-		{
-			PG_SETMASK(&UnBlockSig);
-
-			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
-
-			PG_SETMASK(&BlockSig);
-		}
-		else
-		{
-			/* must set timeout each time; some OSes change it! */
-			struct timeval timeout;
-
-			/* Needs to run with blocked signals! */
-			DetermineSleepTime(&timeout);
-
-			PG_SETMASK(&UnBlockSig);
+		DetermineSleepTime(&timeout);
 
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
+		nevents = WaitEventSetWait(postmaster->wes,
+								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   events,
+								   lengthof(events),
+								   0 /* postmaster posts no wait_events */);
 
-			PG_SETMASK(&BlockSig);
-		}
-
-		/* Now check the select() result */
-		if (selres < 0)
+		/*
+		 * Latch set by signal handler, or new connection pending on any of our
+		 * sockets? If the latter, fork a child process to deal with it.
+		 */
+		for (int i = 0; i < nevents; i++)
 		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
+				ResetLatch(MyLatch);
+
+				/* Process work scheduled by signal handlers. */
+				if (pending_action_request)
+					process_action_request(postmaster);
+				if (pending_child_exit)
+					process_child_exit(postmaster);
+				if (pending_reload_request)
+					process_reload_request();
+				if (pending_shutdown_request)
+					process_shutdown_request(postmaster);
 			}
-		}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
 
-		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
-		 */
-		if (selres > 0)
-		{
-			int			i;
+				/*
+				 * If we are in PM_WAIT_DEAD_END state, then we don't want to
+				 * accept any new connections.  Lazily silence all socket
+				 * events.
+				 */
+				if (pmState == PM_WAIT_DEAD_END)
+				{
+					AdjustServerSocketEvents(postmaster, false);
+					continue;
+				}
 
-			for (i = 0; i < MAXLISTEN; i++)
-			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure
+					 * in this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
@@ -1939,34 +1942,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2707,14 +2682,42 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * Child processes use SIGUSR1 to for pmsignals.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */
+static void
+handle_action_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	pending_action_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * pg_ctl uses SIGHUP to request a reload of the configuration files.
+ */
+static void
+handle_reload_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	pending_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * Re-read config files, and tell children to do same.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+process_reload_request(void)
 {
-	int			save_errno = errno;
+	pending_reload_request = false;
 
 	if (Shutdown <= SmartShutdown)
 	{
@@ -2771,27 +2774,47 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
-
-	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
  */
 static void
-pmdie(SIGNAL_ARGS)
+handle_shutdown_request_signal(SIGNAL_ARGS)
 {
-	int			save_errno = errno;
-
-	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+	int save_errno = errno;
 
 	switch (postgres_signal_arg)
 	{
 		case SIGTERM:
+			pending_shutdown_request = SmartShutdown;
+			break;
+		case SIGINT:
+			pending_shutdown_request = FastShutdown;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = ImmediateShutdown;
+			break;
+	}
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
+/*
+ * Process shutdown request.
+ */
+static void
+process_shutdown_request(Postmaster *postmaster)
+{
+	int		mode = pending_shutdown_request;
+
+	pending_shutdown_request = NoShutdown;
+
+	switch (mode)
+	{
+		case SmartShutdown:
 			/*
 			 * Smart Shutdown:
 			 *
@@ -2827,11 +2850,10 @@ pmdie(SIGNAL_ARGS)
 			 * that is already the case, PostmasterStateMachine will take the
 			 * next step.
 			 */
-			PostmasterStateMachine();
+			PostmasterStateMachine(postmaster);
 			break;
 
-		case SIGINT:
-
+		case FastShutdown:
 			/*
 			 * Fast Shutdown:
 			 *
@@ -2868,11 +2890,10 @@ pmdie(SIGNAL_ARGS)
 			 * PostmasterStateMachine will issue any necessary signals, or
 			 * take the next step if no child processes need to be killed.
 			 */
-			PostmasterStateMachine();
+			PostmasterStateMachine(postmaster);
 			break;
 
-		case SIGQUIT:
-
+		case ImmediateShutdown:
 			/*
 			 * Immediate Shutdown:
 			 *
@@ -2905,23 +2926,33 @@ pmdie(SIGNAL_ARGS)
 			 * Now wait for backends to exit.  If there are none,
 			 * PostmasterStateMachine will take the next step.
 			 */
-			PostmasterStateMachine();
+			PostmasterStateMachine(postmaster);
 			break;
 	}
+}
+
+static void
+handle_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_child_exit(Postmaster *postmaster)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3212,9 +3243,7 @@ reaper(SIGNAL_ARGS)
 	 * After cleaning out the SIGCHLD queue, see if we have any state changes
 	 * or actions to make.
 	 */
-	PostmasterStateMachine();
-
-	errno = save_errno;
+	PostmasterStateMachine(postmaster);
 }
 
 /*
@@ -3642,11 +3671,12 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_action_request(), which process the signals that might mean we need
+ * to change state.
  */
 static void
-PostmasterStateMachine(void)
+PostmasterStateMachine(Postmaster *postmaster)
 {
 	/* If we're doing a smart shutdown, try to advance that state. */
 	if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
@@ -3819,6 +3849,9 @@ PostmasterStateMachine(void)
 			Assert(AutoVacPID == 0);
 			/* syslogger is not considered here */
 			pmState = PM_NO_CHILDREN;
+
+			/* re-activate server socket events */
+			AdjustServerSocketEvents(postmaster, true);
 		}
 	}
 
@@ -3905,6 +3938,9 @@ PostmasterStateMachine(void)
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
 		AbortStartTime = 0;
+
+		/* start accepting server socket connection events again */
+		reenable_server_socket_events = true;
 	}
 }
 
@@ -4094,6 +4130,7 @@ BackendStartup(Port *port)
 	/* Hasn't asked to be notified about any bgworkers yet */
 	bn->bgworker_notify = false;
 
+	PG_SETMASK(&BlockSig);
 #ifdef EXEC_BACKEND
 	pid = backend_forkexec(port);
 #else							/* !EXEC_BACKEND */
@@ -4124,6 +4161,7 @@ BackendStartup(Port *port)
 		BackendRun(port);
 	}
 #endif							/* EXEC_BACKEND */
+	PG_SETMASK(&UnBlockSig);
 
 	if (pid < 0)
 	{
@@ -5013,12 +5051,13 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends,
+ * and check for promote and logrotate requests from pg_ctl.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_action_request(Postmaster *postmaster)
 {
-	int			save_errno = errno;
+	pending_action_request = false;
 
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
@@ -5143,7 +5182,7 @@ sigusr1_handler(SIGNAL_ARGS)
 	 */
 	if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE))
 	{
-		PostmasterStateMachine();
+		PostmasterStateMachine(postmaster);
 	}
 
 	if (StartupPID != 0 &&
@@ -5159,8 +5198,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
@@ -5271,6 +5308,7 @@ StartChildProcess(AuxProcType type)
 {
 	pid_t		pid;
 
+	PG_SETMASK(&BlockSig);
 #ifdef EXEC_BACKEND
 	{
 		char	   *av[10];
@@ -5310,6 +5348,7 @@ StartChildProcess(AuxProcType type)
 		AuxiliaryProcessMain(type); /* does not return */
 	}
 #endif							/* EXEC_BACKEND */
+	PG_SETMASK(&UnBlockSig);
 
 	if (pid < 0)
 	{
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index eb3a569aae..3bfef592eb 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -283,6 +283,17 @@ InitializeLatchSupport(void)
 #ifdef WAIT_USE_SIGNALFD
 	sigset_t	signalfd_mask;
 
+	if (IsUnderPostmaster)
+	{
+		if (signal_fd != -1)
+		{
+			/* Release postmaster's signal FD; ignore any error */
+			(void) close(signal_fd);
+			signal_fd = -1;
+			ReleaseExternalFD();
+		}
+	}
+
 	/* Block SIGURG, because we'll receive it through a signalfd. */
 	sigaddset(&UnBlockSig, SIGURG);
 
@@ -1069,6 +1080,7 @@ WaitEventAdjustEpoll(WaitEventSet *set, WaitEvent *event, int action)
 		Assert(event->fd != PGINVALID_SOCKET);
 		Assert(event->events & (WL_SOCKET_READABLE |
 								WL_SOCKET_WRITEABLE |
+								WL_SOCKET_IGNORE |
 								WL_SOCKET_CLOSED));
 
 		if (event->events & WL_SOCKET_READABLE)
@@ -1117,6 +1129,7 @@ WaitEventAdjustPoll(WaitEventSet *set, WaitEvent *event)
 	{
 		Assert(event->events & (WL_SOCKET_READABLE |
 								WL_SOCKET_WRITEABLE |
+								WL_SOCKET_IGNORE |
 								WL_SOCKET_CLOSED));
 		pollfd->events = 0;
 		if (event->events & WL_SOCKET_READABLE)
@@ -1201,6 +1214,7 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 		   event->events == WL_POSTMASTER_DEATH ||
 		   (event->events & (WL_SOCKET_READABLE |
 							 WL_SOCKET_WRITEABLE |
+							 WL_SOCKET_IGNORE |
 							 WL_SOCKET_CLOSED)));
 
 	if (event->events == WL_POSTMASTER_DEATH)
@@ -1312,6 +1326,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
 			flags |= FD_WRITE;
 		if (event->events & WL_SOCKET_CONNECTED)
 			flags |= FD_CONNECT;
+		if (event->events & WL_SOCKET_ACCEPT)
+			flags |= FD_ACCEPT;
 
 		if (*handle == WSA_INVALID_EVENT)
 		{
@@ -2067,6 +2083,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* connected */
 			occurred_events->events |= WL_SOCKET_CONNECTED;
 		}
+		if ((cur_event->events & WL_SOCKET_ACCEPT) &&
+			(resEvents.lNetworkEvents & FD_ACCEPT))
+		{
+			/* incoming connection ready to accept */
+			occurred_events->events |= WL_SOCKET_ACCEPT;
+		}
 		if (resEvents.lNetworkEvents & FD_CLOSE)
 		{
 			/* EOF/error, so signal all caller-requested socket flags */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..655e881688 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
-#include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/time.h>
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index eb1046450b..1348261220 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 7890b426a8..76eb380a4f 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 795182fa51..0975867197 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 68ab740f16..ce1f4bd44e 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -135,10 +135,17 @@ typedef struct Latch
 #define WL_SOCKET_CONNECTED  WL_SOCKET_WRITEABLE
 #endif
 #define WL_SOCKET_CLOSED 	 (1 << 7)
+#ifdef WIN32
+#define WL_SOCKET_ACCEPT	 (1 << 8)
+#else
+#define WL_SOCKET_ACCEPT	 WL_SOCKET_READABLE
+#endif
+#define WL_SOCKET_IGNORE	 (1 << 9)
 #define WL_SOCKET_MASK		(WL_SOCKET_READABLE | \
 							 WL_SOCKET_WRITEABLE | \
 							 WL_SOCKET_CONNECTED | \
-							 WL_SOCKET_CLOSED)
+							 WL_SOCKET_CLOSED | \
+							 WL_SOCKET_ACCEPT)
 
 typedef struct WaitEvent
 {
-- 
2.38.1

#6Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#5)
Re: Using WaitEventSet in the postmaster

Hi,

On 2022-12-05 22:45:57 +1300, Thomas Munro wrote:

On Sat, Dec 3, 2022 at 10:41 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Here's an iteration like that. Still WIP grade. It passes, but there
must be something I don't understand about this computer program yet,
because if I move the "if (pending_..." section up into the block
where WL_LATCH_SET has arrived (instead of testing those variables
every time through the loop), a couple of tests leave zombie
(unreaped) processes behind, indicating that something funky happened
to the state machine that I haven't yet grokked. Will look more next
week.

Duh. The reason for that was the pre-existing special case for
PM_WAIT_DEAD_END, which used a sleep(100ms) loop to wait for children
to exit, which I needed to change to a latch wait. Fixed in the next
iteration, attached.

The reason for the existing sleep-based approach was that we didn't
want to accept any more connections (or spin furiously because the
listen queue was non-empty). So in this version I invented a way to
suppress socket events temporarily with WL_SOCKET_IGNORE, and then
reactivate them after crash reinit.

That seems like an odd special flag. Why do we need it? Is it just because we
want to have assertions ensuring that something is being queried?

* WL_SOCKET_ACCEPT is a new event for an incoming connection (on Unix,
this is just another name for WL_SOCKET_READABLE, but Window has a
different underlying event; this mirrors WL_SOCKET_CONNECTED on the
other end of a connection)

Perhaps worth committing separately and soon? Seems pretty uncontroversial
from here.

+/*
+ * Object representing the state of a postmaster.
+ *
+ * XXX Lots of global variables could move in here.
+ */
+typedef struct
+{
+	WaitEventSet	*wes;
+} Postmaster;
+

Seems weird to introduce this but then basically have it be unused. I'd say
either have a preceding patch move at least a few members into it, or just
omit it for now.

+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitLocalLatch();

I'm mildly preferring InitProcessLocalLatch(), but not sure why - there's not
really a conflicting meaning of "local" here.

+/*
+ * Initialize the WaitEventSet we'll use in our main event loop.
+ */
+static void
+InitializeWaitSet(Postmaster *postmaster)
+{
+	/* Set up a WaitEventSet for our latch and listening sockets. */
+	postmaster->wes = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(postmaster->wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+	for (int i = 0; i < MAXLISTEN; i++)
+	{
+		int			fd = ListenSocket[i];
+
+		if (fd == PGINVALID_SOCKET)
+			break;
+		AddWaitEventToSet(postmaster->wes, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+	}
+}

The naming seems like it could be confused with latch.h
infrastructure. InitPostmasterWaitSet()?

+/*
+ * Activate or deactivate the server socket events.
+ */
+static void
+AdjustServerSocketEvents(Postmaster *postmaster, bool active)
+{
+	for (int pos = 1; pos < GetNumRegisteredWaitEvents(postmaster->wes); ++pos)
+		ModifyWaitEvent(postmaster->wes,
+						pos, active ? WL_SOCKET_ACCEPT : WL_SOCKET_IGNORE,
+						NULL);
+}

This seems to hardcode the specific wait events we're waiting for based on
latch.c infrastructure. Not really convinced that's a good idea.

+		/*
+		 * Latch set by signal handler, or new connection pending on any of our
+		 * sockets? If the latter, fork a child process to deal with it.
+		 */
+		for (int i = 0; i < nevents; i++)
{
-			if (errno != EINTR && errno != EWOULDBLOCK)
+			if (events[i].events & WL_LATCH_SET)
{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
+				ResetLatch(MyLatch);
+
+				/* Process work scheduled by signal handlers. */
+				if (pending_action_request)
+					process_action_request(postmaster);
+				if (pending_child_exit)
+					process_child_exit(postmaster);
+				if (pending_reload_request)
+					process_reload_request();
+				if (pending_shutdown_request)
+					process_shutdown_request(postmaster);
}

Is the order of operations here quite right? Shouldn't we process a shutdown
request before the others? And a child exit before the request to start an
autovac worker etc?

ISTM it should be 1) shutdown request 2) child exit 3) config reload 4) action
request.

/*
- * pmdie -- signal handler for processing various postmaster signals.
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
*/
static void
-pmdie(SIGNAL_ARGS)
+handle_shutdown_request_signal(SIGNAL_ARGS)
{
-	int			save_errno = errno;
-
-	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+	int save_errno = errno;
switch (postgres_signal_arg)
{
case SIGTERM:
+			pending_shutdown_request = SmartShutdown;
+			break;
+		case SIGINT:
+			pending_shutdown_request = FastShutdown;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = ImmediateShutdown;
+			break;
+	}
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}

Hm, not having the "postmaster received signal" message anymore seems like a
loss when debugging things. I think process_shutdown_request() should emit
something like it.

I wonder if we should have a elog_sighand() that's written to be signal
safe. I've written versions of that numerous times for debugging, and it's a
bit silly to do that over and over again.

@@ -2905,23 +2926,33 @@ pmdie(SIGNAL_ARGS)
* Now wait for backends to exit. If there are none,
* PostmasterStateMachine will take the next step.
*/
- PostmasterStateMachine();
+ PostmasterStateMachine(postmaster);
break;

I'm by now fairly certain that it's a bad idea to have this change mixed in
with the rest of this large-ish change.

static void
-PostmasterStateMachine(void)
+PostmasterStateMachine(Postmaster *postmaster)
{
/* If we're doing a smart shutdown, try to advance that state. */
if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
@@ -3819,6 +3849,9 @@ PostmasterStateMachine(void)
Assert(AutoVacPID == 0);
/* syslogger is not considered here */
pmState = PM_NO_CHILDREN;
+
+			/* re-activate server socket events */
+			AdjustServerSocketEvents(postmaster, true);
}
}
@@ -3905,6 +3938,9 @@ PostmasterStateMachine(void)
pmState = PM_STARTUP;
/* crash recovery started, reset SIGKILL flag */
AbortStartTime = 0;
+
+		/* start accepting server socket connection events again */
+		reenable_server_socket_events = true;
}
}

I don't think reenable_server_socket_events does anything as the patch stands
- I don't see it being checked anywhere? And in the path above, you're using
AdjustServerSocketEvents() directly.

@@ -4094,6 +4130,7 @@ BackendStartup(Port *port)
/* Hasn't asked to be notified about any bgworkers yet */
bn->bgworker_notify = false;

+ PG_SETMASK(&BlockSig);
#ifdef EXEC_BACKEND
pid = backend_forkexec(port);
#else /* !EXEC_BACKEND */

There are other calls to fork_process() - why don't they need the same
treatment?

Perhaps we should add an assertion to fork_process() ensuring that signals are
masked?

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index eb3a569aae..3bfef592eb 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -283,6 +283,17 @@ InitializeLatchSupport(void)
#ifdef WAIT_USE_SIGNALFD
sigset_t	signalfd_mask;
+	if (IsUnderPostmaster)
+	{
+		if (signal_fd != -1)
+		{
+			/* Release postmaster's signal FD; ignore any error */
+			(void) close(signal_fd);
+			signal_fd = -1;
+			ReleaseExternalFD();
+		}
+	}
+

Hm - arguably it's a bug that we don't do this right now, correct?

@@ -1201,6 +1214,7 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
event->events == WL_POSTMASTER_DEATH ||
(event->events & (WL_SOCKET_READABLE |
WL_SOCKET_WRITEABLE |
+ WL_SOCKET_IGNORE |
WL_SOCKET_CLOSED)));

if (event->events == WL_POSTMASTER_DEATH)
@@ -1312,6 +1326,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
flags |= FD_WRITE;
if (event->events & WL_SOCKET_CONNECTED)
flags |= FD_CONNECT;
+ if (event->events & WL_SOCKET_ACCEPT)
+ flags |= FD_ACCEPT;

if (*handle == WSA_INVALID_EVENT)
{

I wonder if the code would end up easier to understand if we handled
WL_SOCKET_CONNECTED, WL_SOCKET_ACCEPT explicitly in the !WIN32 cases, rather
than redefining it to WL_SOCKET_READABLE.

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..655e881688 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
#include <signal.h>
#include <unistd.h>
#include <sys/resource.h>
-#include <sys/select.h>
#include <sys/socket.h>
#include <sys/time.h>

Do you know why this include even existed?

Greetings,

Andres Freund

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#6)
5 attachment(s)
Re: Using WaitEventSet in the postmaster

On Tue, Dec 6, 2022 at 7:09 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-12-05 22:45:57 +1300, Thomas Munro wrote:

The reason for the existing sleep-based approach was that we didn't
want to accept any more connections (or spin furiously because the
listen queue was non-empty). So in this version I invented a way to
suppress socket events temporarily with WL_SOCKET_IGNORE, and then
reactivate them after crash reinit.

That seems like an odd special flag. Why do we need it? Is it just because we
want to have assertions ensuring that something is being queried?

Yeah. Perhaps 0 would be a less clumsy way to say "no events please".
I removed the assertions and did it that way in this next iteration.

I realised that the previous approach didn't actually suppress POLLHUP
and POLLERR in the poll and epoll implementations (even though our
code seems to think it needs to ask for those events, it's not
necessary, you get them anyway), and, being level-triggered, if those
were ever reported we'd finish up pegging the CPU to 100% until the
children exited. Unlikely to happen with a server socket, but wrong
on principle, and maybe a problem for other potential users of this
temporary event suppression mode.

One way to fix that for the epoll version is to EPOLL_CTL_DEL and
EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask.
Tried like that in this version. Another approach would be to
(finally) write DeleteWaitEvent() to do the same thing at a higher
level... seems overkill for this.

The kqueue version was already doing that because of the way it was
implemented, and the poll and Windows versions needed only a small
adjustment. I'm not too sure about the Windows change; my two ideas
are passing the 0 through as shown in this version (not sure if it
really works the way I want, but it makes some sense and the
WSAEventSelect() call doesn't fail...), or sticking a dummy unsignaled
event in the array passed to WaitForMultipleObjects().

To make sure this code is exercised, I made the state machine code
eager about silencing the socket events during PM_WAIT_DEAD_END, so
crash TAP tests go through the cycle. Regular non-crash shutdown also
runs EPOLL_CTL_DEL/EV_DELETE, which stands out if you trace the
postmaster.

* WL_SOCKET_ACCEPT is a new event for an incoming connection (on Unix,
this is just another name for WL_SOCKET_READABLE, but Window has a
different underlying event; this mirrors WL_SOCKET_CONNECTED on the
other end of a connection)

Perhaps worth committing separately and soon? Seems pretty uncontroversial
from here.

Alright, I split this into a separate patch.

+/*
+ * Object representing the state of a postmaster.
+ *
+ * XXX Lots of global variables could move in here.
+ */
+typedef struct
+{
+     WaitEventSet    *wes;
+} Postmaster;
+

Seems weird to introduce this but then basically have it be unused. I'd say
either have a preceding patch move at least a few members into it, or just
omit it for now.

Alright, I'll just have to make a global variable wait_set for now to
keep things simple.

+     /* This may configure SIGURG, depending on platform. */
+     InitializeLatchSupport();
+     InitLocalLatch();

I'm mildly preferring InitProcessLocalLatch(), but not sure why - there's not
really a conflicting meaning of "local" here.

Done.

+/*
+ * Initialize the WaitEventSet we'll use in our main event loop.
+ */
+static void
+InitializeWaitSet(Postmaster *postmaster)
+{
+     /* Set up a WaitEventSet for our latch and listening sockets. */
+     postmaster->wes = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+     AddWaitEventToSet(postmaster->wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+     for (int i = 0; i < MAXLISTEN; i++)
+     {
+             int                     fd = ListenSocket[i];
+
+             if (fd == PGINVALID_SOCKET)
+                     break;
+             AddWaitEventToSet(postmaster->wes, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+     }
+}

The naming seems like it could be confused with latch.h
infrastructure. InitPostmasterWaitSet()?

OK.

+/*
+ * Activate or deactivate the server socket events.
+ */
+static void
+AdjustServerSocketEvents(Postmaster *postmaster, bool active)
+{
+     for (int pos = 1; pos < GetNumRegisteredWaitEvents(postmaster->wes); ++pos)
+             ModifyWaitEvent(postmaster->wes,
+                                             pos, active ? WL_SOCKET_ACCEPT : WL_SOCKET_IGNORE,
+                                             NULL);
+}

This seems to hardcode the specific wait events we're waiting for based on
latch.c infrastructure. Not really convinced that's a good idea.

What are you objecting to? The assumption that the first socket is at
position 1? The use of GetNumRegisteredWaitEvents()?

+                             /* Process work scheduled by signal handlers. */
+                             if (pending_action_request)
+                                     process_action_request(postmaster);
+                             if (pending_child_exit)
+                                     process_child_exit(postmaster);
+                             if (pending_reload_request)
+                                     process_reload_request();
+                             if (pending_shutdown_request)
+                                     process_shutdown_request(postmaster);

Is the order of operations here quite right? Shouldn't we process a shutdown
request before the others? And a child exit before the request to start an
autovac worker etc?

ISTM it should be 1) shutdown request 2) child exit 3) config reload 4) action
request.

OK, reordered like that.

- ereport(DEBUG2,
- (errmsg_internal("postmaster received signal %d",
- postgres_signal_arg)));

Hm, not having the "postmaster received signal" message anymore seems like a
loss when debugging things. I think process_shutdown_request() should emit
something like it.

I added some of these.

I wonder if we should have a elog_sighand() that's written to be signal
safe. I've written versions of that numerous times for debugging, and it's a
bit silly to do that over and over again.

Right, I was being dogmatic about kicking everything that doesn't have
a great big neon "async-signal-safe" sign on it out of the handlers.

+
+             /* start accepting server socket connection events again */
+             reenable_server_socket_events = true;
}
}

I don't think reenable_server_socket_events does anything as the patch stands
- I don't see it being checked anywhere? And in the path above, you're using
AdjustServerSocketEvents() directly.

Sorry, that was a left over unused variable from an earlier attempt,
which I only noticed after clicking send. Removed.

@@ -4094,6 +4130,7 @@ BackendStartup(Port *port)
/* Hasn't asked to be notified about any bgworkers yet */
bn->bgworker_notify = false;

+ PG_SETMASK(&BlockSig);
#ifdef EXEC_BACKEND
pid = backend_forkexec(port);
#else /* !EXEC_BACKEND */

There are other calls to fork_process() - why don't they need the same
treatment?

Perhaps we should add an assertion to fork_process() ensuring that signals are
masked?

If we're going to put an assertion in there, we might as well consider
setting and restoring the mask in that wrapper. Tried like that in
this version.

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index eb3a569aae..3bfef592eb 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -283,6 +283,17 @@ InitializeLatchSupport(void)
#ifdef WAIT_USE_SIGNALFD
sigset_t        signalfd_mask;
+     if (IsUnderPostmaster)
+     {
+             if (signal_fd != -1)
+             {
+                     /* Release postmaster's signal FD; ignore any error */
+                     (void) close(signal_fd);
+                     signal_fd = -1;
+                     ReleaseExternalFD();
+             }
+     }
+

Hm - arguably it's a bug that we don't do this right now, correct?

Yes, I would say it's a non-live bug. A signalfd descriptor inherited
by a child process isn't dangerous (it doesn't see the parent's
signals, it sees the child's signals), but it's a waste because we'd
leak it. I guess we could re-use it instead but that seems a little
weird. I've put this into a separate commit in case someone wants to
argue for back-patching, but it's a pretty hypothetical concern since
the postmaster never initialised latch support before...

One thing that does seem a bit odd to me, though, is why we're
cleaning up inherited descriptors in a function called
InitializeLatchSupport(). I wonder if we should move it into
FreeLatchSupportAfterFork()?

We should also close the postmaster's epoll fd, so I invented
FreeWaitEventSetAfterFork(). I found that ClosePostmasterPorts() was
a good place to call that, though it doesn't really fit the name of
that function too well...

@@ -1201,6 +1214,7 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
event->events == WL_POSTMASTER_DEATH ||
(event->events & (WL_SOCKET_READABLE |
WL_SOCKET_WRITEABLE |
+                                                      WL_SOCKET_IGNORE |
WL_SOCKET_CLOSED)));
if (event->events == WL_POSTMASTER_DEATH)
@@ -1312,6 +1326,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
flags |= FD_WRITE;
if (event->events & WL_SOCKET_CONNECTED)
flags |= FD_CONNECT;
+             if (event->events & WL_SOCKET_ACCEPT)
+                     flags |= FD_ACCEPT;

if (*handle == WSA_INVALID_EVENT)
{

I wonder if the code would end up easier to understand if we handled
WL_SOCKET_CONNECTED, WL_SOCKET_ACCEPT explicitly in the !WIN32 cases, rather
than redefining it to WL_SOCKET_READABLE.

Yeah maybe we could try that separately.

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..655e881688 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
#include <signal.h>
#include <unistd.h>
#include <sys/resource.h>
-#include <sys/select.h>
#include <sys/socket.h>
#include <sys/time.h>

Do you know why this include even existed?

That turned out to be a fun question to answer: apparently there used
to be an optional 'multiplexed backend' mode, removed by commit
d5bbe2aca5 in 1998. A single backend could be connected to multiple
frontends.

Attachments:

v4-0005-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchtext/x-patch; charset=US-ASCII; name=v4-0005-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From 435a15c4f017ce2f058abebc35b4d22c04e9f48a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v4 5/5] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (that
   would require latches robust against arbitrary corruption of shared
   memory).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX variable and sets the local latch, and a
   process_XXX part.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant: SUSv2 left it implementation-defined whether select()
   restarts, but didn't add that qualification for poll(), and it doesn't
   matter anyway because we call SetLatch() creating a new reason to wake
   up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqsignal.c          |  40 ---
 src/backend/postmaster/fork_process.c |  12 +-
 src/backend/postmaster/postmaster.c   | 377 ++++++++++++++------------
 src/backend/tcop/postgres.c           |   1 -
 src/backend/utils/init/miscinit.c     |  13 +-
 src/include/libpq/pqsignal.h          |   3 -
 src/include/miscadmin.h               |   1 +
 7 files changed, 223 insertions(+), 224 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 1ab34c5214..718043a39d 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index ec67761487..e1e7d91c52 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -12,24 +12,28 @@
 #include "postgres.h"
 
 #include <fcntl.h>
+#include <signal.h>
 #include <time.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <unistd.h>
 
+#include "libpq/pqsignal.h"
 #include "postmaster/fork_process.h"
 
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
- * child in the parent process.
+ * child in the parent process.  Signals are blocked while forking, so
+ * the child must unblock.
  */
 pid_t
 fork_process(void)
 {
 	pid_t		result;
 	const char *oomfilename;
+	sigset_t	save_mask;
 
 #ifdef LINUX_PROFILE
 	struct itimerval prof_itimer;
@@ -51,6 +55,7 @@ fork_process(void)
 	getitimer(ITIMER_PROF, &prof_itimer);
 #endif
 
+	sigprocmask(SIG_SETMASK, &BlockSig, &save_mask);
 	result = fork();
 	if (result == 0)
 	{
@@ -103,6 +108,11 @@ fork_process(void)
 		/* do post-fork initialization for random number generation */
 		pg_strong_random_init();
 	}
+	else
+	{
+		/* in parent, restore signal mask */
+		sigprocmask(SIG_SETMASK, &save_mask, NULL);
+	}
 
 	return result;
 }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a8a246921f..08dc80fcea 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include <time.h>
 #include <sys/wait.h>
 #include <ctype.h>
-#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <fcntl.h>
@@ -362,6 +361,15 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_action_request;
+static volatile sig_atomic_t pending_child_exit;
+static volatile sig_atomic_t pending_reload_request;
+static volatile sig_atomic_t pending_shutdown_request;
+
+/* I/O multiplexing event */
+static WaitEventSet *wait_set;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +388,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_action_request_signal(SIGNAL_ARGS);
+static void handle_child_exit_signal(SIGNAL_ARGS);
+static void handle_reload_request_signal(SIGNAL_ARGS);
+static void handle_shutdown_request_signal(SIGNAL_ARGS);
+static void process_action_request(void);
+static void process_child_exit(void);
+static void process_reload_request(void);
+static void process_shutdown_request(void);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -401,7 +413,6 @@ static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -609,26 +620,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +629,21 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
+	pqsignal(SIGHUP, handle_reload_request_signal);
+	pqsignal(SIGINT, handle_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN);	/* ignored */
+	pqsignal(SIGPIPE, SIG_IGN);	/* ignored */
+	pqsignal(SIGUSR1, handle_action_request_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_child_exit_signal);
 
-#ifdef SIGURG
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitProcessLocalLatch();
 
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	PG_SETMASK(&UnBlockSig);
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,15 +653,15 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN);	/* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN);	/* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN);	/* ignored */
 #endif
 
 	/*
@@ -1698,6 +1684,35 @@ DetermineSleepTime(struct timeval *timeout)
 	}
 }
 
+/*
+ * Initialize the WaitEventSet we'll use in our main event loop.
+ */
+static void
+InitializePostmasterWaitSet(void)
+{
+	/* Set up a WaitEventSet for our latch and listening sockets. */
+	wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+	for (int i = 0; i < MAXLISTEN; i++)
+	{
+		int			fd = ListenSocket[i];
+
+		if (fd == PGINVALID_SOCKET)
+			break;
+		AddWaitEventToSet(wait_set, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+	}
+}
+
+/*
+ * Activate or deactivate the server socket events.
+ */
+static void
+AdjustServerSocketEvents(bool active)
+{
+	for (int pos = 1; pos < GetNumRegisteredWaitEvents(wait_set); ++pos)
+		ModifyWaitEvent(wait_set, pos, active ? WL_SOCKET_ACCEPT : 0, NULL);
+}
+
 /*
  * Main idle loop of postmaster
  *
@@ -1706,97 +1721,62 @@ DetermineSleepTime(struct timeval *timeout)
 static int
 ServerLoop(void)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
+	InitializePostmasterWaitSet();
 	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
-	nSockets = initMasks(&readmask);
-
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
+		struct timeval timeout;
 
-		/*
-		 * Wait for a connection request to arrive.
-		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
-		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
-		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
-		if (pmState == PM_WAIT_DEAD_END)
-		{
-			PG_SETMASK(&UnBlockSig);
-
-			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
-
-			PG_SETMASK(&BlockSig);
-		}
-		else
-		{
-			/* must set timeout each time; some OSes change it! */
-			struct timeval timeout;
-
-			/* Needs to run with blocked signals! */
-			DetermineSleepTime(&timeout);
-
-			PG_SETMASK(&UnBlockSig);
-
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-
-			PG_SETMASK(&BlockSig);
-		}
+		DetermineSleepTime(&timeout);
 
-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
-		}
+		nevents = WaitEventSetWait(wait_set,
+								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   events,
+								   lengthof(events),
+								   0 /* postmaster posts no wait_events */);
 
 		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of our
+		 * sockets? If the latter, fork a child process to deal with it.
 		 */
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
 		{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+
+				/* Process work scheduled by signal handlers. */
+				if (pending_shutdown_request)
+					process_shutdown_request();
+				if (pending_child_exit)
+					process_child_exit();
+				if (pending_reload_request)
+					process_reload_request();
+				if (pending_action_request)
+					process_action_request();
+			}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
+
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure
+					 * in this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
@@ -1939,34 +1919,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2609,6 +2561,10 @@ ClosePostmasterPorts(bool am_syslogger)
 {
 	int			i;
 
+	/* Release resources held by the postmaster's WaitEventSet. */
+	if (wait_set)
+		FreeWaitEventSetAfterFork(wait_set);
+
 #ifndef WIN32
 
 	/*
@@ -2707,14 +2663,45 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * Child processes use SIGUSR1 to for pmsignals.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */
+static void
+handle_action_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	pending_action_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * pg_ctl uses SIGHUP to request a reload of the configuration files.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+handle_reload_request_signal(SIGNAL_ARGS)
 {
-	int			save_errno = errno;
+	int save_errno = errno;
+
+	pending_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Re-read config files, and tell children to do same.
+ */
+static void
+process_reload_request(void)
+{
+	pending_reload_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received reload request signal")));
 
 	if (Shutdown <= SmartShutdown)
 	{
@@ -2771,27 +2758,50 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
+}
+
+/*
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
+ */
+static void
+handle_shutdown_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	switch (postgres_signal_arg)
+	{
+		case SIGTERM:
+			pending_shutdown_request = SmartShutdown;
+			break;
+		case SIGINT:
+			pending_shutdown_request = FastShutdown;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = ImmediateShutdown;
+			break;
+	}
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * Process shutdown request.
  */
 static void
-pmdie(SIGNAL_ARGS)
+process_shutdown_request(void)
 {
-	int			save_errno = errno;
+	int		mode = pending_shutdown_request;
 
 	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+			(errmsg_internal("postmaster received shutdown request signal")));
 
-	switch (postgres_signal_arg)
-	{
-		case SIGTERM:
+	pending_shutdown_request = NoShutdown;
 
+	switch (mode)
+	{
+		case SmartShutdown:
 			/*
 			 * Smart Shutdown:
 			 *
@@ -2830,7 +2840,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGINT:
+		case FastShutdown:
 
 			/*
 			 * Fast Shutdown:
@@ -2871,7 +2881,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGQUIT:
+		case ImmediateShutdown:
 
 			/*
 			 * Immediate Shutdown:
@@ -2908,20 +2918,30 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 	}
+}
+
+static void
+handle_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_child_exit(void)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3213,8 +3233,6 @@ reaper(SIGNAL_ARGS)
 	 * or actions to make.
 	 */
 	PostmasterStateMachine();
-
-	errno = save_errno;
 }
 
 /*
@@ -3642,8 +3660,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_action_request(), which process the signals that might mean we need
+ * to change state.
  */
 static void
 PostmasterStateMachine(void)
@@ -3796,6 +3815,9 @@ PostmasterStateMachine(void)
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
+		/* Don't allow any new socket connection events. */
+		AdjustServerSocketEvents(false);
+
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3905,6 +3927,9 @@ PostmasterStateMachine(void)
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
 		AbortStartTime = 0;
+
+		/* start accepting server socket connection events again */
+		AdjustServerSocketEvents(true);
 	}
 }
 
@@ -5013,12 +5038,16 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends,
+ * and check for promote and logrotate requests from pg_ctl.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_action_request(void)
 {
-	int			save_errno = errno;
+	pending_action_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received action request signal")));
 
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
@@ -5159,8 +5188,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..655e881688 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
-#include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/time.h>
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index eb1046450b..1a8885b73e 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitProcessLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 7890b426a8..76eb380a4f 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 795182fa51..f64f81cf00 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitProcessLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
-- 
2.30.2

v4-0001-Add-WL_SOCKET_ACCEPT-event-to-WaitEventSet-API.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Add-WL_SOCKET_ACCEPT-event-to-WaitEventSet-API.patchDownload
From 3da9af2a9250ef052ba25be434f5bc01d4e36520 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 15:21:11 +1300
Subject: [PATCH v4 1/5] Add WL_SOCKET_ACCEPT event to WaitEventSet API.

To be able to handle incoming connections on a server socket with
the WaitEventSet API, we'll need a new kind of event to indicate that
the the socket is ready to accept a connection.

On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there
is a different kernel event that we need to map our abstraction to.

A future commit will use this.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 13 ++++++++++++-
 src/include/storage/latch.h     |  7 +++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index eb3a569aae..7ced8264f0 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -864,6 +864,9 @@ FreeWaitEventSet(WaitEventSet *set)
  * - WL_SOCKET_CONNECTED: Wait for socket connection to be established,
  *	 can be combined with other WL_SOCKET_* events (on non-Windows
  *	 platforms, this is the same as WL_SOCKET_WRITEABLE)
+ * - WL_SOCKET_ACCEPT: Wait for new connection to a server socket,
+ *	 can be combined with other WL_SOCKET_* events (on non-Windows
+ *	 platforms, this is the same as WL_SOCKET_READABLE)
  * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer.
  * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
  *
@@ -874,7 +877,7 @@ FreeWaitEventSet(WaitEventSet *set)
  * i.e. it must be a process-local latch initialized with InitLatch, or a
  * shared latch associated with the current process by calling OwnLatch.
  *
- * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED cases, EOF and error
+ * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED/ACCEPT cases, EOF and error
  * conditions cause the socket to be reported as readable/writable/connected,
  * so that the caller can deal with the condition.
  *
@@ -1312,6 +1315,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
 			flags |= FD_WRITE;
 		if (event->events & WL_SOCKET_CONNECTED)
 			flags |= FD_CONNECT;
+		if (event->events & WL_SOCKET_ACCEPT)
+			flags |= FD_ACCEPT;
 
 		if (*handle == WSA_INVALID_EVENT)
 		{
@@ -2067,6 +2072,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* connected */
 			occurred_events->events |= WL_SOCKET_CONNECTED;
 		}
+		if ((cur_event->events & WL_SOCKET_ACCEPT) &&
+			(resEvents.lNetworkEvents & FD_ACCEPT))
+		{
+			/* incoming connection could be accepted */
+			occurred_events->events |= WL_SOCKET_ACCEPT;
+		}
 		if (resEvents.lNetworkEvents & FD_CLOSE)
 		{
 			/* EOF/error, so signal all caller-requested socket flags */
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 68ab740f16..c55838db60 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -135,9 +135,16 @@ typedef struct Latch
 #define WL_SOCKET_CONNECTED  WL_SOCKET_WRITEABLE
 #endif
 #define WL_SOCKET_CLOSED 	 (1 << 7)
+#ifdef WIN32
+#define WL_SOCKET_ACCEPT	 (1 << 8)
+#else
+/* avoid having to deal with case on platforms not requiring it */
+#define WL_SOCKET_ACCEPT	WL_SOCKET_READABLE
+#endif
 #define WL_SOCKET_MASK		(WL_SOCKET_READABLE | \
 							 WL_SOCKET_WRITEABLE | \
 							 WL_SOCKET_CONNECTED | \
+							 WL_SOCKET_ACCEPT | \
 							 WL_SOCKET_CLOSED)
 
 typedef struct WaitEvent
-- 
2.30.2

v4-0002-Don-t-leak-a-signalfd-when-using-latches-in-the-p.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Don-t-leak-a-signalfd-when-using-latches-in-the-p.patchDownload
From 61480441f67ca7fac96ca4bcfe85f27013a47aa8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 16:13:36 +1300
Subject: [PATCH v4 2/5] Don't leak a signalfd when using latches in the
 postmaster.

At the time of commit 6a2a70a02 we didn't use latch infrastructure in
the postmaster.  We're planning to start doing that, so we'd better make
sure that the signalfd inherited from a postmaster is not duplicated and
then leaked in the child.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 7ced8264f0..c4b9153690 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -283,6 +283,22 @@ InitializeLatchSupport(void)
 #ifdef WAIT_USE_SIGNALFD
 	sigset_t	signalfd_mask;
 
+	if (IsUnderPostmaster)
+	{
+		/*
+		 * It would probably be safe to re-use the inherited signalfd since
+		 * signalfds only see the current processes pending signals, but it
+		 * seems less surprising to close it and create our own.
+		 */
+		if (signal_fd != -1)
+		{
+			/* Release postmaster's signal FD; ignore any error */
+			(void) close(signal_fd);
+			signal_fd = -1;
+			ReleaseExternalFD();
+		}
+	}
+
 	/* Block SIGURG, because we'll receive it through a signalfd. */
 	sigaddset(&UnBlockSig, SIGURG);
 
-- 
2.30.2

v4-0003-Allow-parent-s-WaitEventSets-to-be-freed-after-fo.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Allow-parent-s-WaitEventSets-to-be-freed-after-fo.patchDownload
From e3a0156228684520c17f3aaa8b1ef60c5f15b350 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 16:24:05 +1300
Subject: [PATCH v4 3/5] Allow parent's WaitEventSets to be freed after fork().

An epoll fd belonging to the parent should be closed in the child.  A
kqueue fd is automatically closed, but we should adjust our counter.
For poll and Windows systems, nothing special is required.  On all
systems we free the memory.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 17 +++++++++++++++++
 src/include/storage/latch.h     |  1 +
 2 files changed, 18 insertions(+)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index c4b9153690..51c239eefa 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -869,6 +869,23 @@ FreeWaitEventSet(WaitEventSet *set)
 	pfree(set);
 }
 
+/*
+ * Free a previously created WaitEventSet in a child process after a fork().
+ */
+void
+FreeWaitEventSetAfterFork(WaitEventSet *set)
+{
+#if defined(WAIT_USE_EPOLL)
+	close(set->epoll_fd);
+	ReleaseExternalFD();
+#elif defined(WAIT_USE_KQUEUE)
+	/* kqueues are not normally inherited by child processes */
+	ReleaseExternalFD();
+#endif
+
+	pfree(set);
+}
+
 /* ---
  * Add an event to the set. Possible events are:
  * - WL_LATCH_SET: Wait for the latch to be set
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index c55838db60..63a1fc440c 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -175,6 +175,7 @@ extern void ShutdownLatchSupport(void);
 
 extern WaitEventSet *CreateWaitEventSet(MemoryContext context, int nevents);
 extern void FreeWaitEventSet(WaitEventSet *set);
+extern void FreeWaitEventSetAfterFork(WaitEventSet *set);
 extern int	AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd,
 							  Latch *latch, void *user_data);
 extern void ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch);
-- 
2.30.2

v4-0004-Allow-socket-WaitEvents-to-be-temporarily-blocked.patchtext/x-patch; charset=US-ASCII; name=v4-0004-Allow-socket-WaitEvents-to-be-temporarily-blocked.patchDownload
From 7ceb93c7602ffb80ae32bbd87705dc638640c38a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 16:40:48 +1300
Subject: [PATCH v4 4/5] Allow socket WaitEvents to be temporarily blocked.

Allow ModifyWaitEvent(.., ..., 0, ...) as a way to suppress events from
a socket that we are not interested in.  This also blocks event errors
from being reported.  Another call to ModifyWaitEvent() can be used to
reenable events.

For kqueue, no change other than removing an assertion, because the code
already deletes all events for 0 (that falls out of having separate
events for read and write).

For epoll, teach ModifyWaitEvent() calls to delete and re-add to
transition between non-zero and zero event masks.

For poll, use fd -1 (allowed by POSIX for an empty entry) to make sure
we don't get any events.

For Windows, suppress the request for FD_CLOSE, which conveys errors.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 36 ++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 51c239eefa..1d6fe69ef7 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1008,14 +1008,14 @@ void
 ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
 {
 	WaitEvent  *event;
-#if defined(WAIT_USE_KQUEUE)
+#if defined(WAIT_USE_KQUEUE) || defined(WAIT_USE_EPOLL)
 	int			old_events;
 #endif
 
 	Assert(pos < set->nevents);
 
 	event = &set->events[pos];
-#if defined(WAIT_USE_KQUEUE)
+#if defined(WAIT_USE_KQUEUE) || defined(WAIT_USE_EPOLL)
 	old_events = event->events;
 #endif
 
@@ -1065,7 +1065,12 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
 	}
 
 #if defined(WAIT_USE_EPOLL)
-	WaitEventAdjustEpoll(set, event, EPOLL_CTL_MOD);
+	if (events == 0 && old_events != 0)
+		WaitEventAdjustEpoll(set, event, EPOLL_CTL_DEL);
+	else if (events != 0 && old_events == 0)
+		WaitEventAdjustEpoll(set, event, EPOLL_CTL_ADD);
+	else
+		WaitEventAdjustEpoll(set, event, EPOLL_CTL_MOD);
 #elif defined(WAIT_USE_KQUEUE)
 	WaitEventAdjustKqueue(set, event, old_events);
 #elif defined(WAIT_USE_POLL)
@@ -1103,9 +1108,6 @@ WaitEventAdjustEpoll(WaitEventSet *set, WaitEvent *event, int action)
 	else
 	{
 		Assert(event->fd != PGINVALID_SOCKET);
-		Assert(event->events & (WL_SOCKET_READABLE |
-								WL_SOCKET_WRITEABLE |
-								WL_SOCKET_CLOSED));
 
 		if (event->events & WL_SOCKET_READABLE)
 			epoll_ev.events |= EPOLLIN;
@@ -1149,6 +1151,14 @@ WaitEventAdjustPoll(WaitEventSet *set, WaitEvent *event)
 	{
 		pollfd->events = POLLIN;
 	}
+	else if (event->events == 0)
+	{
+		/*
+		 * If we're suppressing all socket events, remove the file descriptor
+		 * so poll() ignores this entry.
+		 */
+		pollfd->fd = -1;
+	}
 	else
 	{
 		Assert(event->events & (WL_SOCKET_READABLE |
@@ -1233,11 +1243,6 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 		return;
 
 	Assert(event->events != WL_LATCH_SET || set->latch != NULL);
-	Assert(event->events == WL_LATCH_SET ||
-		   event->events == WL_POSTMASTER_DEATH ||
-		   (event->events & (WL_SOCKET_READABLE |
-							 WL_SOCKET_WRITEABLE |
-							 WL_SOCKET_CLOSED)));
 
 	if (event->events == WL_POSTMASTER_DEATH)
 	{
@@ -1340,7 +1345,14 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
 	}
 	else
 	{
-		int			flags = FD_CLOSE;	/* always check for errors/EOF */
+		int			flags = 0;
+
+		/*
+		 * Check for errors/EOF unless we're completely suppressing all events
+		 * for this socket.
+		 */
+		if (event->events != 0)
+			flags = FD_CLOSE;
 
 		if (event->events & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-- 
2.30.2

#8Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#7)
Re: Using WaitEventSet in the postmaster

On 2022-12-07 00:58:06 +1300, Thomas Munro wrote:

One way to fix that for the epoll version is to EPOLL_CTL_DEL and
EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask.
Tried like that in this version. Another approach would be to
(finally) write DeleteWaitEvent() to do the same thing at a higher
level... seems overkill for this.

What about just recreating the WES during crash restart?

This seems to hardcode the specific wait events we're waiting for based on
latch.c infrastructure. Not really convinced that's a good idea.

What are you objecting to? The assumption that the first socket is at
position 1? The use of GetNumRegisteredWaitEvents()?

The latter.

#9Justin Pryzby
pryzby@telsasoft.com
In reply to: Thomas Munro (#7)
Re: Using WaitEventSet in the postmaster

From 61480441f67ca7fac96ca4bcfe85f27013a47aa8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 16:13:36 +1300
Subject: [PATCH v4 2/5] Don't leak a signalfd when using latches in the
postmaster.

+		/*
+		 * It would probably be safe to re-use the inherited signalfd since
+		 * signalfds only see the current processes pending signals, but it

I think you mean "current process's", right ?

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#8)
4 attachment(s)
Re: Using WaitEventSet in the postmaster

On Wed, Dec 7, 2022 at 12:12 PM Andres Freund <andres@anarazel.de> wrote:

On 2022-12-07 00:58:06 +1300, Thomas Munro wrote:

One way to fix that for the epoll version is to EPOLL_CTL_DEL and
EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask.
Tried like that in this version. Another approach would be to
(finally) write DeleteWaitEvent() to do the same thing at a higher
level... seems overkill for this.

What about just recreating the WES during crash restart?

It seems a bit like cheating but yeah that's a super simple solution,
and removes one patch from the stack. Done like that in this version.

This seems to hardcode the specific wait events we're waiting for based on
latch.c infrastructure. Not really convinced that's a good idea.

What are you objecting to? The assumption that the first socket is at
position 1? The use of GetNumRegisteredWaitEvents()?

The latter.

Removed.

Attachments:

v5-0001-Add-WL_SOCKET_ACCEPT-event-to-WaitEventSet-API.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Add-WL_SOCKET_ACCEPT-event-to-WaitEventSet-API.patchDownload
From 07b04dc410118ad04fd0006edda7ba80f241357a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 15:21:11 +1300
Subject: [PATCH v5 1/4] Add WL_SOCKET_ACCEPT event to WaitEventSet API.

To be able to handle incoming connections on a server socket with
the WaitEventSet API, we'll need a new kind of event to indicate that
the the socket is ready to accept a connection.

On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there
is a different kernel event that we need to map our abstraction to.

A future commit will use this.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 13 ++++++++++++-
 src/include/storage/latch.h     |  7 +++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index eb3a569aae..7ced8264f0 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -864,6 +864,9 @@ FreeWaitEventSet(WaitEventSet *set)
  * - WL_SOCKET_CONNECTED: Wait for socket connection to be established,
  *	 can be combined with other WL_SOCKET_* events (on non-Windows
  *	 platforms, this is the same as WL_SOCKET_WRITEABLE)
+ * - WL_SOCKET_ACCEPT: Wait for new connection to a server socket,
+ *	 can be combined with other WL_SOCKET_* events (on non-Windows
+ *	 platforms, this is the same as WL_SOCKET_READABLE)
  * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer.
  * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
  *
@@ -874,7 +877,7 @@ FreeWaitEventSet(WaitEventSet *set)
  * i.e. it must be a process-local latch initialized with InitLatch, or a
  * shared latch associated with the current process by calling OwnLatch.
  *
- * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED cases, EOF and error
+ * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED/ACCEPT cases, EOF and error
  * conditions cause the socket to be reported as readable/writable/connected,
  * so that the caller can deal with the condition.
  *
@@ -1312,6 +1315,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
 			flags |= FD_WRITE;
 		if (event->events & WL_SOCKET_CONNECTED)
 			flags |= FD_CONNECT;
+		if (event->events & WL_SOCKET_ACCEPT)
+			flags |= FD_ACCEPT;
 
 		if (*handle == WSA_INVALID_EVENT)
 		{
@@ -2067,6 +2072,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* connected */
 			occurred_events->events |= WL_SOCKET_CONNECTED;
 		}
+		if ((cur_event->events & WL_SOCKET_ACCEPT) &&
+			(resEvents.lNetworkEvents & FD_ACCEPT))
+		{
+			/* incoming connection could be accepted */
+			occurred_events->events |= WL_SOCKET_ACCEPT;
+		}
 		if (resEvents.lNetworkEvents & FD_CLOSE)
 		{
 			/* EOF/error, so signal all caller-requested socket flags */
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 68ab740f16..c55838db60 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -135,9 +135,16 @@ typedef struct Latch
 #define WL_SOCKET_CONNECTED  WL_SOCKET_WRITEABLE
 #endif
 #define WL_SOCKET_CLOSED 	 (1 << 7)
+#ifdef WIN32
+#define WL_SOCKET_ACCEPT	 (1 << 8)
+#else
+/* avoid having to deal with case on platforms not requiring it */
+#define WL_SOCKET_ACCEPT	WL_SOCKET_READABLE
+#endif
 #define WL_SOCKET_MASK		(WL_SOCKET_READABLE | \
 							 WL_SOCKET_WRITEABLE | \
 							 WL_SOCKET_CONNECTED | \
+							 WL_SOCKET_ACCEPT | \
 							 WL_SOCKET_CLOSED)
 
 typedef struct WaitEvent
-- 
2.35.1

v5-0002-Don-t-leak-a-signalfd-when-using-latches-in-the-p.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Don-t-leak-a-signalfd-when-using-latches-in-the-p.patchDownload
From 827866959dbbe537f6677271093f6d7730bd2527 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 16:13:36 +1300
Subject: [PATCH v5 2/4] Don't leak a signalfd when using latches in the
 postmaster.

At the time of commit 6a2a70a02 we didn't use latch infrastructure in
the postmaster.  We're planning to start doing that, so we'd better make
sure that the signalfd inherited from a postmaster is not duplicated and
then leaked in the child.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 7ced8264f0..b32c96b63d 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -283,6 +283,22 @@ InitializeLatchSupport(void)
 #ifdef WAIT_USE_SIGNALFD
 	sigset_t	signalfd_mask;
 
+	if (IsUnderPostmaster)
+	{
+		/*
+		 * It would probably be safe to re-use the inherited signalfd since
+		 * signalfds only see the current process's pending signals, but it
+		 * seems less surprising to close it and create our own.
+		 */
+		if (signal_fd != -1)
+		{
+			/* Release postmaster's signal FD; ignore any error */
+			(void) close(signal_fd);
+			signal_fd = -1;
+			ReleaseExternalFD();
+		}
+	}
+
 	/* Block SIGURG, because we'll receive it through a signalfd. */
 	sigaddset(&UnBlockSig, SIGURG);
 
-- 
2.35.1

v5-0003-Allow-parent-s-WaitEventSets-to-be-freed-after-fo.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Allow-parent-s-WaitEventSets-to-be-freed-after-fo.patchDownload
From 6cdba2a3e68b23e4bec06e9db3feffdf64cd80cb Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 16:24:05 +1300
Subject: [PATCH v5 3/4] Allow parent's WaitEventSets to be freed after fork().

An epoll fd belonging to the parent should be closed in the child.  A
kqueue fd is automatically closed, but we should adjust our counter.
For poll and Windows systems, nothing special is required.  On all
systems we free the memory.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 17 +++++++++++++++++
 src/include/storage/latch.h     |  1 +
 2 files changed, 18 insertions(+)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index b32c96b63d..de4fbcdfb9 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -869,6 +869,23 @@ FreeWaitEventSet(WaitEventSet *set)
 	pfree(set);
 }
 
+/*
+ * Free a previously created WaitEventSet in a child process after a fork().
+ */
+void
+FreeWaitEventSetAfterFork(WaitEventSet *set)
+{
+#if defined(WAIT_USE_EPOLL)
+	close(set->epoll_fd);
+	ReleaseExternalFD();
+#elif defined(WAIT_USE_KQUEUE)
+	/* kqueues are not normally inherited by child processes */
+	ReleaseExternalFD();
+#endif
+
+	pfree(set);
+}
+
 /* ---
  * Add an event to the set. Possible events are:
  * - WL_LATCH_SET: Wait for the latch to be set
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index c55838db60..63a1fc440c 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -175,6 +175,7 @@ extern void ShutdownLatchSupport(void);
 
 extern WaitEventSet *CreateWaitEventSet(MemoryContext context, int nevents);
 extern void FreeWaitEventSet(WaitEventSet *set);
+extern void FreeWaitEventSetAfterFork(WaitEventSet *set);
 extern int	AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd,
 							  Latch *latch, void *user_data);
 extern void ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch);
-- 
2.35.1

v5-0004-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchtext/x-patch; charset=US-ASCII; name=v5-0004-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From 7f7bb05d8e85b15d6113aa68e330caaa9f882718 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v5 4/4] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (that
   would require latches robust against arbitrary corruption of shared
   memory).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX variable and sets the local latch, and a
   process_XXX part.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant: SUSv2 left it implementation-defined whether select()
   restarts, but didn't add that qualification for poll(), and it doesn't
   matter anyway because we call SetLatch() creating a new reason to wake
   up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqsignal.c          |  40 ---
 src/backend/postmaster/fork_process.c |  12 +-
 src/backend/postmaster/postmaster.c   | 379 ++++++++++++++------------
 src/backend/tcop/postgres.c           |   1 -
 src/backend/utils/init/miscinit.c     |  13 +-
 src/include/libpq/pqsignal.h          |   3 -
 src/include/miscadmin.h               |   1 +
 7 files changed, 225 insertions(+), 224 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 1ab34c5214..718043a39d 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index ec67761487..e1e7d91c52 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -12,24 +12,28 @@
 #include "postgres.h"
 
 #include <fcntl.h>
+#include <signal.h>
 #include <time.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <unistd.h>
 
+#include "libpq/pqsignal.h"
 #include "postmaster/fork_process.h"
 
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
- * child in the parent process.
+ * child in the parent process.  Signals are blocked while forking, so
+ * the child must unblock.
  */
 pid_t
 fork_process(void)
 {
 	pid_t		result;
 	const char *oomfilename;
+	sigset_t	save_mask;
 
 #ifdef LINUX_PROFILE
 	struct itimerval prof_itimer;
@@ -51,6 +55,7 @@ fork_process(void)
 	getitimer(ITIMER_PROF, &prof_itimer);
 #endif
 
+	sigprocmask(SIG_SETMASK, &BlockSig, &save_mask);
 	result = fork();
 	if (result == 0)
 	{
@@ -103,6 +108,11 @@ fork_process(void)
 		/* do post-fork initialization for random number generation */
 		pg_strong_random_init();
 	}
+	else
+	{
+		/* in parent, restore signal mask */
+		sigprocmask(SIG_SETMASK, &save_mask, NULL);
+	}
 
 	return result;
 }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a8a246921f..f7e5972114 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include <time.h>
 #include <sys/wait.h>
 #include <ctype.h>
-#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <fcntl.h>
@@ -362,6 +361,15 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_action_request;
+static volatile sig_atomic_t pending_child_exit;
+static volatile sig_atomic_t pending_reload_request;
+static volatile sig_atomic_t pending_shutdown_request;
+
+/* I/O multiplexing event */
+static WaitEventSet *wait_set;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +388,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_action_request_signal(SIGNAL_ARGS);
+static void handle_child_exit_signal(SIGNAL_ARGS);
+static void handle_reload_request_signal(SIGNAL_ARGS);
+static void handle_shutdown_request_signal(SIGNAL_ARGS);
+static void process_action_request(void);
+static void process_child_exit(void);
+static void process_reload_request(void);
+static void process_shutdown_request(void);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -401,7 +413,6 @@ static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -609,26 +620,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +629,21 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
+	pqsignal(SIGHUP, handle_reload_request_signal);
+	pqsignal(SIGINT, handle_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN);	/* ignored */
+	pqsignal(SIGPIPE, SIG_IGN);	/* ignored */
+	pqsignal(SIGUSR1, handle_action_request_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_child_exit_signal);
 
-#ifdef SIGURG
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitProcessLocalLatch();
 
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	PG_SETMASK(&UnBlockSig);
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,15 +653,15 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN);	/* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN);	/* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN);	/* ignored */
 #endif
 
 	/*
@@ -1698,6 +1684,37 @@ DetermineSleepTime(struct timeval *timeout)
 	}
 }
 
+/*
+ * Activate or deactivate notifications of server socket events.  Since we
+ * don't currently have a way to remove events from an existing WaitEventSet,
+ * we'll just destroy and recreate the whole thing.  This is called during
+ * shutdown so we can wait for backends to exit without accepting new
+ * connections, and during crash reinitialization when we need to start
+ * listening for new connections again.
+ */
+static void
+ConfigurePostmasterWaitSet(bool accept_connections)
+{
+	if (wait_set)
+		FreeWaitEventSet(wait_set);
+	wait_set = NULL;
+y
+	wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+
+	if (accept_connections)
+	{
+		for (int i = 0; i < MAXLISTEN; i++)
+		{
+			int			fd = ListenSocket[i];
+
+			if (fd == PGINVALID_SOCKET)
+				break;
+			AddWaitEventToSet(wait_set, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+		}
+	}
+}
+
 /*
  * Main idle loop of postmaster
  *
@@ -1706,97 +1723,62 @@ DetermineSleepTime(struct timeval *timeout)
 static int
 ServerLoop(void)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
+	ConfigurePostmasterWaitSet(true);
 	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
-	nSockets = initMasks(&readmask);
-
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
+		struct timeval timeout;
 
-		/*
-		 * Wait for a connection request to arrive.
-		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
-		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
-		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
-		if (pmState == PM_WAIT_DEAD_END)
-		{
-			PG_SETMASK(&UnBlockSig);
+		DetermineSleepTime(&timeout);
 
-			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
-
-			PG_SETMASK(&BlockSig);
-		}
-		else
-		{
-			/* must set timeout each time; some OSes change it! */
-			struct timeval timeout;
-
-			/* Needs to run with blocked signals! */
-			DetermineSleepTime(&timeout);
-
-			PG_SETMASK(&UnBlockSig);
-
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-
-			PG_SETMASK(&BlockSig);
-		}
-
-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
-		}
+		nevents = WaitEventSetWait(wait_set,
+								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   events,
+								   lengthof(events),
+								   0 /* postmaster posts no wait_events */);
 
 		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of our
+		 * sockets? If the latter, fork a child process to deal with it.
 		 */
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
 		{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+
+				/* Process work scheduled by signal handlers. */
+				if (pending_shutdown_request)
+					process_shutdown_request();
+				if (pending_child_exit)
+					process_child_exit();
+				if (pending_reload_request)
+					process_reload_request();
+				if (pending_action_request)
+					process_action_request();
+			}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
+
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure
+					 * in this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
@@ -1939,34 +1921,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2609,6 +2563,10 @@ ClosePostmasterPorts(bool am_syslogger)
 {
 	int			i;
 
+	/* Release resources held by the postmaster's WaitEventSet. */
+	if (wait_set)
+		FreeWaitEventSetAfterFork(wait_set);
+
 #ifndef WIN32
 
 	/*
@@ -2707,14 +2665,45 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * Child processes use SIGUSR1 to for pmsignals.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */
+static void
+handle_action_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	pending_action_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * pg_ctl uses SIGHUP to request a reload of the configuration files.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+handle_reload_request_signal(SIGNAL_ARGS)
 {
-	int			save_errno = errno;
+	int save_errno = errno;
+
+	pending_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Re-read config files, and tell children to do same.
+ */
+static void
+process_reload_request(void)
+{
+	pending_reload_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received reload request signal")));
 
 	if (Shutdown <= SmartShutdown)
 	{
@@ -2771,27 +2760,50 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
+}
+
+/*
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
+ */
+static void
+handle_shutdown_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	switch (postgres_signal_arg)
+	{
+		case SIGTERM:
+			pending_shutdown_request = SmartShutdown;
+			break;
+		case SIGINT:
+			pending_shutdown_request = FastShutdown;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = ImmediateShutdown;
+			break;
+	}
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * Process shutdown request.
  */
 static void
-pmdie(SIGNAL_ARGS)
+process_shutdown_request(void)
 {
-	int			save_errno = errno;
+	int		mode = pending_shutdown_request;
 
 	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+			(errmsg_internal("postmaster received shutdown request signal")));
 
-	switch (postgres_signal_arg)
-	{
-		case SIGTERM:
+	pending_shutdown_request = NoShutdown;
 
+	switch (mode)
+	{
+		case SmartShutdown:
 			/*
 			 * Smart Shutdown:
 			 *
@@ -2830,7 +2842,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGINT:
+		case FastShutdown:
 
 			/*
 			 * Fast Shutdown:
@@ -2871,7 +2883,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGQUIT:
+		case ImmediateShutdown:
 
 			/*
 			 * Immediate Shutdown:
@@ -2908,20 +2920,30 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 	}
+}
+
+static void
+handle_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_child_exit(void)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3213,8 +3235,6 @@ reaper(SIGNAL_ARGS)
 	 * or actions to make.
 	 */
 	PostmasterStateMachine();
-
-	errno = save_errno;
 }
 
 /*
@@ -3642,8 +3662,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_action_request(), which process the signals that might mean we need
+ * to change state.
  */
 static void
 PostmasterStateMachine(void)
@@ -3796,6 +3817,9 @@ PostmasterStateMachine(void)
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
+		/* Don't allow any new socket connection events. */
+		ConfigurePostmasterWaitSet(false);
+
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3905,6 +3929,9 @@ PostmasterStateMachine(void)
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
 		AbortStartTime = 0;
+
+		/* start accepting server socket connection events again */
+		ConfigurePostmasterWaitSet(true);
 	}
 }
 
@@ -5013,12 +5040,16 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends,
+ * and check for promote and logrotate requests from pg_ctl.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_action_request(void)
 {
-	int			save_errno = errno;
+	pending_action_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received action request signal")));
 
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
@@ -5159,8 +5190,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..655e881688 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
-#include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/time.h>
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index eb1046450b..1a8885b73e 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitProcessLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 7890b426a8..76eb380a4f 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 795182fa51..f64f81cf00 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitProcessLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
-- 
2.35.1

#11Thomas Munro
thomas.munro@gmail.com
In reply to: Justin Pryzby (#9)
Re: Using WaitEventSet in the postmaster

On Wed, Dec 7, 2022 at 2:08 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

+             /*
+              * It would probably be safe to re-use the inherited signalfd since
+              * signalfds only see the current processes pending signals, but it

I think you mean "current process's", right ?

Fixed in v5, thanks.

#12Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#10)
Re: Using WaitEventSet in the postmaster

Hi,

On 2022-12-07 14:12:37 +1300, Thomas Munro wrote:

On Wed, Dec 7, 2022 at 12:12 PM Andres Freund <andres@anarazel.de> wrote:

On 2022-12-07 00:58:06 +1300, Thomas Munro wrote:

One way to fix that for the epoll version is to EPOLL_CTL_DEL and
EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask.
Tried like that in this version. Another approach would be to
(finally) write DeleteWaitEvent() to do the same thing at a higher
level... seems overkill for this.

What about just recreating the WES during crash restart?

It seems a bit like cheating but yeah that's a super simple solution,
and removes one patch from the stack. Done like that in this version.

I somewhat wish we'd do that more aggressively during crash-restart, rather
than the opposite. Mostly around shared memory contents though, so perhaps
that's not that comparable...

Greetings,

Andres Freund

#13Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#12)
4 attachment(s)
Re: Using WaitEventSet in the postmaster

Oops, v5 was broken as visible on cfbot (a last second typo broke it).
Here's a better one.

Attachments:

v6-0001-Add-WL_SOCKET_ACCEPT-event-to-WaitEventSet-API.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Add-WL_SOCKET_ACCEPT-event-to-WaitEventSet-API.patchDownload
From 07b04dc410118ad04fd0006edda7ba80f241357a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 15:21:11 +1300
Subject: [PATCH v6 1/4] Add WL_SOCKET_ACCEPT event to WaitEventSet API.

To be able to handle incoming connections on a server socket with
the WaitEventSet API, we'll need a new kind of event to indicate that
the the socket is ready to accept a connection.

On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there
is a different kernel event that we need to map our abstraction to.

A future commit will use this.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 13 ++++++++++++-
 src/include/storage/latch.h     |  7 +++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index eb3a569aae..7ced8264f0 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -864,6 +864,9 @@ FreeWaitEventSet(WaitEventSet *set)
  * - WL_SOCKET_CONNECTED: Wait for socket connection to be established,
  *	 can be combined with other WL_SOCKET_* events (on non-Windows
  *	 platforms, this is the same as WL_SOCKET_WRITEABLE)
+ * - WL_SOCKET_ACCEPT: Wait for new connection to a server socket,
+ *	 can be combined with other WL_SOCKET_* events (on non-Windows
+ *	 platforms, this is the same as WL_SOCKET_READABLE)
  * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer.
  * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
  *
@@ -874,7 +877,7 @@ FreeWaitEventSet(WaitEventSet *set)
  * i.e. it must be a process-local latch initialized with InitLatch, or a
  * shared latch associated with the current process by calling OwnLatch.
  *
- * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED cases, EOF and error
+ * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED/ACCEPT cases, EOF and error
  * conditions cause the socket to be reported as readable/writable/connected,
  * so that the caller can deal with the condition.
  *
@@ -1312,6 +1315,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
 			flags |= FD_WRITE;
 		if (event->events & WL_SOCKET_CONNECTED)
 			flags |= FD_CONNECT;
+		if (event->events & WL_SOCKET_ACCEPT)
+			flags |= FD_ACCEPT;
 
 		if (*handle == WSA_INVALID_EVENT)
 		{
@@ -2067,6 +2072,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* connected */
 			occurred_events->events |= WL_SOCKET_CONNECTED;
 		}
+		if ((cur_event->events & WL_SOCKET_ACCEPT) &&
+			(resEvents.lNetworkEvents & FD_ACCEPT))
+		{
+			/* incoming connection could be accepted */
+			occurred_events->events |= WL_SOCKET_ACCEPT;
+		}
 		if (resEvents.lNetworkEvents & FD_CLOSE)
 		{
 			/* EOF/error, so signal all caller-requested socket flags */
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 68ab740f16..c55838db60 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -135,9 +135,16 @@ typedef struct Latch
 #define WL_SOCKET_CONNECTED  WL_SOCKET_WRITEABLE
 #endif
 #define WL_SOCKET_CLOSED 	 (1 << 7)
+#ifdef WIN32
+#define WL_SOCKET_ACCEPT	 (1 << 8)
+#else
+/* avoid having to deal with case on platforms not requiring it */
+#define WL_SOCKET_ACCEPT	WL_SOCKET_READABLE
+#endif
 #define WL_SOCKET_MASK		(WL_SOCKET_READABLE | \
 							 WL_SOCKET_WRITEABLE | \
 							 WL_SOCKET_CONNECTED | \
+							 WL_SOCKET_ACCEPT | \
 							 WL_SOCKET_CLOSED)
 
 typedef struct WaitEvent
-- 
2.35.1

v6-0002-Don-t-leak-a-signalfd-when-using-latches-in-the-p.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Don-t-leak-a-signalfd-when-using-latches-in-the-p.patchDownload
From 827866959dbbe537f6677271093f6d7730bd2527 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 16:13:36 +1300
Subject: [PATCH v6 2/4] Don't leak a signalfd when using latches in the
 postmaster.

At the time of commit 6a2a70a02 we didn't use latch infrastructure in
the postmaster.  We're planning to start doing that, so we'd better make
sure that the signalfd inherited from a postmaster is not duplicated and
then leaked in the child.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 7ced8264f0..b32c96b63d 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -283,6 +283,22 @@ InitializeLatchSupport(void)
 #ifdef WAIT_USE_SIGNALFD
 	sigset_t	signalfd_mask;
 
+	if (IsUnderPostmaster)
+	{
+		/*
+		 * It would probably be safe to re-use the inherited signalfd since
+		 * signalfds only see the current process's pending signals, but it
+		 * seems less surprising to close it and create our own.
+		 */
+		if (signal_fd != -1)
+		{
+			/* Release postmaster's signal FD; ignore any error */
+			(void) close(signal_fd);
+			signal_fd = -1;
+			ReleaseExternalFD();
+		}
+	}
+
 	/* Block SIGURG, because we'll receive it through a signalfd. */
 	sigaddset(&UnBlockSig, SIGURG);
 
-- 
2.35.1

v6-0003-Allow-parent-s-WaitEventSets-to-be-freed-after-fo.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Allow-parent-s-WaitEventSets-to-be-freed-after-fo.patchDownload
From 6cdba2a3e68b23e4bec06e9db3feffdf64cd80cb Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 6 Dec 2022 16:24:05 +1300
Subject: [PATCH v6 3/4] Allow parent's WaitEventSets to be freed after fork().

An epoll fd belonging to the parent should be closed in the child.  A
kqueue fd is automatically closed, but we should adjust our counter.
For poll and Windows systems, nothing special is required.  On all
systems we free the memory.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 17 +++++++++++++++++
 src/include/storage/latch.h     |  1 +
 2 files changed, 18 insertions(+)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index b32c96b63d..de4fbcdfb9 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -869,6 +869,23 @@ FreeWaitEventSet(WaitEventSet *set)
 	pfree(set);
 }
 
+/*
+ * Free a previously created WaitEventSet in a child process after a fork().
+ */
+void
+FreeWaitEventSetAfterFork(WaitEventSet *set)
+{
+#if defined(WAIT_USE_EPOLL)
+	close(set->epoll_fd);
+	ReleaseExternalFD();
+#elif defined(WAIT_USE_KQUEUE)
+	/* kqueues are not normally inherited by child processes */
+	ReleaseExternalFD();
+#endif
+
+	pfree(set);
+}
+
 /* ---
  * Add an event to the set. Possible events are:
  * - WL_LATCH_SET: Wait for the latch to be set
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index c55838db60..63a1fc440c 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -175,6 +175,7 @@ extern void ShutdownLatchSupport(void);
 
 extern WaitEventSet *CreateWaitEventSet(MemoryContext context, int nevents);
 extern void FreeWaitEventSet(WaitEventSet *set);
+extern void FreeWaitEventSetAfterFork(WaitEventSet *set);
 extern int	AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd,
 							  Latch *latch, void *user_data);
 extern void ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch);
-- 
2.35.1

v6-0004-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchtext/x-patch; charset=US-ASCII; name=v6-0004-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From 1d708127bc626b7d59c35a0b59ece99b089ae1b8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v6 4/4] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (that
   would require latches robust against arbitrary corruption of shared
   memory).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX variable and sets the local latch, and a
   process_XXX part.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant: SUSv2 left it implementation-defined whether select()
   restarts, but didn't add that qualification for poll(), and it doesn't
   matter anyway because we call SetLatch() creating a new reason to wake
   up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqsignal.c          |  40 ---
 src/backend/postmaster/fork_process.c |  12 +-
 src/backend/postmaster/postmaster.c   | 379 ++++++++++++++------------
 src/backend/tcop/postgres.c           |   1 -
 src/backend/utils/init/miscinit.c     |  13 +-
 src/include/libpq/pqsignal.h          |   3 -
 src/include/miscadmin.h               |   1 +
 7 files changed, 225 insertions(+), 224 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 1ab34c5214..718043a39d 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index ec67761487..e1e7d91c52 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -12,24 +12,28 @@
 #include "postgres.h"
 
 #include <fcntl.h>
+#include <signal.h>
 #include <time.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <unistd.h>
 
+#include "libpq/pqsignal.h"
 #include "postmaster/fork_process.h"
 
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
- * child in the parent process.
+ * child in the parent process.  Signals are blocked while forking, so
+ * the child must unblock.
  */
 pid_t
 fork_process(void)
 {
 	pid_t		result;
 	const char *oomfilename;
+	sigset_t	save_mask;
 
 #ifdef LINUX_PROFILE
 	struct itimerval prof_itimer;
@@ -51,6 +55,7 @@ fork_process(void)
 	getitimer(ITIMER_PROF, &prof_itimer);
 #endif
 
+	sigprocmask(SIG_SETMASK, &BlockSig, &save_mask);
 	result = fork();
 	if (result == 0)
 	{
@@ -103,6 +108,11 @@ fork_process(void)
 		/* do post-fork initialization for random number generation */
 		pg_strong_random_init();
 	}
+	else
+	{
+		/* in parent, restore signal mask */
+		sigprocmask(SIG_SETMASK, &save_mask, NULL);
+	}
 
 	return result;
 }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a8a246921f..d51202f53f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include <time.h>
 #include <sys/wait.h>
 #include <ctype.h>
-#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <fcntl.h>
@@ -362,6 +361,15 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_action_request;
+static volatile sig_atomic_t pending_child_exit;
+static volatile sig_atomic_t pending_reload_request;
+static volatile sig_atomic_t pending_shutdown_request;
+
+/* I/O multiplexing event */
+static WaitEventSet *wait_set;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +388,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_action_request_signal(SIGNAL_ARGS);
+static void handle_child_exit_signal(SIGNAL_ARGS);
+static void handle_reload_request_signal(SIGNAL_ARGS);
+static void handle_shutdown_request_signal(SIGNAL_ARGS);
+static void process_action_request(void);
+static void process_child_exit(void);
+static void process_reload_request(void);
+static void process_shutdown_request(void);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -401,7 +413,6 @@ static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -609,26 +620,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +629,21 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
+	pqsignal(SIGHUP, handle_reload_request_signal);
+	pqsignal(SIGINT, handle_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN);	/* ignored */
+	pqsignal(SIGPIPE, SIG_IGN);	/* ignored */
+	pqsignal(SIGUSR1, handle_action_request_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_child_exit_signal);
 
-#ifdef SIGURG
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitProcessLocalLatch();
 
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	PG_SETMASK(&UnBlockSig);
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,15 +653,15 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN);	/* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN);	/* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN);	/* ignored */
 #endif
 
 	/*
@@ -1698,6 +1684,37 @@ DetermineSleepTime(struct timeval *timeout)
 	}
 }
 
+/*
+ * Activate or deactivate notifications of server socket events.  Since we
+ * don't currently have a way to remove events from an existing WaitEventSet,
+ * we'll just destroy and recreate the whole thing.  This is called during
+ * shutdown so we can wait for backends to exit without accepting new
+ * connections, and during crash reinitialization when we need to start
+ * listening for new connections again.
+ */
+static void
+ConfigurePostmasterWaitSet(bool accept_connections)
+{
+	if (wait_set)
+		FreeWaitEventSet(wait_set);
+	wait_set = NULL;
+
+	wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+
+	if (accept_connections)
+	{
+		for (int i = 0; i < MAXLISTEN; i++)
+		{
+			int			fd = ListenSocket[i];
+
+			if (fd == PGINVALID_SOCKET)
+				break;
+			AddWaitEventToSet(wait_set, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+		}
+	}
+}
+
 /*
  * Main idle loop of postmaster
  *
@@ -1706,97 +1723,62 @@ DetermineSleepTime(struct timeval *timeout)
 static int
 ServerLoop(void)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
+	ConfigurePostmasterWaitSet(true);
 	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
-	nSockets = initMasks(&readmask);
-
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
+		struct timeval timeout;
 
-		/*
-		 * Wait for a connection request to arrive.
-		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
-		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
-		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
-		if (pmState == PM_WAIT_DEAD_END)
-		{
-			PG_SETMASK(&UnBlockSig);
+		DetermineSleepTime(&timeout);
 
-			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
-
-			PG_SETMASK(&BlockSig);
-		}
-		else
-		{
-			/* must set timeout each time; some OSes change it! */
-			struct timeval timeout;
-
-			/* Needs to run with blocked signals! */
-			DetermineSleepTime(&timeout);
-
-			PG_SETMASK(&UnBlockSig);
-
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-
-			PG_SETMASK(&BlockSig);
-		}
-
-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
-		}
+		nevents = WaitEventSetWait(wait_set,
+								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   events,
+								   lengthof(events),
+								   0 /* postmaster posts no wait_events */);
 
 		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of our
+		 * sockets? If the latter, fork a child process to deal with it.
 		 */
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
 		{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+
+				/* Process work scheduled by signal handlers. */
+				if (pending_shutdown_request)
+					process_shutdown_request();
+				if (pending_child_exit)
+					process_child_exit();
+				if (pending_reload_request)
+					process_reload_request();
+				if (pending_action_request)
+					process_action_request();
+			}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
+
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure
+					 * in this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
@@ -1939,34 +1921,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2609,6 +2563,10 @@ ClosePostmasterPorts(bool am_syslogger)
 {
 	int			i;
 
+	/* Release resources held by the postmaster's WaitEventSet. */
+	if (wait_set)
+		FreeWaitEventSetAfterFork(wait_set);
+
 #ifndef WIN32
 
 	/*
@@ -2707,14 +2665,45 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * Child processes use SIGUSR1 to send 'pmsignals'.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */
+static void
+handle_action_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	pending_action_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * pg_ctl uses SIGHUP to request a reload of the configuration files.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+handle_reload_request_signal(SIGNAL_ARGS)
 {
-	int			save_errno = errno;
+	int save_errno = errno;
+
+	pending_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Re-read config files, and tell children to do same.
+ */
+static void
+process_reload_request(void)
+{
+	pending_reload_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received reload request signal")));
 
 	if (Shutdown <= SmartShutdown)
 	{
@@ -2771,27 +2760,50 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
+}
+
+/*
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
+ */
+static void
+handle_shutdown_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	switch (postgres_signal_arg)
+	{
+		case SIGTERM:
+			pending_shutdown_request = SmartShutdown;
+			break;
+		case SIGINT:
+			pending_shutdown_request = FastShutdown;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = ImmediateShutdown;
+			break;
+	}
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * Process shutdown request.
  */
 static void
-pmdie(SIGNAL_ARGS)
+process_shutdown_request(void)
 {
-	int			save_errno = errno;
+	int		mode = pending_shutdown_request;
 
 	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+			(errmsg_internal("postmaster received shutdown request signal")));
 
-	switch (postgres_signal_arg)
-	{
-		case SIGTERM:
+	pending_shutdown_request = NoShutdown;
 
+	switch (mode)
+	{
+		case SmartShutdown:
 			/*
 			 * Smart Shutdown:
 			 *
@@ -2830,7 +2842,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGINT:
+		case FastShutdown:
 
 			/*
 			 * Fast Shutdown:
@@ -2871,7 +2883,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGQUIT:
+		case ImmediateShutdown:
 
 			/*
 			 * Immediate Shutdown:
@@ -2908,20 +2920,30 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 	}
+}
+
+static void
+handle_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_child_exit(void)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3213,8 +3235,6 @@ reaper(SIGNAL_ARGS)
 	 * or actions to make.
 	 */
 	PostmasterStateMachine();
-
-	errno = save_errno;
 }
 
 /*
@@ -3642,8 +3662,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_action_request(), which process the signals that might mean we need
+ * to change state.
  */
 static void
 PostmasterStateMachine(void)
@@ -3796,6 +3817,9 @@ PostmasterStateMachine(void)
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
+		/* Don't allow any new socket connection events. */
+		ConfigurePostmasterWaitSet(false);
+
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3905,6 +3929,9 @@ PostmasterStateMachine(void)
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
 		AbortStartTime = 0;
+
+		/* start accepting server socket connection events again */
+		ConfigurePostmasterWaitSet(true);
 	}
 }
 
@@ -5013,12 +5040,16 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends,
+ * and check for promote and logrotate requests from pg_ctl.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_action_request(void)
 {
-	int			save_errno = errno;
+	pending_action_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received action request signal")));
 
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
@@ -5159,8 +5190,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..655e881688 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
-#include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/time.h>
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index eb1046450b..1a8885b73e 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitProcessLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 7890b426a8..76eb380a4f 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 795182fa51..f64f81cf00 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitProcessLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
-- 
2.35.1

#14Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#13)
1 attachment(s)
Re: Using WaitEventSet in the postmaster

I pushed the small preliminary patches. Here's a rebase of the main patch.

Attachments:

v7-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From d23fba75cf693ffabc068a36424b7be22342c1b2 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v7] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (that
   would require latches robust against arbitrary corruption of shared
   memory).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX variable and sets the local latch, and a
   process_XXX part.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant: SUSv2 left it implementation-defined whether select()
   restarts, but didn't add that qualification for poll(), and it doesn't
   matter anyway because we call SetLatch() creating a new reason to wake
   up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqsignal.c          |  40 ---
 src/backend/postmaster/fork_process.c |  12 +-
 src/backend/postmaster/postmaster.c   | 379 ++++++++++++++------------
 src/backend/tcop/postgres.c           |   1 -
 src/backend/utils/init/miscinit.c     |  13 +-
 src/include/libpq/pqsignal.h          |   3 -
 src/include/miscadmin.h               |   1 +
 7 files changed, 225 insertions(+), 224 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 1ab34c5214..718043a39d 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index ec67761487..e1e7d91c52 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -12,24 +12,28 @@
 #include "postgres.h"
 
 #include <fcntl.h>
+#include <signal.h>
 #include <time.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <unistd.h>
 
+#include "libpq/pqsignal.h"
 #include "postmaster/fork_process.h"
 
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
- * child in the parent process.
+ * child in the parent process.  Signals are blocked while forking, so
+ * the child must unblock.
  */
 pid_t
 fork_process(void)
 {
 	pid_t		result;
 	const char *oomfilename;
+	sigset_t	save_mask;
 
 #ifdef LINUX_PROFILE
 	struct itimerval prof_itimer;
@@ -51,6 +55,7 @@ fork_process(void)
 	getitimer(ITIMER_PROF, &prof_itimer);
 #endif
 
+	sigprocmask(SIG_SETMASK, &BlockSig, &save_mask);
 	result = fork();
 	if (result == 0)
 	{
@@ -103,6 +108,11 @@ fork_process(void)
 		/* do post-fork initialization for random number generation */
 		pg_strong_random_init();
 	}
+	else
+	{
+		/* in parent, restore signal mask */
+		sigprocmask(SIG_SETMASK, &save_mask, NULL);
+	}
 
 	return result;
 }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index f459dab360..e107c18ff7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include <time.h>
 #include <sys/wait.h>
 #include <ctype.h>
-#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <fcntl.h>
@@ -362,6 +361,15 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_action_request;
+static volatile sig_atomic_t pending_child_exit;
+static volatile sig_atomic_t pending_reload_request;
+static volatile sig_atomic_t pending_shutdown_request;
+
+/* I/O multiplexing object */
+static WaitEventSet *wait_set;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +388,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_action_request_signal(SIGNAL_ARGS);
+static void handle_child_exit_signal(SIGNAL_ARGS);
+static void handle_reload_request_signal(SIGNAL_ARGS);
+static void handle_shutdown_request_signal(SIGNAL_ARGS);
+static void process_action_request(void);
+static void process_child_exit(void);
+static void process_reload_request(void);
+static void process_shutdown_request(void);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -401,7 +413,6 @@ static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -609,26 +620,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +629,21 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
+	pqsignal(SIGHUP, handle_reload_request_signal);
+	pqsignal(SIGINT, handle_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN);	/* ignored */
+	pqsignal(SIGPIPE, SIG_IGN);	/* ignored */
+	pqsignal(SIGUSR1, handle_action_request_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_child_exit_signal);
 
-#ifdef SIGURG
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitProcessLocalLatch();
 
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	PG_SETMASK(&UnBlockSig);
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,15 +653,15 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN);	/* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN);	/* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN);	/* ignored */
 #endif
 
 	/*
@@ -1698,6 +1684,37 @@ DetermineSleepTime(struct timeval *timeout)
 	}
 }
 
+/*
+ * Activate or deactivate notifications of server socket events.  Since we
+ * don't currently have a way to remove events from an existing WaitEventSet,
+ * we'll just destroy and recreate the whole thing.  This is called during
+ * shutdown so we can wait for backends to exit without accepting new
+ * connections, and during crash reinitialization when we need to start
+ * listening for new connections again.
+ */
+static void
+ConfigurePostmasterWaitSet(bool accept_connections)
+{
+	if (wait_set)
+		FreeWaitEventSet(wait_set);
+	wait_set = NULL;
+
+	wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+
+	if (accept_connections)
+	{
+		for (int i = 0; i < MAXLISTEN; i++)
+		{
+			int			fd = ListenSocket[i];
+
+			if (fd == PGINVALID_SOCKET)
+				break;
+			AddWaitEventToSet(wait_set, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+		}
+	}
+}
+
 /*
  * Main idle loop of postmaster
  *
@@ -1706,97 +1723,62 @@ DetermineSleepTime(struct timeval *timeout)
 static int
 ServerLoop(void)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
+	ConfigurePostmasterWaitSet(true);
 	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
-	nSockets = initMasks(&readmask);
-
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
+		struct timeval timeout;
 
-		/*
-		 * Wait for a connection request to arrive.
-		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
-		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
-		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
-		if (pmState == PM_WAIT_DEAD_END)
-		{
-			PG_SETMASK(&UnBlockSig);
+		DetermineSleepTime(&timeout);
 
-			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
-
-			PG_SETMASK(&BlockSig);
-		}
-		else
-		{
-			/* must set timeout each time; some OSes change it! */
-			struct timeval timeout;
-
-			/* Needs to run with blocked signals! */
-			DetermineSleepTime(&timeout);
-
-			PG_SETMASK(&UnBlockSig);
-
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-
-			PG_SETMASK(&BlockSig);
-		}
-
-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
-		}
+		nevents = WaitEventSetWait(wait_set,
+								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   events,
+								   lengthof(events),
+								   0 /* postmaster posts no wait_events */);
 
 		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of our
+		 * sockets? If the latter, fork a child process to deal with it.
 		 */
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
 		{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+
+				/* Process work scheduled by signal handlers. */
+				if (pending_shutdown_request)
+					process_shutdown_request();
+				if (pending_child_exit)
+					process_child_exit();
+				if (pending_reload_request)
+					process_reload_request();
+				if (pending_action_request)
+					process_action_request();
+			}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
+
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure
+					 * in this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
@@ -1939,34 +1921,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2609,6 +2563,10 @@ ClosePostmasterPorts(bool am_syslogger)
 {
 	int			i;
 
+	/* Release resources held by the postmaster's WaitEventSet. */
+	if (wait_set)
+		FreeWaitEventSetAfterFork(wait_set);
+
 #ifndef WIN32
 
 	/*
@@ -2707,14 +2665,45 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * Child processes use SIGUSR1 to send 'pmsignals'.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */
+static void
+handle_action_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	pending_action_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * pg_ctl uses SIGHUP to request a reload of the configuration files.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+handle_reload_request_signal(SIGNAL_ARGS)
 {
-	int			save_errno = errno;
+	int save_errno = errno;
+
+	pending_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Re-read config files, and tell children to do same.
+ */
+static void
+process_reload_request(void)
+{
+	pending_reload_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received reload request signal")));
 
 	if (Shutdown <= SmartShutdown)
 	{
@@ -2771,27 +2760,50 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
+}
+
+/*
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
+ */
+static void
+handle_shutdown_request_signal(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+
+	switch (postgres_signal_arg)
+	{
+		case SIGTERM:
+			pending_shutdown_request = SmartShutdown;
+			break;
+		case SIGINT:
+			pending_shutdown_request = FastShutdown;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = ImmediateShutdown;
+			break;
+	}
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * Process shutdown request.
  */
 static void
-pmdie(SIGNAL_ARGS)
+process_shutdown_request(void)
 {
-	int			save_errno = errno;
+	int		mode = pending_shutdown_request;
 
 	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+			(errmsg_internal("postmaster received shutdown request signal")));
 
-	switch (postgres_signal_arg)
-	{
-		case SIGTERM:
+	pending_shutdown_request = NoShutdown;
 
+	switch (mode)
+	{
+		case SmartShutdown:
 			/*
 			 * Smart Shutdown:
 			 *
@@ -2830,7 +2842,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGINT:
+		case FastShutdown:
 
 			/*
 			 * Fast Shutdown:
@@ -2871,7 +2883,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGQUIT:
+		case ImmediateShutdown:
 
 			/*
 			 * Immediate Shutdown:
@@ -2908,20 +2920,30 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 	}
+}
+
+static void
+handle_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_child_exit(void)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3213,8 +3235,6 @@ reaper(SIGNAL_ARGS)
 	 * or actions to make.
 	 */
 	PostmasterStateMachine();
-
-	errno = save_errno;
 }
 
 /*
@@ -3642,8 +3662,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_action_request(), which process the signals that might mean we need
+ * to change state.
  */
 static void
 PostmasterStateMachine(void)
@@ -3796,6 +3817,9 @@ PostmasterStateMachine(void)
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
+		/* Don't allow any new socket connection events. */
+		ConfigurePostmasterWaitSet(false);
+
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3905,6 +3929,9 @@ PostmasterStateMachine(void)
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
 		AbortStartTime = 0;
+
+		/* start accepting server socket connection events again */
+		ConfigurePostmasterWaitSet(true);
 	}
 }
 
@@ -5013,12 +5040,16 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends,
+ * and check for promote and logrotate requests from pg_ctl.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_action_request(void)
 {
-	int			save_errno = errno;
+	pending_action_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received action request signal")));
 
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
@@ -5159,8 +5190,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 01d264b5ab..9b320889b0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
-#include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/time.h>
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index eb1046450b..1a8885b73e 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitProcessLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 7890b426a8..76eb380a4f 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 795182fa51..f64f81cf00 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitProcessLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
-- 
2.38.1

#15Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#14)
1 attachment(s)
Re: Using WaitEventSet in the postmaster

On Fri, Dec 23, 2022 at 8:46 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I pushed the small preliminary patches. Here's a rebase of the main patch.

Here are some questions I have considered. Anyone got an opinion
on point 3, in particular?

1. Is it OK that we are now using APIs that might throw, in places
where we weren't? I think so: we don't really expect WaitEventSet
APIs to throw unless something is pretty seriously wrong, and if you
hack things to inject failures there then you get a FATAL error and
the postmaster exits and the children detect that. I think that is
appropriate.

2. Is it really OK to delete the pqsignal_pm() infrastructure? I
think so. The need for sa_mask to block all signals is gone: all
signal handlers should now be re-entrant (ie safe in case of
interruption while already in a signal handler), safe against stack
overflow (pqsignal() still blocks re-entry for the *same* signal
number, because we use sigaction() without SA_NODEFER, so a handler
can only be interrupted by a different signal, and the number of
actions installed is finite and small), and safe to run at any time
(ie safe to interrupt the user context because we just do known-good
sigatomic_t/syscall stuff and save/restore errno). The concern about
SA_RESTART is gone, because we no longer use the underspecified
select() interface; the replacement implementation syscalls, even
poll(), return with EINTR for handlers installed with SA_RESTART, but
that's now moot anyway because we have a latch that guarantees they
return with a different event anyway. (FTR select() is nearly extinct
in BE code, I found one other user and I plan to remove it, see RADIUS
thread, CF #4103.)

3. Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
SIGINT? If you send all of these extremely rapidly, it's
indeterminate which one will be seen by handle_shutdown_request(). I
think that's probably acceptable? To be strict about processing only
the first one that is delivered, I think you'd need an sa_mask to
block all three signals, and then you wouldn't change
pending_shutdown_request if it's already set, which I'm willing to
code up if someone thinks that's important. (<vapourware>Ideally I
would invent WL_SIGNAL to consume signal events serially without
handlers or global variables</vapourware>.)

4. Is anything new leaking into child processes due to this new
infrastructure? I don't think so; the postmaster's MemoryContext is
destroyed, and before that I'm releasing kernel resources on OSes that
need it (namely Linux, where the epoll fd and signalfd need to be
closed).

5. Is the signal mask being correctly handled during forking? I
think so: I decided to push the masking logic directly into the
routine that forks, to make it easy to verify that all paths set the
mask the way we want. (While thinking about that I noticed that
signals don't seem to be initially masked on Windows; I think that's a
pre-existing condition, and I assume we get away with it because
nothing reaches the fake signal dispatch code soon enough to break
anything? Not changing that in this patch.)

6. Is the naming and style OK? Better ideas welcome, but basically I
tried to avoid all unnecessary refactoring and changes, so no real
logic moves around, and the changes are pretty close to "mechanical".
One bikeshed decision was what to call the {handle,process}_XXX
functions and associated flags. Maybe "action" isn't the best name;
but it could be a request from pg_ctl or a request from a child
process. I went with newly invented names for these handlers rather
than "handle_SIGUSR1" etc because (1) the 3 different shutdown request
signals point to a common handler and (2) I hope to switch to latches
instead of SIGUSR1 for "action" in later work. But I could switch to
got_SIGUSR1 style variables if someone thinks it's better.

Here's a new version, with small changes:
* remove a stray reference to select() in a pqcomm.c comment
* move PG_SETMASK(&UnBlockSig) below the bit that sets up SIGTTIN etc
* pgindent

Attachments:

v8-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchapplication/x-patch; name=v8-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From 95fd8e1a832ba063f7f1a34f6548de39ca8e691c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v8] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (that
   would require latches robust against arbitrary corruption of shared
   memory).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX variable and sets the local latch, and a
   process_XXX part.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant: SUSv2 left it implementation-defined whether select()
   restarts, but didn't add that qualification for poll(), and it doesn't
   matter anyway because we call SetLatch() creating a new reason to wake
   up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqcomm.c            |   3 +-
 src/backend/libpq/pqsignal.c          |  40 ---
 src/backend/postmaster/fork_process.c |  12 +-
 src/backend/postmaster/postmaster.c   | 379 ++++++++++++++------------
 src/backend/tcop/postgres.c           |   1 -
 src/backend/utils/init/miscinit.c     |  13 +-
 src/include/libpq/pqsignal.h          |   3 -
 src/include/miscadmin.h               |   1 +
 8 files changed, 227 insertions(+), 225 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 7a043bf6b0..864c9debe8 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -683,8 +683,7 @@ Setup_AF_UNIX(const char *sock_path)
  *		server port.  Set port->sock to the FD of the new connection.
  *
  * ASSUME: that this doesn't need to be non-blocking because
- *		the Postmaster uses select() to tell when the socket is ready for
- *		accept().
+ *		the Postmaster waits for the socket to be ready to accept().
  *
  * RETURNS: STATUS_OK or STATUS_ERROR
  */
diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index b815be6eea..d233e3a2fd 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index 569b52e849..9aafd0c385 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -12,24 +12,28 @@
 #include "postgres.h"
 
 #include <fcntl.h>
+#include <signal.h>
 #include <time.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <unistd.h>
 
+#include "libpq/pqsignal.h"
 #include "postmaster/fork_process.h"
 
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
- * child in the parent process.
+ * child in the parent process.  Signals are blocked while forking, so
+ * the child must unblock.
  */
 pid_t
 fork_process(void)
 {
 	pid_t		result;
 	const char *oomfilename;
+	sigset_t	save_mask;
 
 #ifdef LINUX_PROFILE
 	struct itimerval prof_itimer;
@@ -51,6 +55,7 @@ fork_process(void)
 	getitimer(ITIMER_PROF, &prof_itimer);
 #endif
 
+	sigprocmask(SIG_SETMASK, &BlockSig, &save_mask);
 	result = fork();
 	if (result == 0)
 	{
@@ -103,6 +108,11 @@ fork_process(void)
 		/* do post-fork initialization for random number generation */
 		pg_strong_random_init();
 	}
+	else
+	{
+		/* in parent, restore signal mask */
+		sigprocmask(SIG_SETMASK, &save_mask, NULL);
+	}
 
 	return result;
 }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eac3450774..405c13bad0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include <time.h>
 #include <sys/wait.h>
 #include <ctype.h>
-#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <fcntl.h>
@@ -362,6 +361,15 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_action_request;
+static volatile sig_atomic_t pending_child_exit;
+static volatile sig_atomic_t pending_reload_request;
+static volatile sig_atomic_t pending_shutdown_request;
+
+/* I/O multiplexing object */
+static WaitEventSet *wait_set;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +388,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_action_request_signal(SIGNAL_ARGS);
+static void handle_child_exit_signal(SIGNAL_ARGS);
+static void handle_reload_request_signal(SIGNAL_ARGS);
+static void handle_shutdown_request_signal(SIGNAL_ARGS);
+static void process_action_request(void);
+static void process_child_exit(void);
+static void process_reload_request(void);
+static void process_shutdown_request(void);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -401,7 +413,6 @@ static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -609,26 +620,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +629,19 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
+	pqsignal(SIGHUP, handle_reload_request_signal);
+	pqsignal(SIGINT, handle_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN); /* ignored */
+	pqsignal(SIGPIPE, SIG_IGN); /* ignored */
+	pqsignal(SIGUSR1, handle_action_request_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_child_exit_signal);
 
-#ifdef SIGURG
-
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitProcessLocalLatch();
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,17 +651,20 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN); /* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN); /* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN); /* ignored */
 #endif
 
+	/* Begin accepting signals. */
+	PG_SETMASK(&UnBlockSig);
+
 	/*
 	 * Options setup
 	 */
@@ -1698,6 +1685,37 @@ DetermineSleepTime(struct timeval *timeout)
 	}
 }
 
+/*
+ * Activate or deactivate notifications of server socket events.  Since we
+ * don't currently have a way to remove events from an existing WaitEventSet,
+ * we'll just destroy and recreate the whole thing.  This is called during
+ * shutdown so we can wait for backends to exit without accepting new
+ * connections, and during crash reinitialization when we need to start
+ * listening for new connections again.
+ */
+static void
+ConfigurePostmasterWaitSet(bool accept_connections)
+{
+	if (wait_set)
+		FreeWaitEventSet(wait_set);
+	wait_set = NULL;
+
+	wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
+
+	if (accept_connections)
+	{
+		for (int i = 0; i < MAXLISTEN; i++)
+		{
+			int			fd = ListenSocket[i];
+
+			if (fd == PGINVALID_SOCKET)
+				break;
+			AddWaitEventToSet(wait_set, WL_SOCKET_ACCEPT, fd, NULL, NULL);
+		}
+	}
+}
+
 /*
  * Main idle loop of postmaster
  *
@@ -1706,97 +1724,62 @@ DetermineSleepTime(struct timeval *timeout)
 static int
 ServerLoop(void)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
+	ConfigurePostmasterWaitSet(true);
 	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
-	nSockets = initMasks(&readmask);
-
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
+		struct timeval timeout;
 
-		/*
-		 * Wait for a connection request to arrive.
-		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
-		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
-		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
-		if (pmState == PM_WAIT_DEAD_END)
-		{
-			PG_SETMASK(&UnBlockSig);
-
-			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
+		DetermineSleepTime(&timeout);
 
-			PG_SETMASK(&BlockSig);
-		}
-		else
-		{
-			/* must set timeout each time; some OSes change it! */
-			struct timeval timeout;
-
-			/* Needs to run with blocked signals! */
-			DetermineSleepTime(&timeout);
-
-			PG_SETMASK(&UnBlockSig);
-
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-
-			PG_SETMASK(&BlockSig);
-		}
-
-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
-		}
+		nevents = WaitEventSetWait(wait_set,
+								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   events,
+								   lengthof(events),
+								   0 /* postmaster posts no wait_events */ );
 
 		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of
+		 * our sockets? If the latter, fork a child process to deal with it.
 		 */
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
 		{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+
+				/* Process work scheduled by signal handlers. */
+				if (pending_shutdown_request)
+					process_shutdown_request();
+				if (pending_child_exit)
+					process_child_exit();
+				if (pending_reload_request)
+					process_reload_request();
+				if (pending_action_request)
+					process_action_request();
+			}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
+
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure in
+					 * this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
@@ -1939,34 +1922,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2609,6 +2564,10 @@ ClosePostmasterPorts(bool am_syslogger)
 {
 	int			i;
 
+	/* Release resources held by the postmaster's WaitEventSet. */
+	if (wait_set)
+		FreeWaitEventSetAfterFork(wait_set);
+
 #ifndef WIN32
 
 	/*
@@ -2707,15 +2666,46 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * Child processes use SIGUSR1 to send 'pmsignals'.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */
+static void
+handle_action_request_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_action_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * pg_ctl uses SIGHUP to request a reload of the configuration files.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+handle_reload_request_signal(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	pending_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Re-read config files, and tell children to do same.
+ */
+static void
+process_reload_request(void)
+{
+	pending_reload_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received reload request signal")));
+
 	if (Shutdown <= SmartShutdown)
 	{
 		ereport(LOG,
@@ -2771,26 +2761,50 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
+}
+
+/*
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
+ */
+static void
+handle_shutdown_request_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	switch (postgres_signal_arg)
+	{
+		case SIGTERM:
+			pending_shutdown_request = SmartShutdown;
+			break;
+		case SIGINT:
+			pending_shutdown_request = FastShutdown;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = ImmediateShutdown;
+			break;
+	}
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * Process shutdown request.
  */
 static void
-pmdie(SIGNAL_ARGS)
+process_shutdown_request(void)
 {
-	int			save_errno = errno;
+	int			mode = pending_shutdown_request;
 
 	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+			(errmsg_internal("postmaster received shutdown request signal")));
 
-	switch (postgres_signal_arg)
+	pending_shutdown_request = NoShutdown;
+
+	switch (mode)
 	{
-		case SIGTERM:
+		case SmartShutdown:
 
 			/*
 			 * Smart Shutdown:
@@ -2830,7 +2844,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGINT:
+		case FastShutdown:
 
 			/*
 			 * Fast Shutdown:
@@ -2871,7 +2885,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGQUIT:
+		case ImmediateShutdown:
 
 			/*
 			 * Immediate Shutdown:
@@ -2908,20 +2922,30 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 	}
+}
+
+static void
+handle_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_child_exit(void)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3213,8 +3237,6 @@ reaper(SIGNAL_ARGS)
 	 * or actions to make.
 	 */
 	PostmasterStateMachine();
-
-	errno = save_errno;
 }
 
 /*
@@ -3642,8 +3664,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_action_request(), which process the signals that might mean we need
+ * to change state.
  */
 static void
 PostmasterStateMachine(void)
@@ -3796,6 +3819,9 @@ PostmasterStateMachine(void)
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
+		/* Don't allow any new socket connection events. */
+		ConfigurePostmasterWaitSet(false);
+
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3905,6 +3931,9 @@ PostmasterStateMachine(void)
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
 		AbortStartTime = 0;
+
+		/* start accepting server socket connection events again */
+		ConfigurePostmasterWaitSet(true);
 	}
 }
 
@@ -5013,12 +5042,16 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends,
+ * and check for promote and logrotate requests from pg_ctl.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_action_request(void)
 {
-	int			save_errno = errno;
+	pending_action_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received action request signal")));
 
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
@@ -5159,8 +5192,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 31479e8212..ea867ccc95 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
-#include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/time.h>
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 9b840a6318..0cdc1e11a3 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitProcessLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 29ee5ad2b6..1e66f25b76 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0ffeefc437..96b3a1e1a0 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitProcessLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
-- 
2.38.1

#16Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#15)
Re: Using WaitEventSet in the postmaster

Hi,

On 2023-01-07 11:08:36 +1300, Thomas Munro wrote:

1. Is it OK that we are now using APIs that might throw, in places
where we weren't? I think so: we don't really expect WaitEventSet
APIs to throw unless something is pretty seriously wrong, and if you
hack things to inject failures there then you get a FATAL error and
the postmaster exits and the children detect that. I think that is
appropriate.

I think it's ok in principle. It might be that we'll find some thing to fix in
the future, but I don't see anything fundamental or obvious.

2. Is it really OK to delete the pqsignal_pm() infrastructure? I
think so.

Same.

3. Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
SIGINT? If you send all of these extremely rapidly, it's
indeterminate which one will be seen by handle_shutdown_request().

That doesn't seem optimal. I'm mostly worried that we can end up downgrading a
shutdown request.

I think that's probably acceptable? To be strict about processing only the
first one that is delivered, I think you'd need an sa_mask to block all
three signals, and then you wouldn't change pending_shutdown_request if it's
already set, which I'm willing to code up if someone thinks that's
important. (<vapourware>Ideally I would invent WL_SIGNAL to consume signal
events serially without handlers or global variables</vapourware>.)

Hm. The need for blocking sa_mask solely comes from using one variable in
three signal handlers, right? It's not pretty, but to me the easiest fix here
seems to be have separate pending_{fast,smart,immediate}_shutdown_request
variables and deal with them in process_shutdown_request(). Might still make
sense to have one pending_shutdown_request variable, to avoid unnecessary
branches before calling process_shutdown_request().

5. Is the signal mask being correctly handled during forking? I
think so: I decided to push the masking logic directly into the
routine that forks, to make it easy to verify that all paths set the
mask the way we want.

Hm. If I understand correctly, you used sigprocmask() directly (vs
PG_SETMASK()) in fork_process() because you want the old mask? But why do we
restore the prior mask, instead of just using PG_SETMASK(&UnBlockSig); as we
still do in a bunch of places in the postmaster?

Not that I'm advocating for that, but would there be any real harm in just
continuing to accept signals post fork? Now all the signal handlers should
just end up pointlessly setting a local variable that's not going to be read
any further? If true, it'd be good to add a comment explaining that this is
just a belt-and-suspenders thing.

(While thinking about that I noticed that signals don't seem to be initially
masked on Windows; I think that's a pre-existing condition, and I assume we
get away with it because nothing reaches the fake signal dispatch code soon
enough to break anything? Not changing that in this patch.)

It's indeed a bit odd that we do pgwin32_signal_initialize() before the
initmask() and PG_SETMASK(&BlockSig) in InitPostmasterChild(). I guess it's
kinda harmless though?

I'm now somewhat weirded out by the choice to do pg_strong_random_init() in
fork_process() rather than InitPostmasterChild(). Seems odd.

6. Is the naming and style OK? Better ideas welcome, but basically I
tried to avoid all unnecessary refactoring and changes, so no real
logic moves around, and the changes are pretty close to "mechanical".
One bikeshed decision was what to call the {handle,process}_XXX
functions and associated flags.

I wonder if it'd be good to have a _pm_ in the name.

Maybe "action" isn't the best name;

Yea, I don't like it. A shutdown is also an action, etc. What about just using
_pmsignal_? It's a it odd because there's two signals in the name, but it
still feels better than 'action' and better than the existing sigusr1_handler.

+
+/* I/O multiplexing object */
+static WaitEventSet *wait_set;

I'd name it a bit more obviously connected to postmaster, particularly because
it does survive into forked processes and needs to be closed there.

+/*
+ * Activate or deactivate notifications of server socket events.  Since we
+ * don't currently have a way to remove events from an existing WaitEventSet,
+ * we'll just destroy and recreate the whole thing.  This is called during
+ * shutdown so we can wait for backends to exit without accepting new
+ * connections, and during crash reinitialization when we need to start
+ * listening for new connections again.
+ */

I'd maybe reference that this gets cleaned up via ClosePostmasterPorts(), it's
not *immediately* obvious.

+static void
+ConfigurePostmasterWaitSet(bool accept_connections)
+{
+	if (wait_set)
+		FreeWaitEventSet(wait_set);
+	wait_set = NULL;
+
+	wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+	AddWaitEventToSet(wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);

Is there any reason to use MAXLISTEN here? We know how many sockets w're
listening on by this point, I think? No idea if the overhead matters anywhere,
but ...

I guess all the other code already does so, but afaict we don't dynamically
allocate resources there for things like ListenSocket[].

-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
-		}
+		nevents = WaitEventSetWait(wait_set,
+								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   events,
+								   lengthof(events),
+								   0 /* postmaster posts no wait_events */ );
/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of
+		 * our sockets? If the latter, fork a child process to deal with it.
*/
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
{

Hm. This is preexisting behaviour, but now it seems somewhat odd that we might
end up happily forking a backend for each socket without checking signals
inbetween. Forking might take a while, and if a signal arrived since the
WaitEventSetWait() we'll not react to it.

static void
PostmasterStateMachine(void)
@@ -3796,6 +3819,9 @@ PostmasterStateMachine(void)

if (pmState == PM_WAIT_DEAD_END)
{
+		/* Don't allow any new socket connection events. */
+		ConfigurePostmasterWaitSet(false);

Hm. Is anything actually using the wait set until we re-create it with (true)
below?

Greetings,

Andres Freund

#17Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#16)
1 attachment(s)
Re: Using WaitEventSet in the postmaster

On Sat, Jan 7, 2023 at 12:25 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-01-07 11:08:36 +1300, Thomas Munro wrote:

3. Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
SIGINT? If you send all of these extremely rapidly, it's
indeterminate which one will be seen by handle_shutdown_request().

That doesn't seem optimal. I'm mostly worried that we can end up downgrading a
shutdown request.

I was contemplating whether I needed to do some more push-ups to
prefer the first delivered signal (instead of the last), but you're
saying that it would be enough to prefer the fastest shutdown type, in
cases where more than one signal was handled between server loops.
WFM.

It's not pretty, but to me the easiest fix here
seems to be have separate pending_{fast,smart,immediate}_shutdown_request
variables and deal with them in process_shutdown_request(). Might still make
sense to have one pending_shutdown_request variable, to avoid unnecessary
branches before calling process_shutdown_request().

OK, tried that way.

5. Is the signal mask being correctly handled during forking? I
think so: I decided to push the masking logic directly into the
routine that forks, to make it easy to verify that all paths set the
mask the way we want.

Hm. If I understand correctly, you used sigprocmask() directly (vs
PG_SETMASK()) in fork_process() because you want the old mask? But why do we
restore the prior mask, instead of just using PG_SETMASK(&UnBlockSig); as we
still do in a bunch of places in the postmaster?

It makes zero difference in practice but I think it's a nicer way to
code it because it doesn't make an unnecessary assumption about what
the signal mask was on entry.

Not that I'm advocating for that, but would there be any real harm in just
continuing to accept signals post fork? Now all the signal handlers should
just end up pointlessly setting a local variable that's not going to be read
any further? If true, it'd be good to add a comment explaining that this is
just a belt-and-suspenders thing.

Seems plausible and a nice idea to research. I think it might take
some analysis of important signals that children might miss before
they install their own handlers. Comment added.

6. Is the naming and style OK? Better ideas welcome, but basically I
tried to avoid all unnecessary refactoring and changes, so no real
logic moves around, and the changes are pretty close to "mechanical".
One bikeshed decision was what to call the {handle,process}_XXX
functions and associated flags.

I wonder if it'd be good to have a _pm_ in the name.

I dunno about this one, it's all static stuff in a file called
postmaster.c and one (now) already has pm in it (see below).

Maybe "action" isn't the best name;

Yea, I don't like it. A shutdown is also an action, etc. What about just using
_pmsignal_? It's a it odd because there's two signals in the name, but it
still feels better than 'action' and better than the existing sigusr1_handler.

Done.

+
+/* I/O multiplexing object */
+static WaitEventSet *wait_set;

I'd name it a bit more obviously connected to postmaster, particularly because
it does survive into forked processes and needs to be closed there.

Done, as pm_wait_set.

+/*
+ * Activate or deactivate notifications of server socket events.  Since we
+ * don't currently have a way to remove events from an existing WaitEventSet,
+ * we'll just destroy and recreate the whole thing.  This is called during
+ * shutdown so we can wait for backends to exit without accepting new
+ * connections, and during crash reinitialization when we need to start
+ * listening for new connections again.
+ */

I'd maybe reference that this gets cleaned up via ClosePostmasterPorts(), it's
not *immediately* obvious.

Done.

+static void
+ConfigurePostmasterWaitSet(bool accept_connections)
+{
+     if (wait_set)
+             FreeWaitEventSet(wait_set);
+     wait_set = NULL;
+
+     wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
+     AddWaitEventToSet(wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);

Is there any reason to use MAXLISTEN here? We know how many sockets w're
listening on by this point, I think? No idea if the overhead matters anywhere,
but ...

Fixed.

/*
-              * New connection pending on any of our sockets? If so, fork a child
-              * process to deal with it.
+              * Latch set by signal handler, or new connection pending on any of
+              * our sockets? If the latter, fork a child process to deal with it.
*/
-             if (selres > 0)
+             for (int i = 0; i < nevents; i++)
{

Hm. This is preexisting behaviour, but now it seems somewhat odd that we might
end up happily forking a backend for each socket without checking signals
inbetween. Forking might take a while, and if a signal arrived since the
WaitEventSetWait() we'll not react to it.

Right, so if you have 64 server sockets and they all have clients
waiting on their listen queue, we'll service one connection from each
before getting back to checking for pmsignals or shutdown, and that's
unchanged in this patch. (FWIW I also noticed that problem while
experimenting with the idea that you could handle multiple clients in
one go on OSes that report the listen queue depth size along with
WL_SOCKET_ACCEPT events, but didn't pursue it...)

I guess we could check every time through the nevents loop. I may
look into that in a later patch, but I prefer to keep the same policy
in this patch.

static void
PostmasterStateMachine(void)
@@ -3796,6 +3819,9 @@ PostmasterStateMachine(void)

if (pmState == PM_WAIT_DEAD_END)
{
+             /* Don't allow any new socket connection events. */
+             ConfigurePostmasterWaitSet(false);

Hm. Is anything actually using the wait set until we re-create it with (true)
below?

Yes. While in PM_WAIT_DEAD_END state, waiting for children to exit,
there may be clients trying to connect. On master, we have a special
pg_usleep(100000L) instead of select() just for that state so we can
ignore incoming connections while waiting for the SIGCHLD reaper to
advance our state, but in this new world that's not enough. We need
to wait for the latch to be set by handle_child_exit_signal(). So I
used the regular WES to wait for the latch (that is, no more special
case for that state), but I need to ignore socket events. If I
didn't, then an incoming connection sitting in the listen queue would
cause the server loop to burn 100% CPU reporting a level-triggered
WL_SOCKET_ACCEPT for sockets that we don't want to accept (yet).

Attachments:

v9-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchtext/x-patch; charset=US-ASCII; name=v9-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From d1ecf3c0ab86057adede3bd369bd8c8bbaf1e001 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v9] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (that
   would require latches robust against arbitrary corruption of shared
   memory).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX variable and sets the local latch, and a
   process_XXX part.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant: SUSv2 left it implementation-defined whether select()
   restarts, but didn't add that qualification for poll(), and it doesn't
   matter anyway because we call SetLatch() creating a new reason to wake
   up.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqcomm.c            |   3 +-
 src/backend/libpq/pqsignal.c          |  40 ---
 src/backend/postmaster/fork_process.c |  18 +-
 src/backend/postmaster/postmaster.c   | 405 +++++++++++++++-----------
 src/backend/tcop/postgres.c           |   1 -
 src/backend/utils/init/miscinit.c     |  13 +-
 src/include/libpq/pqsignal.h          |   3 -
 src/include/miscadmin.h               |   1 +
 8 files changed, 259 insertions(+), 225 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 7a043bf6b0..864c9debe8 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -683,8 +683,7 @@ Setup_AF_UNIX(const char *sock_path)
  *		server port.  Set port->sock to the FD of the new connection.
  *
  * ASSUME: that this doesn't need to be non-blocking because
- *		the Postmaster uses select() to tell when the socket is ready for
- *		accept().
+ *		the Postmaster waits for the socket to be ready to accept().
  *
  * RETURNS: STATUS_OK or STATUS_ERROR
  */
diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index b815be6eea..d233e3a2fd 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index 569b52e849..509587636e 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -12,24 +12,28 @@
 #include "postgres.h"
 
 #include <fcntl.h>
+#include <signal.h>
 #include <time.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <unistd.h>
 
+#include "libpq/pqsignal.h"
 #include "postmaster/fork_process.h"
 
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
- * child in the parent process.
+ * child in the parent process.  Signals are blocked while forking, so
+ * the child must unblock.
  */
 pid_t
 fork_process(void)
 {
 	pid_t		result;
 	const char *oomfilename;
+	sigset_t	save_mask;
 
 #ifdef LINUX_PROFILE
 	struct itimerval prof_itimer;
@@ -51,6 +55,13 @@ fork_process(void)
 	getitimer(ITIMER_PROF, &prof_itimer);
 #endif
 
+	/*
+	 * We start postmaster children with signals blocked.  This allows them to
+	 * install their own handlers before unblocking, to avoid races where they
+	 * might run the postmaster's handler and miss an important control signal.
+	 * With more analysis this could potentially be relaxed.
+	 */
+	sigprocmask(SIG_SETMASK, &BlockSig, &save_mask);
 	result = fork();
 	if (result == 0)
 	{
@@ -103,6 +114,11 @@ fork_process(void)
 		/* do post-fork initialization for random number generation */
 		pg_strong_random_init();
 	}
+	else
+	{
+		/* in parent, restore signal mask */
+		sigprocmask(SIG_SETMASK, &save_mask, NULL);
+	}
 
 	return result;
 }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eac3450774..ea035c91f5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include <time.h>
 #include <sys/wait.h>
 #include <ctype.h>
-#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <fcntl.h>
@@ -362,6 +361,17 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_pmsignal;
+static volatile sig_atomic_t pending_child_exit;
+static volatile sig_atomic_t pending_reload_request;
+static volatile sig_atomic_t pending_shutdown_request;
+static volatile sig_atomic_t pending_fast_shutdown_request;
+static volatile sig_atomic_t pending_immediate_shutdown_request;
+
+/* I/O multiplexing object */
+static WaitEventSet *pm_wait_set;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +390,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_pmsignal_signal(SIGNAL_ARGS);
+static void handle_child_exit_signal(SIGNAL_ARGS);
+static void handle_reload_request_signal(SIGNAL_ARGS);
+static void handle_shutdown_request_signal(SIGNAL_ARGS);
+static void process_pmsignal(void);
+static void process_child_exit(void);
+static void process_reload_request(void);
+static void process_shutdown_request(void);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -401,7 +415,6 @@ static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -609,26 +622,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +631,19 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
-
-#ifdef SIGURG
+	pqsignal(SIGHUP, handle_reload_request_signal);
+	pqsignal(SIGINT, handle_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN); /* ignored */
+	pqsignal(SIGPIPE, SIG_IGN); /* ignored */
+	pqsignal(SIGUSR1, handle_pmsignal_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_child_exit_signal);
 
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitProcessLocalLatch();
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,17 +653,20 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN); /* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN); /* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN); /* ignored */
 #endif
 
+	/* Begin accepting signals. */
+	PG_SETMASK(&UnBlockSig);
+
 	/*
 	 * Options setup
 	 */
@@ -1698,6 +1687,45 @@ DetermineSleepTime(struct timeval *timeout)
 	}
 }
 
+/*
+ * Activate or deactivate notifications of server socket events.  Since we
+ * don't currently have a way to remove events from an existing WaitEventSet,
+ * we'll just destroy and recreate the whole thing.  This is called during
+ * shutdown so we can wait for backends to exit without accepting new
+ * connections, and during crash reinitialization when we need to start
+ * listening for new connections again.  This will be freed in fork children by
+ * ClosePostmasterPorts().
+ */
+static void
+ConfigurePostmasterWaitSet(bool accept_connections)
+{
+	int			nsockets;
+
+	if (pm_wait_set)
+		FreeWaitEventSet(pm_wait_set);
+	pm_wait_set = NULL;
+
+	/* How many server sockets do we need to wait for? */
+	nsockets = 0;
+	if (accept_connections)
+	{
+		while (nsockets < MAXLISTEN &&
+			   ListenSocket[nsockets] != PGINVALID_SOCKET)
+			++nsockets;
+	}
+
+	pm_wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + nsockets);
+	AddWaitEventToSet(pm_wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch,
+					  NULL);
+
+	if (accept_connections)
+	{
+		for (int i = 0; i < nsockets; i++)
+			AddWaitEventToSet(pm_wait_set, WL_SOCKET_ACCEPT, ListenSocket[i],
+							  NULL, NULL);
+	}
+}
+
 /*
  * Main idle loop of postmaster
  *
@@ -1706,97 +1734,62 @@ DetermineSleepTime(struct timeval *timeout)
 static int
 ServerLoop(void)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
+	ConfigurePostmasterWaitSet(true);
 	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
-	nSockets = initMasks(&readmask);
-
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
+		struct timeval timeout;
 
-		/*
-		 * Wait for a connection request to arrive.
-		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
-		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
-		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
-		if (pmState == PM_WAIT_DEAD_END)
-		{
-			PG_SETMASK(&UnBlockSig);
-
-			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
-
-			PG_SETMASK(&BlockSig);
-		}
-		else
-		{
-			/* must set timeout each time; some OSes change it! */
-			struct timeval timeout;
-
-			/* Needs to run with blocked signals! */
-			DetermineSleepTime(&timeout);
-
-			PG_SETMASK(&UnBlockSig);
+		DetermineSleepTime(&timeout);
 
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-
-			PG_SETMASK(&BlockSig);
-		}
-
-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
-		}
+		nevents = WaitEventSetWait(pm_wait_set,
+								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   events,
+								   lengthof(events),
+								   0 /* postmaster posts no wait_events */ );
 
 		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of
+		 * our sockets? If the latter, fork a child process to deal with it.
 		 */
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
 		{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+
+				/* Process work scheduled by signal handlers. */
+				if (pending_shutdown_request)
+					process_shutdown_request();
+				if (pending_child_exit)
+					process_child_exit();
+				if (pending_reload_request)
+					process_reload_request();
+				if (pending_pmsignal)
+					process_pmsignal();
+			}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
+
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure in
+					 * this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
@@ -1939,34 +1932,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2609,6 +2574,10 @@ ClosePostmasterPorts(bool am_syslogger)
 {
 	int			i;
 
+	/* Release resources held by the postmaster's WaitEventSet. */
+	if (pm_wait_set)
+		FreeWaitEventSetAfterFork(pm_wait_set);
+
 #ifndef WIN32
 
 	/*
@@ -2707,15 +2676,46 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * Child processes use SIGUSR1 to send 'pmsignals'.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */
+static void
+handle_pmsignal_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_pmsignal = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * pg_ctl uses SIGHUP to request a reload of the configuration files.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+handle_reload_request_signal(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	pending_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Re-read config files, and tell children to do same.
+ */
+static void
+process_reload_request(void)
+{
+	pending_reload_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received reload request signal")));
+
 	if (Shutdown <= SmartShutdown)
 	{
 		ereport(LOG,
@@ -2771,26 +2771,66 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
+}
+
+/*
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
+ */
+static void
+handle_shutdown_request_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	switch (postgres_signal_arg)
+	{
+		case SIGTERM:
+			pending_shutdown_request = true;
+			/* smart is implied if the other two flags aren't set */
+			break;
+		case SIGINT:
+			pending_shutdown_request = true;
+			pending_fast_shutdown_request = true;
+			break;
+		case SIGQUIT:
+			pending_shutdown_request = true;
+			pending_immediate_shutdown_request = true;
+			break;
+	}
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * Process shutdown request.
  */
 static void
-pmdie(SIGNAL_ARGS)
+process_shutdown_request(void)
 {
-	int			save_errno = errno;
+	int			mode;
 
 	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+			(errmsg_internal("postmaster received shutdown request signal")));
 
-	switch (postgres_signal_arg)
+	/*
+	 * If more than one shutdown request signal arrived in one go around the
+	 * server loop, we don't know which arrived first.  Take the one that is
+	 * the most immediate.
+	 */
+	if (pending_immediate_shutdown_request)
+		mode = ImmediateShutdown;
+	else if (pending_fast_shutdown_request)
+		mode = FastShutdown;
+	else
+		mode = SmartShutdown;
+	pending_shutdown_request = false;
+	pending_immediate_shutdown_request = false;
+	pending_fast_shutdown_request = false;
+
+	switch (mode)
 	{
-		case SIGTERM:
+		case SmartShutdown:
 
 			/*
 			 * Smart Shutdown:
@@ -2830,7 +2870,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGINT:
+		case FastShutdown:
 
 			/*
 			 * Fast Shutdown:
@@ -2871,7 +2911,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGQUIT:
+		case ImmediateShutdown:
 
 			/*
 			 * Immediate Shutdown:
@@ -2908,20 +2948,30 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 	}
+}
+
+static void
+handle_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_child_exit(void)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3213,8 +3263,6 @@ reaper(SIGNAL_ARGS)
 	 * or actions to make.
 	 */
 	PostmasterStateMachine();
-
-	errno = save_errno;
 }
 
 /*
@@ -3642,8 +3690,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_pmsignal(), which process the signals that might mean we need
+ * to change state.
  */
 static void
 PostmasterStateMachine(void)
@@ -3796,6 +3845,9 @@ PostmasterStateMachine(void)
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
+		/* Don't allow any new socket connection events. */
+		ConfigurePostmasterWaitSet(false);
+
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3905,6 +3957,9 @@ PostmasterStateMachine(void)
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
 		AbortStartTime = 0;
+
+		/* start accepting server socket connection events again */
+		ConfigurePostmasterWaitSet(true);
 	}
 }
 
@@ -5013,12 +5068,16 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends,
+ * and check for promote and logrotate requests from pg_ctl.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_pmsignal(void)
 {
-	int			save_errno = errno;
+	pending_pmsignal = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received action request signal")));
 
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
@@ -5159,8 +5218,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 31479e8212..ea867ccc95 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
-#include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/time.h>
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 9b840a6318..0cdc1e11a3 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitProcessLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 29ee5ad2b6..1e66f25b76 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0ffeefc437..96b3a1e1a0 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitProcessLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
-- 
2.38.1

#18Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#17)
Re: Using WaitEventSet in the postmaster

Hi,

On 2023-01-07 18:08:11 +1300, Thomas Munro wrote:

On Sat, Jan 7, 2023 at 12:25 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-01-07 11:08:36 +1300, Thomas Munro wrote:

3. Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
SIGINT? If you send all of these extremely rapidly, it's
indeterminate which one will be seen by handle_shutdown_request().

That doesn't seem optimal. I'm mostly worried that we can end up downgrading a
shutdown request.

I was contemplating whether I needed to do some more push-ups to
prefer the first delivered signal (instead of the last), but you're
saying that it would be enough to prefer the fastest shutdown type, in
cases where more than one signal was handled between server loops.
WFM.

I don't see any need for such an order requirement - in case of receiving a
"less severe" shutdown request first, we'd process the more severe one soon
after. There's nothing to be gained by trying to follow the order of the
incoming signals.

Afaict that's also the behaviour today. pmdie() has blocks like this:
case SIGTERM:

/*
* Smart Shutdown:
*
* Wait for children to end their work, then shut down.
*/
if (Shutdown >= SmartShutdown)
break;

I briefly wondered about deduplicating that code, now that we we know the
requested mode ahead of the switch. So it could be a something like:

/* don't interrupt an already in-progress shutdown in a more "severe" mode */
if (mode < Shutdown)
return;

but that's again probaly something for later.

5. Is the signal mask being correctly handled during forking? I
think so: I decided to push the masking logic directly into the
routine that forks, to make it easy to verify that all paths set the
mask the way we want.

Hm. If I understand correctly, you used sigprocmask() directly (vs
PG_SETMASK()) in fork_process() because you want the old mask? But why do we
restore the prior mask, instead of just using PG_SETMASK(&UnBlockSig); as we
still do in a bunch of places in the postmaster?

It makes zero difference in practice but I think it's a nicer way to
code it because it doesn't make an unnecessary assumption about what
the signal mask was on entry.

Heh, to me doing something slightly different than in other places actually
seems to make it a bit less nice :). There's also some benefit in the
certainty of knowing what the mask will be. But it doesn't matter mcuh.

6. Is the naming and style OK? Better ideas welcome, but basically I
tried to avoid all unnecessary refactoring and changes, so no real
logic moves around, and the changes are pretty close to "mechanical".
One bikeshed decision was what to call the {handle,process}_XXX
functions and associated flags.

I wonder if it'd be good to have a _pm_ in the name.

I dunno about this one, it's all static stuff in a file called
postmaster.c and one (now) already has pm in it (see below).

I guess stuff like signal handlers and their state somehow seems more global
to me than their C linkage type suggests. Hence the desire to be a bit more
"namespaced" in their naming. I do find it somewhat annoying when reasonably
important global variables aren't uniquely named when using a debugger...

But again, this isn't anything that should hold up the patch.

Is there any reason to use MAXLISTEN here? We know how many sockets w're
listening on by this point, I think? No idea if the overhead matters anywhere,
but ...

Fixed.

I was thinking of determining the number once, in PostmasterMain(). But that's
perhaps better done as a separate change... WFM.

Hm. This is preexisting behaviour, but now it seems somewhat odd that we might
end up happily forking a backend for each socket without checking signals
inbetween. Forking might take a while, and if a signal arrived since the
WaitEventSetWait() we'll not react to it.

Right, so if you have 64 server sockets and they all have clients
waiting on their listen queue, we'll service one connection from each
before getting back to checking for pmsignals or shutdown, and that's
unchanged in this patch. (FWIW I also noticed that problem while
experimenting with the idea that you could handle multiple clients in
one go on OSes that report the listen queue depth size along with
WL_SOCKET_ACCEPT events, but didn't pursue it...)

I guess we could check every time through the nevents loop. I may
look into that in a later patch, but I prefer to keep the same policy
in this patch.

Makes sense. This was mainly me trying to make sure we're not changing subtle
stuff accidentally (and trying to understand how things work currently, as a
prerequisite).

static void
PostmasterStateMachine(void)
@@ -3796,6 +3819,9 @@ PostmasterStateMachine(void)

if (pmState == PM_WAIT_DEAD_END)
{
+             /* Don't allow any new socket connection events. */
+             ConfigurePostmasterWaitSet(false);

Hm. Is anything actually using the wait set until we re-create it with (true)
below?

Yes. While in PM_WAIT_DEAD_END state, waiting for children to exit,
there may be clients trying to connect.

Oh, Right. I had misread the diff, thinking the
ConfigurePostmasterWaitSet(false) was in the same PM_NO_CHILDREN branch that
the ConfigurePostmasterWaitSet(true) was in.

On master, we have a special pg_usleep(100000L) instead of select() just for
that state so we can ignore incoming connections while waiting for the
SIGCHLD reaper to advance our state, but in this new world that's not
enough. We need to wait for the latch to be set by
handle_child_exit_signal(). So I used the regular WES to wait for the latch
(that is, no more special case for that state), but I need to ignore socket
events. If I didn't, then an incoming connection sitting in the listen
queue would cause the server loop to burn 100% CPU reporting a
level-triggered WL_SOCKET_ACCEPT for sockets that we don't want to accept
(yet).

Yea, this is clearly the better approach.

A few more code review comments:

DetermineSleepTime() still deals with struct timeval, which we maintain at
some effort. Just to then convert it away from struct timeval in the
WaitEventSetWait() call. That doesn't seem quite right, and is basically
introduced in this patch.

I think ServerLoop still has an outdated comment:

*
* NB: Needs to be called with signals blocked

which we aren't doing (nor need to be doing) anymore.

/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of
+		 * our sockets? If the latter, fork a child process to deal with it.
*/
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+
+				/* Process work scheduled by signal handlers. */

Very minor: It feels a tad off to say that the work was scheduled by signal
handlers, it's either from other processes or by the OS. But ...

+/*
+ * Child processes use SIGUSR1 to send 'pmsignals'.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */

s/send/notify us of/, since the concrete "pmsignal" is actually transported
outside of the "OS signal" level?

LGTM.

I think this is a significant improvement, thanks for working on it.

Greetings,

Andres Freund

#19Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#18)
2 attachment(s)
Re: Using WaitEventSet in the postmaster

On Sun, Jan 8, 2023 at 11:55 AM Andres Freund <andres@anarazel.de> wrote:

On 2023-01-07 18:08:11 +1300, Thomas Munro wrote:

On Sat, Jan 7, 2023 at 12:25 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-01-07 11:08:36 +1300, Thomas Munro wrote:

3. Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
SIGINT? If you send all of these extremely rapidly, it's
indeterminate which one will be seen by handle_shutdown_request().

That doesn't seem optimal. I'm mostly worried that we can end up downgrading a
shutdown request.

I was contemplating whether I needed to do some more push-ups to
prefer the first delivered signal (instead of the last), but you're
saying that it would be enough to prefer the fastest shutdown type, in
cases where more than one signal was handled between server loops.
WFM.

I don't see any need for such an order requirement - in case of receiving a
"less severe" shutdown request first, we'd process the more severe one soon
after. There's nothing to be gained by trying to follow the order of the
incoming signals.

Oh, I fully agree. I was working through the realisation that I might
need to serialise the handlers to implement the priority logic
correctly (upgrades good, downgrades bad), but your suggestion
fast-forwards to the right answer and doesn't require blocking, so I
prefer it, and had already gone that way in v9. In this version I've
added a comment to explain that the outcome is the same in the end,
and also fixed the flag clearing logic which was subtly wrong before.

I wonder if it'd be good to have a _pm_ in the name.

I dunno about this one, it's all static stuff in a file called
postmaster.c and one (now) already has pm in it (see below).

I guess stuff like signal handlers and their state somehow seems more global
to me than their C linkage type suggests. Hence the desire to be a bit more
"namespaced" in their naming. I do find it somewhat annoying when reasonably
important global variables aren't uniquely named when using a debugger...

Alright, renamed.

A few more code review comments:

DetermineSleepTime() still deals with struct timeval, which we maintain at
some effort. Just to then convert it away from struct timeval in the
WaitEventSetWait() call. That doesn't seem quite right, and is basically
introduced in this patch.

I agree, but I was trying to minimise the patch: signals and events
stuff is a lot already. I didn't want to touch DetermineSleepTime()'s
time logic in the same commit. But here's a separate patch for that.

I think ServerLoop still has an outdated comment:

*
* NB: Needs to be called with signals blocked

Fixed.

+ /* Process work scheduled by signal handlers. */

Very minor: It feels a tad off to say that the work was scheduled by signal
handlers, it's either from other processes or by the OS. But ...

OK, now it's "requested via signal handlers".

+/*
+ * Child processes use SIGUSR1 to send 'pmsignals'.  pg_ctl uses SIGUSR1 to ask
+ * postmaster to check for logrotate and promote files.
+ */

s/send/notify us of/, since the concrete "pmsignal" is actually transported
outside of the "OS signal" level?

Fixed.

LGTM.

Thanks. Here's v10. I'll wait a bit longer to see if anyone else has feedback.

Attachments:

v10-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchtext/x-patch; charset=US-ASCII; name=v10-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patchDownload
From 01bd9b3ef6029d5e5d568b514874a23a167d826d Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v10 1/2] Give the postmaster a WaitEventSet and a latch.

Switch to an architecture similar to regular backends, where signal
handlers just set flags instead of doing real work.

Changes:

 * Allow the postmaster to set up its own local latch.  For now, we don't
   want other backends setting a shared memory latch directly (but that
   could be made to work with more research on robustness).

 * The existing signal handlers are cut in two: a handle_pm_XXX part that
   sets a pending_pm_XXX variable and sets the local latch, and a
   process_pm_XXX part.

 * Signal handlers are now installed with the regular pqsignal()
   function rather then the special pqsignal_pm() function; the concerns
   about the portability of SA_RESTART vs select() are no longer
   relevant, as we are not using select() or relying on EINTR.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
---
 src/backend/libpq/pqcomm.c            |   3 +-
 src/backend/libpq/pqsignal.c          |  40 ---
 src/backend/postmaster/fork_process.c |  18 +-
 src/backend/postmaster/postmaster.c   | 412 +++++++++++++++-----------
 src/backend/tcop/postgres.c           |   1 -
 src/backend/utils/init/miscinit.c     |  13 +-
 src/include/libpq/pqsignal.h          |   3 -
 src/include/miscadmin.h               |   1 +
 8 files changed, 264 insertions(+), 227 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 7a043bf6b0..864c9debe8 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -683,8 +683,7 @@ Setup_AF_UNIX(const char *sock_path)
  *		server port.  Set port->sock to the FD of the new connection.
  *
  * ASSUME: that this doesn't need to be non-blocking because
- *		the Postmaster uses select() to tell when the socket is ready for
- *		accept().
+ *		the Postmaster waits for the socket to be ready to accept().
  *
  * RETURNS: STATUS_OK or STATUS_ERROR
  */
diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index b815be6eea..d233e3a2fd 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -97,43 +97,3 @@ pqinitmask(void)
 	sigdelset(&StartupBlockSig, SIGALRM);
 #endif
 }
-
-/*
- * Set up a postmaster signal handler for signal "signo"
- *
- * Returns the previous handler.
- *
- * This is used only in the postmaster, which has its own odd approach to
- * signal handling.  For signals with handlers, we block all signals for the
- * duration of signal handler execution.  We also do not set the SA_RESTART
- * flag; this should be safe given the tiny range of code in which the
- * postmaster ever unblocks signals.
- *
- * pqinitmask() must have been invoked previously.
- */
-pqsigfunc
-pqsignal_pm(int signo, pqsigfunc func)
-{
-	struct sigaction act,
-				oact;
-
-	act.sa_handler = func;
-	if (func == SIG_IGN || func == SIG_DFL)
-	{
-		/* in these cases, act the same as pqsignal() */
-		sigemptyset(&act.sa_mask);
-		act.sa_flags = SA_RESTART;
-	}
-	else
-	{
-		act.sa_mask = BlockSig;
-		act.sa_flags = 0;
-	}
-#ifdef SA_NOCLDSTOP
-	if (signo == SIGCHLD)
-		act.sa_flags |= SA_NOCLDSTOP;
-#endif
-	if (sigaction(signo, &act, &oact) < 0)
-		return SIG_ERR;
-	return oact.sa_handler;
-}
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index 569b52e849..509587636e 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -12,24 +12,28 @@
 #include "postgres.h"
 
 #include <fcntl.h>
+#include <signal.h>
 #include <time.h>
 #include <sys/stat.h>
 #include <sys/time.h>
 #include <unistd.h>
 
+#include "libpq/pqsignal.h"
 #include "postmaster/fork_process.h"
 
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
- * child in the parent process.
+ * child in the parent process.  Signals are blocked while forking, so
+ * the child must unblock.
  */
 pid_t
 fork_process(void)
 {
 	pid_t		result;
 	const char *oomfilename;
+	sigset_t	save_mask;
 
 #ifdef LINUX_PROFILE
 	struct itimerval prof_itimer;
@@ -51,6 +55,13 @@ fork_process(void)
 	getitimer(ITIMER_PROF, &prof_itimer);
 #endif
 
+	/*
+	 * We start postmaster children with signals blocked.  This allows them to
+	 * install their own handlers before unblocking, to avoid races where they
+	 * might run the postmaster's handler and miss an important control signal.
+	 * With more analysis this could potentially be relaxed.
+	 */
+	sigprocmask(SIG_SETMASK, &BlockSig, &save_mask);
 	result = fork();
 	if (result == 0)
 	{
@@ -103,6 +114,11 @@ fork_process(void)
 		/* do post-fork initialization for random number generation */
 		pg_strong_random_init();
 	}
+	else
+	{
+		/* in parent, restore signal mask */
+		sigprocmask(SIG_SETMASK, &save_mask, NULL);
+	}
 
 	return result;
 }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eac3450774..8bbc1042a5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -70,7 +70,6 @@
 #include <time.h>
 #include <sys/wait.h>
 #include <ctype.h>
-#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <fcntl.h>
@@ -362,6 +361,17 @@ static volatile sig_atomic_t WalReceiverRequested = false;
 static volatile bool StartWorkerNeeded = true;
 static volatile bool HaveCrashedWorker = false;
 
+/* set when signals arrive */
+static volatile sig_atomic_t pending_pm_pmsignal;
+static volatile sig_atomic_t pending_pm_child_exit;
+static volatile sig_atomic_t pending_pm_reload_request;
+static volatile sig_atomic_t pending_pm_shutdown_request;
+static volatile sig_atomic_t pending_pm_fast_shutdown_request;
+static volatile sig_atomic_t pending_pm_immediate_shutdown_request;
+
+/* I/O multiplexing object */
+static WaitEventSet *pm_wait_set;
+
 #ifdef USE_SSL
 /* Set when and if SSL has been initialized properly */
 static bool LoadedSSL = false;
@@ -380,10 +390,14 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void SIGHUP_handler(SIGNAL_ARGS);
-static void pmdie(SIGNAL_ARGS);
-static void reaper(SIGNAL_ARGS);
-static void sigusr1_handler(SIGNAL_ARGS);
+static void handle_pm_pmsignal_signal(SIGNAL_ARGS);
+static void handle_pm_child_exit_signal(SIGNAL_ARGS);
+static void handle_pm_reload_request_signal(SIGNAL_ARGS);
+static void handle_pm_shutdown_request_signal(SIGNAL_ARGS);
+static void process_pm_pmsignal(void);
+static void process_pm_child_exit(void);
+static void process_pm_reload_request(void);
+static void process_pm_shutdown_request(void);
 static void process_startup_packet_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
@@ -401,7 +415,6 @@ static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
-static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(int backend_type);
 static bool RandomCancelKey(int32 *cancel_key);
@@ -609,26 +622,6 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up signal handlers for the postmaster process.
 	 *
-	 * In the postmaster, we use pqsignal_pm() rather than pqsignal() (which
-	 * is used by all child processes and client processes).  That has a
-	 * couple of special behaviors:
-	 *
-	 * 1. We tell sigaction() to block all signals for the duration of the
-	 * signal handler.  This is faster than our old approach of
-	 * blocking/unblocking explicitly in the signal handler, and it should also
-	 * prevent excessive stack consumption if signals arrive quickly.
-	 *
-	 * 2. We do not set the SA_RESTART flag.  This is because signals will be
-	 * blocked at all times except when ServerLoop is waiting for something to
-	 * happen, and during that window, we want signals to exit the select(2)
-	 * wait so that ServerLoop can respond if anything interesting happened.
-	 * On some platforms, signals marked SA_RESTART would not cause the
-	 * select() wait to end.
-	 *
-	 * Child processes will generally want SA_RESTART, so pqsignal() sets that
-	 * flag.  We expect children to set up their own handlers before
-	 * unblocking signals.
-	 *
 	 * CAUTION: when changing this list, check for side-effects on the signal
 	 * handling setup of child processes.  See tcop/postgres.c,
 	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
@@ -638,26 +631,19 @@ PostmasterMain(int argc, char *argv[])
 	pqinitmask();
 	PG_SETMASK(&BlockSig);
 
-	pqsignal_pm(SIGHUP, SIGHUP_handler);	/* reread config file and have
-											 * children do same */
-	pqsignal_pm(SIGINT, pmdie); /* send SIGTERM and shut down */
-	pqsignal_pm(SIGQUIT, pmdie);	/* send SIGQUIT and die */
-	pqsignal_pm(SIGTERM, pmdie);	/* wait for children and shut down */
-	pqsignal_pm(SIGALRM, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGPIPE, SIG_IGN);	/* ignored */
-	pqsignal_pm(SIGUSR1, sigusr1_handler);	/* message from child process */
-	pqsignal_pm(SIGUSR2, dummy_handler);	/* unused, reserve for children */
-	pqsignal_pm(SIGCHLD, reaper);	/* handle child termination */
-
-#ifdef SIGURG
+	pqsignal(SIGHUP, handle_pm_reload_request_signal);
+	pqsignal(SIGINT, handle_pm_shutdown_request_signal);
+	pqsignal(SIGQUIT, handle_pm_shutdown_request_signal);
+	pqsignal(SIGTERM, handle_pm_shutdown_request_signal);
+	pqsignal(SIGALRM, SIG_IGN); /* ignored */
+	pqsignal(SIGPIPE, SIG_IGN); /* ignored */
+	pqsignal(SIGUSR1, handle_pm_pmsignal_signal);
+	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
+	pqsignal(SIGCHLD, handle_pm_child_exit_signal);
 
-	/*
-	 * Ignore SIGURG for now.  Child processes may change this (see
-	 * InitializeLatchSupport), but they will not receive any such signals
-	 * until they wait on a latch.
-	 */
-	pqsignal_pm(SIGURG, SIG_IGN);	/* ignored */
-#endif
+	/* This may configure SIGURG, depending on platform. */
+	InitializeLatchSupport();
+	InitProcessLocalLatch();
 
 	/*
 	 * No other place in Postgres should touch SIGTTIN/SIGTTOU handling.  We
@@ -667,17 +653,20 @@ PostmasterMain(int argc, char *argv[])
 	 * child processes should just allow the inherited settings to stand.
 	 */
 #ifdef SIGTTIN
-	pqsignal_pm(SIGTTIN, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTIN, SIG_IGN); /* ignored */
 #endif
 #ifdef SIGTTOU
-	pqsignal_pm(SIGTTOU, SIG_IGN);	/* ignored */
+	pqsignal(SIGTTOU, SIG_IGN); /* ignored */
 #endif
 
 	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
 #ifdef SIGXFSZ
-	pqsignal_pm(SIGXFSZ, SIG_IGN);	/* ignored */
+	pqsignal(SIGXFSZ, SIG_IGN); /* ignored */
 #endif
 
+	/* Begin accepting signals. */
+	PG_SETMASK(&UnBlockSig);
+
 	/*
 	 * Options setup
 	 */
@@ -1698,105 +1687,107 @@ DetermineSleepTime(struct timeval *timeout)
 	}
 }
 
+/*
+ * Activate or deactivate notifications of server socket events.  Since we
+ * don't currently have a way to remove events from an existing WaitEventSet,
+ * we'll just destroy and recreate the whole thing.  This is called during
+ * shutdown so we can wait for backends to exit without accepting new
+ * connections, and during crash reinitialization when we need to start
+ * listening for new connections again.  This will be freed in fork children by
+ * ClosePostmasterPorts().
+ */
+static void
+ConfigurePostmasterWaitSet(bool accept_connections)
+{
+	int			nsockets;
+
+	if (pm_wait_set)
+		FreeWaitEventSet(pm_wait_set);
+	pm_wait_set = NULL;
+
+	/* How many server sockets do we need to wait for? */
+	nsockets = 0;
+	if (accept_connections)
+	{
+		while (nsockets < MAXLISTEN &&
+			   ListenSocket[nsockets] != PGINVALID_SOCKET)
+			++nsockets;
+	}
+
+	pm_wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + nsockets);
+	AddWaitEventToSet(pm_wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch,
+					  NULL);
+
+	if (accept_connections)
+	{
+		for (int i = 0; i < nsockets; i++)
+			AddWaitEventToSet(pm_wait_set, WL_SOCKET_ACCEPT, ListenSocket[i],
+							  NULL, NULL);
+	}
+}
+
 /*
  * Main idle loop of postmaster
- *
- * NB: Needs to be called with signals blocked
  */
 static int
 ServerLoop(void)
 {
-	fd_set		readmask;
-	int			nSockets;
 	time_t		last_lockfile_recheck_time,
 				last_touch_time;
+	WaitEvent	events[MAXLISTEN];
+	int			nevents;
 
+	ConfigurePostmasterWaitSet(true);
 	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
-	nSockets = initMasks(&readmask);
-
 	for (;;)
 	{
-		fd_set		rmask;
-		int			selres;
 		time_t		now;
+		struct timeval timeout;
 
-		/*
-		 * Wait for a connection request to arrive.
-		 *
-		 * We block all signals except while sleeping. That makes it safe for
-		 * signal handlers, which again block all signals while executing, to
-		 * do nontrivial work.
-		 *
-		 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-		 * any new connections, so we don't call select(), and just sleep.
-		 */
-		memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
+		DetermineSleepTime(&timeout);
 
-		if (pmState == PM_WAIT_DEAD_END)
-		{
-			PG_SETMASK(&UnBlockSig);
-
-			pg_usleep(100000L); /* 100 msec seems reasonable */
-			selres = 0;
-
-			PG_SETMASK(&BlockSig);
-		}
-		else
-		{
-			/* must set timeout each time; some OSes change it! */
-			struct timeval timeout;
-
-			/* Needs to run with blocked signals! */
-			DetermineSleepTime(&timeout);
-
-			PG_SETMASK(&UnBlockSig);
-
-			selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-
-			PG_SETMASK(&BlockSig);
-		}
-
-		/* Now check the select() result */
-		if (selres < 0)
-		{
-			if (errno != EINTR && errno != EWOULDBLOCK)
-			{
-				ereport(LOG,
-						(errcode_for_socket_access(),
-						 errmsg("select() failed in postmaster: %m")));
-				return STATUS_ERROR;
-			}
-		}
+		nevents = WaitEventSetWait(pm_wait_set,
+								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   events,
+								   lengthof(events),
+								   0 /* postmaster posts no wait_events */ );
 
 		/*
-		 * New connection pending on any of our sockets? If so, fork a child
-		 * process to deal with it.
+		 * Latch set by signal handler, or new connection pending on any of
+		 * our sockets? If the latter, fork a child process to deal with it.
 		 */
-		if (selres > 0)
+		for (int i = 0; i < nevents; i++)
 		{
-			int			i;
-
-			for (i = 0; i < MAXLISTEN; i++)
+			if (events[i].events & WL_LATCH_SET)
 			{
-				if (ListenSocket[i] == PGINVALID_SOCKET)
-					break;
-				if (FD_ISSET(ListenSocket[i], &rmask))
+				ResetLatch(MyLatch);
+
+				/* Process work requested via signal handlers. */
+				if (pending_pm_shutdown_request)
+					process_pm_shutdown_request();
+				if (pending_pm_child_exit)
+					process_pm_child_exit();
+				if (pending_pm_reload_request)
+					process_pm_reload_request();
+				if (pending_pm_pmsignal)
+					process_pm_pmsignal();
+			}
+			else if (events[i].events & WL_SOCKET_ACCEPT)
+			{
+				Port	   *port;
+
+				port = ConnCreate(events[i].fd);
+				if (port)
 				{
-					Port	   *port;
+					BackendStartup(port);
 
-					port = ConnCreate(ListenSocket[i]);
-					if (port)
-					{
-						BackendStartup(port);
-
-						/*
-						 * We no longer need the open socket or port structure
-						 * in this process
-						 */
-						StreamClose(port->sock);
-						ConnFree(port);
-					}
+					/*
+					 * We no longer need the open socket or port structure in
+					 * this process
+					 */
+					StreamClose(port->sock);
+					ConnFree(port);
 				}
 			}
 		}
@@ -1939,34 +1930,6 @@ ServerLoop(void)
 	}
 }
 
-/*
- * Initialise the masks for select() for the ports we are listening on.
- * Return the number of sockets to listen on.
- */
-static int
-initMasks(fd_set *rmask)
-{
-	int			maxsock = -1;
-	int			i;
-
-	FD_ZERO(rmask);
-
-	for (i = 0; i < MAXLISTEN; i++)
-	{
-		int			fd = ListenSocket[i];
-
-		if (fd == PGINVALID_SOCKET)
-			break;
-		FD_SET(fd, rmask);
-
-		if (fd > maxsock)
-			maxsock = fd;
-	}
-
-	return maxsock + 1;
-}
-
-
 /*
  * Read a client's startup packet and do something according to it.
  *
@@ -2609,6 +2572,10 @@ ClosePostmasterPorts(bool am_syslogger)
 {
 	int			i;
 
+	/* Release resources held by the postmaster's WaitEventSet. */
+	if (pm_wait_set)
+		FreeWaitEventSetAfterFork(pm_wait_set);
+
 #ifndef WIN32
 
 	/*
@@ -2707,15 +2674,46 @@ InitProcessGlobals(void)
 #endif
 }
 
+/*
+ * Child processes use SIGUSR1 to notify us of 'pmsignals'.  pg_ctl uses
+ * SIGUSR1 to ask postmaster to check for logrotate and promote files.
+ */
+static void
+handle_pm_pmsignal_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_pm_pmsignal = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
 
 /*
- * SIGHUP -- reread config files, and tell children to do same
+ * pg_ctl uses SIGHUP to request a reload of the configuration files.
  */
 static void
-SIGHUP_handler(SIGNAL_ARGS)
+handle_pm_reload_request_signal(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	pending_pm_reload_request = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Re-read config files, and tell children to do same.
+ */
+static void
+process_pm_reload_request(void)
+{
+	pending_pm_reload_request = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received reload request signal")));
+
 	if (Shutdown <= SmartShutdown)
 	{
 		ereport(LOG,
@@ -2771,26 +2769,71 @@ SIGHUP_handler(SIGNAL_ARGS)
 		write_nondefault_variables(PGC_SIGHUP);
 #endif
 	}
+}
+
+/*
+ * pg_ctl uses SIGTERM, SIGINT and SIGQUIT to request different types of
+ * shutdown.
+ */
+static void
+handle_pm_shutdown_request_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	switch (postgres_signal_arg)
+	{
+		case SIGTERM:
+			/* smart is implied if the other two flags aren't set */
+			pending_pm_shutdown_request = true;
+			break;
+		case SIGINT:
+			pending_pm_fast_shutdown_request = true;
+			pending_pm_shutdown_request = true;
+			break;
+		case SIGQUIT:
+			pending_pm_immediate_shutdown_request = true;
+			pending_pm_shutdown_request = true;
+			break;
+	}
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
-
 /*
- * pmdie -- signal handler for processing various postmaster signals.
+ * Process shutdown request.
  */
 static void
-pmdie(SIGNAL_ARGS)
+process_pm_shutdown_request(void)
 {
-	int			save_errno = errno;
+	int			mode;
 
 	ereport(DEBUG2,
-			(errmsg_internal("postmaster received signal %d",
-							 postgres_signal_arg)));
+			(errmsg_internal("postmaster received shutdown request signal")));
 
-	switch (postgres_signal_arg)
+	pending_pm_shutdown_request = false;
+
+	/*
+	 * If more than one shutdown request signal arrived since the last server
+	 * loop, take the one that is the most immediate.  That matches the
+	 * priority that would apply if we processed them one by one in any order.
+	 */
+	if (pending_pm_immediate_shutdown_request)
 	{
-		case SIGTERM:
+		pending_pm_immediate_shutdown_request = false;
+		mode = ImmediateShutdown;
+	}
+	else if (pending_pm_fast_shutdown_request)
+	{
+		pending_pm_fast_shutdown_request = false;
+		mode = FastShutdown;
+	}
+	else
+		mode = SmartShutdown;
+
+	switch (mode)
+	{
+		case SmartShutdown:
 
 			/*
 			 * Smart Shutdown:
@@ -2830,7 +2873,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGINT:
+		case FastShutdown:
 
 			/*
 			 * Fast Shutdown:
@@ -2871,7 +2914,7 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 
-		case SIGQUIT:
+		case ImmediateShutdown:
 
 			/*
 			 * Immediate Shutdown:
@@ -2908,20 +2951,30 @@ pmdie(SIGNAL_ARGS)
 			PostmasterStateMachine();
 			break;
 	}
+}
+
+static void
+handle_pm_child_exit_signal(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	pending_pm_child_exit = true;
+	SetLatch(MyLatch);
 
 	errno = save_errno;
 }
 
 /*
- * Reaper -- signal handler to cleanup after a child process dies.
+ * Cleanup after a child process dies.
  */
 static void
-reaper(SIGNAL_ARGS)
+process_pm_child_exit(void)
 {
-	int			save_errno = errno;
 	int			pid;			/* process id of dead child process */
 	int			exitstatus;		/* its exit status */
 
+	pending_pm_child_exit = false;
+
 	ereport(DEBUG4,
 			(errmsg_internal("reaping dead processes")));
 
@@ -3213,8 +3266,6 @@ reaper(SIGNAL_ARGS)
 	 * or actions to make.
 	 */
 	PostmasterStateMachine();
-
-	errno = save_errno;
 }
 
 /*
@@ -3642,8 +3693,9 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 /*
  * Advance the postmaster's state machine and take actions as appropriate
  *
- * This is common code for pmdie(), reaper() and sigusr1_handler(), which
- * receive the signals that might mean we need to change state.
+ * This is common code for process_shutdown_request(), process_child_exit() and
+ * process_pmsignal(), which process the signals that might mean we need
+ * to change state.
  */
 static void
 PostmasterStateMachine(void)
@@ -3796,6 +3848,9 @@ PostmasterStateMachine(void)
 
 	if (pmState == PM_WAIT_DEAD_END)
 	{
+		/* Don't allow any new socket connection events. */
+		ConfigurePostmasterWaitSet(false);
+
 		/*
 		 * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
 		 * (ie, no dead_end children remain), and the archiver is gone too.
@@ -3905,6 +3960,9 @@ PostmasterStateMachine(void)
 		pmState = PM_STARTUP;
 		/* crash recovery started, reset SIGKILL flag */
 		AbortStartTime = 0;
+
+		/* start accepting server socket connection events again */
+		ConfigurePostmasterWaitSet(true);
 	}
 }
 
@@ -5013,12 +5071,16 @@ ExitPostmaster(int status)
 }
 
 /*
- * sigusr1_handler - handle signal conditions from child processes
+ * Handle pmsignal conditions representing requests from backends,
+ * and check for promote and logrotate requests from pg_ctl.
  */
 static void
-sigusr1_handler(SIGNAL_ARGS)
+process_pm_pmsignal(void)
 {
-	int			save_errno = errno;
+	pending_pm_pmsignal = false;
+
+	ereport(DEBUG2,
+			(errmsg_internal("postmaster received action request signal")));
 
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
@@ -5159,8 +5221,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		 */
 		signal_child(StartupPID, SIGUSR2);
 	}
-
-	errno = save_errno;
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 224ab290af..470b734e9e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -24,7 +24,6 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
-#include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/time.h>
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 9b840a6318..0cdc1e11a3 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -135,8 +135,7 @@ InitPostmasterChild(void)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -189,8 +188,7 @@ InitStandaloneProcess(const char *argv0)
 
 	/* Initialize process-local latch support */
 	InitializeLatchSupport();
-	MyLatch = &LocalLatchData;
-	InitLatch(MyLatch);
+	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
 	/*
@@ -232,6 +230,13 @@ SwitchToSharedLatch(void)
 	SetLatch(MyLatch);
 }
 
+void
+InitProcessLocalLatch(void)
+{
+	MyLatch = &LocalLatchData;
+	InitLatch(MyLatch);
+}
+
 void
 SwitchBackToLocalLatch(void)
 {
diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h
index 29ee5ad2b6..1e66f25b76 100644
--- a/src/include/libpq/pqsignal.h
+++ b/src/include/libpq/pqsignal.h
@@ -53,7 +53,4 @@ extern PGDLLIMPORT sigset_t StartupBlockSig;
 
 extern void pqinitmask(void);
 
-/* pqsigfunc is declared in src/include/port.h */
-extern pqsigfunc pqsignal_pm(int signo, pqsigfunc func);
-
 #endif							/* PQSIGNAL_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0ffeefc437..96b3a1e1a0 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -310,6 +310,7 @@ extern PGDLLIMPORT char *DatabasePath;
 /* now in utils/init/miscinit.c */
 extern void InitPostmasterChild(void);
 extern void InitStandaloneProcess(const char *argv0);
+extern void InitProcessLocalLatch(void);
 extern void SwitchToSharedLatch(void);
 extern void SwitchBackToLocalLatch(void);
 
-- 
2.38.1

v10-0002-Refactor-DetermineSleepTime-to-use-milliseconds.patchtext/x-patch; charset=US-ASCII; name=v10-0002-Refactor-DetermineSleepTime-to-use-milliseconds.patchDownload
From 35eb0bee3b4e90115c75739c16892c158b375c40 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 11 Jan 2023 14:48:06 +1300
Subject: [PATCH v10 2/2] Refactor DetermineSleepTime() to use milliseconds.

Since we're not using select() anymore, we don't need to bother with
struct timeval.  We can work directly in milliseconds, which the latch
API wants.  This change was kept separate to make review easier.
---
 src/backend/postmaster/postmaster.c | 59 ++++++++++-------------------
 1 file changed, 19 insertions(+), 40 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 8bbc1042a5..b55e3e77bc 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1586,7 +1586,7 @@ checkControlFile(void)
 }
 
 /*
- * Determine how long should we let ServerLoop sleep.
+ * Determine how long should we let ServerLoop sleep, in milliseconds.
  *
  * In normal conditions we wait at most one minute, to ensure that the other
  * background tasks handled by ServerLoop get done even when no requests are
@@ -1594,8 +1594,8 @@ checkControlFile(void)
  * we don't actually sleep so that they are quickly serviced.  Other exception
  * cases are as shown in the code.
  */
-static void
-DetermineSleepTime(struct timeval *timeout)
+static int
+DetermineSleepTime(void)
 {
 	TimestampTz next_wakeup = 0;
 
@@ -1608,26 +1608,20 @@ DetermineSleepTime(struct timeval *timeout)
 	{
 		if (AbortStartTime != 0)
 		{
+			int			seconds;
+
 			/* time left to abort; clamp to 0 in case it already expired */
-			timeout->tv_sec = SIGKILL_CHILDREN_AFTER_SECS -
-				(time(NULL) - AbortStartTime);
-			timeout->tv_sec = Max(timeout->tv_sec, 0);
-			timeout->tv_usec = 0;
+			seconds = Max(0,
+						  SIGKILL_CHILDREN_AFTER_SECS - (time(NULL) - AbortStartTime));
+
+			return Max(seconds * 1000, 0);
 		}
 		else
-		{
-			timeout->tv_sec = 60;
-			timeout->tv_usec = 0;
-		}
-		return;
+			return 60 * 1000;
 	}
 
 	if (StartWorkerNeeded)
-	{
-		timeout->tv_sec = 0;
-		timeout->tv_usec = 0;
-		return;
-	}
+		return 0;
 
 	if (HaveCrashedWorker)
 	{
@@ -1665,26 +1659,14 @@ DetermineSleepTime(struct timeval *timeout)
 
 	if (next_wakeup != 0)
 	{
-		long		secs;
-		int			microsecs;
-
-		TimestampDifference(GetCurrentTimestamp(), next_wakeup,
-							&secs, &microsecs);
-		timeout->tv_sec = secs;
-		timeout->tv_usec = microsecs;
-
-		/* Ensure we don't exceed one minute */
-		if (timeout->tv_sec > 60)
-		{
-			timeout->tv_sec = 60;
-			timeout->tv_usec = 0;
-		}
-	}
-	else
-	{
-		timeout->tv_sec = 60;
-		timeout->tv_usec = 0;
+		/* Ensure we don't exceed one minute, or go under 0. */
+		return Max(0,
+				   Min(60 * 1000,
+					   TimestampDifferenceMilliseconds(GetCurrentTimestamp(),
+													   next_wakeup)));
 	}
+
+	return 60 * 1000;
 }
 
 /*
@@ -1743,12 +1725,9 @@ ServerLoop(void)
 	for (;;)
 	{
 		time_t		now;
-		struct timeval timeout;
-
-		DetermineSleepTime(&timeout);
 
 		nevents = WaitEventSetWait(pm_wait_set,
-								   timeout.tv_sec * 1000 + timeout.tv_usec / 1000,
+								   DetermineSleepTime(),
 								   events,
 								   lengthof(events),
 								   0 /* postmaster posts no wait_events */ );
-- 
2.38.1

#20Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#19)
Re: Using WaitEventSet in the postmaster

On Wed, Jan 11, 2023 at 4:07 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Thanks. Here's v10. I'll wait a bit longer to see if anyone else has feedback.

Pushed, after a few very minor adjustments, mostly comments. Thanks
for the reviews and pointers. I think there are quite a lot of
refactoring and refinement opportunities unlocked by this change (I
have some draft proposals already), but for now I'll keep an eye on
the build farm.

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#20)
Re: Using WaitEventSet in the postmaster

Thomas Munro <thomas.munro@gmail.com> writes:

Pushed, after a few very minor adjustments, mostly comments. Thanks
for the reviews and pointers. I think there are quite a lot of
refactoring and refinement opportunities unlocked by this change (I
have some draft proposals already), but for now I'll keep an eye on
the build farm.

skink seems to have found a problem:

==2011873== VALGRINDERROR-BEGIN
==2011873== Syscall param epoll_wait(events) points to unaddressable byte(s)
==2011873== at 0x4D8DC73: epoll_wait (epoll_wait.c:30)
==2011873== by 0x55CA49: WaitEventSetWaitBlock (latch.c:1527)
==2011873== by 0x55D591: WaitEventSetWait (latch.c:1473)
==2011873== by 0x4F2B28: ServerLoop (postmaster.c:1729)
==2011873== by 0x4F3E85: PostmasterMain (postmaster.c:1452)
==2011873== by 0x42643C: main (main.c:200)
==2011873== Address 0x7b1e620 is 1,360 bytes inside a recently re-allocated block of size 8,192 alloc'd
==2011873== at 0x48407B4: malloc (vg_replace_malloc.c:381)
==2011873== by 0x6D9D30: AllocSetContextCreateInternal (aset.c:446)
==2011873== by 0x4F2D9B: PostmasterMain (postmaster.c:614)
==2011873== by 0x42643C: main (main.c:200)
==2011873==
==2011873== VALGRINDERROR-END

regards, tom lane

#22Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#21)
Re: Using WaitEventSet in the postmaster

On Thu, Jan 12, 2023 at 7:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

skink seems to have found a problem:

==2011873== VALGRINDERROR-BEGIN
==2011873== Syscall param epoll_wait(events) points to unaddressable byte(s)
==2011873== at 0x4D8DC73: epoll_wait (epoll_wait.c:30)
==2011873== by 0x55CA49: WaitEventSetWaitBlock (latch.c:1527)
==2011873== by 0x55D591: WaitEventSetWait (latch.c:1473)
==2011873== by 0x4F2B28: ServerLoop (postmaster.c:1729)
==2011873== by 0x4F3E85: PostmasterMain (postmaster.c:1452)
==2011873== by 0x42643C: main (main.c:200)
==2011873== Address 0x7b1e620 is 1,360 bytes inside a recently re-allocated block of size 8,192 alloc'd
==2011873== at 0x48407B4: malloc (vg_replace_malloc.c:381)
==2011873== by 0x6D9D30: AllocSetContextCreateInternal (aset.c:446)
==2011873== by 0x4F2D9B: PostmasterMain (postmaster.c:614)
==2011873== by 0x42643C: main (main.c:200)
==2011873==
==2011873== VALGRINDERROR-END

Repro'd here on Valgrind. Oh, that's interesting. WaitEventSetWait()
wants to use an internal buffer of the size given to the constructor
function, but passes the size of the caller's output buffer to
epoll_wait() and friends. Perhaps it should use Min(nevents,
set->nevents_space). I mean, I should have noticed that, but I think
that's arguably a pre-existing bug in the WES code, or at least an
unhelpful interface. Thinking...

#23Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#22)
1 attachment(s)
Re: Using WaitEventSet in the postmaster

On Thu, Jan 12, 2023 at 7:57 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Jan 12, 2023 at 7:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

skink seems to have found a problem:

==2011873== VALGRINDERROR-BEGIN
==2011873== Syscall param epoll_wait(events) points to unaddressable byte(s)
==2011873== at 0x4D8DC73: epoll_wait (epoll_wait.c:30)
==2011873== by 0x55CA49: WaitEventSetWaitBlock (latch.c:1527)
==2011873== by 0x55D591: WaitEventSetWait (latch.c:1473)
==2011873== by 0x4F2B28: ServerLoop (postmaster.c:1729)
==2011873== by 0x4F3E85: PostmasterMain (postmaster.c:1452)
==2011873== by 0x42643C: main (main.c:200)
==2011873== Address 0x7b1e620 is 1,360 bytes inside a recently re-allocated block of size 8,192 alloc'd
==2011873== at 0x48407B4: malloc (vg_replace_malloc.c:381)
==2011873== by 0x6D9D30: AllocSetContextCreateInternal (aset.c:446)
==2011873== by 0x4F2D9B: PostmasterMain (postmaster.c:614)
==2011873== by 0x42643C: main (main.c:200)
==2011873==
==2011873== VALGRINDERROR-END

Repro'd here on Valgrind. Oh, that's interesting. WaitEventSetWait()
wants to use an internal buffer of the size given to the constructor
function, but passes the size of the caller's output buffer to
epoll_wait() and friends. Perhaps it should use Min(nevents,
set->nevents_space). I mean, I should have noticed that, but I think
that's arguably a pre-existing bug in the WES code, or at least an
unhelpful interface. Thinking...

Yeah. This stops valgrind complaining here.

Attachments:

0001-Fix-WaitEventSetWait-buffer-overrun.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-WaitEventSetWait-buffer-overrun.patchDownload
From f3d201a2509affc8248a930f18a58cdb7ed3220f Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 12 Jan 2023 20:05:38 +1300
Subject: [PATCH] Fix WaitEventSetWait() buffer overrun.

The WAIT_USE_EPOLL and WAIT_USE_KQUEUE implementations of
WaitEventSetWaitBlock() confused the size of their internal buffer with
the size of the caller's output buffer, and could ask the kernel for too
many events.  In fact the set of events retrieved from the kernel needs
to be able to fit in both buffers, so take the minimum of the two.

The WAIT_USE_POLL and WAIT_USE WIN32 implementations didn't have this
confusion.

This probably didn't come up before because we always used the same
number in both places, but commit 7389aad6 calculates a dynamic size at
construction time, while using MAXLISTEN for its output event buffer on
the stack.  That seems like a reasonable thing to want to do, so
consider this to be a pre-existing bug worth fixing.

As reported by skink, valgrind and Tom Lane.

Discussion: https://postgr.es/m/901504.1673504836%40sss.pgh.pa.us
---
 src/backend/storage/ipc/latch.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index a238c5827c..d79d71a851 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1525,7 +1525,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 
 	/* Sleep */
 	rc = epoll_wait(set->epoll_fd, set->epoll_ret_events,
-					nevents, cur_timeout);
+					Min(nevents, set->nevents_space), cur_timeout);
 
 	/* Check return code */
 	if (rc < 0)
@@ -1685,7 +1685,8 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 
 	/* Sleep */
 	rc = kevent(set->kqueue_fd, NULL, 0,
-				set->kqueue_ret_events, nevents,
+				set->kqueue_ret_events,
+				Min(nevents, set->nevents_space),
 				timeout_p);
 
 	/* Check return code */
-- 
2.35.1

#24Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#23)
Re: Using WaitEventSet in the postmaster

Hi,

On 2023-01-12 20:35:43 +1300, Thomas Munro wrote:

Subject: [PATCH] Fix WaitEventSetWait() buffer overrun.

The WAIT_USE_EPOLL and WAIT_USE_KQUEUE implementations of
WaitEventSetWaitBlock() confused the size of their internal buffer with
the size of the caller's output buffer, and could ask the kernel for too
many events. In fact the set of events retrieved from the kernel needs
to be able to fit in both buffers, so take the minimum of the two.

The WAIT_USE_POLL and WAIT_USE WIN32 implementations didn't have this
confusion.

This probably didn't come up before because we always used the same
number in both places, but commit 7389aad6 calculates a dynamic size at
construction time, while using MAXLISTEN for its output event buffer on
the stack. That seems like a reasonable thing to want to do, so
consider this to be a pre-existing bug worth fixing.

As reported by skink, valgrind and Tom Lane.

Discussion: /messages/by-id/901504.1673504836@sss.pgh.pa.us

Makes sense. We should backpatch this, I think?

Greetings,

Andres Freund

#25Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#24)
Re: Using WaitEventSet in the postmaster

On Fri, Jan 13, 2023 at 7:26 AM Andres Freund <andres@anarazel.de> wrote:

On 2023-01-12 20:35:43 +1300, Thomas Munro wrote:

Subject: [PATCH] Fix WaitEventSetWait() buffer overrun.

Makes sense. We should backpatch this, I think?

Done.

#26Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#25)
1 attachment(s)
Re: Using WaitEventSet in the postmaster

The nearby thread about searching for uses of volatile reminded me: we
can now drop a bunch of these in postmaster.c. The patch I originally
wrote to do that as part of this series somehow morphed into an
experimental patch to nuke all global variables[1]/messages/by-id/CA+hUKGKH_RPAo=NgPfHKj--565aL1qiVpUGdWt1_pmJehY+dmw@mail.gmail.com, but of course we
should at least drop the now redundant use of volatile and
sigatomic_t. See attached.

[1]: /messages/by-id/CA+hUKGKH_RPAo=NgPfHKj--565aL1qiVpUGdWt1_pmJehY+dmw@mail.gmail.com

Attachments:

0001-Remove-unneeded-volatile-qualitifiers-from-postmaste.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-unneeded-volatile-qualitifiers-from-postmaste.patchDownload
From 74f8b703f1a23b47bf613813014bc432c62c75d8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 28 Jan 2023 14:08:23 +1300
Subject: [PATCH] Remove unneeded volatile qualitifiers from postmaster.c.

Several flags were marked volatile and in some cases use sigatomic_t
because they were accessed from signal handlers.  After commit 7389aad6,
we can just use unqualified bool.
---
 src/backend/postmaster/postmaster.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 62fba5fcee..f92dbc2270 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -359,17 +359,17 @@ bool		ClientAuthInProgress = false;	/* T during new-client
 bool		redirection_done = false;	/* stderr redirected for syslogger? */
 
 /* received START_AUTOVAC_LAUNCHER signal */
-static volatile sig_atomic_t start_autovac_launcher = false;
+static bool start_autovac_launcher = false;
 
 /* the launcher needs to be signaled to communicate some condition */
-static volatile bool avlauncher_needs_signal = false;
+static bool avlauncher_needs_signal = false;
 
 /* received START_WALRECEIVER signal */
-static volatile sig_atomic_t WalReceiverRequested = false;
+static bool WalReceiverRequested = false;
 
 /* set when there's a worker that needs to be started up */
-static volatile bool StartWorkerNeeded = true;
-static volatile bool HaveCrashedWorker = false;
+static bool StartWorkerNeeded = true;
+static bool HaveCrashedWorker = false;
 
 /* set when signals arrive */
 static volatile sig_atomic_t pending_pm_pmsignal;
-- 
2.38.1

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#26)
Re: Using WaitEventSet in the postmaster

Thomas Munro <thomas.munro@gmail.com> writes:

The nearby thread about searching for uses of volatile reminded me: we
can now drop a bunch of these in postmaster.c. The patch I originally
wrote to do that as part of this series somehow morphed into an
experimental patch to nuke all global variables[1], but of course we
should at least drop the now redundant use of volatile and
sigatomic_t. See attached.

+1

regards, tom lane

#28Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#26)
Re: Using WaitEventSet in the postmaster

Hi,

On 2023-01-28 14:25:38 +1300, Thomas Munro wrote:

The nearby thread about searching for uses of volatile reminded me: we
can now drop a bunch of these in postmaster.c. The patch I originally
wrote to do that as part of this series somehow morphed into an
experimental patch to nuke all global variables[1],

Hah.

but of course we should at least drop the now redundant use of volatile and
sigatomic_t. See attached.

+1

Greetings,

Andres Freund