Latch implementation that wakes on postmaster death on both win32 and Unix

Started by Peter Geogheganover 14 years ago46 messages
#1Peter Geoghegan
peter@2ndquadrant.com
1 attachment(s)

Attached is the latest revision of the latch implementation that
monitors postmaster death, plus the archiver client that now relies on
that new functionality and thereby works well without a tight
PostmasterIsAlive() polling loop.

On second thought, it is reasonable for the patch to be evaluated with
the archiver changes. Any problems that we'll have with latch changes
are likely problems that all WL_POSTMASTER_DEATH latch clients will
have, so we might as well include the simplest such client initially.
Once I have buy-in on the latch changes, the archiver work becomes
uncontroversial, I think.

The lifesign terminology has been dropped. We now close() the file
descriptor that represents "ownership" - the write end of our
anonymous pipe - in each child backend directly in the forking
machinery (the thin fork() wrapper for the non-EXEC_BACKEND case),
through a call to ReleasePostmasterDeathWatchHandle(). We don't have
to do that on Windows, and we don't.

I've handled the non-win32 EXEC_BACKEND case, which I understand just
exists for testing purposes. I've done the usual BackendParameters
stuff.

A ReleasePostmasterDeathWatchHandle() call is unnecessary on win32
(the function doesn't exist there - the need to call it on Unix is a
result of its implementation). I'd like to avoid having calls to it in
each auxiliary process. It should be called in a single sweet spot
that doesn't put any burden on child process authors to remember to
call it themselves.

Disappointingly, and despite a big effort, there doesn't seem to be a
way to have the win32 WaitForMultipleObjects() call wake on postmaster
death in addition to everything else in the same way that select()
does, so there are now two blocking calls, each in a thread of its own
(when the latch code is interested in postmaster death - otherwise,
it's single threaded as before).

The threading stuff (in particular, the fact that we used a named pipe
in a thread where the name of the pipe comes from the process PID) is
inspired by win32 signal emulation, src/backend/port/win32/signal.c .

You can easily observe that it works as advertised on Windows by
starting Postgres with archiving, using task manager to monitor
processes, and doing the following to the postmaster (assuming it has
a PID of 1234). This is the Windows equivalent of kill -9 :

C:\Users\Peter>taskkill /pid 1234 /F

You'll see that it takes about a second for the archiver to exit. All
processes exit.

Thoughts?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

new_latch.patchtext/x-patch; charset=US-ASCII; name=new_latch.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e71090f..b1d38f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10150,7 +10150,7 @@ retry:
 					/*
 					 * Wait for more WAL to arrive, or timeout to be reached
 					 */
-					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
+					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 				}
 				else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..c60986c 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -94,6 +94,7 @@
 
 #include "miscadmin.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
 /* Are we currently in WaitLatch? The signal handler would like to know. */
@@ -108,6 +109,15 @@ static void initSelfPipe(void);
 static void drainSelfPipe(void);
 static void sendSelfPipeByte(void);
 
+/* 
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of 
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+
+extern int postmaster_alive_fds[2];
 
 /*
  * Initialize a backend-local latch.
@@ -188,22 +198,22 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns bit field indicating which condition(s) caused the wake-up.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +221,13 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 1000000L;
 		tv.tv_usec = timeout % 1000000L;
@@ -224,7 +235,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +246,30 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
-		{
-			result = 1;
-			break;
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
+ 		{
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 * Since latch is set, no other factor could have 
+			 * coincided that could make us wake up 
+			 * independently of the latch being set, so no
+			 * need to worry about having missed something.
+			 */
+			break; 
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +277,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +293,35 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 					(errcode_for_socket_access(),
 					 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask))))
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;				/* data available in socket */
+			if ((wakeEvents & WL_SOCKET_READABLE ) && FD_ISSET(sock, &input_mask))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; /* data available in socket */
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
+		if ((wakeEvents & WL_POSTMASTER_DEATH) && 
+			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask) && 
+			 !PostmasterIsAlive(true))
+ 		{
+			result |= WL_POSTMASTER_DEATH;
+ 			found = true;
+ 		}
 	}
+	while(!found);
+
 	waiting = false;
 
 	return result;
@@ -430,3 +470,99 @@ drainSelfPipe(void)
 			elog(ERROR, "unexpected EOF on self-pipe");
 	}
 }
+
+/*
+ * Called once from the postmaster, so that child processes can subsequently  
+ * monitor if their parent is dead. We open up an anoymous pipe, and have child 
+ * processes block on a select() call that examines if the read file descriptor 
+ * is ready for reading. They do so through a latch.
+ *
+ * Child processes are responsible for releasing the death watch handler, so 
+ * that only the postmaster holds it, and a select() on the fd returns upon the 
+ * one and only holder (the postmaster) dying.
+ *
+ * This is a trick that obviates the need for auxiliary backends to have tight 
+ * polling loops where they check if the postmaster is alive. We do this because 
+ * that pattern results in an excessive number of wakeups per second when idle.
+ */
+void 
+InitPostmasterDeathWatchHandle(void)
+{
+	int flags;
+
+	/* 
+	 * Create pipe. The postmaster is deemed dead if
+	 * no process has the writing end (POSTMASTER_FD_OWN) open.
+	 *
+	 */
+	Assert(MyProcPid == PostmasterPid);
+	if (pipe(postmaster_alive_fds)) 
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg( "pipe() call failed to create pipe to monitor postmaster death: %s", strerror(errno))));
+	}
+
+	flags = fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_GETFL);
+	if (flags < 0)
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+
+	/* 
+	 * Set FNONBLOCK to allow checking for the fd's presence with a read() call
+	 * and FASYNC to deliver a signal to our process group if the life sign vanishes
+	 */
+	flags |= FNONBLOCK | FASYNC;
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, flags))
+	{	
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+ 
+	/* Send SIGIO signal to the whole process group */
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETOWN, -getpgrp())) 
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to set the life sign's watching end's notification pid: %s", strerror(errno))));
+	}
+}
+
+/*
+ * Release postmaster death watch handle.
+ *
+ * Important: This must be called immediately after a process 
+ * forks from the postmaster. Otherwise, latch clients will 
+ * not wake up on postmaster death, even if they have requested 
+ * to.
+ *
+ * Even some hypothetical backend that doesn't care about postmaster
+ * death has a responsibility to call this function - otherwise,
+ * some other latch client backend could wait in vain to be informed 
+ * of postmaster death, because the irresponsible backend held open
+ * the ownership file descriptor and outlived the postmaster.
+ *
+ * We call the function within the fork machinery to handle all cases,
+ * so new backends need not bother with this themselves
+ */
+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[ POSTMASTER_FD_OWN])) 
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to close file descriptor associated with Postmaster death in child process %d", MyProcPid)));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 3509302..26c4655 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -25,8 +25,16 @@
 #include "miscadmin.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
+static DWORD WINAPI
+death_pipe_thread(LPVOID param);
+static DWORD WINAPI
+block_death_pipe_thread(LPVOID param);
+
+static volatile Latch *current_latch = NULL;
+static volatile bool postmasted_died = false;
 
 void
 InitLatch(volatile Latch *latch)
@@ -81,15 +89,14 @@ DisownLatch(volatile Latch *latch)
 	latch->owner_pid = 0;
 }
 
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 int
-WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, long timeout)
 {
 	DWORD		rc;
 	HANDLE		events[3];
@@ -97,19 +104,20 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 	HANDLE		sockevent = WSA_INVALID_EVENT; /* silence compiler */
 	int			numevents;
 	int			result = 0;
+	bool		found = false;
 
 	latchevent = latch->event;
 
 	events[0] = latchevent;
 	events[1] = pgwin32_signal_event;
 	numevents = 2;
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		int			flags = 0;
 
-		if (forRead)
+		if (wakeEvents & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-		if (forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
 		sockevent = WSACreateEvent();
@@ -117,8 +125,9 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 		events[numevents++] = sockevent;
 	}
 
-	for (;;)
+	do
 	{
+		HANDLE pipe_monitor_thread = NULL;
 		/*
 		 * Reset the event, and check if the latch is set already. If someone
 		 * sets the latch between this and the WaitForMultipleObjects() call
@@ -129,18 +138,52 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
 		if (latch->is_set)
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 * Since latch is set, no other factor could have 
+			 * coincided that could make us wake up 
+			 * independently of the latch being set, so no
+			 * need to worry about having missed something.
+			 */
 			break;
 		}
 
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			current_latch = latch;
+			/* Start thread to monitor postmaster death 
+			 * independently of monitoring everything else
+			 * in WaitForMultipleObjects() call below
+			 */
+			pipe_monitor_thread = CreateThread(NULL, 0, block_death_pipe_thread, NULL, 0, NULL);
+			Assert(pipe_monitor_thread);
+			if (pipe_monitor_thread == NULL)
+				ereport(FATAL,
+						(errmsg_internal("failed to create postmaster death monitoring thread")));
+		}
+
 		rc = WaitForMultipleObjects(numevents, events, FALSE,
 							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
-		if (rc == WAIT_FAILED)
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+			TerminateThread(pipe_monitor_thread, 0);
+		
+
+		if ( (wakeEvents & WL_POSTMASTER_DEATH) && 
+			 postmasted_died && 
+			 !PostmasterIsAlive(true))
+		{
+			/* Postmaster died */
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
+		}
+		else if (rc == WAIT_FAILED)
 			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
 		else if (rc == WAIT_TIMEOUT)
 		{
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
 		else if (rc == WAIT_OBJECT_0 + 1)
 			pgwin32_dispatch_queued_signals();
@@ -155,17 +198,24 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 				ereport(FATAL,
 						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
 
-			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
-				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
-				result = 2;
-			break;
+			if ((wakeEvents & WL_SOCKET_READABLE ) && (resEvents.lNetworkEvents & FD_READ))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; 
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && (resEvents.lNetworkEvents & FD_WRITE))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
 		else if (rc != WAIT_OBJECT_0)
 			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
 	}
+	while(!found);
 
 	/* Clean up the handle we created for the socket */
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		WSAEventSelect(sock, sockevent, 0);
 		WSACloseEvent(sockevent);
@@ -214,3 +264,96 @@ ResetLatch(volatile Latch *latch)
 {
 	latch->is_set = false;
 }
+
+/*
+ * Called once from the postmaster, so that its child processes can 
+ * subsequently monitor if their parent is dead. 
+ *
+ * Spawns a thread that creates pipe instances as needed for child processes
+ * to block on through ReadFile() calls. If the call ever returns, the 
+ * postmaster is dead.
+ */
+void 
+InitPostmasterDeathWatchHandle(void)
+{
+	HANDLE		 pipe_spawn_thread;	
+	/* Create thread for spawning pipe instances as needed */
+	Assert(MyProcPid == PostmasterPid);
+	pipe_spawn_thread = CreateThread(NULL, 0, death_pipe_thread, NULL, 0, NULL);
+
+	if (pipe_spawn_thread == NULL)
+		ereport(FATAL,
+			(errmsg_internal("failed to create postmaster death watch handler thread")));
+}
+
+/* Postmaster death handling thread */
+static DWORD WINAPI
+death_pipe_thread(LPVOID param)
+{
+	char		pipename[128];
+	snprintf(pipename, sizeof(pipename), "\\\\.\\pipe\\pm_death_%d", PostmasterPid);
+
+	for(;;)
+	{
+		BOOL success;
+		HANDLE pipe;
+
+		pipe = CreateNamedPipe(pipename,
+							   PIPE_ACCESS_OUTBOUND | FILE_FLAG_WRITE_THROUGH,
+							   PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT,
+							   PIPE_UNLIMITED_INSTANCES, 4096, 4096, 0, NULL);    
+
+		if (pipe == INVALID_HANDLE_VALUE) 
+		{
+			write_stderr("CreateNamedPipe failed when creating death pipe, GetLastError(): %d\n", (int) GetLastError());
+			return -1;
+		}
+	
+		success = ConnectNamedPipe(pipe, NULL);
+		if (!success)
+		{
+			write_stderr("ConnectNamedPipe failed when creating death pipe, GetLastError(): %d\n", (int) GetLastError());
+			return -1;
+		}
+	}
+	return 0;
+}
+
+/* Postmaster death pipe blocking thread */
+static DWORD WINAPI
+block_death_pipe_thread(LPVOID param)
+{
+	char 		buf[4096];
+	DWORD		cbRead;
+	HANDLE		pipe;
+	char		pipename[128];
+
+	snprintf(pipename, sizeof(pipename), "\\\\.\\pipe\\pm_death_%d", PostmasterPid);
+	postmasted_died = false;
+
+	pipe = CreateFile(
+		pipename,
+		GENERIC_READ,    /* read and write access */
+		FILE_SHARE_READ, /* read sharing */
+		NULL,            /* default security attributes */
+		OPEN_EXISTING,
+		0,
+		NULL);
+	/* Block on call to ReadFile */
+	ReadFile( 
+		pipe,
+		buf,     /* buffer to receive reply */
+		4096,    /* size of buffer */
+		&cbRead, /* number of bytes read */
+		NULL);   /* not overlapped */
+
+	/* If control ever reaches here, the postmaster died */
+	postmasted_died = true;
+	/* We've set postmaster_died; thus, this won't be 
+	 * spuriously handled as a regular case of 
+	 * calling SetLatch()
+	 */
+	SetLatch(current_latch);
+	return 0;
+}
+
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..6e2f37a 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -11,6 +11,7 @@
  */
 #include "postgres.h"
 #include "postmaster/fork_process.h"
+#include "storage/latch.h"
 
 #include <fcntl.h>
 #include <time.h>
@@ -61,6 +62,7 @@ fork_process(void)
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
+		ReleasePostmasterDeathWatchHandle();
 
 		/*
 		 * By default, Linux tends to kill the postmaster in out-of-memory
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..1548a3b 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by 
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -370,26 +394,28 @@ pgarch_MainLoop(void)
 			last_copy_time = time(NULL);
 		}
 
-		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		/* 
+		 * Wait on latch, until various signals are received, or 
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time, or on the postmaster's
+		 * untimely demise.
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals resetting the timeout of 
+		 * WaitLatch()/select() on some platforms can be safely disregarded, 
+		 * because we handle all expected signals, and all handlers 
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);	
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL - 
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
 
 		/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 1e2aa9f..a010c11 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -356,6 +356,7 @@ static void RandomSalt(char *md5Salt);
 static void signal_child(pid_t pid, int signal);
 static bool SignalSomeChildren(int signal, int targets);
 
+
 #define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
 
 /*
@@ -443,6 +444,7 @@ typedef struct
 	HANDLE		syslogPipe[2];
 #else
 	int			syslogPipe[2];
+	int			postmaster_alive_fds[2];
 #endif
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
@@ -472,6 +474,15 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
 
+/* 
+ * 2 file descriptors that represent postmaster lifesign.
+ * First is LIFESIGN_FD_WATCH, second is LIFESIGN_FD_OWN.
+ * (macros defined in unix_latch.c)
+ */
+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif
+
 
 /*
  * Postmaster main entry point
@@ -492,6 +503,13 @@ PostmasterMain(int argc, char *argv[])
 	IsPostmasterEnvironment = true;
 
 	/*
+	 * Initialise mechanism that allows waiting latch clients 
+	 * to wake on postmaster death, to finish their
+	 * remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
+
+	/*
 	 * for security, no dir or file created can be group or other accessible
 	 */
 	umask(S_IRWXG | S_IRWXO);
@@ -4753,6 +4771,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
 	strlcpy(param->my_exec_path, my_exec_path, MAXPGPATH);
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
@@ -4968,6 +4989,10 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
+
 	strlcpy(my_exec_path, param->my_exec_path, MAXPGPATH);
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 1d4df8a..efda804 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -171,7 +171,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		 * postmaster death regularly while waiting. Note that timeout here
 		 * does not necessarily release from loop.
 		 */
-		WaitLatch(&MyProc->waitLatch, 60000000L);
+		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
 
 		/* Must reset the latch before testing state. */
 		ResetLatch(&MyProc->waitLatch);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..27cc350 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -805,8 +805,9 @@ WalSndLoop(void)
 			}
 
 			/* Sleep */
-			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
-							  true, pq_is_send_pending(),
+			WaitLatchOrSocket(&MyWalSnd->latch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE | (pq_is_send_pending()? WL_SOCKET_WRITEABLE:0) |  WL_TIMEOUT,
+							  MyProcPort->sock,
 							  sleeptime * 1000L);
 
 			/* Check for replication timeout */
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 03ec071..5418c73 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -38,11 +38,11 @@ extern void InitLatch(volatile Latch *latch);
 extern void InitSharedLatch(volatile Latch *latch);
 extern void OwnLatch(volatile Latch *latch);
 extern void DisownLatch(volatile Latch *latch);
-extern bool WaitLatch(volatile Latch *latch, long timeout);
-extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
-				  bool forRead, bool forWrite, long timeout);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
 extern void SetLatch(volatile Latch *latch);
 extern void ResetLatch(volatile Latch *latch);
+extern void InitPostmasterDeathWatchHandle(void);
 
 #define TestLatch(latch) (((volatile Latch *) latch)->is_set)
 
@@ -52,8 +52,20 @@ extern void ResetLatch(volatile Latch *latch);
  */
 #ifndef WIN32
 extern void latch_sigusr1_handler(void);
+/* 
+ * On Unix, it is necessary to call ReleasePostmasterDeathWatchHandle() 
+ * after forking from PM
+ */
+extern void ReleasePostmasterDeathWatchHandle(void);
 #else
 #define latch_sigusr1_handler()
 #endif
 
+/* Bitmasks for events that may wake-up WaitLatch() clients */
+#define WL_LATCH_SET         (1 << 0)
+#define WL_SOCKET_READABLE   (1 << 1)
+#define WL_SOCKET_WRITEABLE  (1 << 2)
+#define WL_TIMEOUT           (1 << 3)
+#define WL_POSTMASTER_DEATH  (1 << 4)
+
 #endif   /* LATCH_H */
#2Peter Geoghegan
peter@2ndquadrant.com
In reply to: Peter Geoghegan (#1)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

I'm a bit disappointed that no one has commented on this yet. I would
have appreciated some preliminary feedback.

Anyway, I've added it to CommitFest 2011-06.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#1)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 24.05.2011 23:43, Peter Geoghegan wrote:

Attached is the latest revision of the latch implementation that
monitors postmaster death, plus the archiver client that now relies on
that new functionality and thereby works well without a tight
PostmasterIsAlive() polling loop.

The Unix-stuff looks good to me at a first glance.

The lifesign terminology has been dropped. We now close() the file
descriptor that represents "ownership" - the write end of our
anonymous pipe - in each child backend directly in the forking
machinery (the thin fork() wrapper for the non-EXEC_BACKEND case),
through a call to ReleasePostmasterDeathWatchHandle(). We don't have
to do that on Windows, and we don't.

There's one reference left to "life sign" in comments. (FWIW, I don't
have a problem with that terminology myself)

Disappointingly, and despite a big effort, there doesn't seem to be a
way to have the win32 WaitForMultipleObjects() call wake on postmaster
death in addition to everything else in the same way that select()
does, so there are now two blocking calls, each in a thread of its own
(when the latch code is interested in postmaster death - otherwise,
it's single threaded as before).

The threading stuff (in particular, the fact that we used a named pipe
in a thread where the name of the pipe comes from the process PID) is
inspired by win32 signal emulation, src/backend/port/win32/signal.c .

That's a pity, all those threads and named pipes are a bit gross for a
safety mechanism like this.

Looking at the MSDN docs again, can't you simply include
PostmasterHandle in the WaitForMultipleObjects() call to have it return
when the process dies? It should be possible to mix different kind of
handles in one call, including process handles. Does it not work as
advertised?

You can easily observe that it works as advertised on Windows by
starting Postgres with archiving, using task manager to monitor
processes, and doing the following to the postmaster (assuming it has
a PID of 1234). This is the Windows equivalent of kill -9 :

C:\Users\Peter>taskkill /pid 1234 /F

You'll see that it takes about a second for the archiver to exit. All
processes exit.

Hmm, shouldn't the archiver exit almost instantaneously now that there's
no polling anymore?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#3)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 26 May 2011 11:22, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

The Unix-stuff looks good to me at a first glance.

Good.

There's one reference left to "life sign" in comments. (FWIW, I don't have a
problem with that terminology myself)

Should have caught that one. Removed.

Looking at the MSDN docs again, can't you simply include PostmasterHandle in
the WaitForMultipleObjects() call to have it return when the process dies?
It should be possible to mix different kind of handles in one call,
including process handles. Does it not work as advertised?

Uh, I might have done that, had I been aware of PostmasterHandle. I
tried various convoluted ways to make it do what ReadFile() did for
me, before finally biting the bullet and just using ReadFile() in a
separate thread. I've tried adding PostmasterHandle though, and it
works well - it appears to behave exactly the same as my original
implementation.

This simplifies things considerably. Now, on win32, things are
actually simpler than on Unix.

You'll see that it takes about a second for the archiver to exit. All
processes exit.

Hmm, shouldn't the archiver exit almost instantaneously now that there's no
polling anymore?

Actually, just one "lagger" process sometimes remains that takes maybe
as long as a second, a bit longer than the others. I assumed that it
was the archiver, but I was probably wrong. I also didn't see that
very consistently.

Attached revision doesn't use any threads or pipes on win32. It's far
neater there. I'm still seeing that "lagger" process (which is an
overstatement) at times, so I guess it is normal. On Windows, there is
no detailed PS output, so I actually don't know what the lagger
process is, and no easy way to determine that immediately occurs to
me.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

new_latch.patchtext/x-patch; charset=US-ASCII; name=new_latch.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e71090f..b1d38f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10150,7 +10150,7 @@ retry:
 					/*
 					 * Wait for more WAL to arrive, or timeout to be reached
 					 */
-					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
+					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 				}
 				else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..fa1d382 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -94,6 +94,7 @@
 
 #include "miscadmin.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
 /* Are we currently in WaitLatch? The signal handler would like to know. */
@@ -108,6 +109,15 @@ static void initSelfPipe(void);
 static void drainSelfPipe(void);
 static void sendSelfPipeByte(void);
 
+/* 
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of 
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+
+extern int postmaster_alive_fds[2];
 
 /*
  * Initialize a backend-local latch.
@@ -188,22 +198,22 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns bit field indicating which condition(s) caused the wake-up.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +221,13 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 1000000L;
 		tv.tv_usec = timeout % 1000000L;
@@ -224,7 +235,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +246,30 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
-		{
-			result = 1;
-			break;
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
+ 		{
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 * Since latch is set, no other factor could have 
+			 * coincided that could make us wake up 
+			 * independently of the latch being set, so no
+			 * need to worry about having missed something.
+			 */
+			break; 
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +277,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +293,35 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 					(errcode_for_socket_access(),
 					 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask))))
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;				/* data available in socket */
+			if ((wakeEvents & WL_SOCKET_READABLE ) && FD_ISSET(sock, &input_mask))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; /* data available in socket */
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
+		if ((wakeEvents & WL_POSTMASTER_DEATH) && 
+			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask) && 
+			 !PostmasterIsAlive(true))
+ 		{
+			result |= WL_POSTMASTER_DEATH;
+ 			found = true;
+ 		}
 	}
+	while(!found);
+
 	waiting = false;
 
 	return result;
@@ -430,3 +470,99 @@ drainSelfPipe(void)
 			elog(ERROR, "unexpected EOF on self-pipe");
 	}
 }
+
+/*
+ * Called once from the postmaster, so that child processes can subsequently  
+ * monitor if their parent is dead. We open up an anoymous pipe, and have child 
+ * processes block on a select() call that examines if the read file descriptor 
+ * is ready for reading. They do so through a latch.
+ *
+ * Child processes are responsible for releasing the death watch handler, so 
+ * that only the postmaster holds it, and a select() on the fd returns upon the 
+ * one and only holder (the postmaster) dying.
+ *
+ * This is a trick that obviates the need for auxiliary backends to have tight 
+ * polling loops where they check if the postmaster is alive. We do this because 
+ * that pattern results in an excessive number of wakeups per second when idle.
+ */
+void 
+InitPostmasterDeathWatchHandle(void)
+{
+	int flags;
+
+	/* 
+	 * Create pipe. The postmaster is deemed dead if
+	 * no process has the writing end (POSTMASTER_FD_OWN) open.
+	 *
+	 */
+	Assert(MyProcPid == PostmasterPid);
+	if (pipe(postmaster_alive_fds)) 
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg( "pipe() call failed to create pipe to monitor postmaster death: %s", strerror(errno))));
+	}
+
+	flags = fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_GETFL);
+	if (flags < 0)
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+
+	/* 
+	 * Set FNONBLOCK to allow checking for the fd's presence with a read() call
+	 * and FASYNC to deliver a signal to our process group if the descriptor vanishes
+	 */
+	flags |= FNONBLOCK | FASYNC;
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, flags))
+	{	
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+ 
+	/* Send SIGIO signal to the whole process group */
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETOWN, -getpgrp())) 
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to set the postmaster's watching end's notification: %s", strerror(errno))));
+	}
+}
+
+/*
+ * Release postmaster death watch handle.
+ *
+ * Important: This must be called immediately after a process 
+ * forks from the postmaster. Otherwise, latch clients will 
+ * not wake up on postmaster death, even if they have requested 
+ * to.
+ *
+ * Even some hypothetical backend that doesn't care about postmaster
+ * death has a responsibility to call this function - otherwise,
+ * some other latch client backend could wait in vain to be informed 
+ * of postmaster death, because the irresponsible backend held open
+ * the ownership file descriptor and outlived the postmaster.
+ *
+ * We call the function within the fork machinery to handle all cases,
+ * so new backends need not bother with this themselves
+ */
+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[ POSTMASTER_FD_OWN])) 
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to close file descriptor associated with Postmaster death in child process %d", MyProcPid)));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 3509302..ad5d914 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -25,8 +25,10 @@
 #include "miscadmin.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
+extern HANDLE PostmasterHandle;
 
 void
 InitLatch(volatile Latch *latch)
@@ -81,35 +83,35 @@ DisownLatch(volatile Latch *latch)
 	latch->owner_pid = 0;
 }
 
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 int
-WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, long timeout)
 {
 	DWORD		rc;
-	HANDLE		events[3];
+	HANDLE		events[4];
 	HANDLE		latchevent;
 	HANDLE		sockevent = WSA_INVALID_EVENT; /* silence compiler */
 	int			numevents;
 	int			result = 0;
+	bool		found = false;
 
 	latchevent = latch->event;
 
 	events[0] = latchevent;
 	events[1] = pgwin32_signal_event;
 	numevents = 2;
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		int			flags = 0;
 
-		if (forRead)
+		if (wakeEvents & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-		if (forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
 		sockevent = WSACreateEvent();
@@ -117,7 +119,12 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 		events[numevents++] = sockevent;
 	}
 
-	for (;;)
+	if (wakeEvents & WL_POSTMASTER_DEATH)
+	{
+		events[numevents++] = PostmasterHandle;
+	}
+
+	do
 	{
 		/*
 		 * Reset the event, and check if the latch is set already. If someone
@@ -129,22 +136,37 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
 		if (latch->is_set)
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 * Since latch is set, no other factor could have 
+			 * coincided that could make us wake up 
+			 * independently of the latch being set, so no
+			 * need to worry about having missed something.
+			 */
 			break;
 		}
-
 		rc = WaitForMultipleObjects(numevents, events, FALSE,
 							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
-		if (rc == WAIT_FAILED)
+
+		if ( (wakeEvents & WL_POSTMASTER_DEATH) && 
+			 !PostmasterIsAlive(true))
+		{
+			/* Postmaster died */
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
+		}
+		else if (rc == WAIT_FAILED)
 			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
 		else if (rc == WAIT_TIMEOUT)
 		{
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
 		else if (rc == WAIT_OBJECT_0 + 1)
 			pgwin32_dispatch_queued_signals();
-		else if (rc == WAIT_OBJECT_0 + 2)
+		else if (rc == WAIT_OBJECT_0 + 2 && 
+				 ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE))) 
 		{
 			WSANETWORKEVENTS resEvents;
 
@@ -155,17 +177,24 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 				ereport(FATAL,
 						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
 
-			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
-				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
-				result = 2;
-			break;
+			if ((wakeEvents & WL_SOCKET_READABLE) && (resEvents.lNetworkEvents & FD_READ))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; 
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && (resEvents.lNetworkEvents & FD_WRITE))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
 		else if (rc != WAIT_OBJECT_0)
 			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
 	}
+	while(!found);
 
 	/* Clean up the handle we created for the socket */
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		WSAEventSelect(sock, sockevent, 0);
 		WSACloseEvent(sockevent);
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..6e2f37a 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -11,6 +11,7 @@
  */
 #include "postgres.h"
 #include "postmaster/fork_process.h"
+#include "storage/latch.h"
 
 #include <fcntl.h>
 #include <time.h>
@@ -61,6 +62,7 @@ fork_process(void)
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
+		ReleasePostmasterDeathWatchHandle();
 
 		/*
 		 * By default, Linux tends to kill the postmaster in out-of-memory
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..1548a3b 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by 
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -370,26 +394,28 @@ pgarch_MainLoop(void)
 			last_copy_time = time(NULL);
 		}
 
-		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		/* 
+		 * Wait on latch, until various signals are received, or 
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time, or on the postmaster's
+		 * untimely demise.
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals resetting the timeout of 
+		 * WaitLatch()/select() on some platforms can be safely disregarded, 
+		 * because we handle all expected signals, and all handlers 
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);	
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL - 
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
 
 		/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 1e2aa9f..dd2335e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -356,6 +356,7 @@ static void RandomSalt(char *md5Salt);
 static void signal_child(pid_t pid, int signal);
 static bool SignalSomeChildren(int signal, int targets);
 
+
 #define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
 
 /*
@@ -443,6 +444,7 @@ typedef struct
 	HANDLE		syslogPipe[2];
 #else
 	int			syslogPipe[2];
+	int			postmaster_alive_fds[2];
 #endif
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
@@ -472,6 +474,15 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
 
+/* 
+ * 2 file descriptors that monitoring if postmaster is alive.
+ * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+ * (macros defined in unix_latch.c)
+ */
+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif
+
 
 /*
  * Postmaster main entry point
@@ -491,6 +502,15 @@ PostmasterMain(int argc, char *argv[])
 
 	IsPostmasterEnvironment = true;
 
+#ifndef WIN32
+	/*
+	 * Initialise mechanism that allows waiting latch clients 
+	 * to wake on postmaster death, to finish their
+	 * remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
+#endif
+
 	/*
 	 * for security, no dir or file created can be group or other accessible
 	 */
@@ -4753,6 +4773,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
 	strlcpy(param->my_exec_path, my_exec_path, MAXPGPATH);
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
@@ -4968,6 +4991,10 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
+
 	strlcpy(my_exec_path, param->my_exec_path, MAXPGPATH);
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 1d4df8a..efda804 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -171,7 +171,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		 * postmaster death regularly while waiting. Note that timeout here
 		 * does not necessarily release from loop.
 		 */
-		WaitLatch(&MyProc->waitLatch, 60000000L);
+		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
 
 		/* Must reset the latch before testing state. */
 		ResetLatch(&MyProc->waitLatch);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..27cc350 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -805,8 +805,9 @@ WalSndLoop(void)
 			}
 
 			/* Sleep */
-			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
-							  true, pq_is_send_pending(),
+			WaitLatchOrSocket(&MyWalSnd->latch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE | (pq_is_send_pending()? WL_SOCKET_WRITEABLE:0) |  WL_TIMEOUT,
+							  MyProcPort->sock,
 							  sleeptime * 1000L);
 
 			/* Check for replication timeout */
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 03ec071..5464dbc 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -38,9 +38,8 @@ extern void InitLatch(volatile Latch *latch);
 extern void InitSharedLatch(volatile Latch *latch);
 extern void OwnLatch(volatile Latch *latch);
 extern void DisownLatch(volatile Latch *latch);
-extern bool WaitLatch(volatile Latch *latch, long timeout);
-extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
-				  bool forRead, bool forWrite, long timeout);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
 extern void SetLatch(volatile Latch *latch);
 extern void ResetLatch(volatile Latch *latch);
 
@@ -52,8 +51,25 @@ extern void ResetLatch(volatile Latch *latch);
  */
 #ifndef WIN32
 extern void latch_sigusr1_handler(void);
+/*
+ * On unix, it is necessary to Init monitoring
+ * of postmaster being alive
+ */
+extern void InitPostmasterDeathWatchHandle(void);
+/* 
+ * It is also necessary to call ReleasePostmasterDeathWatchHandle() 
+ * after forking from PM for the Unix implementation
+ */
+extern void ReleasePostmasterDeathWatchHandle(void);
 #else
 #define latch_sigusr1_handler()
 #endif
 
+/* Bitmasks for events that may wake-up WaitLatch() clients */
+#define WL_LATCH_SET         (1 << 0)
+#define WL_SOCKET_READABLE   (1 << 1)
+#define WL_SOCKET_WRITEABLE  (1 << 2)
+#define WL_TIMEOUT           (1 << 3)
+#define WL_POSTMASTER_DEATH  (1 << 4)
+
 #endif   /* LATCH_H */
#5Dave Page
dpage@pgadmin.org
In reply to: Peter Geoghegan (#4)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Thu, May 26, 2011 at 11:58 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

Attached revision doesn't use any threads or pipes on win32. It's far
neater there. I'm still seeing that "lagger" process (which is an
overstatement) at times, so I guess it is normal. On Windows, there is
no detailed PS output, so I actually don't know what the lagger
process is, and no easy way to determine that immediately occurs to
me.

Process Explorer might help you there:
http://technet.microsoft.com/en-us/sysinternals/bb896653

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#3)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

I had another quick look-over this patch, and realised that I made a
minor mistake:

+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[ POSTMASTER_FD_OWN]))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to close file descriptor associated with
Postmaster death in child process %d", MyProcPid)));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+

MyProcPid is used in this errmsg, and as noted in the first comment,
it isn't expected to be initialised when
ReleasePostmasterDeathWatchHandle() is called. Therefore, MyProcPid
should be replaced with a call to getpid(), just as it is for
Assert(PostmasterPid != getpid()).

I suppose that you could take the view that MyProcPid ought to be
initialised before the function is called, but I thought this was the
least worst way. Better to do it this way than to touch all the
different ways in which MyProcPid might be initialised, I suspect.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#6)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 16.06.2011 15:07, Peter Geoghegan wrote:

I had another quick look-over this patch, and realised that I made a
minor mistake:

+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[ POSTMASTER_FD_OWN]))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("Failed to close file descriptor associated with
Postmaster death in child process %d", MyProcPid)));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+

MyProcPid is used in this errmsg, and as noted in the first comment,
it isn't expected to be initialised when
ReleasePostmasterDeathWatchHandle() is called. Therefore, MyProcPid
should be replaced with a call to getpid(), just as it is for
Assert(PostmasterPid != getpid()).

I suppose that you could take the view that MyProcPid ought to be
initialised before the function is called, but I thought this was the
least worst way. Better to do it this way than to touch all the
different ways in which MyProcPid might be initialised, I suspect.

Hmm, I'm not sure having the pid in that error message is too useful in
the first place. The process was just spawned, and it will die at that
error. When you try to debug that sort of error, what you would compare
the pid with? And you can include the pid in log_line_prefix if it turns
out to be useful after all.

PS. error messages should begin with lower-case letter.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#7)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 16 June 2011 13:15, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Hmm, I'm not sure having the pid in that error message is too useful in the
first place. The process was just spawned, and it will die at that error.
When you try to debug that sort of error, what you would compare the pid
with? And you can include the pid in log_line_prefix if it turns out to be
useful after all.

All fair points. FWIW, I think it's pretty unlikely that anyone will
ever see this error message.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#8)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

Peter Geoghegan wrote:

--- 247,277 ----
* do that), and the select() will return immediately.
*/
drainSelfPipe();
! 		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
!  		{
! 			result |= WL_LATCH_SET;
! 			found = true;
! 			/*
! 			 * Leave loop immediately, avoid blocking again.
! 			 * Since latch is set, no other factor could have
! 			 * coincided that could make us wake up
! 			 * independently of the latch being set, so no
! 			 * need to worry about having missed something.
! 			 */
break;
}

I don't understand that comment. Why can't e.g postmaster death happen
at the same time as a latch is set? I think the code is fine as it is,
we just need to document that if there are several events that would
wake up WaitLatch(), we make no promise that we return all of them in
the return value. I believe all the callers would be fine with that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#10Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#9)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 16 June 2011 15:27, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I don't understand that comment. Why can't e.g postmaster death happen at
the same time as a latch is set? I think the code is fine as it is, we just
need to document that if there are several events that would wake up
WaitLatch(), we make no promise that we return all of them in the return
value. I believe all the callers would be fine with that.

I see your perspective...there is a window for the Postmaster to die
after the latch is set, but before it returns control to clients, and
this won't be reported. I would argue that Postmaster death didn't
actually coincide with the latch being set, because it didn't actually
cause the select() to unblock, and therefore we don't have a
responsibility to report it. Even if that view doesn't stand up to
scrutiny, and it may not, it is a fairly academic point, because, as
you say, it's unlikely that clients will ever much care. I'd be happy
to document that we make no promises, on the off chance that some
future caller might care.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#11Florian Pflug
fgp@phlo.org
In reply to: Peter Geoghegan (#2)
Re: Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On May26, 2011, at 11:25 , Peter Geoghegan wrote:

I'm a bit disappointed that no one has commented on this yet. I would
have appreciated some preliminary feedback.

I noticed to your patch doesn't seem to register a SIGIO handler, i.e.
it doesn't use async IO machinery (or rather a tiny part thereof) to
get asynchronously notified if the postmaster dies.

If that is on purpose, you can remove the fsetown() call, as it serves
no purpose without such a handler I think. Or, you might want to add
such a signal handler, and make it simply do "kill(getpid(), SIGTERM)".

best regards,
Florian Pflug

#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#6)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

This patch breaks silent_mode=on. In silent_mode, postmaster forks early
on, to detach from the controlling tty. It uses fork_process() for that,
which with patch closes the write end of the postmaster-alive pipe, but
that's wrong because the child becomes the postmaster process.

On a stylistic note, the "extern" declaration in unix_latch.c is ugly,
extern declarations should be in header files. Come to think of it, I
feel the Init- and ReleasePostmasterDeathWatchHandle() functions should
go to postmaster.c. postmaster_alive_fds[] and PostmasterHandle serve
the same purpose, declaration and initialization of both should be kept
together, perhaps by moving the initialization of PostmasterHandle into
Init- and ReleasePostmasterDeathWatchHandle().

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#13Alvaro Herrera
alvherre@commandprompt.com
In reply to: Peter Geoghegan (#8)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

Excerpts from Peter Geoghegan's message of jue jun 16 08:42:39 -0400 2011:

On 16 June 2011 13:15, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Hmm, I'm not sure having the pid in that error message is too useful in the
first place. The process was just spawned, and it will die at that error.
When you try to debug that sort of error, what you would compare the pid
with? And you can include the pid in log_line_prefix if it turns out to be
useful after all.

All fair points. FWIW, I think it's pretty unlikely that anyone will
ever see this error message.

... in which case the getpid() call is not that expensive anyway.

I agree that the PID should be part of the log_line_prefix though, which
is why I was trying to propose we include it (or the session ID) in the
default log_line_prefix along with a suitable timestamp.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#14Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#12)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 16 June 2011 16:30, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

This patch breaks silent_mode=on. In silent_mode, postmaster forks early on,
to detach from the controlling tty. It uses fork_process() for that, which
with patch closes the write end of the postmaster-alive pipe, but that's
wrong because the child becomes the postmaster process.

Attached patch revision addresses that issue. There is a thin
macro-based wrapper around fork_process(), depending on whether or not
it is desirable to ReleasePostmasterDeathWatchHandle() after forking.
All callers to fork_process() are unchanged.

On a stylistic note, the "extern" declaration in unix_latch.c is ugly,
extern declarations should be in header files.

Just an oversight.

Come to think of it, I feel
the Init- and ReleasePostmasterDeathWatchHandle() functions should go to
postmaster.c. postmaster_alive_fds[] and PostmasterHandle serve the same
purpose, declaration and initialization of both should be kept together,
perhaps by moving the initialization of PostmasterHandle into Init- and
ReleasePostmasterDeathWatchHandle().

I've removed the "no coinciding wakeEvents" comment that you objected
to (or clarified that other wakeEvents can coincide), and have
documented the fact that we make no guarantees about reporting all
events that caused a latch wake-up. We will report at least one
though.

I've moved Init- and ReleasePostmasterDeathWatchHandle() into postmaster.c .

I have to disagree with the idea of moving initialisation of
PostmasterHandle into InitPostmasterDeathWatchHandle(). Both Init-,
and Release- functions, which only exist on Unix builds, initialise
and subsequently release the watching handle. There's a symmetry to
it. If we created a win32 InitPostmasterDeathWatchHandle(), we'd have
no reason to create a win32 Release-, so the symmetry would be lost.
Also, PostmasterHandle does not exist for the express purpose of latch
clients monitoring postmaster death, unlike postmaster_alive_fds[] -
it existed before now. I guess I don't feel too strongly about it
though. It just doesn't seem like a maintainability win.

On 16 June 2011 15:49, Florian Pflug <fgp@phlo.org> wrote:

I noticed to your patch doesn't seem to register a SIGIO handler, i.e.
it doesn't use async IO machinery (or rather a tiny part thereof) to
get asynchronously notified if the postmaster dies.

If that is on purpose, you can remove the fsetown() call, as it serves
no purpose without such a handler I think. Or, you might want to add
such a signal handler, and make it simply do "kill(getpid(), SIGTERM)".

It is on purpose - I'm not interested in asynchronous notification for
the time being at least, because it doesn't occur to me how we can
handle that failure usefully in an asynchronous fashion. Anyway, that
code has been simplified, and my intent clarified. Thanks.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

new_latch.v3.patchtext/x-patch; charset=US-ASCII; name=new_latch.v3.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa0b029..691ac42 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10161,7 +10161,7 @@ retry:
 					/*
 					 * Wait for more WAL to arrive, or timeout to be reached
 					 */
-					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
+					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 				}
 				else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..e88631d 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,7 +93,9 @@
 #endif
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
 /* Are we currently in WaitLatch? The signal handler would like to know. */
@@ -188,22 +190,26 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
+ * Note that there is guarantee that callers will have all wake-up conditions
+ * returned, but we will report at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns bit field indicating which condition(s) caused the wake-up.
+ * Note that there is guarantee that callers will have all wake-up conditions
+ * returned, but we will report at least one.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +217,13 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 1000000L;
 		tv.tv_usec = timeout % 1000000L;
@@ -224,7 +231,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +242,30 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +273,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +289,35 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 					(errcode_for_socket_access(),
 					 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask))))
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;				/* data available in socket */
+			if ((wakeEvents & WL_SOCKET_READABLE ) && FD_ISSET(sock, &input_mask))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; /* data available in socket */
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
+		}
+		if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask) &&
+			 !PostmasterIsAlive(true))
+		{
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
 		}
 	}
+	while(!found);
+
 	waiting = false;
 
 	return result;
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 4bcf7b7..ea03aa2 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -23,8 +23,10 @@
 #include <unistd.h>
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
 
@@ -81,43 +83,47 @@ DisownLatch(volatile Latch *latch)
 	latch->owner_pid = 0;
 }
 
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 int
-WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, long timeout)
 {
 	DWORD		rc;
-	HANDLE		events[3];
+	HANDLE		events[4];
 	HANDLE		latchevent;
 	HANDLE		sockevent = WSA_INVALID_EVENT;	/* silence compiler */
 	int			numevents;
 	int			result = 0;
+	bool		found = false;
 
 	latchevent = latch->event;
 
 	events[0] = latchevent;
 	events[1] = pgwin32_signal_event;
 	numevents = 2;
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		int			flags = 0;
 
-		if (forRead)
+		if (wakeEvents & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-		if (forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
 		sockevent = WSACreateEvent();
 		WSAEventSelect(sock, sockevent, flags);
 		events[numevents++] = sockevent;
 	}
+	if (wakeEvents & WL_POSTMASTER_DEATH)
+	{
+		events[numevents++] = PostmasterHandle;
+	}
 
-	for (;;)
+	do
 	{
 		/*
 		 * Reset the event, and check if the latch is set already. If someone
@@ -127,24 +133,39 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 		 */
 		if (!ResetEvent(latchevent))
 			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		rc = WaitForMultipleObjects(numevents, events, FALSE,
 							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
-		if (rc == WAIT_FAILED)
+		if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
+			 !PostmasterIsAlive(true))
+		{
+			/* Postmaster died */
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
+		}
+		else if (rc == WAIT_FAILED)
 			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
 		else if (rc == WAIT_TIMEOUT)
 		{
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
 		else if (rc == WAIT_OBJECT_0 + 1)
 			pgwin32_dispatch_queued_signals();
-		else if (rc == WAIT_OBJECT_0 + 2)
+		else if (rc == WAIT_OBJECT_0 + 2 &&
+				 ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 		{
 			WSANETWORKEVENTS resEvents;
 
@@ -155,17 +176,24 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 				ereport(FATAL,
 						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
 
-			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
-				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
-				result = 2;
-			break;
+			if ((wakeEvents & WL_SOCKET_READABLE) && (resEvents.lNetworkEvents & FD_READ))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true;
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && (resEvents.lNetworkEvents & FD_WRITE))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
 		else if (rc != WAIT_OBJECT_0)
 			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
 	}
+	while(!found);
 
 	/* Clean up the handle we created for the socket */
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		WSAEventSelect(sock, sockevent, 0);
 		WSACloseEvent(sockevent);
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..a3a107a 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -10,6 +10,7 @@
  *	  src/backend/postmaster/fork_process.c
  */
 #include "postgres.h"
+#include "postmaster/postmaster.h"
 #include "postmaster/fork_process.h"
 
 #include <fcntl.h>
@@ -19,13 +20,14 @@
 #include <unistd.h>
 
 #ifndef WIN32
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
  * child in the parent process.
  */
 pid_t
-fork_process(void)
+do_fork_process(bool remain_postmaster)
 {
 	pid_t		result;
 
@@ -61,6 +63,14 @@ fork_process(void)
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
+		/*
+		 * If this isn't the PM forking for some reason other than to create
+		 * a distinct process (such as for silent_mode), release handle that
+		 * the postmaster holds to indicate its alive to certain latch client
+		 * auxiliary processes.
+		 */
+		if (!remain_postmaster)
+			ReleasePostmasterDeathWatchHandle();
 
 		/*
 		 * By default, Linux tends to kill the postmaster in out-of-memory
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..a56fe92 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -371,25 +395,27 @@ pgarch_MainLoop(void)
 		}
 
 		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		 * Wait on latch, until various signals are received, or
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time, or on the postmaster's
+		 * untimely demise.
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals resetting the timeout of
+		 * WaitLatch()/select() on some platforms can be safely disregarded,
+		 * because we handle all expected signals, and all handlers
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL -
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
 
 		/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 1f0d4e6..85901c4 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -443,6 +443,7 @@ typedef struct
 	HANDLE		syslogPipe[2];
 #else
 	int			syslogPipe[2];
+	int			postmaster_alive_fds[2];
 #endif
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
@@ -472,6 +473,13 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
 
+/*
+ * 2 file descriptors that monitoring if postmaster is alive.
+ * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+ */
+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif
 
 /*
  * Postmaster main entry point
@@ -491,6 +499,15 @@ PostmasterMain(int argc, char *argv[])
 
 	IsPostmasterEnvironment = true;
 
+#ifndef WIN32
+	/*
+	 * Initialise mechanism that allows waiting latch clients
+	 * to wake on postmaster death, to finish their
+	 * remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
+#endif
+
 	/*
 	 * for security, no dir or file created can be group or other accessible
 	 */
@@ -1307,7 +1324,7 @@ pmdaemonize(void)
 	/*
 	 * Okay to fork.
 	 */
-	pid = fork_process();
+	pid = fork_process_remain_postmaster();
 	if (pid == (pid_t) -1)
 	{
 		write_stderr("%s: could not fork background process: %s\n",
@@ -4753,6 +4770,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
 	strlcpy(param->my_exec_path, my_exec_path, MAXPGPATH);
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
@@ -4968,6 +4988,10 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
+
 	strlcpy(my_exec_path, param->my_exec_path, MAXPGPATH);
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
@@ -5083,5 +5107,86 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
 	/* Queue SIGCHLD signal */
 	pg_queue_signal(SIGCHLD);
 }
+#else
+/*
+ * Called once from the postmaster, so that child processes can subsequently
+ * monitor if their parent is dead. We open up an anoymous pipe, and have child
+ * processes block on a select() call that examines if the read file descriptor
+ * is ready for reading. They do so through a latch.
+ *
+ * Child processes are responsible for releasing the death watch handler, so
+ * that only the postmaster holds it, and a select() on the fd returns upon the
+ * one and only holder (the postmaster) dying.
+ *
+ * This is a trick that obviates the need for auxiliary backends to have tight
+ * polling loops where they check if the postmaster is alive. We do this because
+ * that pattern results in an excessive number of wakeups per second when idle.
+ */
+void
+InitPostmasterDeathWatchHandle(void)
+{
+	int flags;
+	/*
+	 * Create pipe. The postmaster is deemed dead if
+	 * no process has the writing end (POSTMASTER_FD_OWN) open.
+	 */
+	Assert(MyProcPid == PostmasterPid);
+	if (pipe(postmaster_alive_fds))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg( "pipe() call failed to create pipe to monitor postmaster death: %s", strerror(errno))));
+	}
+	flags = fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_GETFL);
+	if (flags < 0)
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+	/*
+	 * Set FNONBLOCK to allow checking for the fd's presence with a select() call
+	 */
+	flags |= FNONBLOCK;
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, FNONBLOCK))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+}
 
-#endif   /* WIN32 */
+/*
+ * Release postmaster death watch handle.
+ *
+ * Important: This must be called immediately after a process
+ * forks from the postmaster. Otherwise, latch clients will
+ * not wake up on postmaster death, even if they have requested
+ * to.
+ *
+ * Even some hypothetical backend that doesn't care about postmaster
+ * death has a responsibility to call this function - otherwise,
+ * some other latch client backend could wait in vain to be informed
+ * of postmaster death, because the irresponsible backend held open
+ * the ownership file descriptor and outlived the postmaster.
+ *
+ * We call the function within the fork machinery to handle all cases,
+ * so new backends need not bother with this themselves
+ */
+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[POSTMASTER_FD_OWN]))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to close file descriptor associated with Postmaster death in child process")));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+#endif
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 08a4086..646f90b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -171,7 +171,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		 * postmaster death regularly while waiting. Note that timeout here
 		 * does not necessarily release from loop.
 		 */
-		WaitLatch(&MyProc->waitLatch, 60000000L);
+		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
 
 		/* Must reset the latch before testing state. */
 		ResetLatch(&MyProc->waitLatch);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..27cc350 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -805,8 +805,9 @@ WalSndLoop(void)
 			}
 
 			/* Sleep */
-			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
-							  true, pq_is_send_pending(),
+			WaitLatchOrSocket(&MyWalSnd->latch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE | (pq_is_send_pending()? WL_SOCKET_WRITEABLE:0) |  WL_TIMEOUT,
+							  MyProcPort->sock,
 							  sleeptime * 1000L);
 
 			/* Check for replication timeout */
diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
index 0553fd2..e0abe5d 100644
--- a/src/include/postmaster/fork_process.h
+++ b/src/include/postmaster/fork_process.h
@@ -12,6 +12,8 @@
 #ifndef FORK_PROCESS_H
 #define FORK_PROCESS_H
 
-extern pid_t fork_process(void);
+extern pid_t do_fork_process(bool remain_postmaster);
+#define fork_process() do_fork_process(false)
+#define fork_process_remain_postmaster() do_fork_process(true)
 
 #endif   /* FORK_PROCESS_H */
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 25cc84a..497cf51 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -33,6 +33,25 @@ extern bool restart_after_crash;
 
 #ifdef WIN32
 extern HANDLE PostmasterHandle;
+#else
+/*
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+extern int postmaster_alive_fds[2];
+/*
+ * On unix, it is necessary to Init monitoring
+ * of postmaster being alive
+ */
+extern void InitPostmasterDeathWatchHandle(void);
+/*
+ * It is also necessary to call ReleasePostmasterDeathWatchHandle()
+ * after forking from PM for the Unix implementation
+ */
+extern void ReleasePostmasterDeathWatchHandle(void);
 #endif
 
 extern const char *progname;
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 03ec071..6865ac7 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -38,9 +38,8 @@ extern void InitLatch(volatile Latch *latch);
 extern void InitSharedLatch(volatile Latch *latch);
 extern void OwnLatch(volatile Latch *latch);
 extern void DisownLatch(volatile Latch *latch);
-extern bool WaitLatch(volatile Latch *latch, long timeout);
-extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
-				  bool forRead, bool forWrite, long timeout);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
 extern void SetLatch(volatile Latch *latch);
 extern void ResetLatch(volatile Latch *latch);
 
@@ -56,4 +55,11 @@ extern void latch_sigusr1_handler(void);
 #define latch_sigusr1_handler()
 #endif
 
+/* Bitmasks for events that may wake-up WaitLatch() clients */
+#define WL_LATCH_SET         (1 << 0)
+#define WL_SOCKET_READABLE   (1 << 1)
+#define WL_SOCKET_WRITEABLE  (1 << 2)
+#define WL_TIMEOUT           (1 << 3)
+#define WL_POSTMASTER_DEATH  (1 << 4)
+
 #endif   /* LATCH_H */
#15Peter Geoghegan
peter@2ndquadrant.com
In reply to: Peter Geoghegan (#14)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

I took another look at this this evening, and realised that my
comments could be a little clearer.

Attached revision cleans them up a bit.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

new_latch.v4.patchtext/x-patch; charset=US-ASCII; name=new_latch.v4.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa0b029..691ac42 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10161,7 +10161,7 @@ retry:
 					/*
 					 * Wait for more WAL to arrive, or timeout to be reached
 					 */
-					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
+					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 				}
 				else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..f65d389 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,7 +93,9 @@
 #endif
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
 /* Are we currently in WaitLatch? The signal handler would like to know. */
@@ -188,22 +190,25 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
+ *
+ * Note that there is no guarantee that callers will have all wake-up conditions
+ * returned, but we will report at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns same bit mask and makes same guarantees as WaitLatch.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +216,13 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 1000000L;
 		tv.tv_usec = timeout % 1000000L;
@@ -224,7 +230,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +241,30 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +272,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +288,35 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 					(errcode_for_socket_access(),
 					 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask))))
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;				/* data available in socket */
+			if ((wakeEvents & WL_SOCKET_READABLE ) && FD_ISSET(sock, &input_mask))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; /* data available in socket */
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
+		}
+		if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask) &&
+			 !PostmasterIsAlive(true))
+		{
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
 		}
 	}
+	while(!found);
+
 	waiting = false;
 
 	return result;
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 4bcf7b7..ea03aa2 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -23,8 +23,10 @@
 #include <unistd.h>
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
 
@@ -81,43 +83,47 @@ DisownLatch(volatile Latch *latch)
 	latch->owner_pid = 0;
 }
 
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 int
-WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, long timeout)
 {
 	DWORD		rc;
-	HANDLE		events[3];
+	HANDLE		events[4];
 	HANDLE		latchevent;
 	HANDLE		sockevent = WSA_INVALID_EVENT;	/* silence compiler */
 	int			numevents;
 	int			result = 0;
+	bool		found = false;
 
 	latchevent = latch->event;
 
 	events[0] = latchevent;
 	events[1] = pgwin32_signal_event;
 	numevents = 2;
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		int			flags = 0;
 
-		if (forRead)
+		if (wakeEvents & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-		if (forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
 		sockevent = WSACreateEvent();
 		WSAEventSelect(sock, sockevent, flags);
 		events[numevents++] = sockevent;
 	}
+	if (wakeEvents & WL_POSTMASTER_DEATH)
+	{
+		events[numevents++] = PostmasterHandle;
+	}
 
-	for (;;)
+	do
 	{
 		/*
 		 * Reset the event, and check if the latch is set already. If someone
@@ -127,24 +133,39 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 		 */
 		if (!ResetEvent(latchevent))
 			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		rc = WaitForMultipleObjects(numevents, events, FALSE,
 							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
-		if (rc == WAIT_FAILED)
+		if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
+			 !PostmasterIsAlive(true))
+		{
+			/* Postmaster died */
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
+		}
+		else if (rc == WAIT_FAILED)
 			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
 		else if (rc == WAIT_TIMEOUT)
 		{
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
 		else if (rc == WAIT_OBJECT_0 + 1)
 			pgwin32_dispatch_queued_signals();
-		else if (rc == WAIT_OBJECT_0 + 2)
+		else if (rc == WAIT_OBJECT_0 + 2 &&
+				 ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 		{
 			WSANETWORKEVENTS resEvents;
 
@@ -155,17 +176,24 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 				ereport(FATAL,
 						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
 
-			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
-				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
-				result = 2;
-			break;
+			if ((wakeEvents & WL_SOCKET_READABLE) && (resEvents.lNetworkEvents & FD_READ))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true;
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && (resEvents.lNetworkEvents & FD_WRITE))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
 		else if (rc != WAIT_OBJECT_0)
 			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
 	}
+	while(!found);
 
 	/* Clean up the handle we created for the socket */
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		WSAEventSelect(sock, sockevent, 0);
 		WSACloseEvent(sockevent);
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..40896d3 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -10,6 +10,7 @@
  *	  src/backend/postmaster/fork_process.c
  */
 #include "postgres.h"
+#include "postmaster/postmaster.h"
 #include "postmaster/fork_process.h"
 
 #include <fcntl.h>
@@ -19,13 +20,14 @@
 #include <unistd.h>
 
 #ifndef WIN32
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
  * child in the parent process.
  */
 pid_t
-fork_process(void)
+do_fork_process(bool remain_postmaster)
 {
 	pid_t		result;
 
@@ -61,6 +63,17 @@ fork_process(void)
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
+		/*
+		 * Usually, we're forking to create a new, distinct process. That process
+		 * should release the postmaster death watch handle, which is required by
+		 * the implementation, as described in unix_latch.c.
+		 *
+		 * Less frequently, we want to fork for some other reason (such as for
+		 * silent_mode), and the child process is intended to become the new
+		 * postmaster. It should therefore retain the death watch handle.
+		 */
+		if (!remain_postmaster)
+			ReleasePostmasterDeathWatchHandle();
 
 		/*
 		 * By default, Linux tends to kill the postmaster in out-of-memory
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..a56fe92 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -371,25 +395,27 @@ pgarch_MainLoop(void)
 		}
 
 		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		 * Wait on latch, until various signals are received, or
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time, or on the postmaster's
+		 * untimely demise.
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals resetting the timeout of
+		 * WaitLatch()/select() on some platforms can be safely disregarded,
+		 * because we handle all expected signals, and all handlers
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL -
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
 
 		/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6572292..feb5e8a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -443,6 +443,7 @@ typedef struct
 	HANDLE		syslogPipe[2];
 #else
 	int			syslogPipe[2];
+	int			postmaster_alive_fds[2];
 #endif
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
@@ -472,6 +473,13 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
 
+/*
+ * 2 file descriptors that monitoring if postmaster is alive.
+ * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+ */
+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif
 
 /*
  * Postmaster main entry point
@@ -491,6 +499,15 @@ PostmasterMain(int argc, char *argv[])
 
 	IsPostmasterEnvironment = true;
 
+#ifndef WIN32
+	/*
+	 * Initialise mechanism that allows waiting latch clients
+	 * to wake on postmaster death, to finish their
+	 * remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
+#endif
+
 	/*
 	 * for security, no dir or file created can be group or other accessible
 	 */
@@ -1312,7 +1329,7 @@ pmdaemonize(void)
 	/*
 	 * Okay to fork.
 	 */
-	pid = fork_process();
+	pid = fork_process_remain_postmaster();
 	if (pid == (pid_t) -1)
 	{
 		write_stderr("%s: could not fork background process: %s\n",
@@ -4758,6 +4775,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
 	strlcpy(param->my_exec_path, my_exec_path, MAXPGPATH);
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
@@ -4973,6 +4993,10 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
+
 	strlcpy(my_exec_path, param->my_exec_path, MAXPGPATH);
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
@@ -5088,5 +5112,86 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
 	/* Queue SIGCHLD signal */
 	pg_queue_signal(SIGCHLD);
 }
+#else
+/*
+ * Called once from the postmaster, so that child processes can subsequently
+ * monitor if their parent is dead. We open up an anoymous pipe, and have child
+ * processes block on a select() call that examines if the read file descriptor
+ * is ready for reading. They do so through a latch.
+ *
+ * Child processes are responsible for releasing the death watch handler, so
+ * that only the postmaster holds it, and a select() on the fd returns upon the
+ * one and only holder (the postmaster) dying.
+ *
+ * This is a trick that obviates the need for auxiliary backends to have tight
+ * polling loops where they check if the postmaster is alive. We do this because
+ * that pattern results in an excessive number of wakeups per second when idle.
+ */
+void
+InitPostmasterDeathWatchHandle(void)
+{
+	int flags;
+	/*
+	 * Create pipe. The postmaster is deemed dead if
+	 * no process has the writing end (POSTMASTER_FD_OWN) open.
+	 */
+	Assert(MyProcPid == PostmasterPid);
+	if (pipe(postmaster_alive_fds))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg( "pipe() call failed to create pipe to monitor postmaster death: %s", strerror(errno))));
+	}
+	flags = fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_GETFL);
+	if (flags < 0)
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+	/*
+	 * Set FNONBLOCK to allow checking for the fd's presence with a select() call
+	 */
+	flags |= FNONBLOCK;
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, FNONBLOCK))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+}
 
-#endif   /* WIN32 */
+/*
+ * Release postmaster death watch handle.
+ *
+ * Important: This must be called immediately after a process
+ * forks from the postmaster. Otherwise, latch clients will
+ * not wake up on postmaster death, even if they have requested
+ * to.
+ *
+ * Even some hypothetical backend that doesn't care about postmaster
+ * death has a responsibility to call this function - otherwise,
+ * some other latch client backend could wait in vain to be informed
+ * of postmaster death, because the irresponsible backend held open
+ * the ownership file descriptor and outlived the postmaster.
+ *
+ * We call the function within the fork machinery to handle all cases,
+ * so new backends need not bother with this themselves
+ */
+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[POSTMASTER_FD_OWN]))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to close file descriptor associated with Postmaster death in child process")));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+#endif
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 08a4086..646f90b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -171,7 +171,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		 * postmaster death regularly while waiting. Note that timeout here
 		 * does not necessarily release from loop.
 		 */
-		WaitLatch(&MyProc->waitLatch, 60000000L);
+		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
 
 		/* Must reset the latch before testing state. */
 		ResetLatch(&MyProc->waitLatch);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..27cc350 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -805,8 +805,9 @@ WalSndLoop(void)
 			}
 
 			/* Sleep */
-			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
-							  true, pq_is_send_pending(),
+			WaitLatchOrSocket(&MyWalSnd->latch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE | (pq_is_send_pending()? WL_SOCKET_WRITEABLE:0) |  WL_TIMEOUT,
+							  MyProcPort->sock,
 							  sleeptime * 1000L);
 
 			/* Check for replication timeout */
diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
index 0553fd2..e0abe5d 100644
--- a/src/include/postmaster/fork_process.h
+++ b/src/include/postmaster/fork_process.h
@@ -12,6 +12,8 @@
 #ifndef FORK_PROCESS_H
 #define FORK_PROCESS_H
 
-extern pid_t fork_process(void);
+extern pid_t do_fork_process(bool remain_postmaster);
+#define fork_process() do_fork_process(false)
+#define fork_process_remain_postmaster() do_fork_process(true)
 
 #endif   /* FORK_PROCESS_H */
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 25cc84a..497cf51 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -33,6 +33,25 @@ extern bool restart_after_crash;
 
 #ifdef WIN32
 extern HANDLE PostmasterHandle;
+#else
+/*
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+extern int postmaster_alive_fds[2];
+/*
+ * On unix, it is necessary to Init monitoring
+ * of postmaster being alive
+ */
+extern void InitPostmasterDeathWatchHandle(void);
+/*
+ * It is also necessary to call ReleasePostmasterDeathWatchHandle()
+ * after forking from PM for the Unix implementation
+ */
+extern void ReleasePostmasterDeathWatchHandle(void);
 #endif
 
 extern const char *progname;
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 03ec071..6865ac7 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -38,9 +38,8 @@ extern void InitLatch(volatile Latch *latch);
 extern void InitSharedLatch(volatile Latch *latch);
 extern void OwnLatch(volatile Latch *latch);
 extern void DisownLatch(volatile Latch *latch);
-extern bool WaitLatch(volatile Latch *latch, long timeout);
-extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
-				  bool forRead, bool forWrite, long timeout);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
 extern void SetLatch(volatile Latch *latch);
 extern void ResetLatch(volatile Latch *latch);
 
@@ -56,4 +55,11 @@ extern void latch_sigusr1_handler(void);
 #define latch_sigusr1_handler()
 #endif
 
+/* Bitmasks for events that may wake-up WaitLatch() clients */
+#define WL_LATCH_SET         (1 << 0)
+#define WL_SOCKET_READABLE   (1 << 1)
+#define WL_SOCKET_WRITEABLE  (1 << 2)
+#define WL_TIMEOUT           (1 << 3)
+#define WL_POSTMASTER_DEATH  (1 << 4)
+
 #endif   /* LATCH_H */
#16Fujii Masao
masao.fujii@gmail.com
In reply to: Peter Geoghegan (#15)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Mon, Jun 20, 2011 at 1:00 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

I took another look at this this evening, and realised that my
comments could be a little clearer.

Attached revision cleans them up a bit.

Since I'm not familiar with Windows, I haven't read the code related
to Windows. But
the followings are my comments on your patch.

+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
 		hifd = selfpipe_readfd;

'hifd' should be initialized to 'selfpipe_readfd' before the above
'if' block. Otherwise,
'hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH]' might have no effect.

+			time_t		 curtime = time(NULL);
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL -
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT |
WL_POSTMASTER_DEATH, timeout_secs * 1000000L);

Why does the archive still need to wake up periodically?

+	flags |= FNONBLOCK;
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, FNONBLOCK))

Is the variable 'flag' really required? It's not used by fcntl() to
set the fd nonblocking.

Is FNONBLOCK equal to O_NONBLOCK? If yes, we should use O_NONBLOCK
for the sake of consistency? In other code (e.g., noblock.c), O_NONBLOCK is used
rather than FNONBLOCK.

+			WaitLatchOrSocket(&MyWalSnd->latch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE | (pq_is_send_pending()?
WL_SOCKET_WRITEABLE:0) |  WL_TIMEOUT,
+							  MyProcPort->sock,

I think that it's worth that walsender checks the postmaster death event. No?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Peter Geoghegan
peter@2ndquadrant.com
In reply to: Fujii Masao (#16)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

Thanks for giving this your attention Fujii. Attached patch addresses
your concerns.

On 20 June 2011 05:53, Fujii Masao <masao.fujii@gmail.com> wrote:

'hifd' should be initialized to 'selfpipe_readfd' before the above
'if' block. Otherwise,
'hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH]' might have no effect.

That's an oversight that I should have caught. Fixed.

Why does the archive still need to wake up periodically?

That is consistent with its earlier behaviour..."she wakes up
occasionally to allow herself to be proactive". This comment does not
refer to the frequent updates that currently occur within the tight
polling loop. I think any concern about that would apply equally to
the original, unpatched code.

Is the variable 'flag' really required? It's not used by fcntl() to
set the fd nonblocking.

Yes, it's superfluous. Removed.

Is FNONBLOCK equal to O_NONBLOCK? If yes, we should use O_NONBLOCK
for the sake of consistency? In other code (e.g., noblock.c), O_NONBLOCK is used
rather than FNONBLOCK.

FNONBLOCK is just an alias for O_NONBLOCK, so it seems reasonable to
be consistent in which variant we use. I have found suggestions that
it might break the build on OSX, so if that's true there's an
excellent reason to prefer the latter.

I think that it's worth that walsender checks the postmaster death event. No?

It does check it, but only in the same way that it always has (a tight
polling loop). I would like to make walsender use the new
functionality. That is another patch though, that I thought best to
have independently reviewed, only when this patch is committed. I've
only made the walsender use the new interface, changing as little as
possible and not affecting walsender's behaviour, as a stopgap towards
that patch.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

new_latch.v5.patchtext/x-patch; charset=US-ASCII; name=new_latch.v5.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa0b029..691ac42 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10161,7 +10161,7 @@ retry:
 					/*
 					 * Wait for more WAL to arrive, or timeout to be reached
 					 */
-					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
+					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 				}
 				else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..f97d838 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,7 +93,9 @@
 #endif
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
 /* Are we currently in WaitLatch? The signal handler would like to know. */
@@ -188,22 +190,25 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
+ *
+ * Note that there is no guarantee that callers will have all wake-up conditions
+ * returned, but we will report at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns same bit mask and makes same guarantees as WaitLatch.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +216,13 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 1000000L;
 		tv.tv_usec = timeout % 1000000L;
@@ -224,7 +230,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +241,31 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
+
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +273,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +289,34 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 					(errcode_for_socket_access(),
 					 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask))))
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;				/* data available in socket */
+			if ((wakeEvents & WL_SOCKET_READABLE ) && FD_ISSET(sock, &input_mask))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; /* data available in socket */
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
+		}
+		if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask) &&
+			 !PostmasterIsAlive(true))
+		{
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
 		}
 	}
+	while(!found);
 	waiting = false;
 
 	return result;
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 4bcf7b7..ea03aa2 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -23,8 +23,10 @@
 #include <unistd.h>
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
 
@@ -81,43 +83,47 @@ DisownLatch(volatile Latch *latch)
 	latch->owner_pid = 0;
 }
 
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 int
-WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, long timeout)
 {
 	DWORD		rc;
-	HANDLE		events[3];
+	HANDLE		events[4];
 	HANDLE		latchevent;
 	HANDLE		sockevent = WSA_INVALID_EVENT;	/* silence compiler */
 	int			numevents;
 	int			result = 0;
+	bool		found = false;
 
 	latchevent = latch->event;
 
 	events[0] = latchevent;
 	events[1] = pgwin32_signal_event;
 	numevents = 2;
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		int			flags = 0;
 
-		if (forRead)
+		if (wakeEvents & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-		if (forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
 		sockevent = WSACreateEvent();
 		WSAEventSelect(sock, sockevent, flags);
 		events[numevents++] = sockevent;
 	}
+	if (wakeEvents & WL_POSTMASTER_DEATH)
+	{
+		events[numevents++] = PostmasterHandle;
+	}
 
-	for (;;)
+	do
 	{
 		/*
 		 * Reset the event, and check if the latch is set already. If someone
@@ -127,24 +133,39 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 		 */
 		if (!ResetEvent(latchevent))
 			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		rc = WaitForMultipleObjects(numevents, events, FALSE,
 							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
-		if (rc == WAIT_FAILED)
+		if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
+			 !PostmasterIsAlive(true))
+		{
+			/* Postmaster died */
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
+		}
+		else if (rc == WAIT_FAILED)
 			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
 		else if (rc == WAIT_TIMEOUT)
 		{
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
 		else if (rc == WAIT_OBJECT_0 + 1)
 			pgwin32_dispatch_queued_signals();
-		else if (rc == WAIT_OBJECT_0 + 2)
+		else if (rc == WAIT_OBJECT_0 + 2 &&
+				 ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 		{
 			WSANETWORKEVENTS resEvents;
 
@@ -155,17 +176,24 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 				ereport(FATAL,
 						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
 
-			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
-				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
-				result = 2;
-			break;
+			if ((wakeEvents & WL_SOCKET_READABLE) && (resEvents.lNetworkEvents & FD_READ))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true;
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && (resEvents.lNetworkEvents & FD_WRITE))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
 		else if (rc != WAIT_OBJECT_0)
 			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
 	}
+	while(!found);
 
 	/* Clean up the handle we created for the socket */
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		WSAEventSelect(sock, sockevent, 0);
 		WSACloseEvent(sockevent);
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..db9401a 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -11,6 +11,8 @@
  */
 #include "postgres.h"
 #include "postmaster/fork_process.h"
+#include "postmaster/postmaster.h"
+
 
 #include <fcntl.h>
 #include <time.h>
@@ -19,13 +21,14 @@
 #include <unistd.h>
 
 #ifndef WIN32
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
  * child in the parent process.
  */
 pid_t
-fork_process(void)
+do_fork_process(bool remain_postmaster)
 {
 	pid_t		result;
 
@@ -61,6 +64,17 @@ fork_process(void)
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
+		/*
+		 * Usually, we're forking to create a new, distinct process. That process
+		 * should release the postmaster death watch handle, which is required by
+		 * the implementation, as described in unix_latch.c.
+		 *
+		 * Less frequently, we want to fork for some other reason (such as for
+		 * silent_mode), and the child process is intended to become the new
+		 * postmaster. It should therefore retain the death watch handle.
+		 */
+		if (!remain_postmaster)
+			ReleasePostmasterDeathWatchHandle();
 
 		/*
 		 * By default, Linux tends to kill the postmaster in out-of-memory
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..a56fe92 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -371,25 +395,27 @@ pgarch_MainLoop(void)
 		}
 
 		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		 * Wait on latch, until various signals are received, or
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time, or on the postmaster's
+		 * untimely demise.
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals resetting the timeout of
+		 * WaitLatch()/select() on some platforms can be safely disregarded,
+		 * because we handle all expected signals, and all handlers
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL -
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
 
 		/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6572292..b621636 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -443,6 +443,7 @@ typedef struct
 	HANDLE		syslogPipe[2];
 #else
 	int			syslogPipe[2];
+	int			postmaster_alive_fds[2];
 #endif
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
@@ -472,6 +473,13 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
 
+/*
+ * 2 file descriptors that monitoring if postmaster is alive.
+ * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+ */
+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif
 
 /*
  * Postmaster main entry point
@@ -491,6 +499,15 @@ PostmasterMain(int argc, char *argv[])
 
 	IsPostmasterEnvironment = true;
 
+#ifndef WIN32
+	/*
+	 * Initialise mechanism that allows waiting latch clients
+	 * to wake on postmaster death, to finish their
+	 * remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
+#endif
+
 	/*
 	 * for security, no dir or file created can be group or other accessible
 	 */
@@ -1312,7 +1329,7 @@ pmdaemonize(void)
 	/*
 	 * Okay to fork.
 	 */
-	pid = fork_process();
+	pid = fork_process_remain_postmaster();
 	if (pid == (pid_t) -1)
 	{
 		write_stderr("%s: could not fork background process: %s\n",
@@ -4758,6 +4775,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
 	strlcpy(param->my_exec_path, my_exec_path, MAXPGPATH);
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
@@ -4973,6 +4993,10 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
+
 	strlcpy(my_exec_path, param->my_exec_path, MAXPGPATH);
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
@@ -5088,5 +5112,83 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
 	/* Queue SIGCHLD signal */
 	pg_queue_signal(SIGCHLD);
 }
+#else
+/*
+ * Called once from the postmaster, so that child processes can subsequently
+ * monitor if their parent is dead. We open up an anoymous pipe, and have child
+ * processes block on a select() call that examines if the read file descriptor
+ * is ready for reading. They do so through a latch.
+ *
+ * Child processes are responsible for releasing the death watch handler, so
+ * that only the postmaster holds it, and a select() on the fd returns upon the
+ * one and only holder (the postmaster) dying.
+ *
+ * This is a trick that obviates the need for auxiliary backends to have tight
+ * polling loops where they check if the postmaster is alive. We do this because
+ * that pattern results in an excessive number of wakeups per second when idle.
+ */
+void
+InitPostmasterDeathWatchHandle(void)
+{
+	/*
+	 * Create pipe. The postmaster is deemed dead if
+	 * no process has the writing end (POSTMASTER_FD_OWN) open.
+	 */
+	Assert(MyProcPid == PostmasterPid);
+	if (pipe(postmaster_alive_fds))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg( "pipe() call failed to create pipe to monitor postmaster death: %s", strerror(errno))));
+	}
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_GETFL) < 0)
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+	/*
+	 * Set O_NONBLOCK to allow checking for the fd's presence with a select() call
+	 */
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, O_NONBLOCK))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %s", strerror(errno))));
+	}
+}
 
-#endif   /* WIN32 */
+/*
+ * Release postmaster death watch handle.
+ *
+ * Important: This must be called immediately after a process
+ * forks from the postmaster. Otherwise, latch clients will
+ * not wake up on postmaster death, even if they have requested
+ * to.
+ *
+ * Even some hypothetical backend that doesn't care about postmaster
+ * death has a responsibility to call this function - otherwise,
+ * some other latch client backend could wait in vain to be informed
+ * of postmaster death, because the irresponsible backend held open
+ * the ownership file descriptor and outlived the postmaster.
+ *
+ * We call the function within the fork machinery to handle all cases,
+ * so new backends need not bother with this themselves
+ */
+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[POSTMASTER_FD_OWN]))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to close file descriptor associated with postmaster death in child process")));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+#endif
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 08a4086..646f90b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -171,7 +171,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		 * postmaster death regularly while waiting. Note that timeout here
 		 * does not necessarily release from loop.
 		 */
-		WaitLatch(&MyProc->waitLatch, 60000000L);
+		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
 
 		/* Must reset the latch before testing state. */
 		ResetLatch(&MyProc->waitLatch);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..27cc350 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -805,8 +805,9 @@ WalSndLoop(void)
 			}
 
 			/* Sleep */
-			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
-							  true, pq_is_send_pending(),
+			WaitLatchOrSocket(&MyWalSnd->latch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE | (pq_is_send_pending()? WL_SOCKET_WRITEABLE:0) |  WL_TIMEOUT,
+							  MyProcPort->sock,
 							  sleeptime * 1000L);
 
 			/* Check for replication timeout */
diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
index 0553fd2..e0abe5d 100644
--- a/src/include/postmaster/fork_process.h
+++ b/src/include/postmaster/fork_process.h
@@ -12,6 +12,8 @@
 #ifndef FORK_PROCESS_H
 #define FORK_PROCESS_H
 
-extern pid_t fork_process(void);
+extern pid_t do_fork_process(bool remain_postmaster);
+#define fork_process() do_fork_process(false)
+#define fork_process_remain_postmaster() do_fork_process(true)
 
 #endif   /* FORK_PROCESS_H */
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 25cc84a..497cf51 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -33,6 +33,25 @@ extern bool restart_after_crash;
 
 #ifdef WIN32
 extern HANDLE PostmasterHandle;
+#else
+/*
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+extern int postmaster_alive_fds[2];
+/*
+ * On unix, it is necessary to Init monitoring
+ * of postmaster being alive
+ */
+extern void InitPostmasterDeathWatchHandle(void);
+/*
+ * It is also necessary to call ReleasePostmasterDeathWatchHandle()
+ * after forking from PM for the Unix implementation
+ */
+extern void ReleasePostmasterDeathWatchHandle(void);
 #endif
 
 extern const char *progname;
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 03ec071..6865ac7 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -38,9 +38,8 @@ extern void InitLatch(volatile Latch *latch);
 extern void InitSharedLatch(volatile Latch *latch);
 extern void OwnLatch(volatile Latch *latch);
 extern void DisownLatch(volatile Latch *latch);
-extern bool WaitLatch(volatile Latch *latch, long timeout);
-extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
-				  bool forRead, bool forWrite, long timeout);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
 extern void SetLatch(volatile Latch *latch);
 extern void ResetLatch(volatile Latch *latch);
 
@@ -56,4 +55,11 @@ extern void latch_sigusr1_handler(void);
 #define latch_sigusr1_handler()
 #endif
 
+/* Bitmasks for events that may wake-up WaitLatch() clients */
+#define WL_LATCH_SET         (1 << 0)
+#define WL_SOCKET_READABLE   (1 << 1)
+#define WL_SOCKET_WRITEABLE  (1 << 2)
+#define WL_TIMEOUT           (1 << 3)
+#define WL_POSTMASTER_DEATH  (1 << 4)
+
 #endif   /* LATCH_H */
#18Fujii Masao
masao.fujii@gmail.com
In reply to: Peter Geoghegan (#17)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Tue, Jun 21, 2011 at 6:22 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

Thanks for giving this your attention Fujii. Attached patch addresses
your concerns.

Thanks for updating the patch! I have a few comments;

+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket
sock, long timeout)

If 'wakeEvent' is zero, we cannot get out of WaitLatch(). Something like
Assert(waitEvents != 0) is required? Or, WaitLatch() should always wait
on latch even when 'waitEvents' is zero?

In unix_latch.c, select() in WaitLatchOrSocket() checks the timeout only when
WL_TIMEOUT is set in 'wakeEvents'. OTOH, in win32_latch.c,
WaitForMultipleObjects()
in WaitLatchOrSocket() always checks the timeout even if WL_TIMEOUT is not
given. Is this intentional?

+		else if (rc == WAIT_OBJECT_0 + 2 &&
+				 ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))

Another corner case: when WL_POSTMASTER_DEATH and WL_SOCKET_READABLE
are given and 'sock' is set to PGINVALID_SOCKET, we can wrongly pass through the
above check. If this OK?

		rc = WaitForMultipleObjects(numevents, events, FALSE,
 							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
-		if (rc == WAIT_FAILED)
+		if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
+			 !PostmasterIsAlive(true))

After WaitForMultipleObjects() detects the death of postmaster,
WaitForSingleObject()
is called in PostmasterIsAlive(). In this case, what code does
WaitForSingleObject() return?
I wonder if WaitForSingleObject() returns the code other than
WAIT_TIMEOUT and really
can detect the death of postmaster.

+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_GETFL) < 0)
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags:
%s", strerror(errno))));
+	}

Is the above check really required? It's harmless, but looks unnecessary.

+			 errmsg( "pipe() call failed to create pipe to monitor postmaster
death: %s", strerror(errno))));
+			 errmsg("failed to set the postmaster death watching fd's flags:
%s", strerror(errno))));
+			 errmsg("failed to set the postmaster death watching fd's flags:
%s", strerror(errno))));

'%m' should be used instead of '%s' and 'strerror(errno)'.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#19Peter Geoghegan
peter@2ndquadrant.com
In reply to: Fujii Masao (#18)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

Attached patch addresses Fujii's more recent concerns.

On 22 June 2011 04:54, Fujii Masao <masao.fujii@gmail.com> wrote:

+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket
sock, long timeout)

If 'wakeEvent' is zero, we cannot get out of WaitLatch(). Something like
Assert(waitEvents != 0) is required? Or, WaitLatch() should always wait
on latch even when 'waitEvents' is zero?

Well, not waking when the client has not specified an event to wake on
is the correct thing to do in that case. It would also be inherently
undesirable, so I'd be happy to guard against it using an assertion.
Both implementations now use one.

In unix_latch.c, select() in WaitLatchOrSocket() checks the timeout only when
WL_TIMEOUT is set in 'wakeEvents'. OTOH, in win32_latch.c,
WaitForMultipleObjects()
in WaitLatchOrSocket() always checks the timeout even if WL_TIMEOUT is not
given. Is this intentional?

No, it's a mistake. Fixed.

+               else if (rc == WAIT_OBJECT_0 + 2 &&
+                                ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))

Another corner case: when WL_POSTMASTER_DEATH and WL_SOCKET_READABLE
are given and 'sock' is set to PGINVALID_SOCKET, we can wrongly pass through the
above check. If this OK?

I see your point - Assert(sock != PGINVALID_SOCKET) could be violated.
We fix the issue now by setting and checking a bool that simply
indicates that we're interested in sockets.

               rc = WaitForMultipleObjects(numevents, events, FALSE,
                                                          (timeout >= 0) ? (timeout / 1000) : INFINITE);
-               if (rc == WAIT_FAILED)
+               if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
+                        !PostmasterIsAlive(true))

After WaitForMultipleObjects() detects the death of postmaster,
WaitForSingleObject()
is called in PostmasterIsAlive(). In this case, what code does
WaitForSingleObject() return?
I wonder if WaitForSingleObject() returns the code other than
WAIT_TIMEOUT and really
can detect the death of postmaster.

As noted up-thread, the fact that the archiver does wake and finish on
Postmaster death can be clearly observed on Windows. I'm not sure why
you wonder that, as this is fairly standard use of
PostmasterIsAlive(). I've verified that the waitLatch() call
correctly reports Postmaster death in its return code on Windows, and
indeed that it actually wakes up.

Are you suggesting that there should be a defensive else if{ } for the
case where PostmasterIsAlive() incorrectly reports that the PM is
alive due to some implementation related race-condition, and we've
already considered every other possibility? Well, I suppose that's not
necessary, because we will loop until we find a reason - it's okay to
miss it the first time around, because whatever caused
WaitForMultipleObjects() to wake up will cause it to immediately
return for the next iteration.

In any case, we don't rely on the PostmasterIsAlive() call at all
anymore, so it doesn't matter. We just look at rc's value now, as we
do for every other case, though it's a bit trickier when checking
Postmaster death. Similarly, we don't have a PostmasterIsAlive() call
within the unix latch implementation anymore.

+       if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_GETFL) < 0)
+       {
+               ereport(FATAL,
+                       (errcode_for_socket_access(),
+                        errmsg("failed to set the postmaster death watching fd's flags:
%s", strerror(errno))));
+       }

Is the above check really required? It's harmless, but looks unnecessary.

Yes, it's not possible for it to detect an error condition now. Removed.

'%m' should be used instead of '%s' and 'strerror(errno)'.

It is of course better to use the simpler, built-in facility. Fixed.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

new_latch.v6.patchtext/x-patch; charset=US-ASCII; name=new_latch.v6.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4952d22..bfe6bcd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10158,7 +10158,7 @@ retry:
 					/*
 					 * Wait for more WAL to arrive, or timeout to be reached
 					 */
-					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
+					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 				}
 				else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..6d2e3a1 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,6 +93,7 @@
 #endif
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
 
@@ -188,22 +189,25 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
+ *
+ * Note that there is no guarantee that callers will have all wake-up conditions
+ * returned, but we will report at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns same bit mask and makes same guarantees as WaitLatch.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +215,15 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
+
+	Assert(wakeEvents != 0);
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 1000000L;
 		tv.tv_usec = timeout % 1000000L;
@@ -224,7 +231,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +242,31 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
+
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +274,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +290,33 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 					(errcode_for_socket_access(),
 					 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask))))
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;				/* data available in socket */
+			if ((wakeEvents & WL_SOCKET_READABLE ) && FD_ISSET(sock, &input_mask))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; /* data available in socket */
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
+		}
+		if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask))
+		{
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
 		}
 	}
+	while(!found);
 	waiting = false;
 
 	return result;
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 4bcf7b7..21e406f 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -23,6 +23,7 @@
 #include <unistd.h>
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
@@ -81,43 +82,51 @@ DisownLatch(volatile Latch *latch)
 	latch->owner_pid = 0;
 }
 
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 int
-WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, long timeout)
 {
 	DWORD		rc;
-	HANDLE		events[3];
+	HANDLE		events[4];
 	HANDLE		latchevent;
 	HANDLE		sockevent = WSA_INVALID_EVENT;	/* silence compiler */
 	int			numevents;
 	int			result = 0;
+	bool		found = false;
+	bool		checking_socket = false;
+
+	Assert(wakeEvents != 0);
 
 	latchevent = latch->event;
 
 	events[0] = latchevent;
 	events[1] = pgwin32_signal_event;
 	numevents = 2;
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		int			flags = 0;
+		checking_socket = true;
 
-		if (forRead)
+		if (wakeEvents & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-		if (forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
 		sockevent = WSACreateEvent();
 		WSAEventSelect(sock, sockevent, flags);
 		events[numevents++] = sockevent;
 	}
+	if (wakeEvents & WL_POSTMASTER_DEATH)
+	{
+		events[numevents++] = PostmasterHandle;
+	}
 
-	for (;;)
+	do
 	{
 		/*
 		 * Reset the event, and check if the latch is set already. If someone
@@ -127,24 +136,41 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 		 */
 		if (!ResetEvent(latchevent))
 			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		rc = WaitForMultipleObjects(numevents, events, FALSE,
-							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
-		if (rc == WAIT_FAILED)
+								(timeout >= 0 && (wakeEvents & WL_TIMEOUT)) ? (timeout / 1000) : INFINITE);
+		/* Whether or not we're interested in sockets affects which
+		 * rc value indicates that PostmasterHandle indicated PM death
+		 */
+		if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
+			 rc ==  WAIT_OBJECT_0 + (checking_socket? 3:2))
+		{
+			/* Postmaster died */
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
+		}
+		else if (rc == WAIT_FAILED)
 			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
 		else if (rc == WAIT_TIMEOUT)
 		{
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
 		else if (rc == WAIT_OBJECT_0 + 1)
 			pgwin32_dispatch_queued_signals();
-		else if (rc == WAIT_OBJECT_0 + 2)
+		else if (rc == WAIT_OBJECT_0 + 2 && checking_socket) /* If we are checking_socket, it will be WAIT_OBJECT_0 + 2 */
 		{
 			WSANETWORKEVENTS resEvents;
 
@@ -155,17 +181,24 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 				ereport(FATAL,
 						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
 
-			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
-				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
-				result = 2;
-			break;
+			if ((wakeEvents & WL_SOCKET_READABLE) && (resEvents.lNetworkEvents & FD_READ))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true;
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && (resEvents.lNetworkEvents & FD_WRITE))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
 		else if (rc != WAIT_OBJECT_0)
 			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
 	}
+	while(!found);
 
 	/* Clean up the handle we created for the socket */
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		WSAEventSelect(sock, sockevent, 0);
 		WSACloseEvent(sockevent);
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..db9401a 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -11,6 +11,8 @@
  */
 #include "postgres.h"
 #include "postmaster/fork_process.h"
+#include "postmaster/postmaster.h"
+
 
 #include <fcntl.h>
 #include <time.h>
@@ -19,13 +21,14 @@
 #include <unistd.h>
 
 #ifndef WIN32
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
  * child in the parent process.
  */
 pid_t
-fork_process(void)
+do_fork_process(bool remain_postmaster)
 {
 	pid_t		result;
 
@@ -61,6 +64,17 @@ fork_process(void)
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
+		/*
+		 * Usually, we're forking to create a new, distinct process. That process
+		 * should release the postmaster death watch handle, which is required by
+		 * the implementation, as described in unix_latch.c.
+		 *
+		 * Less frequently, we want to fork for some other reason (such as for
+		 * silent_mode), and the child process is intended to become the new
+		 * postmaster. It should therefore retain the death watch handle.
+		 */
+		if (!remain_postmaster)
+			ReleasePostmasterDeathWatchHandle();
 
 		/*
 		 * By default, Linux tends to kill the postmaster in out-of-memory
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..a56fe92 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -371,25 +395,27 @@ pgarch_MainLoop(void)
 		}
 
 		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		 * Wait on latch, until various signals are received, or
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time, or on the postmaster's
+		 * untimely demise.
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals resetting the timeout of
+		 * WaitLatch()/select() on some platforms can be safely disregarded,
+		 * because we handle all expected signals, and all handlers
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL -
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
 
 		/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6572292..5a3e059 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -443,6 +443,7 @@ typedef struct
 	HANDLE		syslogPipe[2];
 #else
 	int			syslogPipe[2];
+	int			postmaster_alive_fds[2];
 #endif
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
@@ -472,6 +473,13 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
 
+/*
+ * 2 file descriptors that monitoring if postmaster is alive.
+ * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+ */
+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif
 
 /*
  * Postmaster main entry point
@@ -491,6 +499,15 @@ PostmasterMain(int argc, char *argv[])
 
 	IsPostmasterEnvironment = true;
 
+#ifndef WIN32
+	/*
+	 * Initialise mechanism that allows waiting latch clients
+	 * to wake on postmaster death, to finish their
+	 * remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
+#endif
+
 	/*
 	 * for security, no dir or file created can be group or other accessible
 	 */
@@ -1312,7 +1329,7 @@ pmdaemonize(void)
 	/*
 	 * Okay to fork.
 	 */
-	pid = fork_process();
+	pid = fork_process_remain_postmaster();
 	if (pid == (pid_t) -1)
 	{
 		write_stderr("%s: could not fork background process: %s\n",
@@ -4758,6 +4775,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
 	strlcpy(param->my_exec_path, my_exec_path, MAXPGPATH);
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
@@ -4973,6 +4993,10 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
+
 	strlcpy(my_exec_path, param->my_exec_path, MAXPGPATH);
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
@@ -5088,5 +5112,79 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
 	/* Queue SIGCHLD signal */
 	pg_queue_signal(SIGCHLD);
 }
+#else
+/*
+ * Initialise one and only handle for monitoring postmaster death.
+ *
+ * Called once from the postmaster, so that child processes can subsequently
+ * monitor if their parent is dead. We open up an anoymous pipe, and have child
+ * processes block on a select() call that examines if the read file descriptor
+ * is ready for reading. They do so through a latch.
+ *
+ * Child processes are responsible for releasing the death watch handler, so
+ * that only the postmaster holds it, and a select() on the fd returns upon the
+ * one and only holder (the postmaster) dying.
+ *
+ * This is a trick that obviates the need for auxiliary backends to have tight
+ * polling loops where they check if the postmaster is alive. We do this because
+ * that pattern results in an excessive number of wakeups per second when idle.
+ */
+void
+InitPostmasterDeathWatchHandle(void)
+{
+	/*
+	 * Create pipe. The postmaster is deemed dead if
+	 * no process has the writing end (POSTMASTER_FD_OWN) open.
+	 */
+	Assert(MyProcPid == PostmasterPid);
+	if (pipe(postmaster_alive_fds))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg( "pipe() call failed to create pipe to monitor postmaster death: %m")));
+	}
+	/*
+	 * Set O_NONBLOCK to allow checking for the fd's presence with a select() call
+	 */
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, O_NONBLOCK))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %m")));
+	}
+}
 
-#endif   /* WIN32 */
+/*
+ * Release postmaster death watch handle.
+ *
+ * Important: This must be called immediately after a process
+ * forks from the postmaster. Otherwise, latch clients will
+ * not wake up on postmaster death, even if they have requested
+ * to.
+ *
+ * Even some hypothetical backend that doesn't care about postmaster
+ * death has a responsibility to call this function - otherwise,
+ * some other latch client backend could wait in vain to be informed
+ * of postmaster death, because the irresponsible backend held open
+ * the ownership file descriptor and outlived the postmaster.
+ *
+ * We call the function within the fork machinery to handle all cases,
+ * so backends need not bother with this themselves.
+ */
+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[POSTMASTER_FD_OWN]))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to close file descriptor associated with postmaster death in child process")));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+#endif
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 2b52d16..7cf6206 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -171,7 +171,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		 * postmaster death regularly while waiting. Note that timeout here
 		 * does not necessarily release from loop.
 		 */
-		WaitLatch(&MyProc->waitLatch, 60000000L);
+		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
 
 		/* Must reset the latch before testing state. */
 		ResetLatch(&MyProc->waitLatch);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..27cc350 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -805,8 +805,9 @@ WalSndLoop(void)
 			}
 
 			/* Sleep */
-			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
-							  true, pq_is_send_pending(),
+			WaitLatchOrSocket(&MyWalSnd->latch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE | (pq_is_send_pending()? WL_SOCKET_WRITEABLE:0) |  WL_TIMEOUT,
+							  MyProcPort->sock,
 							  sleeptime * 1000L);
 
 			/* Check for replication timeout */
diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
index 0553fd2..e0abe5d 100644
--- a/src/include/postmaster/fork_process.h
+++ b/src/include/postmaster/fork_process.h
@@ -12,6 +12,8 @@
 #ifndef FORK_PROCESS_H
 #define FORK_PROCESS_H
 
-extern pid_t fork_process(void);
+extern pid_t do_fork_process(bool remain_postmaster);
+#define fork_process() do_fork_process(false)
+#define fork_process_remain_postmaster() do_fork_process(true)
 
 #endif   /* FORK_PROCESS_H */
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 25cc84a..497cf51 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -33,6 +33,25 @@ extern bool restart_after_crash;
 
 #ifdef WIN32
 extern HANDLE PostmasterHandle;
+#else
+/*
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+extern int postmaster_alive_fds[2];
+/*
+ * On unix, it is necessary to Init monitoring
+ * of postmaster being alive
+ */
+extern void InitPostmasterDeathWatchHandle(void);
+/*
+ * It is also necessary to call ReleasePostmasterDeathWatchHandle()
+ * after forking from PM for the Unix implementation
+ */
+extern void ReleasePostmasterDeathWatchHandle(void);
 #endif
 
 extern const char *progname;
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 03ec071..6865ac7 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -38,9 +38,8 @@ extern void InitLatch(volatile Latch *latch);
 extern void InitSharedLatch(volatile Latch *latch);
 extern void OwnLatch(volatile Latch *latch);
 extern void DisownLatch(volatile Latch *latch);
-extern bool WaitLatch(volatile Latch *latch, long timeout);
-extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
-				  bool forRead, bool forWrite, long timeout);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
 extern void SetLatch(volatile Latch *latch);
 extern void ResetLatch(volatile Latch *latch);
 
@@ -56,4 +55,11 @@ extern void latch_sigusr1_handler(void);
 #define latch_sigusr1_handler()
 #endif
 
+/* Bitmasks for events that may wake-up WaitLatch() clients */
+#define WL_LATCH_SET         (1 << 0)
+#define WL_SOCKET_READABLE   (1 << 1)
+#define WL_SOCKET_WRITEABLE  (1 << 2)
+#define WL_TIMEOUT           (1 << 3)
+#define WL_POSTMASTER_DEATH  (1 << 4)
+
 #endif   /* LATCH_H */
#20Fujii Masao
masao.fujii@gmail.com
In reply to: Peter Geoghegan (#19)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Wed, Jun 22, 2011 at 9:11 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

               rc = WaitForMultipleObjects(numevents, events, FALSE,
                                                          (timeout >= 0) ? (timeout / 1000) : INFINITE);
-               if (rc == WAIT_FAILED)
+               if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
+                        !PostmasterIsAlive(true))

After WaitForMultipleObjects() detects the death of postmaster,
WaitForSingleObject()
is called in PostmasterIsAlive(). In this case, what code does
WaitForSingleObject() return?
I wonder if WaitForSingleObject() returns the code other than
WAIT_TIMEOUT and really
can detect the death of postmaster.

As noted up-thread, the fact that the archiver does wake and finish on
Postmaster death can be clearly observed on Windows. I'm not sure why
you wonder that, as this is fairly standard use of
PostmasterIsAlive().

Because, if PostmasterHandle is an auto-reset event object, its event state
would be automatically reset just after WaitForMultipleObjects() detects
the postmaster death event, I was afraid. But your observation proved that
my concern was not right.

I have another comments:

+#ifndef WIN32
+	/*
+	 * Initialise mechanism that allows waiting latch clients
+	 * to wake on postmaster death, to finish their
+	 * remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
+#endif

Calling this function before creating TopMemoryContext looks unsafe. What if
the function calls ereport(FATAL)?

That ereport() can be called before postgresql.conf is read, i.e., before GUCs
for error reporting are set. Is this OK? If not,
InitPostmasterDeathWatchHandle()
should be moved after SelectConfigFiles().

+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif

postmaster_alive_fds[] should be initialized to "{-1, -1}"?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#21Peter Geoghegan
peter@2ndquadrant.com
In reply to: Fujii Masao (#20)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 24 June 2011 12:30, Fujii Masao <masao.fujii@gmail.com> wrote:

+#ifndef WIN32
+       /*
+        * Initialise mechanism that allows waiting latch clients
+        * to wake on postmaster death, to finish their
+        * remaining business
+        */
+       InitPostmasterDeathWatchHandle();
+#endif

Calling this function before creating TopMemoryContext looks unsafe. What if
the function calls ereport(FATAL)?

That ereport() can be called before postgresql.conf is read, i.e., before GUCs
for error reporting are set. Is this OK? If not,
InitPostmasterDeathWatchHandle()
should be moved after SelectConfigFiles().

I see no reason to take the risk that it might at some point - I've moved it.

+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif

postmaster_alive_fds[] should be initialized to "{-1, -1}"?

Yes, they should. That works better.

I think that Heikki is currently taking another look at my work,
because he indicates in a new message to the list a short time ago
that while reviewing my patch, he realised that there may be an
independent problem with silent_mode. I will wait for his remarks
before producing another version of the patch that incorporates those
two small changes.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#22Peter Geoghegan
peter@2ndquadrant.com
In reply to: Peter Geoghegan (#21)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

Attached is patch that addresses Fujii's third and most recent set of concerns.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

new_latch.v7.patchtext/x-patch; charset=US-ASCII; name=new_latch.v7.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4952d22..bfe6bcd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10158,7 +10158,7 @@ retry:
 					/*
 					 * Wait for more WAL to arrive, or timeout to be reached
 					 */
-					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
+					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 				}
 				else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..6d2e3a1 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,6 +93,7 @@
 #endif
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
 
@@ -188,22 +189,25 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
+ *
+ * Note that there is no guarantee that callers will have all wake-up conditions
+ * returned, but we will report at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns same bit mask and makes same guarantees as WaitLatch.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +215,15 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
+
+	Assert(wakeEvents != 0);
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 1000000L;
 		tv.tv_usec = timeout % 1000000L;
@@ -224,7 +231,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +242,31 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
+
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +274,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +290,33 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 					(errcode_for_socket_access(),
 					 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask))))
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;				/* data available in socket */
+			if ((wakeEvents & WL_SOCKET_READABLE ) && FD_ISSET(sock, &input_mask))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; /* data available in socket */
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
+		}
+		if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask))
+		{
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
 		}
 	}
+	while(!found);
 	waiting = false;
 
 	return result;
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 4bcf7b7..66f2a40 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -23,6 +23,7 @@
 #include <unistd.h>
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
@@ -81,43 +82,51 @@ DisownLatch(volatile Latch *latch)
 	latch->owner_pid = 0;
 }
 
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 int
-WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, long timeout)
 {
 	DWORD		rc;
-	HANDLE		events[3];
+	HANDLE		events[4];
 	HANDLE		latchevent;
 	HANDLE		sockevent = WSA_INVALID_EVENT;	/* silence compiler */
 	int			numevents;
 	int			result = 0;
+	bool		found = false;
+	bool		checking_socket = false;
+
+	Assert(wakeEvents != 0);
 
 	latchevent = latch->event;
 
 	events[0] = latchevent;
 	events[1] = pgwin32_signal_event;
 	numevents = 2;
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		int			flags = 0;
+		checking_socket = true;
 
-		if (forRead)
+		if (wakeEvents & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-		if (forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
 		sockevent = WSACreateEvent();
 		WSAEventSelect(sock, sockevent, flags);
 		events[numevents++] = sockevent;
 	}
+	if (wakeEvents & WL_POSTMASTER_DEATH)
+	{
+		events[numevents++] = PostmasterHandle;
+	}
 
-	for (;;)
+	do
 	{
 		/*
 		 * Reset the event, and check if the latch is set already. If someone
@@ -127,24 +136,41 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 		 */
 		if (!ResetEvent(latchevent))
 			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		rc = WaitForMultipleObjects(numevents, events, FALSE,
-							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
-		if (rc == WAIT_FAILED)
+                                                          (timeout >= 0 && (wakeEvents & WL_TIMEOUT)) ? (timeout / 1000) : INFINITE);
+		/* Whether or not we're interested in sockets affects which
+		 * rc value indicates that PostmasterHandle indicated PM death
+		 */
+		if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
+			 rc ==  WAIT_OBJECT_0 + (checking_socket? 3:2))
+		{
+			/* Postmaster died */
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
+		}
+		else if (rc == WAIT_FAILED)
 			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
 		else if (rc == WAIT_TIMEOUT)
 		{
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
 		else if (rc == WAIT_OBJECT_0 + 1)
 			pgwin32_dispatch_queued_signals();
-		else if (rc == WAIT_OBJECT_0 + 2)
+		else if (rc == WAIT_OBJECT_0 + 2 && checking_socket) /* If we are checking_socket, it will be WAIT_OBJECT_0 + 2 */
 		{
 			WSANETWORKEVENTS resEvents;
 
@@ -155,17 +181,24 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 				ereport(FATAL,
 						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
 
-			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
-				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
-				result = 2;
-			break;
+			if ((wakeEvents & WL_SOCKET_READABLE) && (resEvents.lNetworkEvents & FD_READ))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true;
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && (resEvents.lNetworkEvents & FD_WRITE))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
 		else if (rc != WAIT_OBJECT_0)
 			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
 	}
+	while(!found);
 
 	/* Clean up the handle we created for the socket */
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		WSAEventSelect(sock, sockevent, 0);
 		WSACloseEvent(sockevent);
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..db9401a 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -11,6 +11,8 @@
  */
 #include "postgres.h"
 #include "postmaster/fork_process.h"
+#include "postmaster/postmaster.h"
+
 
 #include <fcntl.h>
 #include <time.h>
@@ -19,13 +21,14 @@
 #include <unistd.h>
 
 #ifndef WIN32
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
  * child in the parent process.
  */
 pid_t
-fork_process(void)
+do_fork_process(bool remain_postmaster)
 {
 	pid_t		result;
 
@@ -61,6 +64,17 @@ fork_process(void)
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
+		/*
+		 * Usually, we're forking to create a new, distinct process. That process
+		 * should release the postmaster death watch handle, which is required by
+		 * the implementation, as described in unix_latch.c.
+		 *
+		 * Less frequently, we want to fork for some other reason (such as for
+		 * silent_mode), and the child process is intended to become the new
+		 * postmaster. It should therefore retain the death watch handle.
+		 */
+		if (!remain_postmaster)
+			ReleasePostmasterDeathWatchHandle();
 
 		/*
 		 * By default, Linux tends to kill the postmaster in out-of-memory
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..a56fe92 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -371,25 +395,27 @@ pgarch_MainLoop(void)
 		}
 
 		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		 * Wait on latch, until various signals are received, or
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time, or on the postmaster's
+		 * untimely demise.
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals resetting the timeout of
+		 * WaitLatch()/select() on some platforms can be safely disregarded,
+		 * because we handle all expected signals, and all handlers
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL -
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
 
 		/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6572292..019c323 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -443,6 +443,7 @@ typedef struct
 	HANDLE		syslogPipe[2];
 #else
 	int			syslogPipe[2];
+	int			postmaster_alive_fds[2];
 #endif
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
@@ -472,6 +473,13 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
 
+/*
+ * 2 file descriptors that monitoring if postmaster is alive.
+ * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+ */
+#ifndef WIN32
+int postmaster_alive_fds[2] = {-1, -1};
+#endif
 
 /*
  * Postmaster main entry point
@@ -720,6 +728,15 @@ PostmasterMain(int argc, char *argv[])
 	if (!SelectConfigFiles(userDoption, progname))
 		ExitPostmaster(2);
 
+#ifndef WIN32
+	/*
+	 * Initialise mechanism that allows waiting latch clients
+	 * to wake on postmaster death, to finish their
+	 * remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
+#endif
+
 	/* Verify that DataDir looks reasonable */
 	checkDataDir();
 
@@ -1312,7 +1329,7 @@ pmdaemonize(void)
 	/*
 	 * Okay to fork.
 	 */
-	pid = fork_process();
+	pid = fork_process_remain_postmaster();
 	if (pid == (pid_t) -1)
 	{
 		write_stderr("%s: could not fork background process: %s\n",
@@ -4758,6 +4775,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
 	strlcpy(param->my_exec_path, my_exec_path, MAXPGPATH);
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
@@ -4973,6 +4993,10 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
+
 	strlcpy(my_exec_path, param->my_exec_path, MAXPGPATH);
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
@@ -5088,5 +5112,79 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
 	/* Queue SIGCHLD signal */
 	pg_queue_signal(SIGCHLD);
 }
+#else
+/*
+ * Initialise one and only handle for monitoring postmaster death.
+ *
+ * Called once from the postmaster, so that child processes can subsequently
+ * monitor if their parent is dead. We open up an anoymous pipe, and have child
+ * processes block on a select() call that examines if the read file descriptor
+ * is ready for reading. They do so through a latch.
+ *
+ * Child processes are responsible for releasing the death watch handler, so
+ * that only the postmaster holds it, and a select() on the fd returns upon the
+ * one and only holder (the postmaster) dying.
+ *
+ * This is a trick that obviates the need for auxiliary backends to have tight
+ * polling loops where they check if the postmaster is alive. We do this because
+ * that pattern results in an excessive number of wakeups per second when idle.
+ */
+void
+InitPostmasterDeathWatchHandle(void)
+{
+	/*
+	 * Create pipe. The postmaster is deemed dead if
+	 * no process has the writing end (POSTMASTER_FD_OWN) open.
+	 */
+	Assert(MyProcPid == PostmasterPid);
+	if (pipe(postmaster_alive_fds))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg( "pipe() call failed to create pipe to monitor postmaster death: %m")));
+	}
+	/*
+	 * Set O_NONBLOCK to allow checking for the fd's presence with a select() call
+	 */
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, O_NONBLOCK))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %m")));
+	}
+}
 
-#endif   /* WIN32 */
+/*
+ * Release postmaster death watch handle.
+ *
+ * Important: This must be called immediately after a process
+ * forks from the postmaster. Otherwise, latch clients will
+ * not wake up on postmaster death, even if they have requested
+ * to.
+ *
+ * Even some hypothetical backend that doesn't care about postmaster
+ * death has a responsibility to call this function - otherwise,
+ * some other latch client backend could wait in vain to be informed
+ * of postmaster death, because the irresponsible backend held open
+ * the ownership file descriptor and outlived the postmaster.
+ *
+ * We call this function within the fork machinery to handle all cases,
+ * so backends need not bother with it themselves.
+ */
+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice, or before initialisation */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[POSTMASTER_FD_OWN]))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to close file descriptor associated with postmaster death in child process")));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+#endif
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 2b52d16..7cf6206 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -171,7 +171,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		 * postmaster death regularly while waiting. Note that timeout here
 		 * does not necessarily release from loop.
 		 */
-		WaitLatch(&MyProc->waitLatch, 60000000L);
+		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
 
 		/* Must reset the latch before testing state. */
 		ResetLatch(&MyProc->waitLatch);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..27cc350 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -805,8 +805,9 @@ WalSndLoop(void)
 			}
 
 			/* Sleep */
-			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
-							  true, pq_is_send_pending(),
+			WaitLatchOrSocket(&MyWalSnd->latch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE | (pq_is_send_pending()? WL_SOCKET_WRITEABLE:0) |  WL_TIMEOUT,
+							  MyProcPort->sock,
 							  sleeptime * 1000L);
 
 			/* Check for replication timeout */
diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
index 0553fd2..e0abe5d 100644
--- a/src/include/postmaster/fork_process.h
+++ b/src/include/postmaster/fork_process.h
@@ -12,6 +12,8 @@
 #ifndef FORK_PROCESS_H
 #define FORK_PROCESS_H
 
-extern pid_t fork_process(void);
+extern pid_t do_fork_process(bool remain_postmaster);
+#define fork_process() do_fork_process(false)
+#define fork_process_remain_postmaster() do_fork_process(true)
 
 #endif   /* FORK_PROCESS_H */
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 25cc84a..497cf51 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -33,6 +33,25 @@ extern bool restart_after_crash;
 
 #ifdef WIN32
 extern HANDLE PostmasterHandle;
+#else
+/*
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+extern int postmaster_alive_fds[2];
+/*
+ * On unix, it is necessary to Init monitoring
+ * of postmaster being alive
+ */
+extern void InitPostmasterDeathWatchHandle(void);
+/*
+ * It is also necessary to call ReleasePostmasterDeathWatchHandle()
+ * after forking from PM for the Unix implementation
+ */
+extern void ReleasePostmasterDeathWatchHandle(void);
 #endif
 
 extern const char *progname;
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 03ec071..6865ac7 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -38,9 +38,8 @@ extern void InitLatch(volatile Latch *latch);
 extern void InitSharedLatch(volatile Latch *latch);
 extern void OwnLatch(volatile Latch *latch);
 extern void DisownLatch(volatile Latch *latch);
-extern bool WaitLatch(volatile Latch *latch, long timeout);
-extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
-				  bool forRead, bool forWrite, long timeout);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
 extern void SetLatch(volatile Latch *latch);
 extern void ResetLatch(volatile Latch *latch);
 
@@ -56,4 +55,11 @@ extern void latch_sigusr1_handler(void);
 #define latch_sigusr1_handler()
 #endif
 
+/* Bitmasks for events that may wake-up WaitLatch() clients */
+#define WL_LATCH_SET         (1 << 0)
+#define WL_SOCKET_READABLE   (1 << 1)
+#define WL_SOCKET_WRITEABLE  (1 << 2)
+#define WL_TIMEOUT           (1 << 3)
+#define WL_POSTMASTER_DEATH  (1 << 4)
+
 #endif   /* LATCH_H */
#23Fujii Masao
masao.fujii@gmail.com
In reply to: Peter Geoghegan (#22)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Sat, Jun 25, 2011 at 10:41 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

Attached is patch that addresses Fujii's third and most recent set of concerns.

Thanks for updating the patch!

I think that Heikki is currently taking another look at my work,
because he indicates in a new message to the list a short time ago
that while reviewing my patch, he realised that there may be an
independent problem with silent_mode. I will wait for his remarks
before producing another version of the patch that incorporates those
two small changes.

Yes, we should wait for the comments from Heikki. But, I have another
comments;

InitPostmasterDeathWatchHandle() can be static function because it's
used only in postmaster.c.

ReleasePostmasterDeathWatchHandle() can call ereport(FATAL) before
StartChildProcess() or BackendStartup() calls on_exit_reset() and resets
MyProcPid. This looks unsafe. If that ereport(FATAL) is unfortunately
called, a process other than postmaster would perform the postmaster's
proc-exit handlers. And that ereport(FATAL) would report wrong pid
when %p is specified in log_line_prefix. What about closing the pipe in
ClosePostmasterPorts() and removing ReleasePostmasterDeathWatchHandle()?

+	/*
+	 * Set O_NONBLOCK to allow checking for the fd's presence with a select() call
+	 */
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, O_NONBLOCK))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %m")));
+	}

I don't think that the pipe fd needs to be set to non-blocking mode
since we don't read or write on it.

http://developer.postgresql.org/pgdocs/postgres/error-style-guide.html
According to the error style guide, I think that it's better to change the
following messages:

+ errmsg( "pipe() call failed to create pipe to monitor postmaster
death: %m")));

"could not create pipe for monitoring postmaster death: %m"

+ errmsg("failed to close file descriptor associated with
postmaster death in child process")));

"could not close postmaster pipe: %m"

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#24Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#23)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 30.06.2011 09:36, Fujii Masao wrote:

On Sat, Jun 25, 2011 at 10:41 AM, Peter Geoghegan<peter@2ndquadrant.com> wrote:

Attached is patch that addresses Fujii's third and most recent set of concerns.

Thanks for updating the patch!

I think that Heikki is currently taking another look at my work,
because he indicates in a new message to the list a short time ago
that while reviewing my patch, he realised that there may be an
independent problem with silent_mode. I will wait for his remarks
before producing another version of the patch that incorporates those
two small changes.

Yes, we should wait for the comments from Heikki. But, I have another
comments;

Here's a WIP patch with some mostly cosmetic changes I've done this far.
I haven't tested the Windows code at all yet. It seems that no-one is
objecting to removing silent_mode altogether, so I'm going to do that
before committing this patch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

new_latch-v7.1.patchtext/x-diff; name=new_latch-v7.1.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a7f5373..155acea 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10165,7 +10165,7 @@ retry:
 					/*
 					 * Wait for more WAL to arrive, or timeout to be reached
 					 */
-					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
+					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 				}
 				else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..1a2e141 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,6 +93,7 @@
 #endif
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
 
@@ -179,31 +180,32 @@ DisownLatch(volatile Latch *latch)
  * Wait for given latch to be set or until timeout is exceeded.
  * If the latch is already set, the function returns immediately.
  *
- * The 'timeout' is given in microseconds, and -1 means wait forever.
- * On some platforms, signals cause the timeout to be restarted, so beware
- * that the function can sleep for several times longer than the specified
- * timeout.
+ * The 'timeout' is given in microseconds. It must be >= 0 if WL_TIMEOUT
+ * event is given, otherwise it is ignored. On some platforms, signals cause
+ * the timeout to be restarted, so beware that the function can sleep for
+ * several times longer than the specified timeout.
  *
  * The latch must be owned by the current process, ie. it must be a
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up. Note
+ * that if multiple wake-up conditions are true, there is no guarantee that
+ * we return all of them in one call, but we will return at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
- * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * Like WaitLatch, but with an extra socket argument for WL_SOCKET_*
+ * conditions.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
+				  long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -212,19 +214,26 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	int			rc;
 	int			result = 0;
 
+	Assert(wakeEvents != 0);
+
+	/* Ignore WL_SOCKET_* events if no valid socket is given */
+	if (sock == PGINVALID_SOCKET)
+		wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
+
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (wakeEvents & WL_TIMEOUT)
 	{
+		Assert(timeout >= 0);
 		tv.tv_sec = timeout / 1000000L;
 		tv.tv_usec = timeout % 1000000L;
 		tvp = &tv;
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +244,28 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			/*
+			 * Leave loop immediately, avoid blocking again. We don't attempt
+			 * to report any other events that might also be satisfied.
+			 */
 			break;
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
+
+		if (wakeEvents & WL_SOCKET_READABLE)
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +273,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +289,26 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 					(errcode_for_socket_access(),
 					 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask))))
+		if ((wakeEvents & WL_SOCKET_READABLE) && FD_ISSET(sock, &input_mask))
 		{
-			result = 2;
-			break;				/* data available in socket */
+			/* data available in socket */
+			result |= WL_SOCKET_READABLE;
 		}
-	}
+		if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
+		{
+			result |= WL_SOCKET_WRITEABLE;
+		}
+		if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask))
+		{
+			result |= WL_POSTMASTER_DEATH;
+		}
+	} while(result == 0);
 	waiting = false;
 
 	return result;
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 4bcf7b7..fc97323 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -23,6 +23,7 @@
 #include <unistd.h>
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
@@ -81,43 +82,66 @@ DisownLatch(volatile Latch *latch)
 	latch->owner_pid = 0;
 }
 
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 int
-WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, long timeout)
 {
 	DWORD		rc;
-	HANDLE		events[3];
+	HANDLE		events[4];
 	HANDLE		latchevent;
-	HANDLE		sockevent = WSA_INVALID_EVENT;	/* silence compiler */
+	HANDLE		sockevent = WSA_INVALID_EVENT;
 	int			numevents;
 	int			result = 0;
+	int			pmdeath_eventno;
+	long		timeout_ms;
+
+	Assert(wakeEvents != 0);
+
+	/* Ignore WL_SOCKET_* events if no valid socket is given */
+	if (sock == PGINVALID_SOCKET)
+		wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
+
+	/* Convert timeout to milliseconds for WaitForMultipleObjects() */
+	if ((wakeEvents & WL_TIMEOUT) != 0)
+	{
+		Assert(timeout >= 0);
+		timeout_ms = timeout / 1000;
+	}
+	else
+		timeout_ms = INFINITE;
 
+	/* Construct an array of event handles for WaitforMultipleObjects() */
 	latchevent = latch->event;
 
 	events[0] = latchevent;
 	events[1] = pgwin32_signal_event;
 	numevents = 2;
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (((wakeEvents & WL_SOCKET_READABLE) ||
+		 (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		int			flags = 0;
 
-		if (forRead)
+		if (wakeEvents & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-		if (forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
 		sockevent = WSACreateEvent();
 		WSAEventSelect(sock, sockevent, flags);
 		events[numevents++] = sockevent;
 	}
+	if (wakeEvents & WL_POSTMASTER_DEATH)
+	{
+		pmdeath_eventno = numevents;
+		events[numevents++] = PostmasterHandle;
+	}
 
-	for (;;)
+	do
 	{
 		/*
 		 * Reset the event, and check if the latch is set already. If someone
@@ -127,45 +151,64 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 		 */
 		if (!ResetEvent(latchevent))
 			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			/*
+			 * Leave loop immediately, avoid blocking again. We don't attempt
+			 * to report any other events that might also be satisfied.
+			 */
 			break;
 		}
 
-		rc = WaitForMultipleObjects(numevents, events, FALSE,
-							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
+		rc = WaitForMultipleObjects(numevents, events, FALSE, timeout_ms);
+
 		if (rc == WAIT_FAILED)
 			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
+
+		/* Participate in the Windows signal emulation */
+		else if (rc == WAIT_OBJECT_0 + 1)
+			pgwin32_dispatch_queued_signals();
+
+		else if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+			rc == WAIT_OBJECT_0 + pmdeath_eventno)
+		{
+			/* Postmaster died */
+			result |= WL_POSTMASTER_DEATH;
+		}
 		else if (rc == WAIT_TIMEOUT)
 		{
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
 		}
-		else if (rc == WAIT_OBJECT_0 + 1)
-			pgwin32_dispatch_queued_signals();
-		else if (rc == WAIT_OBJECT_0 + 2)
+		else if (wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) != 0 &&
+				 rc == WAIT_OBJECT_0 + 2)	/* socket is at event slot 2 */
 		{
 			WSANETWORKEVENTS resEvents;
 
-			Assert(sock != PGINVALID_SOCKET);
-
 			ZeroMemory(&resEvents, sizeof(resEvents));
 			if (WSAEnumNetworkEvents(sock, sockevent, &resEvents) == SOCKET_ERROR)
 				ereport(FATAL,
 						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
 
-			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
-				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
-				result = 2;
-			break;
+			if ((wakeEvents & WL_SOCKET_READABLE) &&
+				(resEvents.lNetworkEvents & FD_READ))
+			{
+				result |= WL_SOCKET_READABLE;
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) &&
+				(resEvents.lNetworkEvents & FD_WRITE))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+			}
 		}
+		/* Otherwise it must be the latch event */
 		else if (rc != WAIT_OBJECT_0)
 			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
 	}
+	while(result == 0);
 
 	/* Clean up the handle we created for the socket */
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sockevent != WSA_INVALID_EVENT)
 	{
 		WSAEventSelect(sock, sockevent, 0);
 		WSACloseEvent(sockevent);
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..db9401a 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -11,6 +11,8 @@
  */
 #include "postgres.h"
 #include "postmaster/fork_process.h"
+#include "postmaster/postmaster.h"
+
 
 #include <fcntl.h>
 #include <time.h>
@@ -19,13 +21,14 @@
 #include <unistd.h>
 
 #ifndef WIN32
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
  * child in the parent process.
  */
 pid_t
-fork_process(void)
+do_fork_process(bool remain_postmaster)
 {
 	pid_t		result;
 
@@ -61,6 +64,17 @@ fork_process(void)
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
+		/*
+		 * Usually, we're forking to create a new, distinct process. That process
+		 * should release the postmaster death watch handle, which is required by
+		 * the implementation, as described in unix_latch.c.
+		 *
+		 * Less frequently, we want to fork for some other reason (such as for
+		 * silent_mode), and the child process is intended to become the new
+		 * postmaster. It should therefore retain the death watch handle.
+		 */
+		if (!remain_postmaster)
+			ReleasePostmasterDeathWatchHandle();
 
 		/*
 		 * By default, Linux tends to kill the postmaster in out-of-memory
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..a56fe92 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -371,25 +395,27 @@ pgarch_MainLoop(void)
 		}
 
 		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		 * Wait on latch, until various signals are received, or
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time, or on the postmaster's
+		 * untimely demise.
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals resetting the timeout of
+		 * WaitLatch()/select() on some platforms can be safely disregarded,
+		 * because we handle all expected signals, and all handlers
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL -
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
 
 		/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6572292..1ec4fda 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -443,6 +443,7 @@ typedef struct
 	HANDLE		syslogPipe[2];
 #else
 	int			syslogPipe[2];
+	int			postmaster_alive_fds[2];
 #endif
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
@@ -472,6 +473,13 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
 
+/*
+ * 2 file descriptors that monitoring if postmaster is alive.
+ * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+ */
+#ifndef WIN32
+int postmaster_alive_fds[2] = { -1, -1 };
+#endif
 
 /*
  * Postmaster main entry point
@@ -998,6 +1006,12 @@ PostmasterMain(int argc, char *argv[])
 		ereport(FATAL,
 				(errmsg_internal("could not duplicate postmaster handle: error code %d",
 								 (int) GetLastError())));
+#else
+	/*
+	 * Initialise mechanism that allows waiting latch clients to wake on
+	 * postmaster death, to finish their remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
 #endif
 
 	/*
@@ -1312,7 +1326,7 @@ pmdaemonize(void)
 	/*
 	 * Okay to fork.
 	 */
-	pid = fork_process();
+	pid = fork_process_remain_postmaster();
 	if (pid == (pid_t) -1)
 	{
 		write_stderr("%s: could not fork background process: %s\n",
@@ -4758,6 +4772,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
 	strlcpy(param->my_exec_path, my_exec_path, MAXPGPATH);
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
@@ -4973,6 +4990,10 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
+
 	strlcpy(my_exec_path, param->my_exec_path, MAXPGPATH);
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
@@ -5088,5 +5109,79 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
 	/* Queue SIGCHLD signal */
 	pg_queue_signal(SIGCHLD);
 }
+#else
+/*
+ * Initialise one and only handle for monitoring postmaster death.
+ *
+ * Called once from the postmaster, so that child processes can subsequently
+ * monitor if their parent is dead. We open up an anoymous pipe, and have child
+ * processes block on a select() call that examines if the read file descriptor
+ * is ready for reading. They do so through a latch.
+ *
+ * Child processes are responsible for releasing the death watch handler, so
+ * that only the postmaster holds it, and a select() on the fd returns upon the
+ * one and only holder (the postmaster) dying.
+ *
+ * This is a trick that obviates the need for auxiliary backends to have tight
+ * polling loops where they check if the postmaster is alive. We do this because
+ * that pattern results in an excessive number of wakeups per second when idle.
+ */
+void
+InitPostmasterDeathWatchHandle(void)
+{
+	/*
+	 * Create pipe. The postmaster is deemed dead if
+	 * no process has the writing end (POSTMASTER_FD_OWN) open.
+	 */
+	Assert(MyProcPid == PostmasterPid);
+	if (pipe(postmaster_alive_fds))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg( "pipe() call failed to create pipe to monitor postmaster death: %m")));
+	}
+	/*
+	 * Set O_NONBLOCK to allow checking for the fd's presence with a select() call
+	 */
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, O_NONBLOCK))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %m")));
+	}
+}
 
-#endif   /* WIN32 */
+/*
+ * Release postmaster death watch handle.
+ *
+ * Important: This must be called immediately after a process
+ * forks from the postmaster. Otherwise, latch clients will
+ * not wake up on postmaster death, even if they have requested
+ * to.
+ *
+ * Even some hypothetical backend that doesn't care about postmaster
+ * death has a responsibility to call this function - otherwise,
+ * some other latch client backend could wait in vain to be informed
+ * of postmaster death, because the irresponsible backend held open
+ * the ownership file descriptor and outlived the postmaster.
+ *
+ * We call this function within the fork machinery to handle all cases,
+ * so backends need not bother with it themselves.
+ */
+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[POSTMASTER_FD_OWN]))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to close file descriptor associated with postmaster death in child process")));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+#endif
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 2b52d16..7cf6206 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -171,7 +171,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		 * postmaster death regularly while waiting. Note that timeout here
 		 * does not necessarily release from loop.
 		 */
-		WaitLatch(&MyProc->waitLatch, 60000000L);
+		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
 
 		/* Must reset the latch before testing state. */
 		ResetLatch(&MyProc->waitLatch);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..090b831 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -779,6 +779,7 @@ WalSndLoop(void)
 		{
 			TimestampTz finish_time = 0;
 			long		sleeptime;
+			int			wakeEvents;
 
 			/* Reschedule replication timeout */
 			if (replication_timeout > 0)
@@ -805,9 +806,11 @@ WalSndLoop(void)
 			}
 
 			/* Sleep */
-			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
-							  true, pq_is_send_pending(),
-							  sleeptime * 1000L);
+			wakeEvents  = WL_LATCH_SET | WL_SOCKET_READABLE | WL_TIMEOUT;
+			if (pq_is_send_pending())
+				wakeEvents |= WL_SOCKET_WRITEABLE;
+			WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
+							  MyProcPort->sock, sleeptime * 1000L);
 
 			/* Check for replication timeout */
 			if (replication_timeout > 0 &&
diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
index 0553fd2..e0abe5d 100644
--- a/src/include/postmaster/fork_process.h
+++ b/src/include/postmaster/fork_process.h
@@ -12,6 +12,8 @@
 #ifndef FORK_PROCESS_H
 #define FORK_PROCESS_H
 
-extern pid_t fork_process(void);
+extern pid_t do_fork_process(bool remain_postmaster);
+#define fork_process() do_fork_process(false)
+#define fork_process_remain_postmaster() do_fork_process(true)
 
 #endif   /* FORK_PROCESS_H */
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 25cc84a..497cf51 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -33,6 +33,25 @@ extern bool restart_after_crash;
 
 #ifdef WIN32
 extern HANDLE PostmasterHandle;
+#else
+/*
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+extern int postmaster_alive_fds[2];
+/*
+ * On unix, it is necessary to Init monitoring
+ * of postmaster being alive
+ */
+extern void InitPostmasterDeathWatchHandle(void);
+/*
+ * It is also necessary to call ReleasePostmasterDeathWatchHandle()
+ * after forking from PM for the Unix implementation
+ */
+extern void ReleasePostmasterDeathWatchHandle(void);
 #endif
 
 extern const char *progname;
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 03ec071..6865ac7 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -38,9 +38,8 @@ extern void InitLatch(volatile Latch *latch);
 extern void InitSharedLatch(volatile Latch *latch);
 extern void OwnLatch(volatile Latch *latch);
 extern void DisownLatch(volatile Latch *latch);
-extern bool WaitLatch(volatile Latch *latch, long timeout);
-extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
-				  bool forRead, bool forWrite, long timeout);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
 extern void SetLatch(volatile Latch *latch);
 extern void ResetLatch(volatile Latch *latch);
 
@@ -56,4 +55,11 @@ extern void latch_sigusr1_handler(void);
 #define latch_sigusr1_handler()
 #endif
 
+/* Bitmasks for events that may wake-up WaitLatch() clients */
+#define WL_LATCH_SET         (1 << 0)
+#define WL_SOCKET_READABLE   (1 << 1)
+#define WL_SOCKET_WRITEABLE  (1 << 2)
+#define WL_TIMEOUT           (1 << 3)
+#define WL_POSTMASTER_DEATH  (1 << 4)
+
 #endif   /* LATCH_H */
#25Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#24)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 30 June 2011 08:58, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Here's a WIP patch with some mostly cosmetic changes I've done this far. I
haven't tested the Windows code at all yet. It seems that no-one is
objecting to removing silent_mode altogether, so I'm going to do that before
committing this patch.

I'm mostly happy with the changes you've made, but I note:

Fujii is of course correct in pointing out that
InitPostmasterDeathWatchHandle() should be a static function.

s/the implementation, as described in unix_latch.c/the implementation/
- This is my mistake. I see no reason to mention the .c file. Use
ctags.

Minor niggle, but there is a little errant whitespace at the top of
fork_process.c.

(wakeEvents & WL_TIMEOUT) != 0 -- I was going to note that this was
redundant, but then I remembered that stupid MSVC warning, which I
wouldn't have seen here because I didn't use it for my Windows build
due to an infuriating issue with winflex (Our own Cygwin built version
of flex for windows). I wouldn't have expected that when it was set to
build C though. I'm not sure exactly why it isn't necessary in other
places where we're (arguably) doing the same thing.

On 30 June 2011 07:36, Fujii Masao <masao.fujii@gmail.com> wrote:

ReleasePostmasterDeathWatchHandle() can call ereport(FATAL) before
StartChildProcess() or BackendStartup() calls on_exit_reset() and resets
MyProcPid. This looks unsafe. If that ereport(FATAL) is unfortunately
called, a process other than postmaster would perform the postmaster's
proc-exit handlers. And that ereport(FATAL) would report wrong pid
when %p is specified in log_line_prefix. What about closing the pipe in
ClosePostmasterPorts() and removing ReleasePostmasterDeathWatchHandle()?

Hmm. That is a valid criticism. I'd rather move the
ReleasePostmasterDeathWatchHandle() call into ClosePostmasterPorts()
though.

http://developer.postgresql.org/pgdocs/postgres/error-style-guide.html
According to the error style guide, I think that it's better to change the
following messages:

I don't think that the way I've phrased my error messages is
inconsistent with that style guide, excepty perhaps the pipe()
reference, but if you feel it's important to try and use "could not",
I have no objections.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#26Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#25)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Thu, Jun 30, 2011 at 5:47 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

I don't think that the way I've phrased my error messages is
inconsistent with that style guide, excepty perhaps the pipe()
reference, but if you feel it's important to try and use "could not",
I have no objections.

I like Fujii's rephrasing - we don't usually mention the name of the
system call.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#25)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

Ok, here's a new patch, addressing the issues Fujii raised, and with a
bunch of stylistic changes of my own. Also, I committed a patch to
remove silent_mode, so the fork_process() changes are now gone. I'm
going to sleep over this and review once again tomorrow, and commit if
it still looks good to me and no-one else reports new issues.

There's two small issues left:

I don't like the names POSTMASTER_FD_WATCH and POSTMASTER_FD_OWN. At a
quick glance, it's not at all clear which is which. I couldn't come up
with better names, so for now I just added some comments to clarify
that. I would find WRITE/READ more clear, but to make sense of that you
need to how the pipe is used. Any suggestions or opinions on that?

The BUGS section of Linux man page for select(2) says:

Under Linux, select() may report a socket file descriptor as "ready for
reading", while nevertheless a subsequent read blocks. This could for
example happen when data has arrived but upon examination has wrong
checksum and is discarded. There may be other circumstances in which a
file descriptor is spuriously reported as ready. Thus it may be safer
to use O_NONBLOCK on sockets that should not block.

So in theory, on Linux you might WaitLatch might sometimes incorrectly
return WL_POSTMASTER_DEATH. None of the callers check for
WL_POSTMASTER_DEATH return code, they call PostmasterIsAlive() before
assuming the postmaster has died, so that won't affect correctness at
the moment. I doubt that scenario can even happen in our case, select()
on a pipe that is never written to. But maybe we should add add an
assertion to WaitLatch to assert that if select() reports that the
postmaster pipe has been closed, PostmasterIsAlive() also returns false.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

new_latch-v7.2.patchtext/x-diff; name=new_latch-v7.2.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 10165,10171 **** retry:
  					/*
  					 * Wait for more WAL to arrive, or timeout to be reached
  					 */
! 					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
  					ResetLatch(&XLogCtl->recoveryWakeupLatch);
  				}
  				else
--- 10165,10171 ----
  					/*
  					 * Wait for more WAL to arrive, or timeout to be reached
  					 */
! 					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
  					ResetLatch(&XLogCtl->recoveryWakeupLatch);
  				}
  				else
*** a/src/backend/port/unix_latch.c
--- b/src/backend/port/unix_latch.c
***************
*** 93,98 ****
--- 93,99 ----
  #endif
  
  #include "miscadmin.h"
+ #include "postmaster/postmaster.h"
  #include "storage/latch.h"
  #include "storage/shmem.h"
  
***************
*** 179,209 **** DisownLatch(volatile Latch *latch)
   * Wait for given latch to be set or until timeout is exceeded.
   * If the latch is already set, the function returns immediately.
   *
!  * The 'timeout' is given in microseconds, and -1 means wait forever.
!  * On some platforms, signals cause the timeout to be restarted, so beware
!  * that the function can sleep for several times longer than the specified
!  * timeout.
   *
   * The latch must be owned by the current process, ie. it must be a
   * backend-local latch initialized with InitLatch, or a shared latch
   * associated with the current process by calling OwnLatch.
   *
!  * Returns 'true' if the latch was set, or 'false' if timeout was reached.
   */
! bool
! WaitLatch(volatile Latch *latch, long timeout)
  {
! 	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
  }
  
  /*
!  * Like WaitLatch, but will also return when there's data available in
!  * 'sock' for reading or writing. Returns 0 if timeout was reached,
!  * 1 if the latch was set, 2 if the socket became readable or writable.
   */
  int
! WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
! 				  bool forWrite, long timeout)
  {
  	struct timeval tv,
  			   *tvp = NULL;
--- 180,211 ----
   * Wait for given latch to be set or until timeout is exceeded.
   * If the latch is already set, the function returns immediately.
   *
!  * The 'timeout' is given in microseconds. It must be >= 0 if WL_TIMEOUT
!  * event is given, otherwise it is ignored. On some platforms, signals cause
!  * the timeout to be restarted, so beware that the function can sleep for
!  * several times longer than the specified timeout.
   *
   * The latch must be owned by the current process, ie. it must be a
   * backend-local latch initialized with InitLatch, or a shared latch
   * associated with the current process by calling OwnLatch.
   *
!  * Returns bit field indicating which condition(s) caused the wake-up. Note
!  * that if multiple wake-up conditions are true, there is no guarantee that
!  * we return all of them in one call, but we will return at least one.
   */
! int
! WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
  {
! 	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
  }
  
  /*
!  * Like WaitLatch, but with an extra socket argument for WL_SOCKET_*
!  * conditions.
   */
  int
! WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
! 				  long timeout)
  {
  	struct timeval tv,
  			   *tvp = NULL;
***************
*** 212,230 **** WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
  	int			rc;
  	int			result = 0;
  
  	if (latch->owner_pid != MyProcPid)
  		elog(ERROR, "cannot wait on a latch owned by another process");
  
  	/* Initialize timeout */
! 	if (timeout >= 0)
  	{
  		tv.tv_sec = timeout / 1000000L;
  		tv.tv_usec = timeout % 1000000L;
  		tvp = &tv;
  	}
  
  	waiting = true;
! 	for (;;)
  	{
  		int			hifd;
  
--- 214,239 ----
  	int			rc;
  	int			result = 0;
  
+ 	Assert(wakeEvents != 0);
+ 
+ 	/* Ignore WL_SOCKET_* events if no valid socket is given */
+ 	if (sock == PGINVALID_SOCKET)
+ 		wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
+ 
  	if (latch->owner_pid != MyProcPid)
  		elog(ERROR, "cannot wait on a latch owned by another process");
  
  	/* Initialize timeout */
! 	if (wakeEvents & WL_TIMEOUT)
  	{
+ 		Assert(timeout >= 0);
  		tv.tv_sec = timeout / 1000000L;
  		tv.tv_usec = timeout % 1000000L;
  		tvp = &tv;
  	}
  
  	waiting = true;
! 	do
  	{
  		int			hifd;
  
***************
*** 235,250 **** WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
  		 * do that), and the select() will return immediately.
  		 */
  		drainSelfPipe();
! 		if (latch->is_set)
  		{
! 			result = 1;
  			break;
  		}
  
  		FD_ZERO(&input_mask);
  		FD_SET(selfpipe_readfd, &input_mask);
  		hifd = selfpipe_readfd;
! 		if (sock != PGINVALID_SOCKET && forRead)
  		{
  			FD_SET(sock, &input_mask);
  			if (sock > hifd)
--- 244,271 ----
  		 * do that), and the select() will return immediately.
  		 */
  		drainSelfPipe();
! 		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
  		{
! 			result |= WL_LATCH_SET;
! 			/*
! 			 * Leave loop immediately, avoid blocking again. We don't attempt
! 			 * to report any other events that might also be satisfied.
! 			 */
  			break;
  		}
  
  		FD_ZERO(&input_mask);
  		FD_SET(selfpipe_readfd, &input_mask);
  		hifd = selfpipe_readfd;
! 
! 		if (wakeEvents & WL_POSTMASTER_DEATH)
! 		{
! 			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
! 			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
! 				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
! 		}
! 
! 		if (wakeEvents & WL_SOCKET_READABLE)
  		{
  			FD_SET(sock, &input_mask);
  			if (sock > hifd)
***************
*** 252,265 **** WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
  		}
  
  		FD_ZERO(&output_mask);
! 		if (sock != PGINVALID_SOCKET && forWrite)
  		{
  			FD_SET(sock, &output_mask);
  			if (sock > hifd)
  				hifd = sock;
  		}
  
  		rc = select(hifd + 1, &input_mask, &output_mask, NULL, tvp);
  		if (rc < 0)
  		{
  			if (errno == EINTR)
--- 273,289 ----
  		}
  
  		FD_ZERO(&output_mask);
! 		if (wakeEvents & WL_SOCKET_WRITEABLE)
  		{
  			FD_SET(sock, &output_mask);
  			if (sock > hifd)
  				hifd = sock;
  		}
  
+ 		/* Sleep */
  		rc = select(hifd + 1, &input_mask, &output_mask, NULL, tvp);
+ 
+ 		/* Check return code */
  		if (rc < 0)
  		{
  			if (errno == EINTR)
***************
*** 268,287 **** WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
  					(errcode_for_socket_access(),
  					 errmsg("select() failed: %m")));
  		}
! 		if (rc == 0)
  		{
  			/* timeout exceeded */
! 			result = 0;
! 			break;
  		}
! 		if (sock != PGINVALID_SOCKET &&
! 			((forRead && FD_ISSET(sock, &input_mask)) ||
! 			 (forWrite && FD_ISSET(sock, &output_mask))))
  		{
! 			result = 2;
! 			break;				/* data available in socket */
  		}
! 	}
  	waiting = false;
  
  	return result;
--- 292,317 ----
  					(errcode_for_socket_access(),
  					 errmsg("select() failed: %m")));
  		}
! 		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
  		{
  			/* timeout exceeded */
! 			result |= WL_TIMEOUT;
  		}
! 		if ((wakeEvents & WL_SOCKET_READABLE) && FD_ISSET(sock, &input_mask))
  		{
! 			/* data available in socket */
! 			result |= WL_SOCKET_READABLE;
  		}
! 		if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
! 		{
! 			result |= WL_SOCKET_WRITEABLE;
! 		}
! 		if ((wakeEvents & WL_POSTMASTER_DEATH) &&
! 			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask))
! 		{
! 			result |= WL_POSTMASTER_DEATH;
! 		}
! 	} while(result == 0);
  	waiting = false;
  
  	return result;
*** a/src/backend/port/win32_latch.c
--- b/src/backend/port/win32_latch.c
***************
*** 23,28 ****
--- 23,29 ----
  #include <unistd.h>
  
  #include "miscadmin.h"
+ #include "postmaster/postmaster.h"
  #include "replication/walsender.h"
  #include "storage/latch.h"
  #include "storage/shmem.h"
***************
*** 81,123 **** DisownLatch(volatile Latch *latch)
  	latch->owner_pid = 0;
  }
  
! bool
! WaitLatch(volatile Latch *latch, long timeout)
  {
! 	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
  }
  
  int
! WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
! 				  bool forWrite, long timeout)
  {
  	DWORD		rc;
! 	HANDLE		events[3];
  	HANDLE		latchevent;
! 	HANDLE		sockevent = WSA_INVALID_EVENT;	/* silence compiler */
  	int			numevents;
  	int			result = 0;
  
  	latchevent = latch->event;
  
  	events[0] = latchevent;
  	events[1] = pgwin32_signal_event;
  	numevents = 2;
! 	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
  	{
  		int			flags = 0;
  
! 		if (forRead)
  			flags |= FD_READ;
! 		if (forWrite)
  			flags |= FD_WRITE;
  
  		sockevent = WSACreateEvent();
  		WSAEventSelect(sock, sockevent, flags);
  		events[numevents++] = sockevent;
  	}
  
! 	for (;;)
  	{
  		/*
  		 * Reset the event, and check if the latch is set already. If someone
--- 82,148 ----
  	latch->owner_pid = 0;
  }
  
! int
! WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
  {
! 	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
  }
  
  int
! WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock,
! 				  long timeout)
  {
  	DWORD		rc;
! 	HANDLE		events[4];
  	HANDLE		latchevent;
! 	HANDLE		sockevent = WSA_INVALID_EVENT;
  	int			numevents;
  	int			result = 0;
+ 	int			pmdeath_eventno;
+ 	long		timeout_ms;
+ 
+ 	Assert(wakeEvents != 0);
+ 
+ 	/* Ignore WL_SOCKET_* events if no valid socket is given */
+ 	if (sock == PGINVALID_SOCKET)
+ 		wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
+ 
+ 	/* Convert timeout to milliseconds for WaitForMultipleObjects() */
+ 	if (wakeEvents & WL_TIMEOUT)
+ 	{
+ 		Assert(timeout >= 0);
+ 		timeout_ms = timeout / 1000;
+ 	}
+ 	else
+ 		timeout_ms = INFINITE;
  
+ 	/* Construct an array of event handles for WaitforMultipleObjects() */
  	latchevent = latch->event;
  
  	events[0] = latchevent;
  	events[1] = pgwin32_signal_event;
  	numevents = 2;
! 	if (((wakeEvents & WL_SOCKET_READABLE) ||
! 		 (wakeEvents & WL_SOCKET_WRITEABLE)))
  	{
  		int			flags = 0;
  
! 		if (wakeEvents & WL_SOCKET_READABLE)
  			flags |= FD_READ;
! 		if (wakeEvents & WL_SOCKET_WRITEABLE)
  			flags |= FD_WRITE;
  
  		sockevent = WSACreateEvent();
  		WSAEventSelect(sock, sockevent, flags);
  		events[numevents++] = sockevent;
  	}
+ 	if (wakeEvents & WL_POSTMASTER_DEATH)
+ 	{
+ 		pmdeath_eventno = numevents;
+ 		events[numevents++] = PostmasterHandle;
+ 	}
  
! 	do
  	{
  		/*
  		 * Reset the event, and check if the latch is set already. If someone
***************
*** 127,171 **** WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
  		 */
  		if (!ResetEvent(latchevent))
  			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
! 		if (latch->is_set)
  		{
! 			result = 1;
  			break;
  		}
  
! 		rc = WaitForMultipleObjects(numevents, events, FALSE,
! 							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
  		if (rc == WAIT_FAILED)
  			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
  		else if (rc == WAIT_TIMEOUT)
  		{
! 			result = 0;
! 			break;
  		}
! 		else if (rc == WAIT_OBJECT_0 + 1)
! 			pgwin32_dispatch_queued_signals();
! 		else if (rc == WAIT_OBJECT_0 + 2)
  		{
  			WSANETWORKEVENTS resEvents;
  
- 			Assert(sock != PGINVALID_SOCKET);
- 
  			ZeroMemory(&resEvents, sizeof(resEvents));
  			if (WSAEnumNetworkEvents(sock, sockevent, &resEvents) == SOCKET_ERROR)
  				ereport(FATAL,
  						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
  
! 			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
! 				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
! 				result = 2;
! 			break;
  		}
  		else if (rc != WAIT_OBJECT_0)
  			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
  	}
  
  	/* Clean up the handle we created for the socket */
! 	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
  	{
  		WSAEventSelect(sock, sockevent, 0);
  		WSACloseEvent(sockevent);
--- 152,215 ----
  		 */
  		if (!ResetEvent(latchevent))
  			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
! 		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
  		{
! 			result |= WL_LATCH_SET;
! 			/*
! 			 * Leave loop immediately, avoid blocking again. We don't attempt
! 			 * to report any other events that might also be satisfied.
! 			 */
  			break;
  		}
  
! 		rc = WaitForMultipleObjects(numevents, events, FALSE, timeout_ms);
! 
  		if (rc == WAIT_FAILED)
  			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
+ 
+ 		/* Participate in Windows signal emulation */
+ 		else if (rc == WAIT_OBJECT_0 + 1)
+ 			pgwin32_dispatch_queued_signals();
+ 
+ 		else if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+ 			rc == WAIT_OBJECT_0 + pmdeath_eventno)
+ 		{
+ 			/* Postmaster died */
+ 			result |= WL_POSTMASTER_DEATH;
+ 		}
  		else if (rc == WAIT_TIMEOUT)
  		{
! 			result |= WL_TIMEOUT;
  		}
! 		else if ((wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)) != 0 &&
! 				 rc == WAIT_OBJECT_0 + 2)	/* socket is at event slot 2 */
  		{
  			WSANETWORKEVENTS resEvents;
  
  			ZeroMemory(&resEvents, sizeof(resEvents));
  			if (WSAEnumNetworkEvents(sock, sockevent, &resEvents) == SOCKET_ERROR)
  				ereport(FATAL,
  						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
  
! 			if ((wakeEvents & WL_SOCKET_READABLE) &&
! 				(resEvents.lNetworkEvents & FD_READ))
! 			{
! 				result |= WL_SOCKET_READABLE;
! 			}
! 			if ((wakeEvents & WL_SOCKET_WRITEABLE) &&
! 				(resEvents.lNetworkEvents & FD_WRITE))
! 			{
! 				result |= WL_SOCKET_WRITEABLE;
! 			}
  		}
+ 		/* Otherwise it must be the latch event */
  		else if (rc != WAIT_OBJECT_0)
  			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
  	}
+ 	while(result == 0);
  
  	/* Clean up the handle we created for the socket */
! 	if (sockevent != WSA_INVALID_EVENT)
  	{
  		WSAEventSelect(sock, sockevent, 0);
  		WSACloseEvent(sockevent);
*** a/src/backend/postmaster/pgarch.c
--- b/src/backend/postmaster/pgarch.c
***************
*** 40,45 ****
--- 40,46 ----
  #include "postmaster/postmaster.h"
  #include "storage/fd.h"
  #include "storage/ipc.h"
+ #include "storage/latch.h"
  #include "storage/pg_shmem.h"
  #include "storage/pmsignal.h"
  #include "utils/guc.h"
***************
*** 87,92 **** static volatile sig_atomic_t got_SIGTERM = false;
--- 88,99 ----
  static volatile sig_atomic_t wakened = false;
  static volatile sig_atomic_t ready_to_stop = false;
  
+ /*
+  * Latch that archiver loop waits on until it is awakened by
+  * signals, each of which there is a handler for
+  */
+ static volatile Latch mainloop_latch;
+ 
  /* ----------
   * Local function forward declarations
   * ----------
***************
*** 228,233 **** PgArchiverMain(int argc, char *argv[])
--- 235,242 ----
  
  	MyProcPid = getpid();		/* reset MyProcPid */
  
+ 	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+ 
  	MyStartTime = time(NULL);	/* record Start Time for logging */
  
  	/*
***************
*** 282,287 **** ArchSigHupHandler(SIGNAL_ARGS)
--- 291,298 ----
  {
  	/* set flag to re-read config file at next convenient time */
  	got_SIGHUP = true;
+ 	/* Let the waiting loop iterate */
+ 	SetLatch(&mainloop_latch);
  }
  
  /* SIGTERM signal handler for archiver process */
***************
*** 295,300 **** ArchSigTermHandler(SIGNAL_ARGS)
--- 306,313 ----
  	 * archive commands.
  	 */
  	got_SIGTERM = true;
+ 	/* Let the waiting loop iterate */
+ 	SetLatch(&mainloop_latch);
  }
  
  /* SIGUSR1 signal handler for archiver process */
***************
*** 303,308 **** pgarch_waken(SIGNAL_ARGS)
--- 316,323 ----
  {
  	/* set flag that there is work to be done */
  	wakened = true;
+ 	/* Let the waiting loop iterate */
+ 	SetLatch(&mainloop_latch);
  }
  
  /* SIGUSR2 signal handler for archiver process */
***************
*** 311,316 **** pgarch_waken_stop(SIGNAL_ARGS)
--- 326,333 ----
  {
  	/* set flag to do a final cycle and shut down afterwards */
  	ready_to_stop = true;
+ 	/* Let the waiting loop iterate */
+ 	SetLatch(&mainloop_latch);
  }
  
  /*
***************
*** 334,339 **** pgarch_MainLoop(void)
--- 351,363 ----
  
  	do
  	{
+ 		/*
+ 		 * There shouldn't be anything for the archiver to do except to wait
+ 		 * on a latch ... however, the archiver exists to protect our data,
+ 		 * so she wakes up occasionally to allow herself to be proactive.
+ 		 */
+ 		ResetLatch(&mainloop_latch);
+ 
  		/* When we get SIGUSR2, we do one more archive cycle, then exit */
  		time_to_stop = ready_to_stop;
  
***************
*** 371,395 **** pgarch_MainLoop(void)
  		}
  
  		/*
! 		 * There shouldn't be anything for the archiver to do except to wait
! 		 * for a signal ... however, the archiver exists to protect our data,
! 		 * so she wakes up occasionally to allow herself to be proactive.
  		 *
! 		 * On some platforms, signals won't interrupt the sleep.  To ensure we
! 		 * respond reasonably promptly when someone signals us, break down the
! 		 * sleep into 1-second increments, and check for interrupts after each
! 		 * nap.
  		 */
- 		while (!(wakened || ready_to_stop || got_SIGHUP ||
- 				 !PostmasterIsAlive(true)))
- 		{
- 			time_t		curtime;
  
! 			pg_usleep(1000000L);
  			curtime = time(NULL);
  			if ((unsigned int) (curtime - last_copy_time) >=
  				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
! 				wakened = true;
  		}
  
  		/*
--- 395,421 ----
  		}
  
  		/*
! 		 * Wait on latch, until various signals are received, or
! 		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
! 		 * having passed since last_copy_time, or on the postmaster's
! 		 * untimely demise.
  		 *
! 		 * The caveat about signals resetting the timeout of
! 		 * WaitLatch()/select() on some platforms can be safely disregarded,
! 		 * because we handle all expected signals, and all handlers
! 		 * call SetLatch() where that matters anyway
  		 */
  
! 		if (!time_to_stop) /* Don't wait during last iteration */
! 		{
! 			time_t		 curtime = time(NULL);
! 			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL -
! 					(unsigned int) (curtime - last_copy_time);
! 			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
  			curtime = time(NULL);
  			if ((unsigned int) (curtime - last_copy_time) >=
  				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
! 				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
  		}
  
  		/*
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 368,373 **** static int	CountChildren(int target);
--- 368,374 ----
  static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
  static pid_t StartChildProcess(AuxProcType type);
  static void StartAutovacuumWorker(void);
+ static void InitPostmasterDeathWatchHandle(void);
  
  #ifdef EXEC_BACKEND
  
***************
*** 383,390 **** typedef struct
  	HANDLE		procHandle;
  	DWORD		procId;
  } win32_deadchild_waitinfo;
- 
- HANDLE		PostmasterHandle;
  #endif
  
  static pid_t backend_forkexec(Port *port);
--- 384,389 ----
***************
*** 439,444 **** typedef struct
--- 438,444 ----
  	HANDLE		initial_signal_pipe;
  	HANDLE		syslogPipe[2];
  #else
+ 	int			postmaster_alive_fds[2];
  	int			syslogPipe[2];
  #endif
  	char		my_exec_path[MAXPGPATH];
***************
*** 469,474 **** static void ShmemBackendArrayRemove(Backend *bn);
--- 469,484 ----
  #define EXIT_STATUS_0(st)  ((st) == 0)
  #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
  
+ /*
+  * File descriptors for pipe used to monitor if postmaster is alive.
+  * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+  */
+ #ifndef WIN32
+ int postmaster_alive_fds[2] = { -1, -1 };
+ #else
+ /* Process handle of postmaster used for the same purpose on Windows */
+ HANDLE		PostmasterHandle;
+ #endif
  
  /*
   * Postmaster main entry point
***************
*** 962,969 **** PostmasterMain(int argc, char *argv[])
  	 */
  	BackendList = DLNewList();
  
! #ifdef WIN32
  
  	/*
  	 * Initialize I/O completion port used to deliver list of dead children.
  	 */
--- 972,984 ----
  	 */
  	BackendList = DLNewList();
  
! 	/*
! 	 * Initialize pipe (or process handle on Windows) that allows children to
! 	 * wake up from sleep on postmaster death.
! 	 */
! 	InitPostmasterDeathWatchHandle();
  
+ #ifdef WIN32
  	/*
  	 * Initialize I/O completion port used to deliver list of dead children.
  	 */
***************
*** 971,991 **** PostmasterMain(int argc, char *argv[])
  	if (win32ChildQueue == NULL)
  		ereport(FATAL,
  		   (errmsg("could not create I/O completion port for child queue")));
- 
- 	/*
- 	 * Set up a handle that child processes can use to check whether the
- 	 * postmaster is still running.
- 	 */
- 	if (DuplicateHandle(GetCurrentProcess(),
- 						GetCurrentProcess(),
- 						GetCurrentProcess(),
- 						&PostmasterHandle,
- 						0,
- 						TRUE,
- 						DUPLICATE_SAME_ACCESS) == 0)
- 		ereport(FATAL,
- 				(errmsg_internal("could not duplicate postmaster handle: error code %d",
- 								 (int) GetLastError())));
  #endif
  
  	/*
--- 986,991 ----
***************
*** 1965,1970 **** ClosePostmasterPorts(bool am_syslogger)
--- 1965,1983 ----
  {
  	int			i;
  
+ #ifndef WIN32
+ 	/*
+ 	 * Close the write end of postmaster death watch pipe. It's important to
+ 	 * do this as early as possible, so that if postmaster dies, others won't
+ 	 * think that it's still running because we're holding the pipe open.
+ 	 */
+ 	if (close(postmaster_alive_fds[POSTMASTER_FD_OWN]))
+ 		ereport(FATAL,
+ 			(errcode_for_file_access(),
+ 			 errmsg_internal("could not close postmaster death monitoring pipe in child process: %m")));
+ 	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+ #endif
+ 
  	/* Close the listen sockets */
  	for (i = 0; i < MAXLISTEN; i++)
  	{
***************
*** 4643,4648 **** save_backend_variables(BackendParameters *param, Port *port,
--- 4656,4664 ----
  								 pgwin32_create_signal_listener(childPid),
  								 childProcess))
  		return false;
+ #else
+ 	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds,
+ 		   sizeof(postmaster_alive_fds));
  #endif
  
  	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
***************
*** 4858,4863 **** restore_backend_variables(BackendParameters *param, Port *port)
--- 4874,4882 ----
  #ifdef WIN32
  	PostmasterHandle = param->PostmasterHandle;
  	pgwin32_initial_signal_pipe = param->initial_signal_pipe;
+ #else
+ 	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds,
+ 		   sizeof(postmaster_alive_fds));
  #endif
  
  	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
***************
*** 4979,4981 **** pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
--- 4998,5051 ----
  }
  
  #endif   /* WIN32 */
+ 
+ /*
+  * Initialize one and only handle for monitoring postmaster death.
+  *
+  * Called once in the postmaster, so that child processes can subsequently
+  * monitor if their parent is dead.
+  */
+ static void
+ InitPostmasterDeathWatchHandle(void)
+ {
+ #ifndef WIN32
+ 	/*
+ 	 * Create a pipe. Postmaster holds the write end of the pipe open
+ 	 * (POSTMASTER_FD_OWN), and children hold the read end. Children can
+ 	 * pass the read file descriptor to select() to wake up in case postmaster
+ 	 * dies. Children must close the write end as soon as possible after
+ 	 * forking, because EOF won't be signaled in the read end until all
+ 	 * processes have closed the write fd. ClosePostmasterPorts() takes care
+ 	 * of closing the write fd.
+ 	 */
+ 	Assert(MyProcPid == PostmasterPid);
+ 	if (pipe(postmaster_alive_fds))
+ 		ereport(FATAL,
+ 				(errcode_for_file_access(),
+ 				 errmsg_internal("could not create pipe to monitor postmaster death: %m")));
+ 
+ 	/*
+ 	 * Set O_NONBLOCK to allow testing for the fd's presence with a read()
+ 	 * call.
+ 	 */
+ 	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, O_NONBLOCK))
+ 		ereport(FATAL,
+ 				(errcode_for_socket_access(),
+ 				 errmsg_internal("could not set postmaster death monitoring pipe to non-blocking mode: %m")));
+ 
+ #else
+ 	/*
+ 	 * On Windows, we use a process handle for the same purpose.
+ 	 */
+ 	if (DuplicateHandle(GetCurrentProcess(),
+ 						GetCurrentProcess(),
+ 						GetCurrentProcess(),
+ 						&PostmasterHandle,
+ 						0,
+ 						TRUE,
+ 						DUPLICATE_SAME_ACCESS) == 0)
+ 		ereport(FATAL,
+ 				(errmsg_internal("could not duplicate postmaster handle: error code %d",
+ 								 (int) GetLastError())));
+ #endif   /* WIN32 */
+ }
*** a/src/backend/replication/syncrep.c
--- b/src/backend/replication/syncrep.c
***************
*** 171,177 **** SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
  		 * postmaster death regularly while waiting. Note that timeout here
  		 * does not necessarily release from loop.
  		 */
! 		WaitLatch(&MyProc->waitLatch, 60000000L);
  
  		/* Must reset the latch before testing state. */
  		ResetLatch(&MyProc->waitLatch);
--- 171,177 ----
  		 * postmaster death regularly while waiting. Note that timeout here
  		 * does not necessarily release from loop.
  		 */
! 		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
  
  		/* Must reset the latch before testing state. */
  		ResetLatch(&MyProc->waitLatch);
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***************
*** 779,784 **** WalSndLoop(void)
--- 779,785 ----
  		{
  			TimestampTz finish_time = 0;
  			long		sleeptime;
+ 			int			wakeEvents;
  
  			/* Reschedule replication timeout */
  			if (replication_timeout > 0)
***************
*** 805,813 **** WalSndLoop(void)
  			}
  
  			/* Sleep */
! 			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
! 							  true, pq_is_send_pending(),
! 							  sleeptime * 1000L);
  
  			/* Check for replication timeout */
  			if (replication_timeout > 0 &&
--- 806,816 ----
  			}
  
  			/* Sleep */
! 			wakeEvents  = WL_LATCH_SET | WL_SOCKET_READABLE | WL_TIMEOUT;
! 			if (pq_is_send_pending())
! 				wakeEvents |= WL_SOCKET_WRITEABLE;
! 			WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
! 							  MyProcPort->sock, sleeptime * 1000L);
  
  			/* Check for replication timeout */
  			if (replication_timeout > 0 &&
*** a/src/include/postmaster/postmaster.h
--- b/src/include/postmaster/postmaster.h
***************
*** 33,38 **** extern bool restart_after_crash;
--- 33,46 ----
  
  #ifdef WIN32
  extern HANDLE PostmasterHandle;
+ #else
+ extern int postmaster_alive_fds[2];
+ /*
+  * Constants that represent which of postmaster_alive_fds is held by
+  * postmaster, and which is used in children to check for postmaster death.
+  */
+ #define POSTMASTER_FD_WATCH		0	/* used in children to check for postmaster death */
+ #define POSTMASTER_FD_OWN		1	/* kept open by postmaster only */
  #endif
  
  extern const char *progname;
*** a/src/include/storage/latch.h
--- b/src/include/storage/latch.h
***************
*** 38,46 **** extern void InitLatch(volatile Latch *latch);
  extern void InitSharedLatch(volatile Latch *latch);
  extern void OwnLatch(volatile Latch *latch);
  extern void DisownLatch(volatile Latch *latch);
! extern bool WaitLatch(volatile Latch *latch, long timeout);
! extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
! 				  bool forRead, bool forWrite, long timeout);
  extern void SetLatch(volatile Latch *latch);
  extern void ResetLatch(volatile Latch *latch);
  
--- 38,45 ----
  extern void InitSharedLatch(volatile Latch *latch);
  extern void OwnLatch(volatile Latch *latch);
  extern void DisownLatch(volatile Latch *latch);
! extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
! extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
  extern void SetLatch(volatile Latch *latch);
  extern void ResetLatch(volatile Latch *latch);
  
***************
*** 56,59 **** extern void latch_sigusr1_handler(void);
--- 55,65 ----
  #define latch_sigusr1_handler()
  #endif
  
+ /* Bitmasks for events that may wake-up WaitLatch() clients */
+ #define WL_LATCH_SET         (1 << 0)
+ #define WL_SOCKET_READABLE   (1 << 1)
+ #define WL_SOCKET_WRITEABLE  (1 << 2)
+ #define WL_TIMEOUT           (1 << 3)
+ #define WL_POSTMASTER_DEATH  (1 << 4)
+ 
  #endif   /* LATCH_H */
#28Florian Pflug
fgp@phlo.org
In reply to: Heikki Linnakangas (#27)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Jul4, 2011, at 17:53 , Heikki Linnakangas wrote:

Under Linux, select() may report a socket file descriptor as "ready for
reading", while nevertheless a subsequent read blocks. This could for
example happen when data has arrived but upon examination has wrong
checksum and is discarded. There may be other circumstances in which a
file descriptor is spuriously reported as ready. Thus it may be safer
to use O_NONBLOCK on sockets that should not block.

So in theory, on Linux you might WaitLatch might sometimes incorrectly return WL_POSTMASTER_DEATH. None of the callers check for WL_POSTMASTER_DEATH return code, they call PostmasterIsAlive() before assuming the postmaster has died, so that won't affect correctness at the moment. I doubt that scenario can even happen in our case, select() on a pipe that is never written to. But maybe we should add add an assertion to WaitLatch to assert that if select() reports that the postmaster pipe has been closed, PostmasterIsAlive() also returns false.

The correct solution would be to read() from the pipe after select()
returns, and only return WL_POSTMASTER_DEATCH if the read doesn't return
EAGAIN. To prevent that read() from blocking if the read event was indeed
spurious, O_NONBLOCK must be set on the pipe but that patch does that already.

Btw, with the death-watch / life-sign / whatever infrastructure in place,
shouldn't PostmasterIsAlive() be using that instead of getppid() / kill(0)?

best regards,
Florian Pflug

#29Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#27)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 4 July 2011 16:53, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Ok, here's a new patch, addressing the issues Fujii raised, and with a bunch
of stylistic changes of my own. Also, I committed a patch to remove
silent_mode, so the fork_process() changes are now gone. I'm going to sleep
over this and review once again tomorrow, and commit if it still looks good
to me and no-one else reports new issues.

Looks good.

I don't like the names POSTMASTER_FD_WATCH and POSTMASTER_FD_OWN. At a quick
glance, it's not at all clear which is which. I couldn't come up with better
names, so for now I just added some comments to clarify that. I would find
WRITE/READ more clear, but to make sense of that you need to how the pipe is
used. Any suggestions or opinions on that?

We could bikeshed about that until the cows come home, but we're not
going to come up with names that make the purpose of each evident at a
glance - it's too involved. If we could, we would have thought of them
already. Besides, I've probably already written all the client code
those macros will ever have.

On 4 July 2011 17:36, Florian Pflug <fgp@phlo.org> wrote:

On Jul4, 2011, at 17:53 , Heikki Linnakangas wrote:

      Under Linux, select() may report a socket file descriptor as "ready for
      reading",  while nevertheless a subsequent read blocks.  This could for
      example happen when data has arrived but  upon  examination  has  wrong
      checksum and is discarded.  There may be other circumstances in which a
      file descriptor is spuriously reported as ready.  Thus it may be  safer
      to use O_NONBLOCK on sockets that should not block.

So in theory, on Linux you might WaitLatch might sometimes incorrectly return WL_POSTMASTER_DEATH. None of the callers check for WL_POSTMASTER_DEATH return code, they call PostmasterIsAlive() before assuming the postmaster has died, so that won't affect correctness at the moment. I doubt that scenario can even happen in our case, select() on a pipe that is never written to. But maybe we should add add an assertion to WaitLatch to assert that if select() reports that the postmaster pipe has been closed, PostmasterIsAlive() also returns false.

The correct solution would be to read() from the pipe after select()
returns, and only return WL_POSTMASTER_DEATCH if the read doesn't return
EAGAIN. To prevent that read() from blocking if the read event was indeed
spurious, O_NONBLOCK must be set on the pipe but that patch does that already.

Let's have some perspective on this. We're talking about a highly
doubtful chance that latches may wake when they shouldn't. Latches are
typically expected to wake up for a variety of reasons, and if that
occurred in the archiver's case with my patch applied, I think we'd
just go asleep again without anything happening. It seems likely that
latch client code in general will never trip up on this, as long as
its not exclusively relying on the waitLatch() return value to report
pm death.

Maybe we should restore the return value of WaitLatch to its previous
format (so it doesn't return a bitmask)? That way, we don't report
that the Postmaster died, and therefore clients are required to call
PostmasterIsAlive() to be sure. In any case, I'm in favour of the
assertion.

Btw, with the death-watch / life-sign / whatever infrastructure in place,
shouldn't PostmasterIsAlive() be using that instead of getppid() / kill(0)?

Hmm, maybe. That seems like a separate issue though, that can be
addressed with another patch. It does have the considerable
disadvantage of making Heikki's proposed assertion failure useless. Is
the implementation of PostmasterIsAlive() really a problem at the
moment?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#30Florian Pflug
fgp@phlo.org
In reply to: Peter Geoghegan (#29)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Jul4, 2011, at 23:11 , Peter Geoghegan wrote:

On 4 July 2011 17:36, Florian Pflug <fgp@phlo.org> wrote:

On Jul4, 2011, at 17:53 , Heikki Linnakangas wrote:

Under Linux, select() may report a socket file descriptor as "ready for
reading", while nevertheless a subsequent read blocks. This could for
example happen when data has arrived but upon examination has wrong
checksum and is discarded. There may be other circumstances in which a
file descriptor is spuriously reported as ready. Thus it may be safer
to use O_NONBLOCK on sockets that should not block.

So in theory, on Linux you might WaitLatch might sometimes incorrectly return WL_POSTMASTER_DEATH. None of the callers check for WL_POSTMASTER_DEATH return code, they call PostmasterIsAlive() before assuming the postmaster has died, so that won't affect correctness at the moment. I doubt that scenario can even happen in our case, select() on a pipe that is never written to. But maybe we should add add an assertion to WaitLatch to assert that if select() reports that the postmaster pipe has been closed, PostmasterIsAlive() also returns false.

The correct solution would be to read() from the pipe after select()
returns, and only return WL_POSTMASTER_DEATCH if the read doesn't return
EAGAIN. To prevent that read() from blocking if the read event was indeed
spurious, O_NONBLOCK must be set on the pipe but that patch does that already.

Let's have some perspective on this. We're talking about a highly
doubtful chance that latches may wake when they shouldn't.

Yeah, as long as there's just a spurious wake up, sure. However,
having WaitLatch() indicate a postmaster death in that case seems
dangerous. Some caller, sooner or later, is bound to get it wrong,
i.e. forget to re-check PostmasterIsAlive.

Maybe we should restore the return value of WaitLatch to its previous
format (so it doesn't return a bitmask)? That way, we don't report
that the Postmaster died, and therefore clients are required to call
PostmasterIsAlive() to be sure.

That'd solve the issue too.

In any case, I'm in favour of the assertion.

I don't really see the value in that assertion. It'd cause spurious
assertion failures in the case of spurious events reported by select().
If we do expect such event, we should close the hole instead of asserting.
If we don't, then what's the point of the assert.

BTW, do we currently retry the select() on EINTR (meaning a signal has
arrived)? If we don't, that'd be an additional source of spurious returns
from select.

Btw, with the death-watch / life-sign / whatever infrastructure in place,
shouldn't PostmasterIsAlive() be using that instead of getppid() / kill(0)?

Hmm, maybe. That seems like a separate issue though, that can be
addressed with another patch. It does have the considerable
disadvantage of making Heikki's proposed assertion failure useless. Is
the implementation of PostmasterIsAlive() really a problem at the
moment?

I'm not sure that there is currently a guarantee that PostmasterIsAlive
will returns false immediately after select() indicates postmaster
death. If e.g. the postmaster's parent is still running (which happens
for example if you launch postgres via daemontools), the re-parenting of
backends to init might not happen until the postmaster zombie has been
vanquished by its parent's call of waitpid(). It's not entirely
inconceivable for getppid() to then return the (dead) postmaster's pid
until that waitpid() call has occurred.

But agreed, this is probably best handled by a separate patch.

best regards,
Florian Pflug

#31Peter Geoghegan
peter@2ndquadrant.com
In reply to: Florian Pflug (#30)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 4 July 2011 22:42, Florian Pflug <fgp@phlo.org> wrote:

If we do expect such event, we should close the hole instead of asserting.
If we don't, then what's the point of the assert.

You can say the same thing about any assertion. I'm not going to
attempt to close the hole because I don't believe that there is one. I
would be happy to see your "read() from the pipe after select()" test
asserted though.

BTW, do we currently retry the select() on EINTR (meaning a signal has
arrived)? If we don't, that'd be an additional source of spurious returns
from select.

Why might it be? WaitLatch() is currently documented to potentially
have its timeout invalidated by the process receiving a signal, which
is the exact opposite problem. We do account for this within the
archiver calling code though, and I remark upon it in a comment there.

I'm not sure that there is currently a guarantee that PostmasterIsAlive
will returns false immediately after select() indicates postmaster
death. If e.g. the postmaster's parent is still running (which happens
for example if you launch postgres via daemontools), the re-parenting of
backends to init might not happen until the postmaster zombie has been
vanquished by its parent's call of waitpid(). It's not entirely
inconceivable for getppid() to then return the (dead) postmaster's pid
until that waitpid() call has occurred.

Yes, this did occur to me - it's hard to reason about what exactly
happens here, and probably impossible to have the behaviour guaranteed
across platforms, however unlikely it seems. I'd like to hear what
Heikki has to say about asserting or otherwise verifying postmaster
death in the case of apparent postmaster death wake-up.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#32Fujii Masao
masao.fujii@gmail.com
In reply to: Florian Pflug (#28)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Tue, Jul 5, 2011 at 1:36 AM, Florian Pflug <fgp@phlo.org> wrote:

On Jul4, 2011, at 17:53 , Heikki Linnakangas wrote:

      Under Linux, select() may report a socket file descriptor as "ready for
      reading",  while nevertheless a subsequent read blocks.  This could for
      example happen when data has arrived but  upon  examination  has  wrong
      checksum and is discarded.  There may be other circumstances in which a
      file descriptor is spuriously reported as ready.  Thus it may be  safer
      to use O_NONBLOCK on sockets that should not block.

So in theory, on Linux you might WaitLatch might sometimes incorrectly return WL_POSTMASTER_DEATH. None of the callers check for WL_POSTMASTER_DEATH return code, they call PostmasterIsAlive() before assuming the postmaster has died, so that won't affect correctness at the moment. I doubt that scenario can even happen in our case, select() on a pipe that is never written to. But maybe we should add add an assertion to WaitLatch to assert that if select() reports that the postmaster pipe has been closed, PostmasterIsAlive() also returns false.

The correct solution would be to read() from the pipe after select()
returns, and only return WL_POSTMASTER_DEATCH if the read doesn't return
EAGAIN. To prevent that read() from blocking if the read event was indeed
spurious, O_NONBLOCK must be set on the pipe but that patch does that already.

+1

The syslogger read() from the pipe after select(), then it thinks EOF
has arrived and
there is no longer write-side process if the return value of read() is
just zero.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#33Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Florian Pflug (#30)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 05.07.2011 00:42, Florian Pflug wrote:

On Jul4, 2011, at 23:11 , Peter Geoghegan wrote:

On 4 July 2011 17:36, Florian Pflug<fgp@phlo.org> wrote:

Btw, with the death-watch / life-sign / whatever infrastructure in place,
shouldn't PostmasterIsAlive() be using that instead of getppid() / kill(0)?

Hmm, maybe. That seems like a separate issue though, that can be
addressed with another patch. It does have the considerable
disadvantage of making Heikki's proposed assertion failure useless. Is
the implementation of PostmasterIsAlive() really a problem at the
moment?

I'm not sure that there is currently a guarantee that PostmasterIsAlive
will returns false immediately after select() indicates postmaster
death. If e.g. the postmaster's parent is still running (which happens
for example if you launch postgres via daemontools), the re-parenting of
backends to init might not happen until the postmaster zombie has been
vanquished by its parent's call of waitpid(). It's not entirely
inconceivable for getppid() to then return the (dead) postmaster's pid
until that waitpid() call has occurred.

Good point, and testing shows that that is exactly what happens at least
on Linux (see attached test program). So, as the code stands, the
children will go into a busy loop until the grandparent calls waitpid().
That's not good.

In that light, I agree we should replace kill() in PostmasterIsAlive()
with read() on the pipe. It would react faster than the kill()-based
test, which seems like a good thing. Or perhaps do both, and return
false if either test says the postmaster is dead.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

forktest.ctext/x-csrc; name=forktest.cDownload
#34Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#33)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 5 July 2011 07:49, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Good point, and testing shows that that is exactly what happens at least on
Linux (see attached test program). So, as the code stands, the children will
go into a busy loop until the grandparent calls waitpid(). That's not good.

In that light, I agree we should replace kill() in PostmasterIsAlive() with
read() on the pipe. It would react faster than the kill()-based test, which
seems like a good thing. Or perhaps do both, and return false if either test
says the postmaster is dead.

Hmm. Why assume that the opposite problem doesn't exist? What if the
kill-based test is faster than the read() on the pipe on some platform
or under some confluence of events?

I suggest that we agree on a standard for determining whether or not
the postmaster is dead and stick to it - that's already the case on
Windows. Since that standard cannot be the kill() based test, because
that would make a postmaster death aware latch implementation
impossible, it has to be the read() test proposed by Florian.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#35Peter Geoghegan
peter@2ndquadrant.com
In reply to: Peter Geoghegan (#34)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

I now think that we shouldn't change the return value format from the
most recent revisions of the patch (i.e. returning a bitfield). We
should leave it as-is, while documenting that it's possible, although
extremely unlikely, for it to incorrectly report Postmaster death, and
that clients therefore have a onus to check that themselves using
PostmasterIsAlive(). We already provide fairly weak guarantees as to
the validity of that return value ("Note that if multiple wake-up
conditions are true, there is no guarantee that we return all of them
in one call, but we will return at least one"). Making them a bit
weaker still seems acceptable.

In addition, we'd change the implementation of PostmasterIsAlive() to
/just/ perform the read() test as already described.

I'm not concerned about the possibility of spurious extra cycles of
auxiliary process event loops - should I be?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#36Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#35)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Thu, Jul 7, 2011 at 1:41 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

I now think that we shouldn't change the return value format from the
most recent revisions of the patch (i.e. returning a bitfield). We
should leave it as-is, while documenting that it's possible, although
extremely unlikely, for it to incorrectly report Postmaster death, and
that clients therefore have a onus to check that themselves using
PostmasterIsAlive(). We already provide fairly weak guarantees as to
the validity of that return value ("Note that if multiple wake-up
conditions are true, there is no guarantee that we return all of them
in one call, but we will return at least one"). Making them a bit
weaker still seems acceptable.

I agree - that seems like a good way to handle it.

In addition, we'd change the implementation of PostmasterIsAlive() to
/just/ perform the read() test as already described.

I'm not concerned about the possibility of spurious extra cycles of
auxiliary process event loops - should I be?

A tight loop would be bad, but an occasional spurious wake-up seems harmless.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#37Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#36)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 7 July 2011 19:15, Robert Haas <robertmhaas@gmail.com> wrote:

I'm not concerned about the possibility of spurious extra cycles of
auxiliary process event loops - should I be?

A tight loop would be bad, but an occasional spurious wake-up seems harmless.

We should also assert !PostmasterIsAlive() from within the latch code
after waking due to apparent Postmaster death. The reason that I don't
want to follow Florian's suggestion to check it in production is that
I don't know what to do if the postmaster turns out to be alive. Why
is it more reasonable to try again than to just return? If the
spurious wake-up thing was a problem that we could actually reproduce,
then maybe I'd have an opinion on it. As it stands, our entire basis
for thinking this may be a problem is the sentence "There may be other
circumstances in which a file descriptor is spuriously reported as
ready". That seems rather flimsy.

Anyone that still has any misgivings about this will probably feel
better once the assertion is never reported to fail on any of the
diverse systems that PostgreSQL will be tested on in advance of the
9.2 release.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#38Florian Pflug
fgp@phlo.org
In reply to: Peter Geoghegan (#37)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Jul8, 2011, at 11:57 , Peter Geoghegan wrote:

On 7 July 2011 19:15, Robert Haas <robertmhaas@gmail.com> wrote:

I'm not concerned about the possibility of spurious extra cycles of
auxiliary process event loops - should I be?

A tight loop would be bad, but an occasional spurious wake-up seems harmless.

We should also assert !PostmasterIsAlive() from within the latch code
after waking due to apparent Postmaster death. The reason that I don't
want to follow Florian's suggestion to check it in production is that
I don't know what to do if the postmaster turns out to be alive. Why
is it more reasonable to try again than to just return?

I'd say return, but don't indicate postmaster death in the return value
if PostmasterIsAlive() returns true. Or don't call PostmasterIsAlive() in
WaitLatch(), and return indicating postmaster death whenever select()
says so, and put the burden of re-checking on the callers.

I agree that retrying isn't all that reasonable.

If the
spurious wake-up thing was a problem that we could actually reproduce,
then maybe I'd have an opinion on it. As it stands, our entire basis
for thinking this may be a problem is the sentence "There may be other
circumstances in which a file descriptor is spuriously reported as
ready". That seems rather flimsy.

Flimsy or not, it pretty clearly warns us not to depend on there being
no spurious wake ups. Whether or not we know how to actually produce
there is IMHO largely irrelevant - what matters is whether the guarantees
given by select() match the expectations of our code. Which, according to
the cited passage, they currently don't.

Anyone that still has any misgivings about this will probably feel
better once the assertion is never reported to fail on any of the
diverse systems that PostgreSQL will be tested on in advance of the
9.2 release.

I'm not so convinced that WaitLatch() will get exercised much on
assert-enabled builds. But I might very well be wrong there...

best regards,
Florian Pflug

#39Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Florian Pflug (#38)
1 attachment(s)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 08.07.2011 13:58, Florian Pflug wrote:

On Jul8, 2011, at 11:57 , Peter Geoghegan wrote:

On 7 July 2011 19:15, Robert Haas<robertmhaas@gmail.com> wrote:

I'm not concerned about the possibility of spurious extra cycles of
auxiliary process event loops - should I be?

A tight loop would be bad, but an occasional spurious wake-up seems harmless.

We should also assert !PostmasterIsAlive() from within the latch code
after waking due to apparent Postmaster death. The reason that I don't
want to follow Florian's suggestion to check it in production is that
I don't know what to do if the postmaster turns out to be alive. Why
is it more reasonable to try again than to just return?

I'd say return, but don't indicate postmaster death in the return value
if PostmasterIsAlive() returns true. Or don't call PostmasterIsAlive() in
WaitLatch(), and return indicating postmaster death whenever select()
says so, and put the burden of re-checking on the callers.

I put the burden on the callers. Removing the return value from
WaitLatch() altogether just makes life unnecessarily difficult for
callers that could safely use that information, even if you sometimes
get spurious wakeups. In particular, the coding in pgarch.c is nicer if
you can simply check the return code for WL_TIMEOUT, rather than call
time(NULL) to figure out if the timeout was reached.

Attached is a new version of this patch. PostmasterIsAlive() now uses
read() on the pipe instead of kill().

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

latch-v8.patchtext/x-diff; name=latch-v8.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 9938,9944 **** HandleStartupProcInterrupts(void)
  	 * Emergency bailout if postmaster has died.  This is to avoid the
  	 * necessity for manual cleanup of all postmaster children.
  	 */
! 	if (IsUnderPostmaster && !PostmasterIsAlive(true))
  		exit(1);
  }
  
--- 9938,9944 ----
  	 * Emergency bailout if postmaster has died.  This is to avoid the
  	 * necessity for manual cleanup of all postmaster children.
  	 */
! 	if (IsUnderPostmaster && !PostmasterIsAlive())
  		exit(1);
  }
  
***************
*** 10165,10171 **** retry:
  					/*
  					 * Wait for more WAL to arrive, or timeout to be reached
  					 */
! 					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
  					ResetLatch(&XLogCtl->recoveryWakeupLatch);
  				}
  				else
--- 10165,10171 ----
  					/*
  					 * Wait for more WAL to arrive, or timeout to be reached
  					 */
! 					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
  					ResetLatch(&XLogCtl->recoveryWakeupLatch);
  				}
  				else
*** a/src/backend/port/unix_latch.c
--- b/src/backend/port/unix_latch.c
***************
*** 93,98 ****
--- 93,99 ----
  #endif
  
  #include "miscadmin.h"
+ #include "postmaster/postmaster.h"
  #include "storage/latch.h"
  #include "storage/shmem.h"
  
***************
*** 176,209 **** DisownLatch(volatile Latch *latch)
  }
  
  /*
!  * Wait for given latch to be set or until timeout is exceeded.
!  * If the latch is already set, the function returns immediately.
   *
!  * The 'timeout' is given in microseconds, and -1 means wait forever.
!  * On some platforms, signals cause the timeout to be restarted, so beware
!  * that the function can sleep for several times longer than the specified
!  * timeout.
   *
   * The latch must be owned by the current process, ie. it must be a
   * backend-local latch initialized with InitLatch, or a shared latch
   * associated with the current process by calling OwnLatch.
   *
!  * Returns 'true' if the latch was set, or 'false' if timeout was reached.
   */
! bool
! WaitLatch(volatile Latch *latch, long timeout)
  {
! 	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
  }
  
  /*
!  * Like WaitLatch, but will also return when there's data available in
!  * 'sock' for reading or writing. Returns 0 if timeout was reached,
!  * 1 if the latch was set, 2 if the socket became readable or writable.
   */
  int
! WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
! 				  bool forWrite, long timeout)
  {
  	struct timeval tv,
  			   *tvp = NULL;
--- 177,220 ----
  }
  
  /*
!  * Wait for a given latch to be set, postmaster death, or until timeout is
!  * exceeded. 'wakeEvents' is a bitmask that specifies which of those events
!  * to wait for. If the latch is already set (and WL_LATCH_SET is given), the
!  * function returns immediately.
   *
!  * The 'timeout' is given in microseconds. It must be >= 0 if WL_TIMEOUT
!  * event is given, otherwise it is ignored. On some platforms, signals cause
!  * the timeout to be restarted, so beware that the function can sleep for
!  * several times longer than the specified timeout.
   *
   * The latch must be owned by the current process, ie. it must be a
   * backend-local latch initialized with InitLatch, or a shared latch
   * associated with the current process by calling OwnLatch.
   *
!  * Returns bit field indicating which condition(s) caused the wake-up. Note
!  * that if multiple wake-up conditions are true, there is no guarantee that
!  * we return all of them in one call, but we will return at least one. Also,
!  * according to the select(2) man page on Linux, select(2) may spuriously
!  * return and report a file descriptor as readable, when it's not. We use
!  * select(2), so WaitLatch can also spuriously claim that a socket is
!  * readable, or postmaster has died, even when none of the wake conditions
!  * have been satisfied. That should be rare in practice, but the caller
!  * should not use the return value for anything critical, re-checking the
!  * situation with PostmasterIsAlive() or read() on a socket if necessary.
   */
! int
! WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
  {
! 	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
  }
  
  /*
!  * Like WaitLatch, but with an extra socket argument for WL_SOCKET_*
!  * conditions.
   */
  int
! WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
! 				  long timeout)
  {
  	struct timeval tv,
  			   *tvp = NULL;
***************
*** 212,230 **** WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
  	int			rc;
  	int			result = 0;
  
! 	if (latch->owner_pid != MyProcPid)
  		elog(ERROR, "cannot wait on a latch owned by another process");
  
  	/* Initialize timeout */
! 	if (timeout >= 0)
  	{
  		tv.tv_sec = timeout / 1000000L;
  		tv.tv_usec = timeout % 1000000L;
  		tvp = &tv;
  	}
  
  	waiting = true;
! 	for (;;)
  	{
  		int			hifd;
  
--- 223,248 ----
  	int			rc;
  	int			result = 0;
  
! 	/* Ignore WL_SOCKET_* events if no valid socket is given */
! 	if (sock == PGINVALID_SOCKET)
! 		wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
! 
! 	Assert(wakeEvents != 0);	/* must have at least one wake event */
! 
! 	if ((wakeEvents & WL_LATCH_SET) && latch->owner_pid != MyProcPid)
  		elog(ERROR, "cannot wait on a latch owned by another process");
  
  	/* Initialize timeout */
! 	if (wakeEvents & WL_TIMEOUT)
  	{
+ 		Assert(timeout >= 0);
  		tv.tv_sec = timeout / 1000000L;
  		tv.tv_usec = timeout % 1000000L;
  		tvp = &tv;
  	}
  
  	waiting = true;
! 	do
  	{
  		int			hifd;
  
***************
*** 235,250 **** WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
  		 * do that), and the select() will return immediately.
  		 */
  		drainSelfPipe();
! 		if (latch->is_set)
  		{
! 			result = 1;
  			break;
  		}
  
  		FD_ZERO(&input_mask);
  		FD_SET(selfpipe_readfd, &input_mask);
  		hifd = selfpipe_readfd;
! 		if (sock != PGINVALID_SOCKET && forRead)
  		{
  			FD_SET(sock, &input_mask);
  			if (sock > hifd)
--- 253,280 ----
  		 * do that), and the select() will return immediately.
  		 */
  		drainSelfPipe();
! 		if ((wakeEvents & WL_LATCH_SET) && latch->is_set)
  		{
! 			result |= WL_LATCH_SET;
! 			/*
! 			 * Leave loop immediately, avoid blocking again. We don't attempt
! 			 * to report any other events that might also be satisfied.
! 			 */
  			break;
  		}
  
  		FD_ZERO(&input_mask);
  		FD_SET(selfpipe_readfd, &input_mask);
  		hifd = selfpipe_readfd;
! 
! 		if (wakeEvents & WL_POSTMASTER_DEATH)
! 		{
! 			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
! 			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
! 				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
! 		}
! 
! 		if (wakeEvents & WL_SOCKET_READABLE)
  		{
  			FD_SET(sock, &input_mask);
  			if (sock > hifd)
***************
*** 252,265 **** WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
  		}
  
  		FD_ZERO(&output_mask);
! 		if (sock != PGINVALID_SOCKET && forWrite)
  		{
  			FD_SET(sock, &output_mask);
  			if (sock > hifd)
  				hifd = sock;
  		}
  
  		rc = select(hifd + 1, &input_mask, &output_mask, NULL, tvp);
  		if (rc < 0)
  		{
  			if (errno == EINTR)
--- 282,298 ----
  		}
  
  		FD_ZERO(&output_mask);
! 		if (wakeEvents & WL_SOCKET_WRITEABLE)
  		{
  			FD_SET(sock, &output_mask);
  			if (sock > hifd)
  				hifd = sock;
  		}
  
+ 		/* Sleep */
  		rc = select(hifd + 1, &input_mask, &output_mask, NULL, tvp);
+ 
+ 		/* Check return code */
  		if (rc < 0)
  		{
  			if (errno == EINTR)
***************
*** 268,287 **** WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
  					(errcode_for_socket_access(),
  					 errmsg("select() failed: %m")));
  		}
! 		if (rc == 0)
  		{
  			/* timeout exceeded */
! 			result = 0;
! 			break;
  		}
! 		if (sock != PGINVALID_SOCKET &&
! 			((forRead && FD_ISSET(sock, &input_mask)) ||
! 			 (forWrite && FD_ISSET(sock, &output_mask))))
  		{
! 			result = 2;
! 			break;				/* data available in socket */
  		}
! 	}
  	waiting = false;
  
  	return result;
--- 301,326 ----
  					(errcode_for_socket_access(),
  					 errmsg("select() failed: %m")));
  		}
! 		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
  		{
  			/* timeout exceeded */
! 			result |= WL_TIMEOUT;
  		}
! 		if ((wakeEvents & WL_SOCKET_READABLE) && FD_ISSET(sock, &input_mask))
  		{
! 			/* data available in socket */
! 			result |= WL_SOCKET_READABLE;
  		}
! 		if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
! 		{
! 			result |= WL_SOCKET_WRITEABLE;
! 		}
! 		if ((wakeEvents & WL_POSTMASTER_DEATH) &&
! 			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask))
! 		{
! 			result |= WL_POSTMASTER_DEATH;
! 		}
! 	} while(result == 0);
  	waiting = false;
  
  	return result;
*** a/src/backend/port/win32_latch.c
--- b/src/backend/port/win32_latch.c
***************
*** 23,28 ****
--- 23,29 ----
  #include <unistd.h>
  
  #include "miscadmin.h"
+ #include "postmaster/postmaster.h"
  #include "replication/walsender.h"
  #include "storage/latch.h"
  #include "storage/shmem.h"
***************
*** 81,123 **** DisownLatch(volatile Latch *latch)
  	latch->owner_pid = 0;
  }
  
! bool
! WaitLatch(volatile Latch *latch, long timeout)
  {
! 	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
  }
  
  int
! WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
! 				  bool forWrite, long timeout)
  {
  	DWORD		rc;
! 	HANDLE		events[3];
  	HANDLE		latchevent;
! 	HANDLE		sockevent = WSA_INVALID_EVENT;	/* silence compiler */
  	int			numevents;
  	int			result = 0;
  
  	latchevent = latch->event;
  
  	events[0] = latchevent;
  	events[1] = pgwin32_signal_event;
  	numevents = 2;
! 	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
  	{
  		int			flags = 0;
  
! 		if (forRead)
  			flags |= FD_READ;
! 		if (forWrite)
  			flags |= FD_WRITE;
  
  		sockevent = WSACreateEvent();
  		WSAEventSelect(sock, sockevent, flags);
  		events[numevents++] = sockevent;
  	}
  
! 	for (;;)
  	{
  		/*
  		 * Reset the event, and check if the latch is set already. If someone
--- 82,148 ----
  	latch->owner_pid = 0;
  }
  
! int
! WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
  {
! 	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
  }
  
  int
! WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock,
! 				  long timeout)
  {
  	DWORD		rc;
! 	HANDLE		events[4];
  	HANDLE		latchevent;
! 	HANDLE		sockevent = WSA_INVALID_EVENT;
  	int			numevents;
  	int			result = 0;
+ 	int			pmdeath_eventno;
+ 	long		timeout_ms;
+ 
+ 	Assert(wakeEvents != 0);
+ 
+ 	/* Ignore WL_SOCKET_* events if no valid socket is given */
+ 	if (sock == PGINVALID_SOCKET)
+ 		wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
+ 
+ 	/* Convert timeout to milliseconds for WaitForMultipleObjects() */
+ 	if (wakeEvents & WL_TIMEOUT)
+ 	{
+ 		Assert(timeout >= 0);
+ 		timeout_ms = timeout / 1000;
+ 	}
+ 	else
+ 		timeout_ms = INFINITE;
  
+ 	/* Construct an array of event handles for WaitforMultipleObjects() */
  	latchevent = latch->event;
  
  	events[0] = latchevent;
  	events[1] = pgwin32_signal_event;
  	numevents = 2;
! 	if (((wakeEvents & WL_SOCKET_READABLE) ||
! 		 (wakeEvents & WL_SOCKET_WRITEABLE)))
  	{
  		int			flags = 0;
  
! 		if (wakeEvents & WL_SOCKET_READABLE)
  			flags |= FD_READ;
! 		if (wakeEvents & WL_SOCKET_WRITEABLE)
  			flags |= FD_WRITE;
  
  		sockevent = WSACreateEvent();
  		WSAEventSelect(sock, sockevent, flags);
  		events[numevents++] = sockevent;
  	}
+ 	if (wakeEvents & WL_POSTMASTER_DEATH)
+ 	{
+ 		pmdeath_eventno = numevents;
+ 		events[numevents++] = PostmasterHandle;
+ 	}
  
! 	do
  	{
  		/*
  		 * Reset the event, and check if the latch is set already. If someone
***************
*** 127,171 **** WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
  		 */
  		if (!ResetEvent(latchevent))
  			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
! 		if (latch->is_set)
  		{
! 			result = 1;
  			break;
  		}
  
! 		rc = WaitForMultipleObjects(numevents, events, FALSE,
! 							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
  		if (rc == WAIT_FAILED)
  			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
  		else if (rc == WAIT_TIMEOUT)
  		{
! 			result = 0;
! 			break;
  		}
! 		else if (rc == WAIT_OBJECT_0 + 1)
! 			pgwin32_dispatch_queued_signals();
! 		else if (rc == WAIT_OBJECT_0 + 2)
  		{
  			WSANETWORKEVENTS resEvents;
  
- 			Assert(sock != PGINVALID_SOCKET);
- 
  			ZeroMemory(&resEvents, sizeof(resEvents));
  			if (WSAEnumNetworkEvents(sock, sockevent, &resEvents) == SOCKET_ERROR)
  				ereport(FATAL,
  						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
  
! 			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
! 				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
! 				result = 2;
! 			break;
  		}
  		else if (rc != WAIT_OBJECT_0)
  			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
  	}
  
  	/* Clean up the handle we created for the socket */
! 	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
  	{
  		WSAEventSelect(sock, sockevent, 0);
  		WSACloseEvent(sockevent);
--- 152,215 ----
  		 */
  		if (!ResetEvent(latchevent))
  			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
! 		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
  		{
! 			result |= WL_LATCH_SET;
! 			/*
! 			 * Leave loop immediately, avoid blocking again. We don't attempt
! 			 * to report any other events that might also be satisfied.
! 			 */
  			break;
  		}
  
! 		rc = WaitForMultipleObjects(numevents, events, FALSE, timeout_ms);
! 
  		if (rc == WAIT_FAILED)
  			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
+ 
+ 		/* Participate in Windows signal emulation */
+ 		else if (rc == WAIT_OBJECT_0 + 1)
+ 			pgwin32_dispatch_queued_signals();
+ 
+ 		else if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+ 			rc == WAIT_OBJECT_0 + pmdeath_eventno)
+ 		{
+ 			/* Postmaster died */
+ 			result |= WL_POSTMASTER_DEATH;
+ 		}
  		else if (rc == WAIT_TIMEOUT)
  		{
! 			result |= WL_TIMEOUT;
  		}
! 		else if ((wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)) != 0 &&
! 				 rc == WAIT_OBJECT_0 + 2)	/* socket is at event slot 2 */
  		{
  			WSANETWORKEVENTS resEvents;
  
  			ZeroMemory(&resEvents, sizeof(resEvents));
  			if (WSAEnumNetworkEvents(sock, sockevent, &resEvents) == SOCKET_ERROR)
  				ereport(FATAL,
  						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
  
! 			if ((wakeEvents & WL_SOCKET_READABLE) &&
! 				(resEvents.lNetworkEvents & FD_READ))
! 			{
! 				result |= WL_SOCKET_READABLE;
! 			}
! 			if ((wakeEvents & WL_SOCKET_WRITEABLE) &&
! 				(resEvents.lNetworkEvents & FD_WRITE))
! 			{
! 				result |= WL_SOCKET_WRITEABLE;
! 			}
  		}
+ 		/* Otherwise it must be the latch event */
  		else if (rc != WAIT_OBJECT_0)
  			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
  	}
+ 	while(result == 0);
  
  	/* Clean up the handle we created for the socket */
! 	if (sockevent != WSA_INVALID_EVENT)
  	{
  		WSAEventSelect(sock, sockevent, 0);
  		WSACloseEvent(sockevent);
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 556,562 **** AutoVacLauncherMain(int argc, char *argv[])
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive(true))
  			proc_exit(1);
  
  		launcher_determine_sleep((AutoVacuumShmem->av_freeWorkers != NULL),
--- 556,562 ----
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
  			proc_exit(1);
  
  		launcher_determine_sleep((AutoVacuumShmem->av_freeWorkers != NULL),
***************
*** 593,599 **** AutoVacLauncherMain(int argc, char *argv[])
  			 * Emergency bailout if postmaster has died.  This is to avoid the
  			 * necessity for manual cleanup of all postmaster children.
  			 */
! 			if (!PostmasterIsAlive(true))
  				proc_exit(1);
  
  			if (got_SIGTERM || got_SIGHUP || got_SIGUSR2)
--- 593,599 ----
  			 * Emergency bailout if postmaster has died.  This is to avoid the
  			 * necessity for manual cleanup of all postmaster children.
  			 */
! 			if (!PostmasterIsAlive())
  				proc_exit(1);
  
  			if (got_SIGTERM || got_SIGHUP || got_SIGUSR2)
*** a/src/backend/postmaster/bgwriter.c
--- b/src/backend/postmaster/bgwriter.c
***************
*** 381,387 **** BackgroundWriterMain(void)
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive(true))
  			exit(1);
  
  		/*
--- 381,387 ----
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
  			exit(1);
  
  		/*
*** a/src/backend/postmaster/pgarch.c
--- b/src/backend/postmaster/pgarch.c
***************
*** 40,45 ****
--- 40,46 ----
  #include "postmaster/postmaster.h"
  #include "storage/fd.h"
  #include "storage/ipc.h"
+ #include "storage/latch.h"
  #include "storage/pg_shmem.h"
  #include "storage/pmsignal.h"
  #include "utils/guc.h"
***************
*** 87,92 **** static volatile sig_atomic_t got_SIGTERM = false;
--- 88,98 ----
  static volatile sig_atomic_t wakened = false;
  static volatile sig_atomic_t ready_to_stop = false;
  
+ /*
+  * Latch used by signal handlers to wake up the sleep in the main loop.
+  */
+ static Latch mainloop_latch;
+ 
  /* ----------
   * Local function forward declarations
   * ----------
***************
*** 228,233 **** PgArchiverMain(int argc, char *argv[])
--- 234,241 ----
  
  	MyProcPid = getpid();		/* reset MyProcPid */
  
+ 	InitLatch(&mainloop_latch); /* initialize latch used in main loop */
+ 
  	MyStartTime = time(NULL);	/* record Start Time for logging */
  
  	/*
***************
*** 282,287 **** ArchSigHupHandler(SIGNAL_ARGS)
--- 290,297 ----
  {
  	/* set flag to re-read config file at next convenient time */
  	got_SIGHUP = true;
+ 	/* let the waiting loop iterate */
+ 	SetLatch(&mainloop_latch);
  }
  
  /* SIGTERM signal handler for archiver process */
***************
*** 295,300 **** ArchSigTermHandler(SIGNAL_ARGS)
--- 305,312 ----
  	 * archive commands.
  	 */
  	got_SIGTERM = true;
+ 	/* let the waiting loop iterate */
+ 	SetLatch(&mainloop_latch);
  }
  
  /* SIGUSR1 signal handler for archiver process */
***************
*** 303,308 **** pgarch_waken(SIGNAL_ARGS)
--- 315,322 ----
  {
  	/* set flag that there is work to be done */
  	wakened = true;
+ 	/* let the waiting loop iterate */
+ 	SetLatch(&mainloop_latch);
  }
  
  /* SIGUSR2 signal handler for archiver process */
***************
*** 311,316 **** pgarch_waken_stop(SIGNAL_ARGS)
--- 325,332 ----
  {
  	/* set flag to do a final cycle and shut down afterwards */
  	ready_to_stop = true;
+ 	/* let the waiting loop iterate */
+ 	SetLatch(&mainloop_latch);
  }
  
  /*
***************
*** 321,327 **** pgarch_waken_stop(SIGNAL_ARGS)
  static void
  pgarch_MainLoop(void)
  {
! 	time_t		last_copy_time = 0;
  	bool		time_to_stop;
  
  	/*
--- 337,343 ----
  static void
  pgarch_MainLoop(void)
  {
! 	pg_time_t	last_copy_time = 0;
  	bool		time_to_stop;
  
  	/*
***************
*** 332,339 **** pgarch_MainLoop(void)
--- 348,362 ----
  	 */
  	wakened = true;
  
+ 	/*
+ 	 * There shouldn't be anything for the archiver to do except to wait
+ 	 * for a signal ... however, the archiver exists to protect our data,
+ 	 * so she wakes up occasionally to allow herself to be proactive.
+ 	 */
  	do
  	{
+ 		ResetLatch(&mainloop_latch);
+ 
  		/* When we get SIGUSR2, we do one more archive cycle, then exit */
  		time_to_stop = ready_to_stop;
  
***************
*** 371,394 **** pgarch_MainLoop(void)
  		}
  
  		/*
! 		 * There shouldn't be anything for the archiver to do except to wait
! 		 * for a signal ... however, the archiver exists to protect our data,
! 		 * so she wakes up occasionally to allow herself to be proactive.
! 		 *
! 		 * On some platforms, signals won't interrupt the sleep.  To ensure we
! 		 * respond reasonably promptly when someone signals us, break down the
! 		 * sleep into 1-second increments, and check for interrupts after each
! 		 * nap.
  		 */
! 		while (!(wakened || ready_to_stop || got_SIGHUP ||
! 				 !PostmasterIsAlive(true)))
  		{
! 			time_t		curtime;
  
! 			pg_usleep(1000000L);
! 			curtime = time(NULL);
! 			if ((unsigned int) (curtime - last_copy_time) >=
! 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
  				wakened = true;
  		}
  
--- 394,419 ----
  		}
  
  		/*
! 		 * Sleep until a signal is received, or until a poll is forced by
! 		 ' PGARCH_AUTOWAKE_INTERVAL having passed since last_copy_time, or
! 		 * until postmaster dies.
  		 */
! 		if (!time_to_stop) /* Don't wait during last iteration */
  		{
! 			pg_time_t curtime = (pg_time_t) time(NULL);
! 			int		timeout;
  
! 			timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
! 			if (timeout > 0)
! 			{
! 				int rc;
! 				rc = WaitLatch(&mainloop_latch,
! 							   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 							   timeout * 1000000L);
! 				if (rc & WL_TIMEOUT)
! 					wakened = true;
! 			}
! 			else
  				wakened = true;
  		}
  
***************
*** 397,403 **** pgarch_MainLoop(void)
  		 * or after completing one more archiving cycle after receiving
  		 * SIGUSR2.
  		 */
! 	} while (PostmasterIsAlive(true) && !time_to_stop);
  }
  
  /*
--- 422,428 ----
  		 * or after completing one more archiving cycle after receiving
  		 * SIGUSR2.
  		 */
! 	} while (PostmasterIsAlive() && !time_to_stop);
  }
  
  /*
***************
*** 429,435 **** pgarch_ArchiverCopyLoop(void)
  			 * command, and the second is to avoid conflicts with another
  			 * archiver spawned by a newer postmaster.
  			 */
! 			if (got_SIGTERM || !PostmasterIsAlive(true))
  				return;
  
  			/*
--- 454,460 ----
  			 * command, and the second is to avoid conflicts with another
  			 * archiver spawned by a newer postmaster.
  			 */
! 			if (got_SIGTERM || !PostmasterIsAlive())
  				return;
  
  			/*
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***************
*** 3111,3117 **** PgstatCollectorMain(int argc, char *argv[])
  			 * We can only get here if the select/poll timeout elapsed. Check
  			 * for postmaster death.
  			 */
! 			if (!PostmasterIsAlive(true))
  				break;
  		}
  	}							/* end of message-processing loop */
--- 3111,3117 ----
  			 * We can only get here if the select/poll timeout elapsed. Check
  			 * for postmaster death.
  			 */
! 			if (!PostmasterIsAlive())
  				break;
  		}
  	}							/* end of message-processing loop */
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 368,373 **** static int	CountChildren(int target);
--- 368,374 ----
  static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
  static pid_t StartChildProcess(AuxProcType type);
  static void StartAutovacuumWorker(void);
+ static void InitPostmasterDeathWatchHandle(void);
  
  #ifdef EXEC_BACKEND
  
***************
*** 383,390 **** typedef struct
  	HANDLE		procHandle;
  	DWORD		procId;
  } win32_deadchild_waitinfo;
- 
- HANDLE		PostmasterHandle;
  #endif
  
  static pid_t backend_forkexec(Port *port);
--- 384,389 ----
***************
*** 439,444 **** typedef struct
--- 438,444 ----
  	HANDLE		initial_signal_pipe;
  	HANDLE		syslogPipe[2];
  #else
+ 	int			postmaster_alive_fds[2];
  	int			syslogPipe[2];
  #endif
  	char		my_exec_path[MAXPGPATH];
***************
*** 469,474 **** static void ShmemBackendArrayRemove(Backend *bn);
--- 469,484 ----
  #define EXIT_STATUS_0(st)  ((st) == 0)
  #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
  
+ #ifndef WIN32
+ /*
+  * File descriptors for pipe used to monitor if postmaster is alive.
+  * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+  */
+ int postmaster_alive_fds[2] = { -1, -1 };
+ #else
+ /* Process handle of postmaster used for the same purpose on Windows */
+ HANDLE		PostmasterHandle;
+ #endif
  
  /*
   * Postmaster main entry point
***************
*** 962,969 **** PostmasterMain(int argc, char *argv[])
  	 */
  	BackendList = DLNewList();
  
! #ifdef WIN32
  
  	/*
  	 * Initialize I/O completion port used to deliver list of dead children.
  	 */
--- 972,984 ----
  	 */
  	BackendList = DLNewList();
  
! 	/*
! 	 * Initialize pipe (or process handle on Windows) that allows children to
! 	 * wake up from sleep on postmaster death.
! 	 */
! 	InitPostmasterDeathWatchHandle();
  
+ #ifdef WIN32
  	/*
  	 * Initialize I/O completion port used to deliver list of dead children.
  	 */
***************
*** 971,991 **** PostmasterMain(int argc, char *argv[])
  	if (win32ChildQueue == NULL)
  		ereport(FATAL,
  		   (errmsg("could not create I/O completion port for child queue")));
- 
- 	/*
- 	 * Set up a handle that child processes can use to check whether the
- 	 * postmaster is still running.
- 	 */
- 	if (DuplicateHandle(GetCurrentProcess(),
- 						GetCurrentProcess(),
- 						GetCurrentProcess(),
- 						&PostmasterHandle,
- 						0,
- 						TRUE,
- 						DUPLICATE_SAME_ACCESS) == 0)
- 		ereport(FATAL,
- 				(errmsg_internal("could not duplicate postmaster handle: error code %d",
- 								 (int) GetLastError())));
  #endif
  
  	/*
--- 986,991 ----
***************
*** 1965,1970 **** ClosePostmasterPorts(bool am_syslogger)
--- 1965,1983 ----
  {
  	int			i;
  
+ #ifndef WIN32
+ 	/*
+ 	 * Close the write end of postmaster death watch pipe. It's important to
+ 	 * do this as early as possible, so that if postmaster dies, others won't
+ 	 * think that it's still running because we're holding the pipe open.
+ 	 */
+ 	if (close(postmaster_alive_fds[POSTMASTER_FD_OWN]))
+ 		ereport(FATAL,
+ 			(errcode_for_file_access(),
+ 			 errmsg_internal("could not close postmaster death monitoring pipe in child process: %m")));
+ 	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+ #endif
+ 
  	/* Close the listen sockets */
  	for (i = 0; i < MAXLISTEN; i++)
  	{
***************
*** 4643,4648 **** save_backend_variables(BackendParameters *param, Port *port,
--- 4656,4664 ----
  								 pgwin32_create_signal_listener(childPid),
  								 childProcess))
  		return false;
+ #else
+ 	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds,
+ 		   sizeof(postmaster_alive_fds));
  #endif
  
  	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
***************
*** 4858,4863 **** restore_backend_variables(BackendParameters *param, Port *port)
--- 4874,4882 ----
  #ifdef WIN32
  	PostmasterHandle = param->PostmasterHandle;
  	pgwin32_initial_signal_pipe = param->initial_signal_pipe;
+ #else
+ 	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds,
+ 		   sizeof(postmaster_alive_fds));
  #endif
  
  	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
***************
*** 4979,4981 **** pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
--- 4998,5051 ----
  }
  
  #endif   /* WIN32 */
+ 
+ /*
+  * Initialize one and only handle for monitoring postmaster death.
+  *
+  * Called once in the postmaster, so that child processes can subsequently
+  * monitor if their parent is dead.
+  */
+ static void
+ InitPostmasterDeathWatchHandle(void)
+ {
+ #ifndef WIN32
+ 	/*
+ 	 * Create a pipe. Postmaster holds the write end of the pipe open
+ 	 * (POSTMASTER_FD_OWN), and children hold the read end. Children can
+ 	 * pass the read file descriptor to select() to wake up in case postmaster
+ 	 * dies, or check for postmaster death with a (read() == 0). Children must
+ 	 * close the write end as soon as possible after forking, because EOF
+ 	 * won't be signaled in the read end until all processes have closed the
+ 	 * write fd. That is taken care of in ClosePostmasterPorts().
+ 	 */
+ 	Assert(MyProcPid == PostmasterPid);
+ 	if (pipe(postmaster_alive_fds))
+ 		ereport(FATAL,
+ 				(errcode_for_file_access(),
+ 				 errmsg_internal("could not create pipe to monitor postmaster death: %m")));
+ 
+ 	/*
+ 	 * Set O_NONBLOCK to allow testing for the fd's presence with a read()
+ 	 * call.
+ 	 */
+ 	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, O_NONBLOCK))
+ 		ereport(FATAL,
+ 				(errcode_for_socket_access(),
+ 				 errmsg_internal("could not set postmaster death monitoring pipe to non-blocking mode: %m")));
+ 
+ #else
+ 	/*
+ 	 * On Windows, we use a process handle for the same purpose.
+ 	 */
+ 	if (DuplicateHandle(GetCurrentProcess(),
+ 						GetCurrentProcess(),
+ 						GetCurrentProcess(),
+ 						&PostmasterHandle,
+ 						0,
+ 						TRUE,
+ 						DUPLICATE_SAME_ACCESS) == 0)
+ 		ereport(FATAL,
+ 				(errmsg_internal("could not duplicate postmaster handle: error code %d",
+ 								 (int) GetLastError())));
+ #endif   /* WIN32 */
+ }
*** a/src/backend/postmaster/walwriter.c
--- b/src/backend/postmaster/walwriter.c
***************
*** 227,233 **** WalWriterMain(void)
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive(true))
  			exit(1);
  
  		/*
--- 227,233 ----
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
  			exit(1);
  
  		/*
*** a/src/backend/replication/syncrep.c
--- b/src/backend/replication/syncrep.c
***************
*** 171,177 **** SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
  		 * postmaster death regularly while waiting. Note that timeout here
  		 * does not necessarily release from loop.
  		 */
! 		WaitLatch(&MyProc->waitLatch, 60000000L);
  
  		/* Must reset the latch before testing state. */
  		ResetLatch(&MyProc->waitLatch);
--- 171,177 ----
  		 * postmaster death regularly while waiting. Note that timeout here
  		 * does not necessarily release from loop.
  		 */
! 		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
  
  		/* Must reset the latch before testing state. */
  		ResetLatch(&MyProc->waitLatch);
***************
*** 239,245 **** SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
  		 * acknowledgement, because all the wal sender processes will exit. So
  		 * just bail out.
  		 */
! 		if (!PostmasterIsAlive(true))
  		{
  			ProcDiePending = true;
  			whereToSendOutput = DestNone;
--- 239,245 ----
  		 * acknowledgement, because all the wal sender processes will exit. So
  		 * just bail out.
  		 */
! 		if (!PostmasterIsAlive())
  		{
  			ProcDiePending = true;
  			whereToSendOutput = DestNone;
*** a/src/backend/replication/walreceiver.c
--- b/src/backend/replication/walreceiver.c
***************
*** 287,293 **** WalReceiverMain(void)
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive(true))
  			exit(1);
  
  		/*
--- 287,293 ----
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
  			exit(1);
  
  		/*
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***************
*** 212,218 **** WalSndHandshake(void)
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive(true))
  			exit(1);
  
  		/*
--- 212,218 ----
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
  			exit(1);
  
  		/*
***************
*** 713,719 **** WalSndLoop(void)
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive(true))
  			exit(1);
  
  		/* Process any requests or signals received recently */
--- 713,719 ----
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
  			exit(1);
  
  		/* Process any requests or signals received recently */
***************
*** 779,784 **** WalSndLoop(void)
--- 779,785 ----
  		{
  			TimestampTz finish_time = 0;
  			long		sleeptime;
+ 			int			wakeEvents;
  
  			/* Reschedule replication timeout */
  			if (replication_timeout > 0)
***************
*** 805,813 **** WalSndLoop(void)
  			}
  
  			/* Sleep */
! 			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
! 							  true, pq_is_send_pending(),
! 							  sleeptime * 1000L);
  
  			/* Check for replication timeout */
  			if (replication_timeout > 0 &&
--- 806,816 ----
  			}
  
  			/* Sleep */
! 			wakeEvents  = WL_LATCH_SET | WL_SOCKET_READABLE | WL_TIMEOUT;
! 			if (pq_is_send_pending())
! 				wakeEvents |= WL_SOCKET_WRITEABLE;
! 			WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
! 							  MyProcPort->sock, sleeptime * 1000L);
  
  			/* Check for replication timeout */
  			if (replication_timeout > 0 &&
*** a/src/backend/storage/ipc/pmsignal.c
--- b/src/backend/storage/ipc/pmsignal.c
***************
*** 267,308 **** MarkPostmasterChildInactive(void)
  
  /*
   * PostmasterIsAlive - check whether postmaster process is still alive
-  *
-  * amDirectChild should be passed as "true" by code that knows it is
-  * executing in a direct child process of the postmaster; pass "false"
-  * if an indirect child or not sure.  The "true" case uses a faster and
-  * more reliable test, so use it when possible.
   */
  bool
! PostmasterIsAlive(bool amDirectChild)
  {
  #ifndef WIN32
! 	if (amDirectChild)
! 	{
! 		pid_t		ppid = getppid();
  
! 		/* If the postmaster is still our parent, it must be alive. */
! 		if (ppid == PostmasterPid)
  			return true;
! 
! 		/* If the init process is our parent, postmaster must be dead. */
! 		if (ppid == 1)
! 			return false;
! 
! 		/*
! 		 * If we get here, our parent process is neither the postmaster nor
! 		 * init.  This can occur on BSD and MacOS systems if a debugger has
! 		 * been attached.  We fall through to the less-reliable kill() method.
! 		 */
  	}
  
- 	/*
- 	 * Use kill() to see if the postmaster is still alive.	This can sometimes
- 	 * give a false positive result, since the postmaster's PID may get
- 	 * recycled, but it is good enough for existing uses by indirect children
- 	 * and in debugging environments.
- 	 */
- 	return (kill(PostmasterPid, 0) == 0);
  #else							/* WIN32 */
  	return (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT);
  #endif   /* WIN32 */
--- 267,293 ----
  
  /*
   * PostmasterIsAlive - check whether postmaster process is still alive
   */
  bool
! PostmasterIsAlive(void)
  {
  #ifndef WIN32
! 	char c;
! 	ssize_t rc;
  
! 	rc = read(postmaster_alive_fds[POSTMASTER_FD_WATCH], &c, 1);
! 	if (rc < 0)
! 	{
! 		if (errno == EAGAIN || errno == EWOULDBLOCK)
  			return true;
! 		else
! 			elog(FATAL, "read on postmaster death monitoring pipe failed: %m");
  	}
+ 	else if (rc > 0)
+ 		elog(FATAL, "unexpected data in postmaster death monitoring pipe");
+ 
+ 	return false;
  
  #else							/* WIN32 */
  	return (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT);
  #endif   /* WIN32 */
*** a/src/include/postmaster/postmaster.h
--- b/src/include/postmaster/postmaster.h
***************
*** 32,37 **** extern bool restart_after_crash;
--- 32,45 ----
  
  #ifdef WIN32
  extern HANDLE PostmasterHandle;
+ #else
+ extern int postmaster_alive_fds[2];
+ /*
+  * Constants that represent which of postmaster_alive_fds is held by
+  * postmaster, and which is used in children to check for postmaster death.
+  */
+ #define POSTMASTER_FD_WATCH		0	/* used in children to check for postmaster death */
+ #define POSTMASTER_FD_OWN		1	/* kept open by postmaster only */
  #endif
  
  extern const char *progname;
*** a/src/include/storage/latch.h
--- b/src/include/storage/latch.h
***************
*** 31,36 **** typedef struct
--- 31,43 ----
  #endif
  } Latch;
  
+ /* Bitmasks for events that may wake-up WaitLatch() clients */
+ #define WL_LATCH_SET         (1 << 0)
+ #define WL_SOCKET_READABLE   (1 << 1)
+ #define WL_SOCKET_WRITEABLE  (1 << 2)
+ #define WL_TIMEOUT           (1 << 3)
+ #define WL_POSTMASTER_DEATH  (1 << 4)
+ 
  /*
   * prototypes for functions in latch.c
   */
***************
*** 38,46 **** extern void InitLatch(volatile Latch *latch);
  extern void InitSharedLatch(volatile Latch *latch);
  extern void OwnLatch(volatile Latch *latch);
  extern void DisownLatch(volatile Latch *latch);
! extern bool WaitLatch(volatile Latch *latch, long timeout);
! extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
! 				  bool forRead, bool forWrite, long timeout);
  extern void SetLatch(volatile Latch *latch);
  extern void ResetLatch(volatile Latch *latch);
  
--- 45,53 ----
  extern void InitSharedLatch(volatile Latch *latch);
  extern void OwnLatch(volatile Latch *latch);
  extern void DisownLatch(volatile Latch *latch);
! extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
! extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents,
! 				  pgsocket sock, long timeout);
  extern void SetLatch(volatile Latch *latch);
  extern void ResetLatch(volatile Latch *latch);
  
*** a/src/include/storage/pmsignal.h
--- b/src/include/storage/pmsignal.h
***************
*** 50,55 **** extern bool IsPostmasterChildWalSender(int slot);
  extern void MarkPostmasterChildActive(void);
  extern void MarkPostmasterChildInactive(void);
  extern void MarkPostmasterChildWalSender(void);
! extern bool PostmasterIsAlive(bool amDirectChild);
  
  #endif   /* PMSIGNAL_H */
--- 50,55 ----
  extern void MarkPostmasterChildActive(void);
  extern void MarkPostmasterChildInactive(void);
  extern void MarkPostmasterChildWalSender(void);
! extern bool PostmasterIsAlive(void);
  
  #endif   /* PMSIGNAL_H */
#40Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#39)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 8 July 2011 13:40, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I put the burden on the callers. Removing the return value from WaitLatch()
altogether just makes life unnecessarily difficult for callers that could
safely use that information, even if you sometimes get spurious wakeups. In
particular, the coding in pgarch.c is nicer if you can simply check the
return code for WL_TIMEOUT, rather than call time(NULL) to figure out if the
timeout was reached.

+1

Attached is a new version of this patch. PostmasterIsAlive() now uses read()
on the pipe instead of kill().

The consensus so far is that in practice spurious wake-ups in
auxiliary process event loops won't a problem. You may want to wait
for others to weigh in here.

This comment in pgarch.c is slightly malformed - note the quote:

/*
* Sleep until a signal is received, or until a poll is forced by
' PGARCH_AUTOWAKE_INTERVAL having passed since last_copy_time, or
* until postmaster dies.
*/

Other than that, I suggest you commit v8 as-is.

Incidentally, I like that this removes the amDirectChild argument to
PostmasterIsAlive() - an added benefit.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#41Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#40)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 08.07.2011 16:11, Peter Geoghegan wrote:

Incidentally, I like that this removes the amDirectChild argument to
PostmasterIsAlive() - an added benefit.

amDirectChild==false has actually been dead code for years. But the new
pipe method would work for a non-direct child too as long as the pipe fd
is inherited by the non-direct child, should we need that in the future
again.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#42Florian Pflug
fgp@phlo.org
In reply to: Heikki Linnakangas (#39)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Jul8, 2011, at 14:40 , Heikki Linnakangas wrote:

On 08.07.2011 13:58, Florian Pflug wrote:

On Jul8, 2011, at 11:57 , Peter Geoghegan wrote:

On 7 July 2011 19:15, Robert Haas<robertmhaas@gmail.com> wrote:

I'm not concerned about the possibility of spurious extra cycles of
auxiliary process event loops - should I be?

A tight loop would be bad, but an occasional spurious wake-up seems harmless.

We should also assert !PostmasterIsAlive() from within the latch code
after waking due to apparent Postmaster death. The reason that I don't
want to follow Florian's suggestion to check it in production is that
I don't know what to do if the postmaster turns out to be alive. Why
is it more reasonable to try again than to just return?

I'd say return, but don't indicate postmaster death in the return value
if PostmasterIsAlive() returns true. Or don't call PostmasterIsAlive() in
WaitLatch(), and return indicating postmaster death whenever select()
says so, and put the burden of re-checking on the callers.

I put the burden on the callers. Removing the return value from WaitLatch() altogether just makes life unnecessarily difficult for callers that could safely use that information, even if you sometimes get spurious wakeups. In particular, the coding in pgarch.c is nicer if you can simply check the return code for WL_TIMEOUT, rather than call time(NULL) to figure out if the timeout was reached.

Attached is a new version of this patch. PostmasterIsAlive() now uses read() on the pipe instead of kill().

I did notice a few (very minor) loose ends...

SyncRepWaitForLSN() says
/*
* Wait on latch for up to 60 seconds. This allows us to check for
* postmaster death regularly while waiting. Note that timeout here
* does not necessarily release from loop.
*/
WaitLatch(&MyProc->waitLatch, 60000000L);

I guess that 60-second timeout is unnecessary now that we'll wake up
on postmaster death anyway.

Also, none of the callers of WaitLatch() seems to actually inspect
the WL_POSTMASTER_DEATH bit of the result. We might want to make
their !PostmasterIsAlive() check conditional on the WL_POSTMASTER_DEATH
bit being set. At least in the case of SyncRepWaitForLSN(), it seems
that avoiding the extra read() syscall might be beneficial.

Maybe these cleanups would better be done in a separate patch, though...

best regards,
Florian Pflug

#43Peter Geoghegan
peter@2ndquadrant.com
In reply to: Florian Pflug (#42)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 8 July 2011 15:58, Florian Pflug <fgp@phlo.org> wrote:

SyncRepWaitForLSN() says
 /*
  * Wait on latch for up to 60 seconds. This allows us to check for
  * postmaster death regularly while waiting. Note that timeout here
  * does not necessarily release from loop.
  */
 WaitLatch(&MyProc->waitLatch, 60000000L);

I guess that 60-second timeout is unnecessary now that we'll wake up
on postmaster death anyway.

We won't wake up on Postmaster death here, because we haven't asked to
- not yet, anyway. We're just using the new interface here for that
one function call in v8.

Also, none of the callers of WaitLatch() seems to actually inspect
the WL_POSTMASTER_DEATH bit of the result. We might want to make
their !PostmasterIsAlive() check conditional on the WL_POSTMASTER_DEATH
bit being set. At least in the case of SyncRepWaitForLSN(), it seems
that avoiding the extra read() syscall might be beneficial.

I don't think so. Postmaster death is an anomaly, so why bother with
any sort of optimisation for that case? Also, that's exactly the sort
of thing that we're trying to caution callers against doing with this
comment:

"That should be rare in practice, but the caller should not use the
return value for anything critical, re-checking the situation with
PostmasterIsAlive() or read() on a socket if necessary."

You might say that the only reason we even bother reporting postmaster
death in the returned bitfield is because there is an expectation that
it will report it, given that we use the same masks on wakeEvents to
inform the function what events we'll actually be waiting on for the
call.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#44Florian Pflug
fgp@phlo.org
In reply to: Peter Geoghegan (#43)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On Jul8, 2011, at 17:56 , Peter Geoghegan wrote:

On 8 July 2011 15:58, Florian Pflug <fgp@phlo.org> wrote:

SyncRepWaitForLSN() says
/*
* Wait on latch for up to 60 seconds. This allows us to check for
* postmaster death regularly while waiting. Note that timeout here
* does not necessarily release from loop.
*/
WaitLatch(&MyProc->waitLatch, 60000000L);

I guess that 60-second timeout is unnecessary now that we'll wake up
on postmaster death anyway.

We won't wake up on Postmaster death here, because we haven't asked to
- not yet, anyway. We're just using the new interface here for that
one function call in v8.

Oh, Right. I still think it'd might be worthwhile to convert it, but
but not in this patch.

Also, none of the callers of WaitLatch() seems to actually inspect
the WL_POSTMASTER_DEATH bit of the result. We might want to make
their !PostmasterIsAlive() check conditional on the WL_POSTMASTER_DEATH
bit being set. At least in the case of SyncRepWaitForLSN(), it seems
that avoiding the extra read() syscall might be beneficial.

I don't think so. Postmaster death is an anomaly, so why bother with
any sort of optimisation for that case? Also, that's exactly the sort
of thing that we're trying to caution callers against doing with this
comment:

"That should be rare in practice, but the caller should not use the
return value for anything critical, re-checking the situation with
PostmasterIsAlive() or read() on a socket if necessary."

Uh, I phrased that badly. What I meant was doing

if ((result & WL_POSTMASTER_DEATH) && (!PostmasterIsAlive())

instead of

if (!PostmasterIsAlive)

It seems that currently SyncRepWaitForLSN() will execute
PostmasterIsAlive() after every wake up. But actually it only needs
to do that if WaitLatch() sets WL_POSTMASTER_DEATH. Usually we wouldn't
care, but in the case of SyncRepWaitForLSN() I figures that we might.
It's in the code path of COMMIT (in the case of synchronous replication)
after all...

We'd not optimize the case of a dead postmaster, but the case of
an live one. Which I do hope is the common case ;-)

You might say that the only reason we even bother reporting postmaster
death in the returned bitfield is because there is an expectation that
it will report it, given that we use the same masks on wakeEvents to
inform the function what events we'll actually be waiting on for the
call.

I kinda guessed that to be the reason after reading the latest patch ;-)

best regards,
Florian Pflug

#45Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#43)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 08.07.2011 18:56, Peter Geoghegan wrote:

On 8 July 2011 15:58, Florian Pflug<fgp@phlo.org> wrote:

Also, none of the callers of WaitLatch() seems to actually inspect
the WL_POSTMASTER_DEATH bit of the result. We might want to make
their !PostmasterIsAlive() check conditional on the WL_POSTMASTER_DEATH
bit being set. At least in the case of SyncRepWaitForLSN(), it seems
that avoiding the extra read() syscall might be beneficial.

I don't think so. Postmaster death is an anomaly, so why bother with
any sort of optimisation for that case?

We currently call PostmasterIsAlive() on every iteration of the loop,
and what Florian is saying is that it would be more efficient to only
call PostmasterIsAlive() if WaitLatch() reports that the postmaster has
died. That's an optimization that would help the case where postmaster
has not died. However, I don't think it makes any difference in practice
because the loop usually only iterates once, and there's break
statements to exit the loop as soon as one of the other exit conditions
are satisfied.

I just committed the v8 of the patch, BTW, after fixing the comment typo
you pointed out. Thanks!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#46Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#45)
Re: Latch implementation that wakes on postmaster death on both win32 and Unix

On 8 July 2011 17:10, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I just committed the v8 of the patch, BTW, after fixing the comment typo you
pointed out. Thanks!

Great, thanks.

Also, thank you Florian and Fujii.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services