PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

Started by Thomas Munroover 5 years ago26 messages

thomas.munro@gmail.com

over 5 years ago

1 attachment(s)

Hello,

In commits 9f095299 and f98b8476 we improved recovery performance on
Linux and FreeBSD but we didn't help other operating systems. David
just confirmed for me that commenting out the PostmasterIsAlive() call
in the main recovery loop speeds up crash recovery considerably on his
Windows system: 93s -> 70s or 1.32x faster. So I think we should do
something like what Heikki originally proposed to lower the frequency
of checks, on systems where we don't have the ability to skip the
check completely. Please see attached.

Attachments:

0001-Poll-postmaster-less-frequently-in-recovery.patchtext/x-patch; charset=US-ASCII; name=0001-Poll-postmaster-less-frequently-in-recovery.patchDownload

From a602b79a354bcfea75494c62c3dd9dee7dd8fa9e Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 17 Sep 2020 21:11:01 +1200
Subject: [PATCH] Poll postmaster less frequently in recovery.

Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe/event at all during recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are still
required, at the cost of delaying shutdown a bit after postmaster death,
to avoid expensive system calls known to slow down CPU-bound recovery by
10-30%.

Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
---
 src/backend/postmaster/startup.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index fd9ac35dac..2ef5da6908 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -134,6 +134,10 @@ StartupRereadConfig(void)
 void
 HandleStartupProcInterrupts(void)
 {
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+	static int	count = 0;
+#endif
+
 	/*
 	 * Process any requests or signals received recently.
 	 */
@@ -151,9 +155,15 @@ HandleStartupProcInterrupts(void)
 
 	/*
 	 * Emergency bailout if postmaster has died.  This is to avoid the
-	 * necessity for manual cleanup of all postmaster children.
+	 * necessity for manual cleanup of all postmaster children.  Do this less
+	 * frequently on systems for which we don't have signals to make that
+	 * cheap.
 	 */
-	if (IsUnderPostmaster && !PostmasterIsAlive())
+	if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+		count++ % 1024 == 0 &&
+#endif
+		!PostmasterIsAlive())
 		exit(1);
 
 	/* Process barrier events */
-- 
2.20.1

Heikki Linnakangas

hlinnaka@iki.fi

over 5 years ago

In reply to: Thomas Munro (#1)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On 17/09/2020 12:48, Thomas Munro wrote:

Hello,

In commits 9f095299 and f98b8476 we improved recovery performance on
Linux and FreeBSD but we didn't help other operating systems. David
just confirmed for me that commenting out the PostmasterIsAlive() call
in the main recovery loop speeds up crash recovery considerably on his
Windows system: 93s -> 70s or 1.32x faster.

Nice speedup!

So I think we should do
something like what Heikki originally proposed to lower the frequency
of checks, on systems where we don't have the ability to skip the
check completely. Please see attached.

If you put the counter in HandleStartupProcInterrupts(), it could be a
long wait if the startup process is e.g. waiting for WAL to arrive in
the loop in WaitForWALToBecomeAvailable(), or in recoveryPausesHere().
My original patch only reduced the frequency in the WAL redo loop, when
you're actively replaying records.

We could probably do better on Windows. Maybe the signal handler thread
could wait on the PostmasterHandle at the same time that it waits on the
signal pipe, and set postmaster_possibly_dead. But I'm not going to work
on that, and it would only help on Windows, so I'm OK with just adding
the counter.

- Heikki

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Heikki Linnakangas (#2)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Thu, Sep 17, 2020 at 10:19 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 17/09/2020 12:48, Thomas Munro wrote:

So I think we should do
something like what Heikki originally proposed to lower the frequency
of checks, on systems where we don't have the ability to skip the
check completely. Please see attached.

If you put the counter in HandleStartupProcInterrupts(), it could be a
long wait if the startup process is e.g. waiting for WAL to arrive in
the loop in WaitForWALToBecomeAvailable(), or in recoveryPausesHere().
My original patch only reduced the frequency in the WAL redo loop, when
you're actively replaying records.

Oh, I checked that WaitForWALToBecomeAvailable() already handled
postmaster death via events rather than polling, with
WL_EXIT_ON_PM_DEATH, but I hadn't clocked that recoveryPausesHere()
uses pg_usleep() and polling. Hmm. Perhaps we should change that
instead? The reason I did it that way is that I didn't want to make
the new ProcSignalBarrierPending handler less reactive.

We could probably do better on Windows. Maybe the signal handler thread
could wait on the PostmasterHandle at the same time that it waits on the
signal pipe, and set postmaster_possibly_dead. But I'm not going to work
on that, and it would only help on Windows, so I'm OK with just adding
the counter.

Yeah, I had the same thought.

Heikki Linnakangas

hlinnaka@iki.fi

over 5 years ago

In reply to: Thomas Munro (#3)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On 17/09/2020 13:31, Thomas Munro wrote:

On Thu, Sep 17, 2020 at 10:19 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

If you put the counter in HandleStartupProcInterrupts(), it could be a
long wait if the startup process is e.g. waiting for WAL to arrive in
the loop in WaitForWALToBecomeAvailable(), or in recoveryPausesHere().
My original patch only reduced the frequency in the WAL redo loop, when
you're actively replaying records.

Oh, I checked that WaitForWALToBecomeAvailable() already handled
postmaster death via events rather than polling, with
WL_EXIT_ON_PM_DEATH, but I hadn't clocked that recoveryPausesHere()
uses pg_usleep() and polling. Hmm. Perhaps we should change that
instead? The reason I did it that way is that I didn't want to make
the new ProcSignalBarrierPending handler less reactive.

Hmm, so for speedy response to postmaster death, you're relying on the
loops to have other postmaster-death checks besides
HandleStartupProcInterrupts(), in the form of WL_EXIT_ON_PM_DEATH. That
seems a bit fragile, at the very least it needs a comment in
HandleStartupProcInterrupts() to call it out.

Note that there's one more loop in ShutdownWalRcv() that uses pg_usleep().

- Heikki

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Heikki Linnakangas (#4)

2 attachment(s)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Thu, Sep 17, 2020 at 10:47 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Hmm, so for speedy response to postmaster death, you're relying on the
loops to have other postmaster-death checks besides
HandleStartupProcInterrupts(), in the form of WL_EXIT_ON_PM_DEATH. That
seems a bit fragile, at the very least it needs a comment in
HandleStartupProcInterrupts() to call it out.

Surely that's the direction we want to go in, though, no? Commit
cfdf4dc4 was intended to standardise the way we react to postmaster
death where waiting is involved. I updated the comment in
HandleStartupProcInterrupts() to highlight that the
PostmasterIsAlive() check in there is only for the benefit of
CPU-bound loops.

Note that there's one more loop in ShutdownWalRcv() that uses pg_usleep().

Updating that one required me to invent a new wait_event for
pg_stat_activity, which seems like progress.

Unfortunately, while I was doing that I realised that WaitLatch()
without WL_SET_LATCH was broken by commit 3347c982bab (in master
only), in a way not previously reached by the tests. So 0001 is a
patch to fix that.

Attachments:

v2-0001-Allow-WaitLatch-to-be-used-without-a-latch.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Allow-WaitLatch-to-be-used-without-a-latch.patchDownload

From 2afa37048fe659bb1f1b3c8aec89824b2f00a892 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 18 Sep 2020 00:04:56 +1200
Subject: [PATCH v2 1/3] Allow WaitLatch() to be used without a latch.

Due to flaws in commit 3347c982bab, using WaitLatch() without
WL_SET_LATCH could cause an assertion failure or crash.  Repair.

While here, also add a check that the latch we're switching to belongs
to this backend.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 4153cc8557..63c6c97536 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -924,7 +924,22 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
 
 	if (events == WL_LATCH_SET)
 	{
+		if (latch && latch->owner_pid != MyProcPid)
+			elog(ERROR, "cannot wait on a latch owned by another process");
 		set->latch = latch;
+		/*
+		 * On Unix, we don't need to modify the kernel object because the
+		 * underlying pipe is the same for all latches so we can return
+		 * immediately.  On Windows, we need to update our array of handles,
+		 * but we leave the old one in place and tolerate spurious wakeups if
+		 * the latch is disabled.
+		 */
+#if defined(WAIT_USE_WIN32)
+		if (!latch)
+			return;
+#else
+		return;
+#endif
 	}
 
 #if defined(WAIT_USE_EPOLL)
@@ -1386,7 +1401,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* There's data in the self-pipe, clear it. */
 			drainSelfPipe();
 
-			if (set->latch->is_set)
+			if (set->latch && set->latch->is_set)
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_LATCH_SET;
@@ -1536,7 +1551,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* There's data in the self-pipe, clear it. */
 			drainSelfPipe();
 
-			if (set->latch->is_set)
+			if (set->latch && set->latch->is_set)
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_LATCH_SET;
@@ -1645,7 +1660,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* There's data in the self-pipe, clear it. */
 			drainSelfPipe();
 
-			if (set->latch->is_set)
+			if (set->latch && set->latch->is_set)
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_LATCH_SET;
@@ -1812,7 +1827,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 		if (!ResetEvent(set->latch->event))
 			elog(ERROR, "ResetEvent failed: error code %lu", GetLastError());
 
-		if (set->latch->is_set)
+		if (set->latch && set->latch->is_set)
 		{
 			occurred_events->fd = PGINVALID_SOCKET;
 			occurred_events->events = WL_LATCH_SET;
-- 
2.20.1

v2-0002-Poll-postmaster-less-frequently-in-recovery.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Poll-postmaster-less-frequently-in-recovery.patchDownload

From 5d62d3091f0980299f1471f0a8006c0dd0bea648 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 17 Sep 2020 21:11:01 +1200
Subject: [PATCH v2 2/3] Poll postmaster less frequently in recovery.

Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death, to avoid
expensive system calls reported to slow down CPU-bound recovery by
10-30%.

Replace a couple of pg_usleep()-based wait loops with WaitLatch() loops,
to make sure they respond promptly to postmaster death now that
HandleStartupProcInterrupts() doesn't check every time through on some
systems.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
---
 doc/src/sgml/monitoring.sgml               |  4 ++++
 src/backend/access/transam/xlog.c          |  5 ++---
 src/backend/postmaster/pgstat.c            |  3 +++
 src/backend/postmaster/startup.c           | 16 ++++++++++++++--
 src/backend/replication/walreceiverfuncs.c |  6 ++++--
 src/include/pgstat.h                       |  1 +
 6 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 673a0e73e4..b0a8fc1582 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1734,6 +1734,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for confirmation from a remote server during synchronous
        replication.</entry>
      </row>
+     <row>
+      <entry><literal>WalrcvExit</literal></entry>
+      <entry>Waiting for the walreceiver to exit.</entry>
+     </row>
      <row>
       <entry><literal>XactGroupUpdate</literal></entry>
       <entry>Waiting for the group leader to update transaction status at
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a38371a64f..578006fa32 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6027,9 +6027,8 @@ recoveryPausesHere(bool endOfRecovery)
 		HandleStartupProcInterrupts();
 		if (CheckForStandbyTrigger())
 			return;
-		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
-		pg_usleep(1000000L);	/* 1000 ms */
-		pgstat_report_wait_end();
+		WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 1000,
+				  WAIT_EVENT_RECOVERY_PAUSE);
 	}
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..395d61f082 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3875,6 +3875,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_WALRCV_EXIT:
+			event_name = "WalrcvExit";
+			break;
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index fd9ac35dac..330b361a3e 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -134,6 +134,10 @@ StartupRereadConfig(void)
 void
 HandleStartupProcInterrupts(void)
 {
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+	static int	count = 0;
+#endif
+
 	/*
 	 * Process any requests or signals received recently.
 	 */
@@ -151,9 +155,17 @@ HandleStartupProcInterrupts(void)
 
 	/*
 	 * Emergency bailout if postmaster has died.  This is to avoid the
-	 * necessity for manual cleanup of all postmaster children.
+	 * necessity for manual cleanup of all postmaster children.  Do this less
+	 * frequently on systems for which we don't have signals to make that
+	 * cheap.  Any loop that sleeps should be using WaitLatch or similar and
+	 * handling postmaster death that way; the check here is intended only to
+	 * deal with CPU-bound loops such as the main redo loop.
 	 */
-	if (IsUnderPostmaster && !PostmasterIsAlive())
+	if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+		count++ % 1024 == 0 &&
+#endif
+		!PostmasterIsAlive())
 		exit(1);
 
 	/* Process barrier events */
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index e675757301..d0263ba16e 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -23,8 +23,10 @@
 #include <signal.h>
 
 #include "access/xlog_internal.h"
+#include "pgstat.h"
 #include "postmaster/startup.h"
 #include "replication/walreceiver.h"
+#include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/timestamp.h"
@@ -207,8 +209,8 @@ ShutdownWalRcv(void)
 		 * process.
 		 */
 		HandleStartupProcInterrupts();
-
-		pg_usleep(100000);		/* 100ms */
+		WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 100,
+				  WAIT_EVENT_WALRCV_EXIT);
 	}
 }
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..f934acaed9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -887,6 +887,7 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_WALRCV_EXIT,
 	WAIT_EVENT_XACT_GROUP_UPDATE
 } WaitEventIPC;
 
-- 
2.20.1

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Thomas Munro (#5)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On 2020/09/18 9:30, Thomas Munro wrote:

On Thu, Sep 17, 2020 at 10:47 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Hmm, so for speedy response to postmaster death, you're relying on the
loops to have other postmaster-death checks besides
HandleStartupProcInterrupts(), in the form of WL_EXIT_ON_PM_DEATH. That
seems a bit fragile, at the very least it needs a comment in
HandleStartupProcInterrupts() to call it out.

Surely that's the direction we want to go in, though, no? Commit
cfdf4dc4 was intended to standardise the way we react to postmaster
death where waiting is involved. I updated the comment in
HandleStartupProcInterrupts() to highlight that the
PostmasterIsAlive() check in there is only for the benefit of
CPU-bound loops.

Note that there's one more loop in ShutdownWalRcv() that uses pg_usleep().

Updating that one required me to invent a new wait_event for
pg_stat_activity, which seems like progress.

Unfortunately, while I was doing that I realised that WaitLatch()
without WL_SET_LATCH was broken by commit 3347c982bab (in master
only), in a way not previously reached by the tests. So 0001 is a
patch to fix that.

-		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
-		pg_usleep(1000000L);	/* 1000 ms */
-		pgstat_report_wait_end();
+		WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 1000,
+				  WAIT_EVENT_RECOVERY_PAUSE);

This change may cause at most one second delay against the standby
promotion request during WAL replay pause? It's only one second,
but I'd like to avoid this (unnecessary) wait to shorten the failover time
as much as possible basically. So what about using WL_SET_LATCH here?

But when using WL_SET_LATCH, one concern is that walreceiver can
wake up the startup process too frequently even during WAL replay pause.
This is also problematic? I'm ok with this, but if not, using pg_usleep()
might be better as the original code does.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Fujii Masao (#6)

2 attachment(s)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Sat, Sep 19, 2020 at 6:07 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

-               pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
-               pg_usleep(1000000L);    /* 1000 ms */
-               pgstat_report_wait_end();
+               WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 1000,
+                                 WAIT_EVENT_RECOVERY_PAUSE);
This change may cause at most one second delay against the standby
promotion request during WAL replay pause? It's only one second,
but I'd like to avoid this (unnecessary) wait to shorten the failover time
as much as possible basically. So what about using WL_SET_LATCH here?

Right, there is a comment saying that we could do that:

* XXX Could also be done with shared latch, avoiding the pg_usleep loop.
* Probably not worth the trouble though. This state shouldn't be one that
* anyone cares about server power consumption in.

But when using WL_SET_LATCH, one concern is that walreceiver can
wake up the startup process too frequently even during WAL replay pause.
This is also problematic? I'm ok with this, but if not, using pg_usleep()
might be better as the original code does.

You're right, at least if we used recoveryWakeupLatch. Although we'd
react to pg_wal_replay_resume() faster, which would be nice, we
wouldn't be saving energy, we'd be using more energy due to all the
other latch wakeups that we'd be ignoring. I believe the correct
solution to this problem is to add a ConditionVariable
"recoveryPauseChanged" into XLogCtlData, and then broadcast on it in
SetRecoveryPause(). This would be a trivial change, except for one
small problem: ConditionVariableTimedSleep() contains
CHECK_FOR_INTERRUPTS(), but startup.c has its own special interrupt
handling rather than using ProcessInterrupts() from postgres.c. Maybe
that's OK, I'm not sure, but it requires more thought, and I propose
to keep the existing sloppy polling for now and leave precise wakeup
improvements for a separate patch. The primary goal of this patch is
to switch to the standard treatment of postmaster death in wait loops,
so that we're free to reduce the sampling frequency in
HandleStartupProcInterrupts(), to fix a horrible performance problem.
I have at least tweaked that comment about pg_usleep(), though,
because that was out of date; I also used (void) WaitLatch(...) to
make it look like other places where we ignore the return value
(perhaps some static analyser out there somewhere cares?)

By the way, a CV could probably be used for walreceiver state changes
too, to improve ShutdownWalRcv().

Although I know from CI that this builds and passes "make check" on
Windows, I'm hoping to attract some review of the 0001 patch from a
Windows person, and confirmation that it passes "check-world" (or at
least src/test/recovery check) there, because I don't have CI scripts
for that yet.

Attachments:

v3-0002-Poll-postmaster-less-frequently-in-recovery.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Poll-postmaster-less-frequently-in-recovery.patchDownload

From 1c72f965e24676d7ade05701617e2381b7ff3065 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 17 Sep 2020 21:11:01 +1200
Subject: [PATCH v3 2/3] Poll postmaster less frequently in recovery.

Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death, to avoid
expensive system calls reported to slow down CPU-bound recovery by
10-30%.

Replace a couple of pg_usleep()-based wait loops with WaitLatch() loops,
to make sure they respond promptly to postmaster death now that
HandleStartupProcInterrupts() doesn't check every time through on some
systems.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
---
 doc/src/sgml/monitoring.sgml               |  4 ++++
 src/backend/access/transam/xlog.c          |  7 +++----
 src/backend/postmaster/pgstat.c            |  3 +++
 src/backend/postmaster/startup.c           | 16 ++++++++++++++--
 src/backend/replication/walreceiverfuncs.c |  6 ++++--
 src/include/pgstat.h                       |  1 +
 6 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 673a0e73e4..b0a8fc1582 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1734,6 +1734,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for confirmation from a remote server during synchronous
        replication.</entry>
      </row>
+     <row>
+      <entry><literal>WalrcvExit</literal></entry>
+      <entry>Waiting for the walreceiver to exit.</entry>
+     </row>
      <row>
       <entry><literal>XactGroupUpdate</literal></entry>
       <entry>Waiting for the group leader to update transaction status at
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61754312e2..f9d9b38a8a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5999,7 +5999,7 @@ recoveryStopsAfter(XLogReaderState *record)
  * the paused state starts at the end of recovery because of
  * recovery_target_action=pause, and false otherwise.
  *
- * XXX Could also be done with shared latch, avoiding the pg_usleep loop.
+ * XXX Could also be done with shared latch, avoiding the WL_TIMEOUT loop.
  * Probably not worth the trouble though.  This state shouldn't be one that
  * anyone cares about server power consumption in.
  */
@@ -6028,9 +6028,8 @@ recoveryPausesHere(bool endOfRecovery)
 		HandleStartupProcInterrupts();
 		if (CheckForStandbyTrigger())
 			return;
-		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
-		pg_usleep(1000000L);	/* 1000 ms */
-		pgstat_report_wait_end();
+		(void) WaitLatch(NULL, WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 1000,
+						 WAIT_EVENT_RECOVERY_PAUSE);
 	}
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..395d61f082 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3875,6 +3875,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_WALRCV_EXIT:
+			event_name = "WalrcvExit";
+			break;
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 64af7b8707..a802041ca7 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -134,6 +134,10 @@ StartupRereadConfig(void)
 void
 HandleStartupProcInterrupts(void)
 {
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+	static int	count = 0;
+#endif
+
 	/*
 	 * Process any requests or signals received recently.
 	 */
@@ -151,9 +155,17 @@ HandleStartupProcInterrupts(void)
 
 	/*
 	 * Emergency bailout if postmaster has died.  This is to avoid the
-	 * necessity for manual cleanup of all postmaster children.
+	 * necessity for manual cleanup of all postmaster children.  Do this less
+	 * frequently on systems for which we don't have signals to make that
+	 * cheap.  Any loop that sleeps should be using WaitLatch or similar and
+	 * handling postmaster death that way; the check here is intended only to
+	 * deal with CPU-bound loops such as the main redo loop.
 	 */
-	if (IsUnderPostmaster && !PostmasterIsAlive())
+	if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+		count++ % 1024 == 0 &&
+#endif
+		!PostmasterIsAlive())
 		exit(1);
 
 	/* Process barrier events */
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index e675757301..059353f342 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -23,8 +23,10 @@
 #include <signal.h>
 
 #include "access/xlog_internal.h"
+#include "pgstat.h"
 #include "postmaster/startup.h"
 #include "replication/walreceiver.h"
+#include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/timestamp.h"
@@ -207,8 +209,8 @@ ShutdownWalRcv(void)
 		 * process.
 		 */
 		HandleStartupProcInterrupts();
-
-		pg_usleep(100000);		/* 100ms */
+		(void) WaitLatch(NULL, WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 100,
+						 WAIT_EVENT_WALRCV_EXIT);
 	}
 }
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..f934acaed9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -887,6 +887,7 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_WALRCV_EXIT,
 	WAIT_EVENT_XACT_GROUP_UPDATE
 } WaitEventIPC;
 
-- 
2.20.1

v3-0001-Allow-WaitLatch-to-be-used-without-a-latch.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Allow-WaitLatch-to-be-used-without-a-latch.patchDownload

From a415ad0f7733ac85f315943dd0a1de9b24d909c5 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 18 Sep 2020 00:04:56 +1200
Subject: [PATCH v3 1/3] Allow WaitLatch() to be used without a latch.

Due to flaws in commit 3347c982bab, using WaitLatch() without
WL_LATCH_SET could cause an assertion failure or crash.  Repair.

While here, also add a check that the latch we're switching to belongs
to this backend, when changing from one latch to another.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 4153cc8557..63c6c97536 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -924,7 +924,22 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
 
 	if (events == WL_LATCH_SET)
 	{
+		if (latch && latch->owner_pid != MyProcPid)
+			elog(ERROR, "cannot wait on a latch owned by another process");
 		set->latch = latch;
+		/*
+		 * On Unix, we don't need to modify the kernel object because the
+		 * underlying pipe is the same for all latches so we can return
+		 * immediately.  On Windows, we need to update our array of handles,
+		 * but we leave the old one in place and tolerate spurious wakeups if
+		 * the latch is disabled.
+		 */
+#if defined(WAIT_USE_WIN32)
+		if (!latch)
+			return;
+#else
+		return;
+#endif
 	}
 
 #if defined(WAIT_USE_EPOLL)
@@ -1386,7 +1401,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* There's data in the self-pipe, clear it. */
 			drainSelfPipe();
 
-			if (set->latch->is_set)
+			if (set->latch && set->latch->is_set)
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_LATCH_SET;
@@ -1536,7 +1551,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* There's data in the self-pipe, clear it. */
 			drainSelfPipe();
 
-			if (set->latch->is_set)
+			if (set->latch && set->latch->is_set)
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_LATCH_SET;
@@ -1645,7 +1660,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			/* There's data in the self-pipe, clear it. */
 			drainSelfPipe();
 
-			if (set->latch->is_set)
+			if (set->latch && set->latch->is_set)
 			{
 				occurred_events->fd = PGINVALID_SOCKET;
 				occurred_events->events = WL_LATCH_SET;
@@ -1812,7 +1827,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 		if (!ResetEvent(set->latch->event))
 			elog(ERROR, "ResetEvent failed: error code %lu", GetLastError());
 
-		if (set->latch->is_set)
+		if (set->latch && set->latch->is_set)
 		{
 			occurred_events->fd = PGINVALID_SOCKET;
 			occurred_events->events = WL_LATCH_SET;
-- 
2.20.1

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Thomas Munro (#7)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Sun, 20 Sep 2020 at 09:29, Thomas Munro <thomas.munro@gmail.com> wrote:

Although I know from CI that this builds and passes "make check" on
Windows, I'm hoping to attract some review of the 0001 patch from a
Windows person, and confirmation that it passes "check-world" (or at
least src/test/recovery check) there, because I don't have CI scripts
for that yet.

I've gone as far as running the recovery tests on the v3-0001 patch
using a Windows machine. They pass:

L:\Projects\Postgres\d\src\tools\msvc>vcregress taptest src/test/recovery
...
t/001_stream_rep.pl .................. ok
t/002_archiving.pl ................... ok
t/003_recovery_targets.pl ............ ok
t/004_timeline_switch.pl ............. ok
t/005_replay_delay.pl ................ ok
t/006_logical_decoding.pl ............ ok
t/007_sync_rep.pl .................... ok
t/008_fsm_truncation.pl .............. ok
t/009_twophase.pl .................... ok
t/010_logical_decoding_timelines.pl .. ok
t/011_crash_recovery.pl .............. skipped: Test fails on Windows perl
t/012_subtransactions.pl ............. ok
t/013_crash_restart.pl ............... ok
t/014_unlogged_reinit.pl ............. ok
t/015_promotion_pages.pl ............. ok
t/016_min_consistency.pl ............. ok
t/017_shm.pl ......................... skipped: SysV shared memory not
supported by this platform
t/018_wal_optimize.pl ................ ok
t/019_replslot_limit.pl .............. ok
t/020_archive_status.pl .............. ok
All tests successful.
Files=20, Tests=222, 397 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
Result: PASS

L:\Projects\Postgres\d\src\tools\msvc>git diff --stat
src/backend/storage/ipc/latch.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)

David

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: David Rowley (#8)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Wed, Sep 23, 2020 at 2:27 PM David Rowley <dgrowleyml@gmail.com> wrote:

I've gone as far as running the recovery tests on the v3-0001 patch
using a Windows machine. They pass:

Thanks! I pushed that one, because it was effectively a bug fix
(WaitLatch() without a latch was supposed to work).

I'll wait longer for feedback on the main patch; perhaps someone has a
better idea, or wants to take issue with the magic number 1024 (ie
limit on how many records we'll replay before we notice the postmaster
has exited), or my plan to harmonise those wait loops? It has a CF
entry for the next CF.

#10

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Thomas Munro (#9)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On 2020/09/23 12:47, Thomas Munro wrote:

On Wed, Sep 23, 2020 at 2:27 PM David Rowley <dgrowleyml@gmail.com> wrote:

I've gone as far as running the recovery tests on the v3-0001 patch
using a Windows machine. They pass:

Thanks! I pushed that one, because it was effectively a bug fix
(WaitLatch() without a latch was supposed to work).

Great!

I'll wait longer for feedback on the main patch; perhaps someone has a
better idea, or wants to take issue with the magic number 1024 (ie
limit on how many records we'll replay before we notice the postmaster
has exited), or my plan to harmonise those wait loops? It has a CF
entry for the next CF.

Does this patch work fine with warm-standby case using pg_standby?
IIUC the startup process doesn't call WaitLatch() in that case, so ISTM that,
with the patch, it cannot detect the postmaster death immediately.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#11

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Fujii Masao (#10)

1 attachment(s)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Thu, Sep 24, 2020 at 2:39 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Does this patch work fine with warm-standby case using pg_standby?
IIUC the startup process doesn't call WaitLatch() in that case, so ISTM that,
with the patch, it cannot detect the postmaster death immediately.

Right, RestoreArchivedFile() uses system(), so I guess it can hang
around for a long time after unexpected postmaster exit on every OS if
the command waits. To respond to various kinds of important
interrupts, I suppose that'd ideally use something like
OpenPipeStream() and a typical WaitLatch() loop with CFI(). I'm not
sure what our policy is or should be for exiting while we have running
subprocesses. I guess that is a separate issue.

Here's a rebase, no code change.

Attachments:

v4-0001-Poll-postmaster-less-frequently-in-recovery.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Poll-postmaster-less-frequently-in-recovery.patchDownload

From 2cd588ecc72729930a06b20da65537dfcb8b2f52 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 24 Sep 2020 17:37:54 +1200
Subject: [PATCH v4] Poll postmaster less frequently in recovery.

Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death, to avoid
expensive system calls reported to slow down CPU-bound recovery by
as much as 10-30%.

Replace a couple of pg_usleep()-based wait loops with WaitLatch() loops,
to make sure they respond promptly to postmaster death now that
HandleStartupProcInterrupts() doesn't check every time through on some
systems.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
---
 doc/src/sgml/monitoring.sgml               |  4 ++++
 src/backend/access/transam/xlog.c          |  7 +++----
 src/backend/postmaster/pgstat.c            |  3 +++
 src/backend/postmaster/startup.c           | 16 ++++++++++++++--
 src/backend/replication/walreceiverfuncs.c |  6 ++++--
 src/include/pgstat.h                       |  1 +
 6 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4e0193a967..65210ee064 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1734,6 +1734,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for confirmation from a remote server during synchronous
        replication.</entry>
      </row>
+     <row>
+      <entry><literal>WalrcvExit</literal></entry>
+      <entry>Waiting for the walreceiver to exit.</entry>
+     </row>
      <row>
       <entry><literal>XactGroupUpdate</literal></entry>
       <entry>Waiting for the group leader to update transaction status at
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61754312e2..f9d9b38a8a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5999,7 +5999,7 @@ recoveryStopsAfter(XLogReaderState *record)
  * the paused state starts at the end of recovery because of
  * recovery_target_action=pause, and false otherwise.
  *
- * XXX Could also be done with shared latch, avoiding the pg_usleep loop.
+ * XXX Could also be done with shared latch, avoiding the WL_TIMEOUT loop.
  * Probably not worth the trouble though.  This state shouldn't be one that
  * anyone cares about server power consumption in.
  */
@@ -6028,9 +6028,8 @@ recoveryPausesHere(bool endOfRecovery)
 		HandleStartupProcInterrupts();
 		if (CheckForStandbyTrigger())
 			return;
-		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
-		pg_usleep(1000000L);	/* 1000 ms */
-		pgstat_report_wait_end();
+		(void) WaitLatch(NULL, WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 1000,
+						 WAIT_EVENT_RECOVERY_PAUSE);
 	}
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..395d61f082 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3875,6 +3875,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_WALRCV_EXIT:
+			event_name = "WalrcvExit";
+			break;
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 64af7b8707..a802041ca7 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -134,6 +134,10 @@ StartupRereadConfig(void)
 void
 HandleStartupProcInterrupts(void)
 {
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+	static int	count = 0;
+#endif
+
 	/*
 	 * Process any requests or signals received recently.
 	 */
@@ -151,9 +155,17 @@ HandleStartupProcInterrupts(void)
 
 	/*
 	 * Emergency bailout if postmaster has died.  This is to avoid the
-	 * necessity for manual cleanup of all postmaster children.
+	 * necessity for manual cleanup of all postmaster children.  Do this less
+	 * frequently on systems for which we don't have signals to make that
+	 * cheap.  Any loop that sleeps should be using WaitLatch or similar and
+	 * handling postmaster death that way; the check here is intended only to
+	 * deal with CPU-bound loops such as the main redo loop.
 	 */
-	if (IsUnderPostmaster && !PostmasterIsAlive())
+	if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+		count++ % 1024 == 0 &&
+#endif
+		!PostmasterIsAlive())
 		exit(1);
 
 	/* Process barrier events */
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index e675757301..059353f342 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -23,8 +23,10 @@
 #include <signal.h>
 
 #include "access/xlog_internal.h"
+#include "pgstat.h"
 #include "postmaster/startup.h"
 #include "replication/walreceiver.h"
+#include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/timestamp.h"
@@ -207,8 +209,8 @@ ShutdownWalRcv(void)
 		 * process.
 		 */
 		HandleStartupProcInterrupts();
-
-		pg_usleep(100000);		/* 100ms */
+		(void) WaitLatch(NULL, WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 100,
+						 WAIT_EVENT_WALRCV_EXIT);
 	}
 }
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..f934acaed9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -887,6 +887,7 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_WALRCV_EXIT,
 	WAIT_EVENT_XACT_GROUP_UPDATE
 } WaitEventIPC;
 
-- 
2.20.1

#12

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Thomas Munro (#11)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Thu, Sep 24, 2020 at 05:55:17PM +1200, Thomas Munro wrote:

Right, RestoreArchivedFile() uses system(), so I guess it can hang
around for a long time after unexpected postmaster exit on every OS if
the command waits. To respond to various kinds of important
interrupts, I suppose that'd ideally use something like
OpenPipeStream() and a typical WaitLatch() loop with CFI(). I'm not
sure what our policy is or should be for exiting while we have running
subprocesses. I guess that is a separate issue.

-   if (IsUnderPostmaster && !PostmasterIsAlive())
+   if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+       count++ % 1024 == 0 &&
+#endif
+       !PostmasterIsAlive())
That's pretty hack-ish, still efficient.  Could we consider a
different approach like something relying on
PostmasterIsAliveInternal() with repetitive call handling?  This may
not be the only place where we care about that, particularly for
non-core code.

No objections with the two changes from pg_usleep() to WaitLatch() so
they could be applied separately first.
--
Michael

#13

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Michael Paquier (#12)

2 attachment(s)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Sep 24, 2020 at 05:55:17PM +1200, Thomas Munro wrote:

Right, RestoreArchivedFile() uses system(), so I guess it can hang
around for a long time after unexpected postmaster exit on every OS if
the command waits. To respond to various kinds of important
interrupts, I suppose that'd ideally use something like
OpenPipeStream() and a typical WaitLatch() loop with CFI(). I'm not
sure what our policy is or should be for exiting while we have running
subprocesses. I guess that is a separate issue.
-   if (IsUnderPostmaster && !PostmasterIsAlive())
+   if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+       count++ % 1024 == 0 &&
+#endif
+       !PostmasterIsAlive())
That's pretty hack-ish, still efficient.  Could we consider a
different approach like something relying on
PostmasterIsAliveInternal() with repetitive call handling?  This may
not be the only place where we care about that, particularly for
non-core code.

As far as I know there aren't any other places that do polling of
PostmasterIsAlive() in a loop like this. All others have been removed
from core code: either they already had a WaitLatch() or similar so it
we just had to add WL_EXIT_ON_PM_DEATH, or they do pure CPU-bound and
don't actively try to detect postmaster death. That's why it seems
utterly insane that here we try to detect it X million times per
second.

No objections with the two changes from pg_usleep() to WaitLatch() so
they could be applied separately first.

I thought about committing that first part, and got as far as
splitting the patch into two (see attached), but then I re-read
Fujii-san's message about the speed of promotion and realised that we
really should have something like a condition variable for walRcvState
changes. I'll look into that.

Attachments:

v5-0001-Replace-some-sleep-poll-loops-with-WaitLatch.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Replace-some-sleep-poll-loops-with-WaitLatch.patchDownload

From f90c4b58989168a1fe5a2dd48181444a18351bfb Mon Sep 17 00:00:00 2001
From: Thomas Munro <tmunro@postgresql.org>
Date: Mon, 1 Mar 2021 23:08:55 +1300
Subject: [PATCH v5 1/2] Replace some sleep/poll loops with WaitLatch().

Although the replaced sleeps are fairly short, a proposed patch to
reduce the frequency of reads of the postmaster pipe could cause the
loops to take a very long time to react to postmaster exit.  Instead,
use the standard waiting infrastructure (with no latch), which can exit
automatically based on readiness.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
---
 doc/src/sgml/monitoring.sgml               | 4 ++++
 src/backend/access/transam/xlog.c          | 7 +++----
 src/backend/postmaster/pgstat.c            | 3 +++
 src/backend/replication/walreceiverfuncs.c | 6 ++++--
 src/include/pgstat.h                       | 1 +
 5 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3513e127b7..cf00210cb3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1757,6 +1757,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for confirmation from a remote server during synchronous
        replication.</entry>
      </row>
+     <row>
+      <entry><literal>WalrcvExit</literal></entry>
+      <entry>Waiting for the walreceiver to exit.</entry>
+     </row>
      <row>
       <entry><literal>XactGroupUpdate</literal></entry>
       <entry>Waiting for the group leader to update transaction status at
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 377afb8732..e63e19eed3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6017,7 +6017,7 @@ recoveryStopsAfter(XLogReaderState *record)
  * the paused state starts at the end of recovery because of
  * recovery_target_action=pause, and false otherwise.
  *
- * XXX Could also be done with shared latch, avoiding the pg_usleep loop.
+ * XXX Could also be done with shared latch, avoiding the WL_TIMEOUT loop.
  * Probably not worth the trouble though.  This state shouldn't be one that
  * anyone cares about server power consumption in.
  */
@@ -6046,9 +6046,8 @@ recoveryPausesHere(bool endOfRecovery)
 		HandleStartupProcInterrupts();
 		if (CheckForStandbyTrigger())
 			return;
-		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
-		pg_usleep(1000000L);	/* 1000 ms */
-		pgstat_report_wait_end();
+		(void) WaitLatch(NULL, WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 1000,
+						 WAIT_EVENT_RECOVERY_PAUSE);
 	}
 }
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..2dbfb81e40 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4122,6 +4122,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_WALRCV_EXIT:
+			event_name = "WalrcvExit";
+			break;
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 63e60478ea..f51dd2fa20 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -23,8 +23,10 @@
 #include <signal.h>
 
 #include "access/xlog_internal.h"
+#include "pgstat.h"
 #include "postmaster/startup.h"
 #include "replication/walreceiver.h"
+#include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/timestamp.h"
@@ -208,8 +210,8 @@ ShutdownWalRcv(void)
 		 * process.
 		 */
 		HandleStartupProcInterrupts();
-
-		pg_usleep(100000);		/* 100ms */
+		(void) WaitLatch(NULL, WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 100,
+						 WAIT_EVENT_WALRCV_EXIT);
 	}
 }
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..222e38037c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -998,6 +998,7 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_WALRCV_EXIT,
 	WAIT_EVENT_XACT_GROUP_UPDATE
 } WaitEventIPC;
 
-- 
2.30.1

v5-0002-Poll-postmaster-less-frequently-in-recovery.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Poll-postmaster-less-frequently-in-recovery.patchDownload

From 8988e877cbdc5a7f19f481352b72598aee7314c8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 24 Sep 2020 17:37:54 +1200
Subject: [PATCH v5 2/2] Poll postmaster less frequently in recovery.

Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death, to avoid
expensive system calls reported to slow down CPU-bound recovery by
as much as 10-30%.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
---
 src/backend/postmaster/startup.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index f781fdc6fc..682fc675ab 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -134,6 +134,10 @@ StartupRereadConfig(void)
 void
 HandleStartupProcInterrupts(void)
 {
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+	static int	count = 0;
+#endif
+
 	/*
 	 * Process any requests or signals received recently.
 	 */
@@ -151,9 +155,17 @@ HandleStartupProcInterrupts(void)
 
 	/*
 	 * Emergency bailout if postmaster has died.  This is to avoid the
-	 * necessity for manual cleanup of all postmaster children.
+	 * necessity for manual cleanup of all postmaster children.  Do this less
+	 * frequently on systems for which we don't have signals to make that
+	 * cheap.  Any loop that sleeps should be using WaitLatch or similar and
+	 * handling postmaster death that way; the check here is intended only to
+	 * deal with CPU-bound loops such as the main redo loop.
 	 */
-	if (IsUnderPostmaster && !PostmasterIsAlive())
+	if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+		count++ % 1024 == 0 &&
+#endif
+		!PostmasterIsAlive())
 		exit(1);
 
 	/* Process barrier events */
-- 
2.30.1

#14

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Thomas Munro (#13)

3 attachment(s)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Tue, Mar 2, 2021 at 12:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier <michael@paquier.xyz> wrote:

No objections with the two changes from pg_usleep() to WaitLatch() so
they could be applied separately first.

I thought about committing that first part, and got as far as
splitting the patch into two (see attached), but then I re-read
Fujii-san's message about the speed of promotion and realised that we
really should have something like a condition variable for walRcvState
changes. I'll look into that.

Here's an experimental attempt at that, though I'm not sure if it's
the right approach. Of course it's not necessary to use condition
variables here: we could use recoveryWakeupLatch, because we're not in
any doubt about who needs to be woken up. But then you could get
constant wakeups while recovery is paused, unless you also suppressed
that somehow. You could use the startup process's procLatch,
advertised in shmem, but that's almost a condition variable. With a
condition variable, you get to name it something like
walRcvStateChanged, and then the programming rule is very clear: if
you change walRcvState, you need to broadcast that fact (and you don't
have to worry about who might be interested). One question I haven't
got to the bottom of: is it a problem for the startup process that CVs
use CHECK_FOR_INTERRUPTS()?

Attachments:

v6-0003-Poll-postmaster-less-frequently-in-recovery.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Poll-postmaster-less-frequently-in-recovery.patchDownload

From 1dfc8a1b25b33c4d1115bcea68ebe8732d56ba93 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 24 Sep 2020 17:37:54 +1200
Subject: [PATCH v6 3/3] Poll postmaster less frequently in recovery.

Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death, to avoid
expensive system calls reported to slow down CPU-bound recovery by
as much as 10-30%.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
---
 src/backend/postmaster/startup.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index f781fdc6fc..682fc675ab 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -134,6 +134,10 @@ StartupRereadConfig(void)
 void
 HandleStartupProcInterrupts(void)
 {
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+	static int	count = 0;
+#endif
+
 	/*
 	 * Process any requests or signals received recently.
 	 */
@@ -151,9 +155,17 @@ HandleStartupProcInterrupts(void)
 
 	/*
 	 * Emergency bailout if postmaster has died.  This is to avoid the
-	 * necessity for manual cleanup of all postmaster children.
+	 * necessity for manual cleanup of all postmaster children.  Do this less
+	 * frequently on systems for which we don't have signals to make that
+	 * cheap.  Any loop that sleeps should be using WaitLatch or similar and
+	 * handling postmaster death that way; the check here is intended only to
+	 * deal with CPU-bound loops such as the main redo loop.
 	 */
-	if (IsUnderPostmaster && !PostmasterIsAlive())
+	if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+		count++ % 1024 == 0 &&
+#endif
+		!PostmasterIsAlive())
 		exit(1);
 
 	/* Process barrier events */
-- 
2.30.1

v6-0001-Add-condition-variable-for-walreceiver-state.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Add-condition-variable-for-walreceiver-state.patchDownload

From 1e28b9c21a953e66c48f78a90192f3a0ca83d0aa Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 2 Mar 2021 11:08:06 +1300
Subject: [PATCH v6 1/3] Add condition variable for walreceiver state.

Use this new CV to wait for walreceiver shutdown without a sleep/poll
loop, while also benefiting from standard postmaster death handling.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
---
 doc/src/sgml/monitoring.sgml               |  4 +++
 src/backend/postmaster/pgstat.c            |  3 ++
 src/backend/replication/walreceiver.c      | 10 ++++++
 src/backend/replication/walreceiverfuncs.c | 42 ++++++++++++++--------
 src/include/pgstat.h                       |  1 +
 src/include/replication/walreceiver.h      |  2 ++
 6 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3513e127b7..cf00210cb3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1757,6 +1757,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting for confirmation from a remote server during synchronous
        replication.</entry>
      </row>
+     <row>
+      <entry><literal>WalrcvExit</literal></entry>
+      <entry>Waiting for the walreceiver to exit.</entry>
+     </row>
      <row>
       <entry><literal>XactGroupUpdate</literal></entry>
       <entry>Waiting for the group leader to update transaction status at
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..2dbfb81e40 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4122,6 +4122,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_WALRCV_EXIT:
+			event_name = "WalrcvExit";
+			break;
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index e5f8a06fea..397a94d7af 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -207,6 +207,8 @@ WalReceiverMain(void)
 
 		case WALRCV_STOPPED:
 			SpinLockRelease(&walrcv->mutex);
+			/* We might have changed state and fallen through above. */
+			ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
 			proc_exit(1);
 			break;
 
@@ -249,6 +251,8 @@ WalReceiverMain(void)
 
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	pg_atomic_write_u64(&WalRcv->writtenUpto, 0);
 
 	/* Arrange to clean up at walreceiver exit */
@@ -647,6 +651,8 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)
 	walrcv->receiveStartTLI = 0;
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	set_ps_display("idle");
 
 	/*
@@ -675,6 +681,8 @@ WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)
 			*startpointTLI = walrcv->receiveStartTLI;
 			walrcv->walRcvState = WALRCV_STREAMING;
 			SpinLockRelease(&walrcv->mutex);
+
+			ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
 			break;
 		}
 		if (walrcv->walRcvState == WALRCV_STOPPING)
@@ -784,6 +792,8 @@ WalRcvDie(int code, Datum arg)
 	walrcv->latch = NULL;
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	/* Terminate the connection gracefully. */
 	if (wrconn != NULL)
 		walrcv_disconnect(wrconn);
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 63e60478ea..9106f43d51 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -23,8 +23,10 @@
 #include <signal.h>
 
 #include "access/xlog_internal.h"
+#include "pgstat.h"
 #include "postmaster/startup.h"
 #include "replication/walreceiver.h"
+#include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/timestamp.h"
@@ -62,6 +64,7 @@ WalRcvShmemInit(void)
 		/* First time through, so initialize */
 		MemSet(WalRcv, 0, WalRcvShmemSize());
 		WalRcv->walRcvState = WALRCV_STOPPED;
+		ConditionVariableInit(&WalRcv->walRcvStateChanged);
 		SpinLockInit(&WalRcv->mutex);
 		pg_atomic_init_u64(&WalRcv->writtenUpto, 0);
 		WalRcv->latch = NULL;
@@ -95,12 +98,18 @@ WalRcvRunning(void)
 
 		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
 		{
-			SpinLockAcquire(&walrcv->mutex);
+			bool		state_changed = false;
 
+			SpinLockAcquire(&walrcv->mutex);
 			if (walrcv->walRcvState == WALRCV_STARTING)
+			{
 				state = walrcv->walRcvState = WALRCV_STOPPED;
-
+				state_changed = true;
+			}
 			SpinLockRelease(&walrcv->mutex);
+
+			if (state_changed)
+				ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
 		}
 	}
 
@@ -140,12 +149,18 @@ WalRcvStreaming(void)
 
 		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
 		{
-			SpinLockAcquire(&walrcv->mutex);
+			bool		state_changed = false;
 
+			SpinLockAcquire(&walrcv->mutex);
 			if (walrcv->walRcvState == WALRCV_STARTING)
+			{
 				state = walrcv->walRcvState = WALRCV_STOPPED;
-
+				state_changed = true;
+			}
 			SpinLockRelease(&walrcv->mutex);
+
+			if (state_changed)
+				ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
 		}
 	}
 
@@ -191,6 +206,8 @@ ShutdownWalRcv(void)
 	}
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	/*
 	 * Signal walreceiver process if it was still running.
 	 */
@@ -199,18 +216,13 @@ ShutdownWalRcv(void)
 
 	/*
 	 * Wait for walreceiver to acknowledge its death by setting state to
-	 * WALRCV_STOPPED.
+	 * WALRCV_STOPPED.  This wait contains a standard CHECK_FOR_INTERRUPTS().
 	 */
+	ConditionVariablePrepareToSleep(&walrcv->walRcvStateChanged);
 	while (WalRcvRunning())
-	{
-		/*
-		 * This possibly-long loop needs to handle interrupts of startup
-		 * process.
-		 */
-		HandleStartupProcInterrupts();
-
-		pg_usleep(100000);		/* 100ms */
-	}
+		ConditionVariableSleep(&walrcv->walRcvStateChanged,
+							   WAIT_EVENT_WALRCV_EXIT);
+	ConditionVariableCancelSleep();
 }
 
 /*
@@ -298,6 +310,8 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
 
 	SpinLockRelease(&walrcv->mutex);
 
+	ConditionVariableBroadcast(&walrcv->walRcvStateChanged);
+
 	if (launch)
 		SendPostmasterSignal(PMSIGNAL_START_WALRECEIVER);
 	else if (latch)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..222e38037c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -998,6 +998,7 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_WALRCV_EXIT,
 	WAIT_EVENT_XACT_GROUP_UPDATE
 } WaitEventIPC;
 
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index a97a59a6a3..bddea21a30 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -19,6 +19,7 @@
 #include "port/atomics.h"
 #include "replication/logicalproto.h"
 #include "replication/walsender.h"
+#include "storage/condition_variable.h"
 #include "storage/latch.h"
 #include "storage/spin.h"
 #include "utils/tuplestore.h"
@@ -62,6 +63,7 @@ typedef struct
 	 */
 	pid_t		pid;
 	WalRcvState walRcvState;
+	ConditionVariable walRcvStateChanged;
 	pg_time_t	startTime;
 
 	/*
-- 
2.30.1

v6-0002-Add-condition-variable-for-recovery-pause-resume.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Add-condition-variable-for-recovery-pause-resume.patchDownload

From f03e945f140d52a9e4910ab0d66a01d659c43c9f Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 2 Mar 2021 11:47:39 +1300
Subject: [PATCH v6 2/3] Add condition variable for recovery pause/resume.

This gives us a fast reaction time when recovery is resumed or the
postmaster exits.  Unfortunately we still need to wake up every second
to perform more expensive checks during the recovery pause loop.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 377afb8732..17240e9b8f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -723,6 +723,7 @@ typedef struct XLogCtlData
 	TimestampTz currentChunkStartTime;
 	/* Are we requested to pause recovery? */
 	bool		recoveryPause;
+	ConditionVariable recoveryPauseChanged;
 
 	/*
 	 * lastFpwDisableRecPtr points to the start of the last replayed
@@ -5204,6 +5205,7 @@ XLOGShmemInit(void)
 	SpinLockInit(&XLogCtl->info_lck);
 	SpinLockInit(&XLogCtl->ulsn_lck);
 	InitSharedLatch(&XLogCtl->recoveryWakeupLatch);
+	ConditionVariableInit(&XLogCtl->recoveryPauseChanged);
 }
 
 /*
@@ -6016,10 +6018,6 @@ recoveryStopsAfter(XLogReaderState *record)
  * endOfRecovery is true if the recovery target is reached and
  * the paused state starts at the end of recovery because of
  * recovery_target_action=pause, and false otherwise.
- *
- * XXX Could also be done with shared latch, avoiding the pg_usleep loop.
- * Probably not worth the trouble though.  This state shouldn't be one that
- * anyone cares about server power consumption in.
  */
 static void
 recoveryPausesHere(bool endOfRecovery)
@@ -6041,15 +6039,22 @@ recoveryPausesHere(bool endOfRecovery)
 				(errmsg("recovery has paused"),
 				 errhint("Execute pg_wal_replay_resume() to continue.")));
 
+	ConditionVariablePrepareToSleep(&XLogCtl->recoveryPauseChanged);
 	while (RecoveryIsPaused())
 	{
 		HandleStartupProcInterrupts();
 		if (CheckForStandbyTrigger())
-			return;
-		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
-		pg_usleep(1000000L);	/* 1000 ms */
-		pgstat_report_wait_end();
+			break;
+
+		/*
+		 * We wait on a condition variable that will wake us as soon as the
+		 * pause ends, but we use a timeout so we can check the above exit
+		 * condition periodically too.
+		 */
+		ConditionVariableTimedSleep(&XLogCtl->recoveryPauseChanged, 1000,
+									WAIT_EVENT_RECOVERY_PAUSE);
 	}
+	ConditionVariableCancelSleep();
 }
 
 bool
@@ -6070,6 +6075,8 @@ SetRecoveryPause(bool recoveryPause)
 	SpinLockAcquire(&XLogCtl->info_lck);
 	XLogCtl->recoveryPause = recoveryPause;
 	SpinLockRelease(&XLogCtl->info_lck);
+
+	ConditionVariableBroadcast(&XLogCtl->recoveryPauseChanged);
 }
 
 /*
-- 
2.30.1

#15

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Thomas Munro (#14)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Tue, Mar 2, 2021 at 2:10 PM Thomas Munro <thomas.munro@gmail.com> wrote:

... One question I haven't
got to the bottom of: is it a problem for the startup process that CVs
use CHECK_FOR_INTERRUPTS()?

This was a red herring. The startup process already reaches CFI() via
various paths, as I figured out pretty quickly with a debugger. So
I'd like to go ahead and commit these patches.

Michael, when you said "That's pretty hack-ish, still efficient" in
reference to this code:

-   if (IsUnderPostmaster && !PostmasterIsAlive())
+   if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+       count++ % 1024 == 0 &&
+#endif
+       !PostmasterIsAlive())

Is that an objection, and do you see a specific better way?

I know that someone just needs to write a Windows patch to get us a
postmaster death signal when the postmaster's event fires, and then
the problem will go away on Windows. I still want this change,
because we don't have such a patch yet, and even when someone writes
that, there are still a couple of Unixes that could benefit.

#16

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Thomas Munro (#15)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Thu, Mar 11, 2021 at 04:37:39PM +1300, Thomas Munro wrote:

Michael, when you said "That's pretty hack-ish, still efficient" in
reference to this code:
-   if (IsUnderPostmaster && !PostmasterIsAlive())
+   if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+       count++ % 1024 == 0 &&
+#endif
+       !PostmasterIsAlive())
Is that an objection, and do you see a specific better way?

I'd like to believe that there are more elegant ways to write that,
but based on the numbers you are giving, there is too much gain here
to ignore it. I would avoid 1024 as a hardcoded value though, so you
could just stick that in a #define or such. So please feel free to go
ahead. Thanks for asking.

I know that someone just needs to write a Windows patch to get us a
postmaster death signal when the postmaster's event fires, and then
the problem will go away on Windows. I still want this change,
because we don't have such a patch yet, and even when someone writes
that, there are still a couple of Unixes that could benefit.

Wow. This probably means that we would be able to get rid of
USE_POSTMASTER_DEATH_SIGNAL?
--
Michael

#17

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Michael Paquier (#16)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Thu, Mar 11, 2021 at 7:34 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 11, 2021 at 04:37:39PM +1300, Thomas Munro wrote:
Michael, when you said "That's pretty hack-ish, still efficient" in
reference to this code:
-   if (IsUnderPostmaster && !PostmasterIsAlive())
+   if (IsUnderPostmaster &&
+#ifndef USE_POSTMASTER_DEATH_SIGNAL
+       count++ % 1024 == 0 &&
+#endif
+       !PostmasterIsAlive())
Is that an objection, and do you see a specific better way?
I'd like to believe that there are more elegant ways to write that,
but based on the numbers you are giving, there is too much gain here
to ignore it. I would avoid 1024 as a hardcoded value though, so you
could just stick that in a #define or such. So please feel free to go
ahead. Thanks for asking.

Thanks! I rebased over the recent recovery pause/resume state
management change and simplified the walRcvState patch a bit (no need
to broadcast for every state change, just the changes to STOPPED
state). So that gets us to the point where there are no loops with
HandleStartupProcInterrupts() and a sleep in them (that'd be bad, it'd
take a long time to notice the postmaster going away if it only checks
every 1024 loops; all loops that sleep need to be using the latch
infrastructure so they can notice the postmaster exiting immediately).
Then I turned that 1024 into a macro as you suggested for the main
patch, and pushed.

It looks like RecoveryRequiresIntParameter() should be sharing code
with recoveryPausesHere(), but I didn't try to do that in this commit.

I know that someone just needs to write a Windows patch to get us a
postmaster death signal when the postmaster's event fires, and then
the problem will go away on Windows. I still want this change,
because we don't have such a patch yet, and even when someone writes
that, there are still a couple of Unixes that could benefit.

Wow. This probably means that we would be able to get rid of
USE_POSTMASTER_DEATH_SIGNAL?

We'll still need it, because there'd still be systems with no signal:
NetBSD, OpenBSD, AIX, HPUX, illumos.

#18

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: Thomas Munro (#14)

1 attachment(s)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On 2021/03/02 10:10, Thomas Munro wrote:

On Tue, Mar 2, 2021 at 12:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier <michael@paquier.xyz> wrote:

No objections with the two changes from pg_usleep() to WaitLatch() so
they could be applied separately first.

I thought about committing that first part, and got as far as
splitting the patch into two (see attached), but then I re-read
Fujii-san's message about the speed of promotion and realised that we
really should have something like a condition variable for walRcvState
changes. I'll look into that.

Here's an experimental attempt at that, though I'm not sure if it's
the right approach. Of course it's not necessary to use condition
variables here: we could use recoveryWakeupLatch, because we're not in
any doubt about who needs to be woken up. But then you could get
constant wakeups while recovery is paused, unless you also suppressed
that somehow. You could use the startup process's procLatch,
advertised in shmem, but that's almost a condition variable. With a
condition variable, you get to name it something like
walRcvStateChanged, and then the programming rule is very clear: if
you change walRcvState, you need to broadcast that fact (and you don't
have to worry about who might be interested). One question I haven't
got to the bottom of: is it a problem for the startup process that CVs
use CHECK_FOR_INTERRUPTS()?

I found 0001 patch was committed in de829ddf23, and which added new
wait event WalrcvExit. This name seems not consistent with other wait
events. I'm thinking it's better to rename it to WalReceiverExit. Thought?
Patch attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

rename_walrcvexit_wait_event.patchtext/plain; charset=UTF-8; name=rename_walrcvexit_wait_event.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 19540206f9..43c07da20e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1763,8 +1763,8 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
        replication.</entry>
      </row>
      <row>
-      <entry><literal>WalrcvExit</literal></entry>
-      <entry>Waiting for the walreceiver to exit.</entry>
+      <entry><literal>WalReceiverExit</literal></entry>
+      <entry>Waiting for the WAL receiver to exit.</entry>
      </row>
      <row>
       <entry><literal>WalReceiverWaitStart</literal></entry>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b7af7c2707..60f45ccc4e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4121,8 +4121,8 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
-		case WAIT_EVENT_WALRCV_EXIT:
-			event_name = "WalrcvExit";
+		case WAIT_EVENT_WAL_RECEIVER_EXIT:
+			event_name = "WalReceiverExit";
 			break;
 		case WAIT_EVENT_WAL_RECEIVER_WAIT_START:
 			event_name = "WalReceiverWaitStart";
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index fff6c54c45..6f0acbfdef 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -224,7 +224,7 @@ ShutdownWalRcv(void)
 	ConditionVariablePrepareToSleep(&walrcv->walRcvStoppedCV);
 	while (WalRcvRunning())
 		ConditionVariableSleep(&walrcv->walRcvStoppedCV,
-							   WAIT_EVENT_WALRCV_EXIT);
+							   WAIT_EVENT_WAL_RECEIVER_EXIT);
 	ConditionVariableCancelSleep();
 }
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2c82313550..87672e6f30 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1008,7 +1008,7 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
-	WAIT_EVENT_WALRCV_EXIT,
+	WAIT_EVENT_WAL_RECEIVER_EXIT,
 	WAIT_EVENT_WAL_RECEIVER_WAIT_START,
 	WAIT_EVENT_XACT_GROUP_UPDATE
 } WaitEventIPC;

#19

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Fujii Masao (#18)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Tue, Mar 23, 2021 at 2:44 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I found 0001 patch was committed in de829ddf23, and which added new
wait event WalrcvExit. This name seems not consistent with other wait
events. I'm thinking it's better to rename it to WalReceiverExit. Thought?
Patch attached.

Agreed, your names are better. Thanks.

#20

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: Thomas Munro (#19)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On 2021/03/23 10:52, Thomas Munro wrote:

On Tue, Mar 23, 2021 at 2:44 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I found 0001 patch was committed in de829ddf23, and which added new
wait event WalrcvExit. This name seems not consistent with other wait
events. I'm thinking it's better to rename it to WalReceiverExit. Thought?
Patch attached.

Agreed, your names are better. Thanks.

Thanks! I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#21

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#20)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On 2021/03/23 14:49, Fujii Masao wrote:

On 2021/03/23 10:52, Thomas Munro wrote:

On Tue, Mar 23, 2021 at 2:44 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I found 0001 patch was committed in de829ddf23, and which added new
wait event WalrcvExit. This name seems not consistent with other wait
events. I'm thinking it's better to rename it to WalReceiverExit. Thought?
Patch attached.

Agreed, your names are better. Thanks.

Thanks! I will commit the patch.

Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#22

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Thomas Munro (#17)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Fri, Mar 12, 2021 at 7:55 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Mar 11, 2021 at 7:34 PM Michael Paquier <michael@paquier.xyz> wrote:

Wow. This probably means that we would be able to get rid of
USE_POSTMASTER_DEATH_SIGNAL?

We'll still need it, because there'd still be systems with no signal:
NetBSD, OpenBSD, AIX, HPUX, illumos.

Erm, and of course macOS.

There is actually something we could do here: ioctl(I_SETSIG) for
SysV-derived systems and fcntl(O_ASYNC) for BSD-derived systems seems
like a promising way to get a SIGIO signal when the postmaster goes
away and the pipe becomes readable. Previously I'd discounted this,
because it's not in POSIX and I doubted it would work well on other
systems. But I was flicking through Stevens' UNIX book while trying
to figure out that POLLHUP stuff from a nearby thread (though it's
useless for that purpose) and I learned from section 12.6 that SIGIO
is fairly ancient, originating in 4.2BSD, adopted by SVR4, so it's
likely present in quite a few systems, maybe even all of our support
platforms if we're prepared to do it two different ways. Just a
thought.

#23

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Thomas Munro (#22)

1 attachment(s)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Wed, Mar 31, 2021 at 7:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Mar 12, 2021 at 7:55 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Mar 11, 2021 at 7:34 PM Michael Paquier <michael@paquier.xyz> wrote:

Wow. This probably means that we would be able to get rid of
USE_POSTMASTER_DEATH_SIGNAL?

We'll still need it, because there'd still be systems with no signal:
NetBSD, OpenBSD, AIX, HPUX, illumos.

Erm, and of course macOS.

There is actually something we could do here: ioctl(I_SETSIG) for
SysV-derived systems and fcntl(O_ASYNC) for BSD-derived systems seems
like a promising way to get a SIGIO signal when the postmaster goes
away and the pipe becomes readable. Previously I'd discounted this,
because it's not in POSIX and I doubted it would work well on other
systems. But I was flicking through Stevens' UNIX book while trying
to figure out that POLLHUP stuff from a nearby thread (though it's
useless for that purpose) and I learned from section 12.6 that SIGIO
is fairly ancient, originating in 4.2BSD, adopted by SVR4, so it's
likely present in quite a few systems, maybe even all of our support
platforms if we're prepared to do it two different ways. Just a
thought.

Alright, here's a proof-of-concept patch that does that. Adding to the next CF.

This seems to work on Linux, macOS, FreeBSD and OpenBSD (and I assume
any other BSD). Can anyone tell me if it works on illumos, AIX or
HPUX, and if not, how to fix it or disable the feature gracefully?
For now the patch assumes that if you have SIGIO then you can do this;
perhaps it should also test for O_ASYNC? Perhaps HPUX has the signal
but requires a different incantation with I_SETSIG?

Full disclosure: The reason for my interest in this subject is that I
have a work-in-progress patch set to make latches, locks and condition
variables more efficient using futexes on several OSes, but it needs a
signal to wake on postmaster death.

Attachments:

0001-Use-SIGIO-to-detect-postmaster-death.patchtext/x-patch; charset=US-ASCII; name=0001-Use-SIGIO-to-detect-postmaster-death.patchDownload

From acff1b99d464eca453cdde3f5ca8079c925e47ac Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 9 Apr 2021 16:00:07 +1200
Subject: [PATCH] Use SIGIO to detect postmaster death.

Previously we used non-POSIX calls prctl/procctl to requests a signal on
death of our parent process on Linux and FreeBSD.  Instead, use O_ASYNC
to request SIGIO when the pipe is closed by the postmaster.  The effect
is about the same, but it should work on many other Unix systems too.
It's not in POSIX either, so it remains optional.

Discussion: https://postgr.es/m/CA%2BhUKGJiejg%3DGPTkf3S53N0-Vr4fOQ4wev7DmAVVLHbh%3DO9%2Bdg%40mail.gmail.com
---
 configure                          |  2 +-
 configure.ac                       |  2 --
 src/backend/storage/ipc/pmsignal.c | 41 ++++++++----------------------
 src/include/pg_config.h.in         |  6 -----
 src/include/storage/pmsignal.h     | 11 +-------
 src/tools/msvc/Solution.pm         |  2 --
 6 files changed, 12 insertions(+), 52 deletions(-)

diff --git a/configure b/configure
index 70f4555264..9b69fb203a 100755
--- a/configure
+++ b/configure
@@ -13345,7 +13345,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h
 fi
 
 
-for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/uio.h sys/un.h termios.h ucred.h wctype.h
+for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/uio.h sys/un.h termios.h ucred.h wctype.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
diff --git a/configure.ac b/configure.ac
index ba67c95bcc..f08c60acd6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1360,8 +1360,6 @@ AC_CHECK_HEADERS(m4_normalize([
 	sys/epoll.h
 	sys/event.h
 	sys/ipc.h
-	sys/prctl.h
-	sys/procctl.h
 	sys/pstat.h
 	sys/resource.h
 	sys/select.h
diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index 280c2395c9..e9f5b4fc41 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include <fcntl.h>
 #include <signal.h>
 #include <unistd.h>
 
@@ -92,23 +93,6 @@ postmaster_death_handler(int signo)
 {
 	postmaster_possibly_dead = true;
 }
-
-/*
- * The available signals depend on the OS.  SIGUSR1 and SIGUSR2 are already
- * used for other things, so choose another one.
- *
- * Currently, we assume that we can always find a signal to use.  That
- * seems like a reasonable assumption for all platforms that are modern
- * enough to have a parent-death signaling mechanism.
- */
-#if defined(SIGINFO)
-#define POSTMASTER_DEATH_SIGNAL SIGINFO
-#elif defined(SIGPWR)
-#define POSTMASTER_DEATH_SIGNAL SIGPWR
-#else
-#error "cannot find a signal to use for postmaster death"
-#endif
-
 #endif							/* USE_POSTMASTER_DEATH_SIGNAL */
 
 /*
@@ -405,21 +389,16 @@ void
 PostmasterDeathSignalInit(void)
 {
 #ifdef USE_POSTMASTER_DEATH_SIGNAL
-	int			signum = POSTMASTER_DEATH_SIGNAL;
-
 	/* Register our signal handler. */
-	pqsignal(signum, postmaster_death_handler);
-
-	/* Request a signal on parent exit. */
-#if defined(PR_SET_PDEATHSIG)
-	if (prctl(PR_SET_PDEATHSIG, signum) < 0)
-		elog(ERROR, "could not request parent death signal: %m");
-#elif defined(PROC_PDEATHSIG_CTL)
-	if (procctl(P_PID, 0, PROC_PDEATHSIG_CTL, &signum) < 0)
-		elog(ERROR, "could not request parent death signal: %m");
-#else
-#error "USE_POSTMASTER_DEATH_SIGNAL set, but there is no mechanism to request the signal"
-#endif
+	pqsignal(SIGIO, postmaster_death_handler);
+
+	/* Ask for SIGIO if the pipe becomes readable, indicating it was closed. */
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH],
+			  F_SETOWN, MyProcPid) < 0)
+		elog(ERROR, "could not own postmaster pipe: %m");
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH],
+			  F_SETFL, O_ASYNC | O_NONBLOCK) < 0)
+		elog(ERROR, "could not enable SIGIO on postmaster pipe: %m");
 
 	/*
 	 * Just in case the parent was gone already and we missed it, we'd better
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 783b8fc1ba..13f99b36f1 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -620,12 +620,6 @@
 /* Define to 1 if you have the <sys/ipc.h> header file. */
 #undef HAVE_SYS_IPC_H
 
-/* Define to 1 if you have the <sys/prctl.h> header file. */
-#undef HAVE_SYS_PRCTL_H
-
-/* Define to 1 if you have the <sys/procctl.h> header file. */
-#undef HAVE_SYS_PROCCTL_H
-
 /* Define to 1 if you have the <sys/pstat.h> header file. */
 #undef HAVE_SYS_PSTAT_H
 
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 8ed4d87ae6..7ab00c4024 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -16,14 +16,6 @@
 
 #include <signal.h>
 
-#ifdef HAVE_SYS_PRCTL_H
-#include "sys/prctl.h"
-#endif
-
-#ifdef HAVE_SYS_PROCCTL_H
-#include "sys/procctl.h"
-#endif
-
 /*
  * Reasons for signaling the postmaster.  We can cope with simultaneous
  * signals for different reasons.  If the same reason is signaled multiple
@@ -83,8 +75,7 @@ extern void PostmasterDeathSignalInit(void);
  * the parent dies.  Checking the flag first makes PostmasterIsAlive() a lot
  * cheaper in usual case that the postmaster is alive.
  */
-#if (defined(HAVE_SYS_PRCTL_H) && defined(PR_SET_PDEATHSIG)) || \
-	(defined(HAVE_SYS_PROCCTL_H) && defined(PROC_PDEATHSIG_CTL))
+#if defined(SIGIO)
 #define USE_POSTMASTER_DEATH_SIGNAL
 #endif
 
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index d2bc7abef0..b4130dddaf 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -397,8 +397,6 @@ sub GenerateFiles
 		HAVE_SYS_EPOLL_H                         => undef,
 		HAVE_SYS_EVENT_H                         => undef,
 		HAVE_SYS_IPC_H                           => undef,
-		HAVE_SYS_PRCTL_H                         => undef,
-		HAVE_SYS_PROCCTL_H                       => undef,
 		HAVE_SYS_PSTAT_H                         => undef,
 		HAVE_SYS_RESOURCE_H                      => undef,
 		HAVE_SYS_SELECT_H                        => undef,
-- 
2.30.1

#24

Heikki Linnakangas

hlinnaka@iki.fi

over 4 years ago

In reply to: Thomas Munro (#23)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On 09/04/2021 07:01, Thomas Munro wrote:

On Wed, Mar 31, 2021 at 7:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Mar 12, 2021 at 7:55 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Mar 11, 2021 at 7:34 PM Michael Paquier <michael@paquier.xyz> wrote:

Wow. This probably means that we would be able to get rid of
USE_POSTMASTER_DEATH_SIGNAL?

We'll still need it, because there'd still be systems with no signal:
NetBSD, OpenBSD, AIX, HPUX, illumos.

Erm, and of course macOS.

There is actually something we could do here: ioctl(I_SETSIG) for
SysV-derived systems and fcntl(O_ASYNC) for BSD-derived systems seems
like a promising way to get a SIGIO signal when the postmaster goes
away and the pipe becomes readable. Previously I'd discounted this,
because it's not in POSIX and I doubted it would work well on other
systems. But I was flicking through Stevens' UNIX book while trying
to figure out that POLLHUP stuff from a nearby thread (though it's
useless for that purpose) and I learned from section 12.6 that SIGIO
is fairly ancient, originating in 4.2BSD, adopted by SVR4, so it's
likely present in quite a few systems, maybe even all of our support
platforms if we're prepared to do it two different ways. Just a
thought.

Alright, here's a proof-of-concept patch that does that. Adding to the next CF.

Looks good to me.

This seems to work on Linux, macOS, FreeBSD and OpenBSD (and I assume
any other BSD). Can anyone tell me if it works on illumos, AIX or
HPUX, and if not, how to fix it or disable the feature gracefully?
For now the patch assumes that if you have SIGIO then you can do this;
perhaps it should also test for O_ASYNC? Perhaps HPUX has the signal
but requires a different incantation with I_SETSIG?

I think it would be OK to just commit this (after REL_14_STABLE has been
created) and see what breaks in the buildfarm. Then we'll at least know
if we need more autoconf checks or something to disable this.

- Heikki

#25

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Heikki Linnakangas (#24)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 09/04/2021 07:01, Thomas Munro wrote:

This seems to work on Linux, macOS, FreeBSD and OpenBSD (and I assume
any other BSD). Can anyone tell me if it works on illumos, AIX or
HPUX, and if not, how to fix it or disable the feature gracefully?
For now the patch assumes that if you have SIGIO then you can do this;
perhaps it should also test for O_ASYNC? Perhaps HPUX has the signal
but requires a different incantation with I_SETSIG?

I took a look on HPUX 10.20 (gaur's host):

* SIGIO exists, but signal.h only defines it with
-D_INCLUDE_HPUX_SOURCE which we don't use.

* I found I_SETSIG, but ...

$ grep -r SETSIG /usr/include
/usr/include/sys/stropts.h:#define I_SETSIG _IO('S', 9) /* request SIGPOLL signal on events */

stropts.h seems to be for a feature called "streams", which is
probably nonstandard enough that we don't want to deal with it.

So I think the short answer on this platform is that if you conditionalize
on #ifdef SIGIO then it will just not do it, and we should be fine.

Can't say about HPUX 11.

regards, tom lane

#26

Thomas Munro

thomas.munro@gmail.com

over 4 years ago

In reply to: Tom Lane (#25)

Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)

On Fri, Jun 11, 2021 at 1:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 09/04/2021 07:01, Thomas Munro wrote:

This seems to work on Linux, macOS, FreeBSD and OpenBSD (and I assume
any other BSD). Can anyone tell me if it works on illumos, AIX or
HPUX, and if not, how to fix it or disable the feature gracefully?
For now the patch assumes that if you have SIGIO then you can do this;
perhaps it should also test for O_ASYNC? Perhaps HPUX has the signal
but requires a different incantation with I_SETSIG?

I took a look on HPUX 10.20 (gaur's host):

Thanks both for looking!

Unfortunately I'll have to withdraw this patch in its current form.
On closer inspection, only the last backend to start up and do
F_SETOWN on the pipe receives the signal. We'd probably have to
create a separate pipe for each backend, or something like that, which
seems unwarranted so far.

* I found I_SETSIG, but ...

$ grep -r SETSIG /usr/include
/usr/include/sys/stropts.h:#define I_SETSIG _IO('S', 9) /* request SIGPOLL signal on events */

stropts.h seems to be for a feature called "streams", which is
probably nonstandard enough that we don't want to deal with it.

Agreed. It is technically POSIX, but optional and marked obsolescent.
It's annoying to see that I_SETSIG did allow multiple processes to
register to receive signals for the same event on the same underlying
stream, unlike F_SETOWN. Oh well.